Simple Linear Regression Model using R UNIFE Spring Semester Mini V. 20-02-2019 RESEARCH QUESTION: does exist a linear causal relationship between the number of cakes sold in a week (by a firm) and the unit’s price (the price applied per cake)? Let’s observe a given dataset and perform a simple linear regression analysis #Analysis: step by step 0. LET'S PREPARE THE DATASET 1. Visualize the relationship: the scatter plot 2. Identify the estimated model 3. The model on a graph 4. Prediction: the expected Y values given a X value 5. The model’s goodness of fit 6. Graphical analysis of Linear Regression Model’s assumptions 7. what about the inference? # #0.LET'S PREPARE THE DATASET #we upload an external dataset #A) CHECK THE DIRECTORY PROCESS getwd() #B) CHANGE THE ACTUAL DIRECTORY (IF NECESSARY) FROM THE FILE BAR #C) CHECK AGAIN THE DIRECTORY PROCESS getwd() #D) WE IMPORT THE DATABASE OF INTERES cake<-read.csv2("cake_reg lin.csv") #E) CHECK THE UPLOADED DATASET View(cake) #F) CHECK THE DATABASE STRUCTURE str(cake) #this command shows the structure and characteristics of the data head(cake) #this command shows the first six rows of our dataset # G) ...TO BE SURE THE DATABASE IS AVAILABLE WITHIN THE R SOFTWARE FOR NEXT ANALYSIS attach(cake) #BECAUSE WE ARE INTERESTED IN TWO VARIABLES (UNITS AND PRICE), WE EXCLUDE THE FIRST ONE cake=cake[,-1] #1. Graphical observation of the data# plot(x,y) #What we can say about the relationship between this couple of data?# #2. We may identify the model using two different strategies: a) Following all the steps seen in theory b) Using the lm function in R# #2A: Let’s follow the steps we’ve seen in theory# x.difference=x-mean(x) #xi - x average # x.difference y.difference=y-mean(y) #yi - y average # y.scarti dev.x=sum(x.difference^2) #total sum of (xi - x average)# dev.x dev.y=sum(y.difference^2) #total sum of (yi - y average)# dev.y # let’s compute the total sum of the product between x and y differences# codev.xy=sum(x.difference*y.difference) codev.xy #now we have all the elements to compute the coefficients of our model# b1=codev.xy/dev.x #SSYX/SSX # b1 b0=mean(y)-mean(x)*b1 # average y -b1*average x# b0 #using those information we may transcript the equation of our estimated model # #y= b0+b1 * xi --> # #we may predict the value of weekly SOLD_KAKES for a given unit price # #before to make any prediction, It’s important to individuate the X range, given by the minimum and maximum value that X takes in our database: we have two different possibilities: >max(x) >min(x) >range(x) #let’s now make prediction ? Using the model : prediction=b0+b1*x #How many cakes we estimate to sell in a week in which the unit’s price is 5.3$ ?# prediction5.3=b0+b1*5.3 prediction5.3 #please, interpret the obtained result ? when the unit’s price is 5.3$, in that week we’ll expect to sell …... cakes# #How many cakes we estimate to sell in a week in which the unit’s price is 7.2$ ?# Prediction7.2=b0+b1*7.2 Prediction7.2 #please, interpret the obtained result# #--------------# #B. let’s compute the Simple Linear Regression Model using the R function lm()# #the function is lm(dependent variable (Y)~explanatory variable (X))# #how to write “tilde” using your keyboard? alt+126 (from the numerical small keyboard on the right side)# reg.lin=lm(y~x) #the result is an object in R: we may visualize the performed linear regression simply by re-calling the object’s name# reg.lin #when we want to visualize some specified contents of our analysis we need to use the dollar symbol between the model’s name and the specified contents we are interested in $# # i.e. regression$specification #for instance we may want to visualize the coefficients of our model # reg.lin$coefficients # definitely we have individuate the equation of our estimated model # #__________________# #3. Plot of our linear model# plot(x,y) #pairs of coordinates lines(x,y) #line which link all the coordinates abline(reg.lin) #graphical representation of the regression line #__________________# # 4. Prediction: the expected Y values given a X value ? Already seen in the 2A step #If in a given week, the company we are working for decides to apply a unit cake’s price equals to 6.8$, how many cakes we’ll expect to sell (in that week)?# prev6.8=b0+b1*6.8 prev6.8 #comment the results: how many cakes the company should prepare for that week?# #_________________# #5. The model’s goodness of fit or the coefficient of determination (R2) # how much of the total variation in Y is explained by our simple regression model? #three ways to identify R2: a) Computing SSR/SST b) Checking the regression model’s output c) Checking the ANOVA table #5A. let’s compute R2=SSR/SST# dev.tot=sum((y-mean(y))^2) #total residuals SST dev.disp=sum(reg.lin$residuals^2) #residuals SSE dev.reg=dev.tot-dev.disp #regression’s residuals SSR RQ=dev.reg/dev.tot RQ #how we can interpret the result? #does the model we’ve performed explain a lot of the variation in Y? #Is it a good model or not? #how much of the variation in Y is not explained by the model? So, how much of unexplained variation in Y still exists? (part of variation dues to different factors or not caught by the linear relationship) #------------# #5B. we may obtain the value of the coefficient of determination (R2) observing the summary of our regression model ? we use the command “summary” summary(reg.lin) #on the penultimate row of the obtained output we’ll see the R2 value #_________________# #5C. . we may obtain the value of the coefficient of determination (R2) observing the ANOVA output ? we use the anova command (analysis of variance) anova(reg.lin) SSR=77991 SST=(77991+91998)=169989 R2=77991/169989=0.4588 #__________________# #6. CHECKING THE LINEAR REGRESSION ASSUMPTIONS # We observe one plot for each assumption: a) linearity between Y and X plot(x,y) abline(reg.lin) b) independence of the error terms from the explanatory variable e=reg.lin$residuals plot(x,e) c) constant variance for all levels of X plot(x,e) d) normal distribution of the error terms hist(e) #please, comment each plot considering the basic assumptions# #_______________#