lab 4_LRM
Lab_lecture 4.txt
—
Plain Text,
6 kB (6588 bytes)
Contenuto del file
Simple Linear Regression Model using R
UNIFE
Spring Semester
Mini V. 20-02-2019
RESEARCH QUESTION:
does exist a linear causal relationship between the number of cakes sold in a week (by a firm) and the unit�s price (the price applied per cake)?
Let�s observe a given dataset and perform a simple linear regression analysis
#Analysis: step by step
0. LET'S PREPARE THE DATASET
1. Visualize the relationship: the scatter plot
2. Identify the estimated model
3. The model on a graph
4. Prediction: the expected Y values given a X value
5. The model�s goodness of fit
6. Graphical analysis of Linear Regression Model�s assumptions
7. what about the inference? #
#0.LET'S PREPARE THE DATASET
#we upload an external dataset
#A) CHECK THE DIRECTORY PROCESS
getwd()
#B) CHANGE THE ACTUAL DIRECTORY (IF NECESSARY) FROM THE FILE BAR
#C) CHECK AGAIN THE DIRECTORY PROCESS
getwd()
#D) WE IMPORT THE DATABASE OF INTERES
cake<-read.csv2("cake_reg lin.csv")
#E) CHECK THE UPLOADED DATASET
View(cake)
#F) CHECK THE DATABASE STRUCTURE
str(cake) #this command shows the structure and characteristics of the data
head(cake) #this command shows the first six rows of our dataset
# G) ...TO BE SURE THE DATABASE IS AVAILABLE WITHIN THE R SOFTWARE FOR NEXT ANALYSIS
attach(cake)
#BECAUSE WE ARE INTERESTED IN TWO VARIABLES (UNITS AND PRICE), WE EXCLUDE THE FIRST ONE
cake=cake[,-1]
#1. Graphical observation of the data#
plot(x,y)
#What we can say about the relationship between this couple of data?#
#2. We may identify the model using two different strategies:
a) Following all the steps seen in theory
b) Using the lm function in R#
#2A: Let�s follow the steps we�ve seen in theory#
x.difference=x-mean(x) #xi - x average #
x.difference
y.difference=y-mean(y) #yi - y average #
y.scarti
dev.x=sum(x.difference^2) #total sum of (xi - x average)#
dev.x
dev.y=sum(y.difference^2) #total sum of (yi - y average)#
dev.y
# let�s compute the total sum of the product between x and y differences#
codev.xy=sum(x.difference*y.difference)
codev.xy
#now we have all the elements to compute the coefficients of our model#
b1=codev.xy/dev.x #SSYX/SSX #
b1
b0=mean(y)-mean(x)*b1 # average y -b1*average x#
b0
#using those information we may transcript the equation of our estimated model #
#y= b0+b1 * xi --> #
#we may predict the value of weekly SOLD_KAKES for a given unit price #
#before to make any prediction, It�s important to individuate the X range, given by the minimum and maximum value that X takes in our database: we have two different possibilities:
>max(x)
>min(x)
>range(x)
#let�s now make prediction ? Using the model : prediction=b0+b1*x
#How many cakes we estimate to sell in a week in which the unit�s price is 5.3$ ?#
prediction5.3=b0+b1*5.3
prediction5.3
#please, interpret the obtained result ? when the unit�s price is 5.3$, in that week we�ll expect to sell �... cakes#
#How many cakes we estimate to sell in a week in which the unit�s price is 7.2$ ?#
Prediction7.2=b0+b1*7.2
Prediction7.2
#please, interpret the obtained result#
#--------------#
#B. let�s compute the Simple Linear Regression Model using the R function lm()#
#the function is lm(dependent variable (Y)~explanatory variable (X))#
#how to write �tilde� using your keyboard? alt+126 (from the numerical small keyboard on the right side)#
reg.lin=lm(y~x)
#the result is an object in R: we may visualize the performed linear regression simply by re-calling the object�s name#
reg.lin
#when we want to visualize some specified contents of our analysis
we need to use the dollar symbol between the model�s name and
the specified contents we are interested in $#
# i.e. regression$specification
#for instance we may want to visualize the coefficients of our model #
reg.lin$coefficients
# definitely we have individuate the equation of our estimated model #
#__________________#
#3. Plot of our linear model#
plot(x,y) #pairs of coordinates
lines(x,y) #line which link all the coordinates
abline(reg.lin) #graphical representation of the regression line
#__________________#
# 4. Prediction: the expected Y values given a X value
? Already seen in the 2A step
#If in a given week, the company we are working for decides to apply a unit cake�s price equals to 6.8$, how many cakes we�ll expect to sell (in that week)?#
prev6.8=b0+b1*6.8
prev6.8
#comment the results: how many cakes the company should prepare for that week?#
#_________________#
#5. The model�s goodness of fit or the coefficient of determination (R2)
# how much of the total variation in Y is explained by our simple regression model?
#three ways to identify R2:
a) Computing SSR/SST
b) Checking the regression model�s output
c) Checking the ANOVA table
#5A. let�s compute R2=SSR/SST#
dev.tot=sum((y-mean(y))^2) #total residuals SST
dev.disp=sum(reg.lin$residuals^2) #residuals SSE
dev.reg=dev.tot-dev.disp #regression�s residuals SSR
RQ=dev.reg/dev.tot
RQ
#how we can interpret the result?
#does the model we�ve performed explain a lot of the variation in Y?
#Is it a good model or not?
#how much of the variation in Y is not explained by the model? So, how much of unexplained variation in Y still exists? (part of variation dues to different factors or not caught by the linear relationship)
#------------#
#5B. we may obtain the value of the coefficient of determination (R2) observing the summary of our regression model ? we use the command �summary�
summary(reg.lin)
#on the penultimate row of the obtained output we�ll see the R2 value
#_________________#
#5C. . we may obtain the value of the coefficient of determination (R2) observing the ANOVA output ? we use the anova command (analysis of variance)
anova(reg.lin)
SSR=77991
SST=(77991+91998)=169989
R2=77991/169989=0.4588
#__________________#
#6. CHECKING THE LINEAR REGRESSION ASSUMPTIONS #
We observe one plot for each assumption:
a) linearity between Y and X
plot(x,y)
abline(reg.lin)
b) independence of the error terms from the explanatory variable
e=reg.lin$residuals
plot(x,e)
c) constant variance for all levels of X
plot(x,e)
d) normal distribution of the error terms
hist(e)
#please, comment each plot considering the basic assumptions#
#_______________#