Linear Regression Modelling Using R

You could think of regression as like an extension of correlation but where you have a definite response (dependent) variable and at least one predictor (independent) variable. Where you have multiple predictor variables you use multiple regression. In basic linear regression you are using the properties of the normal distribution to tell something about the relationship between the response and the various predictors.

The process is often called regression modelling or linear modelling and is carried out in R with the lm() command.

Linear Regression Models

The lm() function requires a formula that describes the experimental setup. It is generally a good idea to assign a named object to hold the result, as it contains useful components:

> head(airquality, n = 4)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
> mod = lm(Ozone ~ Solar.R + Temp, data = airquality)
> summary(mod)

Call:
lm(formula = Ozone ~ Solar.R + Temp, data = airquality)

Residuals:
Min      1Q  Median      3Q     Max
-36.610 -15.976  -2.928  12.371 115.555

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -145.70316   18.44672  -7.899 2.53e-12 ***
Solar.R        0.05711    0.02572   2.221   0.0285 *
Temp           2.27847    0.24600   9.262 2.22e-15 ***
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 23.5 on 108 degrees of freedom
  (42 observations deleted due to missingness)
Multiple R-squared:  0.5103,  Adjusted R-squared:  0.5012
F-statistic: 56.28 on 2 and 108 DF,  p-value: < 2.2e-16

The summary() command gives a standard sort of report. The section labelled Coefficients: gives the basic regression information about the various terms of the regression model. There is a row for each predictor plus one for the intercept.

The bottom part of the summary gives the overall model “result”.

Regression coefficients

Once you have a regression model result you can “extract” the coefficients (that is the slopes and intercept) with the coef() command:

> coef(mod)
  (Intercept)       Solar.R          Temp
-145.70315510    0.05710959    2.27846684

The regular coefficients are in units related to the original variables.

Beta coefficients

R does not compute beta coefficients as standard. In a regression the beta coefficients are standardized against one another and are therefore in units of standard deviation. This allows you to compare variables.

You can calculate a beta coefficient like so:

beta = coeff * SD(x) / SD(y)

Where SD is standard deviation.

In this dataset there are some missing values so to get the standard deviation:

> attach(airquality)
> sd(Ozone, na.rm = TRUE)
[1] 32.98788

> sd(Solar.R, na.rm = TRUE)
[1] 90.05842

> sd(Temp, na.rm = TRUE)
[1] 9.46527

> detach(airquality)

You can get an individual coefficient with the square brackets:

> coef(mod)[2]
   Solar.R
0.05710959

Now it is a simple matter of computing the various beta coefficients!

A simpler method is to install the package lm.beta and then use that.

> install.pacakges("lm.beta")
> library(lm.beta)
> lm.beta(mod)
Call:
lm(formula = Ozone ~ Solar.R + Temp, data = airquality)

Standardized Coefficients:
(Intercept)     Solar.R        Temp
  0.0000000   0.1564394   0.6525345
Comments are closed.