Multiple linear regression

STATS 191

2024-04-01

Outline

Case studies:

A. Effect of light on meadowfoam flowering

B. Studying the brain sizes of mammals
Specifying the model.
Fitting the model: least squares.
Interpretation of the coefficients.
\(F\)-statistic revisited
Matrix approach to linear regression.
Investigating the design matrix

Case study A:

Researchers manipulate timing and intensity of light to investigate effect on number of flowers.

Case study B:

How are litter size, gestation period associated to brain size in mammals?

A model for the brains data

Figure depicts our model: to generate \(Y_i\):

First fix \(X=(X_1,\dots,X_p)\), form the mean (\(\beta_0 + \sum_j \beta_j X_{j}\)), add an error \(\epsilon\)

A model for `brains`

Multiple linear regression model

\[ \texttt{Brain}_i = \beta_0 + \beta_1 \cdot \texttt{Body}_i + \beta_2 \cdot \texttt{Gestation}_i + \beta_3 \cdot \texttt{Litter}_i + \epsilon_i \]

Another model for `brains`

\[ \texttt{Brain}_i = \beta_0 + \beta_1 \cdot \texttt{Body}_i + \beta_2 \cdot \texttt{Litter}_i + \epsilon_i \]

Fitting a multiple linear regression model

Just as in simple linear regression, model is fit by minimizing

\[\begin{aligned} SSE(\beta_0, \dots, \beta_p) &= \sum_{i=1}^n\left(Y_i - \left(\beta_0 + \sum_{j=1}^p \beta_j X_{ij} \right) \right)^2 \\ &= \|Y - \widehat{Y}(\beta)\|^2 \end{aligned}\]
Minimizers: \(\widehat{\beta} = (\widehat{\beta}_0, \dots, \widehat{\beta}_p)\) are the “least squares estimates”: are also normally distributed as in simple linear regression.

Estimating \(\sigma^2\)

As in simple regression

\[\widehat{\sigma}^2 = \frac{SSE}{n-p-1} \sim \sigma^2 \cdot \frac{\chi^2_{n-p-1}}{n-p-1}\]

independent of \(\widehat{\beta}\).
Why \(\chi^2_{n-p-1}\)? Typically, the degrees of freedom in the estimate of \(\sigma^2\) is \(n-\# \text{number of parameters in regression function}\).

Interpretation of \(\beta_j\) in `brains.lm`

Take \(\beta_1=\beta_{\tt Body}\) for example. This is the amount the average Brain weight increases for one kg of increase in Body, keeping everything else constant.
We refer to this as the effect of Body allowing for or controlling for the other variables.

Example

Let’s take Beaked whale and artificially add a kg to its Body and compute the predicted weight

Same example in `simpler.lm`

To emphasize the parameters depend on the other variables, let’s redo in the simpler.lm model

\(R^2\) for multiple regression

\[\begin{aligned} SSE &= \sum_{i=1}^n(Y_i - \widehat{Y}_i)^2 \\ SSR &= \sum_{i=1}^n(\overline{Y} - \widehat{Y}_i)^2 \\ SST &= \sum_{i=1}^n(Y_i - \overline{Y})^2 = SSE + SSR \\ R^2 &= \frac{SSR}{SST} \end{aligned}\]

\(R^2\) is now called the multiple correlation coefficient of the model, or the coefficient of multiple determination.

The sums of squares and \(R^2\) are defined analogously to those in simple linear regression.

Computing \(R^2\) by hand

Adjusted \(R^2\)

As we add more and more variables to the model – even random ones, \(R^2\) will increase to 1.
Adjusted \(R^2\) tries to take this into account by replacing sums of squares by mean squares

\[\begin{equation} R^2_a = 1 - \frac{SSE/(n-p-1)}{SST/(n-1)} = 1 - \frac{MSE}{MST}. \end{equation}\]

Computing \(R^2_a\) by hand

\(F\)-test in `summary(brains.lm)`

Full model:

\[\texttt{Brain}_i = \beta_0 + \beta_1 \cdot \texttt{Body}_i + \beta_2 \cdot \texttt{Gestation}_i + \beta_3 \cdot \texttt{Litter}_i + \epsilon_i\]

Reduced model:

\[\texttt{Brain}_i = \beta_0 + \varepsilon_i\]
Statistic:

\[F=\frac{(SSE_R - SSE_F) / (df_R - df_F)}{SSE_F / df_F} = \frac{SSR/df(SSR)}{SSE/df(SSE)} = \frac{MSR}{MSE}.\]

Right triangle again

Sides of the triangle: \(df_R-df_F=3\), \(df_F=92\)
Hypotenuse: \(df_R=95\)

Matrix formulation

\[{ Y}_{n \times 1} = {X}_{n \times (p + 1)} {\beta}_{(p+1) \times 1} + {\varepsilon}_{n \times 1}\]

\({X}\) is called the design matrix of the model
\({\varepsilon} \sim N(0, \sigma^2 I_{n \times n})\) is multivariate normal

\(SSE\) in matrix form

\[\begin{equation} SSE(\beta) = ({Y} - {X} {\beta})'({Y} - {X} {\beta}) = \|Y-X\beta\|^2 \end{equation}\]

Design matrix

The design matrix is the \(n \times (p+1)\) matrix with entries

\[\begin{equation} X = \begin{pmatrix} 1 & X_{11} & X_{12} & \dots & X_{1p} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & X_{n1} & X_{n2} &\dots & X_{np} \\ \end{pmatrix} \end{equation}\]

The matrix X is the same as formed by R

Math aside: least squares solution

Normal equations

\[\frac{\partial}{\partial \beta_j} SSE \biggl|_{\beta = \widehat{\beta}_{}} = -2 \left({Y\ } - {X} \widehat{\beta}_{} \right)^T {X}_j = 0, \qquad 0 \leq j \leq p.\]
Equivalent to

\[\begin{aligned} ({Y} - {X}{\widehat{\beta}_{}})^T{X} &= 0 \\ {\widehat{\beta}} &= ({X}^T{X})^{-1}{X}^T{Y} \end{aligned}\]

Distribution: \(\widehat{\beta} \sim N(\beta, \sigma^2 (X^TX)^{-1}).\)

Math aside: multivariate normal

To obtain the distribution of \(\hat{\beta}\) we used the following fact about the multivariate Normal.
Suppose \(Z \sim N(\mu,\Sigma)\). Then, for any fixed matrix \(A\)

\[ AZ \sim N(A\mu, A\Sigma A^T). \]

Math aside: how did we derive the distribution of \(\hat{\beta}\)?

Above, we saw that \(\hat{\beta}\) is equal to a matrix times \(Y\). The matrix form of our model is

\[ Y \sim N(X\beta, \sigma^2 I). \]

Therefore,

\[ \begin{aligned} \hat{\beta} &\sim N\left((X^TX)^{-1}X^T (X\beta), (X^TX)^{-1}X^T (\sigma^2 I) X (X^TX)^{-1}\right) \\ &\sim N(\beta, \sigma^2 (X^TX)^{-1}). \end{aligned} \]

Math aside: checking the equation

Categorical variables

Recall case study A: the flower experiment

Design matrix with categorical variables

R has used a binary column for factor(Time).

How categorical variables are encoded

We can change the columns in the design matrix:

Design matrix with categorical variables

By default, R discards one of the columns. Why?

Some additional models

~ Intensity

Some additional models

~ Intensity + factor(Time)

Some additional models

~ factor(Intensity) + factor(Time)

Interactions

Suppose we believe that Flowers varies linearly with Intensity but the slope depends on Time.
We’d need two parameters for Intensity

What is the regression line when Time==1? And Time==2?

Different models across groups

Set \(\beta_1=\beta_{\tt Intensity}\), \(\beta_2=\beta_{\tt Time2}\), \(\beta_3=\beta_{\tt Time2:Intensity}\).
In Time==1 group, one unit change of Intensity leads to \(\beta_1\) units of change in Flower.
In Time==2 group, one unit change of Intensity leads to \(\beta_1 + \beta_3\) units of change in Flower.
Test \(H_0\) slope is the same within each group.

Multiple linear regression

Outline

Case study A:

Case study B:

A model for the brains data

Figure depicts our model: to generate \(Y_i\):

A model for brains

Multiple linear regression model

Another model for brains

Fitting a multiple linear regression model

Estimating \(\sigma^2\)

Interpretation of \(\beta_j\) in brains.lm

Example

Same example in simpler.lm

\(R^2\) for multiple regression

Computing \(R^2\) by hand

Adjusted \(R^2\)

Computing \(R^2_a\) by hand

\(F\)-test in summary(brains.lm)

Right triangle again

Matrix formulation

\(SSE\) in matrix form

Design matrix

Math aside: least squares solution

Math aside: multivariate normal

Math aside: how did we derive the distribution of \(\hat{\beta}\)?

Math aside: checking the equation

Categorical variables

Design matrix with categorical variables

How categorical variables are encoded

Design matrix with categorical variables

Some additional models

~ Intensity

Some additional models

~ Intensity + factor(Time)

Some additional models

~ factor(Intensity) + factor(Time)

Interactions

Different models across groups

Visualizing interaction

A model for `brains`

Another model for `brains`

Interpretation of \(\beta_j\) in `brains.lm`

Same example in `simpler.lm`

\(F\)-test in `summary(brains.lm)`