Multiple linear regression

STATS 191

2024-04-01

Outline

  • Case studies:

    A. Effect of light on meadowfoam flowering

    B. Studying the brain sizes of mammals

  • Specifying the model.

  • Fitting the model: least squares.

  • Interpretation of the coefficients.

  • \(F\)-statistic revisited

  • Matrix approach to linear regression.

  • Investigating the design matrix

Case study A:

  • Researchers manipulate timing and intensity of light to investigate effect on number of flowers.

Case study B:

  • How are litter size, gestation period associated to brain size in mammals?

A model for the brains data

Figure depicts our model: to generate \(Y_i\):

  • First fix \(X=(X_1,\dots,X_p)\), form the mean (\(\beta_0 + \sum_j \beta_j X_{j}\)), add an error \(\epsilon\)

A model for brains

Multiple linear regression model

\[ \texttt{Brain}_i = \beta_0 + \beta_1 \cdot \texttt{Body}_i + \beta_2 \cdot \texttt{Gestation}_i + \beta_3 \cdot \texttt{Litter}_i + \epsilon_i \]

Another model for brains

\[ \texttt{Brain}_i = \beta_0 + \beta_1 \cdot \texttt{Body}_i + \beta_2 \cdot \texttt{Litter}_i + \epsilon_i \]

Fitting a multiple linear regression model

  • Just as in simple linear regression, model is fit by minimizing

    \[\begin{aligned} SSE(\beta_0, \dots, \beta_p) &= \sum_{i=1}^n\left(Y_i - \left(\beta_0 + \sum_{j=1}^p \beta_j X_{ij} \right) \right)^2 \\ &= \|Y - \widehat{Y}(\beta)\|^2 \end{aligned}\]

  • Minimizers: \(\widehat{\beta} = (\widehat{\beta}_0, \dots, \widehat{\beta}_p)\) are the “least squares estimates”: are also normally distributed as in simple linear regression.

Estimating \(\sigma^2\)

  • As in simple regression

    \[\widehat{\sigma}^2 = \frac{SSE}{n-p-1} \sim \sigma^2 \cdot \frac{\chi^2_{n-p-1}}{n-p-1}\]

    independent of \(\widehat{\beta}\).

  • Why \(\chi^2_{n-p-1}\)? Typically, the degrees of freedom in the estimate of \(\sigma^2\) is \(n-\# \text{number of parameters in regression function}\).

Interpretation of \(\beta_j\) in brains.lm

  • Take \(\beta_1=\beta_{\tt Body}\) for example. This is the amount the average Brain weight increases for one kg of increase in Body, keeping everything else constant.

  • We refer to this as the effect of Body allowing for or controlling for the other variables.

Example

  • Let’s take Beaked whale and artificially add a kg to its Body and compute the predicted weight

Same example in simpler.lm

  • To emphasize the parameters depend on the other variables, let’s redo in the simpler.lm model

\(R^2\) for multiple regression

\[\begin{aligned} SSE &= \sum_{i=1}^n(Y_i - \widehat{Y}_i)^2 \\ SSR &= \sum_{i=1}^n(\overline{Y} - \widehat{Y}_i)^2 \\ SST &= \sum_{i=1}^n(Y_i - \overline{Y})^2 = SSE + SSR \\ R^2 &= \frac{SSR}{SST} \end{aligned}\]

\(R^2\) is now called the multiple correlation coefficient of the model, or the coefficient of multiple determination.

The sums of squares and \(R^2\) are defined analogously to those in simple linear regression.

Computing \(R^2\) by hand

Adjusted \(R^2\)

  • As we add more and more variables to the model – even random ones, \(R^2\) will increase to 1.

  • Adjusted \(R^2\) tries to take this into account by replacing sums of squares by mean squares

\[\begin{equation} R^2_a = 1 - \frac{SSE/(n-p-1)}{SST/(n-1)} = 1 - \frac{MSE}{MST}. \end{equation}\]

Computing \(R^2_a\) by hand

\(F\)-test in summary(brains.lm)

  • Full model:

\[\texttt{Brain}_i = \beta_0 + \beta_1 \cdot \texttt{Body}_i + \beta_2 \cdot \texttt{Gestation}_i + \beta_3 \cdot \texttt{Litter}_i + \epsilon_i\]

  • Reduced model:

    \[\texttt{Brain}_i = \beta_0 + \varepsilon_i\]

  • Statistic:

    \[F=\frac{(SSE_R - SSE_F) / (df_R - df_F)}{SSE_F / df_F} = \frac{SSR/df(SSR)}{SSE/df(SSE)} = \frac{MSR}{MSE}.\]

Right triangle again

  • Sides of the triangle: \(df_R-df_F=3\), \(df_F=92\)

  • Hypotenuse: \(df_R=95\)

Matrix formulation

\[{ Y}_{n \times 1} = {X}_{n \times (p + 1)} {\beta}_{(p+1) \times 1} + {\varepsilon}_{n \times 1}\]

  • \({X}\) is called the design matrix of the model

  • \({\varepsilon} \sim N(0, \sigma^2 I_{n \times n})\) is multivariate normal

\(SSE\) in matrix form

\[\begin{equation} SSE(\beta) = ({Y} - {X} {\beta})'({Y} - {X} {\beta}) = \|Y-X\beta\|^2 \end{equation}\]

Design matrix

  • The design matrix is the \(n \times (p+1)\) matrix with entries

\[\begin{equation} X = \begin{pmatrix} 1 & X_{11} & X_{12} & \dots & X_{1p} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & X_{n1} & X_{n2} &\dots & X_{np} \\ \end{pmatrix} \end{equation}\]

The matrix X is the same as formed by R

Math aside: least squares solution

  • Normal equations

    \[\frac{\partial}{\partial \beta_j} SSE \biggl|_{\beta = \widehat{\beta}_{}} = -2 \left({Y\ } - {X} \widehat{\beta}_{} \right)^T {X}_j = 0, \qquad 0 \leq j \leq p.\]

  • Equivalent to

\[\begin{aligned} ({Y} - {X}{\widehat{\beta}_{}})^T{X} &= 0 \\ {\widehat{\beta}} &= ({X}^T{X})^{-1}{X}^T{Y} \end{aligned}\]

  • Distribution: \(\widehat{\beta} \sim N(\beta, \sigma^2 (X^TX)^{-1}).\)

Math aside: multivariate normal

  • To obtain the distribution of \(\hat{\beta}\) we used the following fact about the multivariate Normal.

  • Suppose \(Z \sim N(\mu,\Sigma)\). Then, for any fixed matrix \(A\)

\[ AZ \sim N(A\mu, A\Sigma A^T). \]

Math aside: how did we derive the distribution of \(\hat{\beta}\)?

Above, we saw that \(\hat{\beta}\) is equal to a matrix times \(Y\). The matrix form of our model is

\[ Y \sim N(X\beta, \sigma^2 I). \]

Therefore,

\[ \begin{aligned} \hat{\beta} &\sim N\left((X^TX)^{-1}X^T (X\beta), (X^TX)^{-1}X^T (\sigma^2 I) X (X^TX)^{-1}\right) \\ &\sim N(\beta, \sigma^2 (X^TX)^{-1}). \end{aligned} \]

Math aside: checking the equation

Categorical variables

  • Recall case study A: the flower experiment

Design matrix with categorical variables

  • R has used a binary column for factor(Time).

How categorical variables are encoded

  • We can change the columns in the design matrix:

Design matrix with categorical variables

  • By default, R discards one of the columns. Why?

Some additional models

~ Intensity

Some additional models

~ Intensity + factor(Time)

Some additional models

~ factor(Intensity) + factor(Time)

Interactions

  • Suppose we believe that Flowers varies linearly with Intensity but the slope depends on Time.

  • We’d need two parameters for Intensity

  • What is the regression line when Time==1? And Time==2?

Different models across groups

  • Set \(\beta_1=\beta_{\tt Intensity}\), \(\beta_2=\beta_{\tt Time2}\), \(\beta_3=\beta_{\tt Time2:Intensity}\).

  • In Time==1 group, one unit change of Intensity leads to \(\beta_1\) units of change in Flower.

  • In Time==2 group, one unit change of Intensity leads to \(\beta_1 + \beta_3\) units of change in Flower.

  • Test \(H_0\) slope is the same within each group.

Visualizing interaction