2024-04-01
Case studies:
A. Effect of light on meadowfoam flowering
B. Studying the brain sizes of mammals
Specifying the model.
Fitting the model: least squares.
Interpretation of the coefficients.
\(F\)-statistic revisited
Matrix approach to linear regression.
Investigating the design matrix
brains
\[ \texttt{Brain}_i = \beta_0 + \beta_1 \cdot \texttt{Body}_i + \beta_2 \cdot \texttt{Gestation}_i + \beta_3 \cdot \texttt{Litter}_i + \epsilon_i \]
brains
\[ \texttt{Brain}_i = \beta_0 + \beta_1 \cdot \texttt{Body}_i + \beta_2 \cdot \texttt{Litter}_i + \epsilon_i \]
Just as in simple linear regression, model is fit by minimizing
\[\begin{aligned} SSE(\beta_0, \dots, \beta_p) &= \sum_{i=1}^n\left(Y_i - \left(\beta_0 + \sum_{j=1}^p \beta_j X_{ij} \right) \right)^2 \\ &= \|Y - \widehat{Y}(\beta)\|^2 \end{aligned}\]
Minimizers: \(\widehat{\beta} = (\widehat{\beta}_0, \dots, \widehat{\beta}_p)\) are the “least squares estimates”: are also normally distributed as in simple linear regression.
As in simple regression
\[\widehat{\sigma}^2 = \frac{SSE}{n-p-1} \sim \sigma^2 \cdot \frac{\chi^2_{n-p-1}}{n-p-1}\]
independent of \(\widehat{\beta}\).
Why \(\chi^2_{n-p-1}\)? Typically, the degrees of freedom in the estimate of \(\sigma^2\) is \(n-\# \text{number of parameters in regression function}\).
brains.lm
Take \(\beta_1=\beta_{\tt Body}\) for example. This is the amount the average Brain
weight increases for one kg of increase in Body
, keeping everything else constant.
We refer to this as the effect of Body
allowing for or controlling for the other variables.
Beaked whale
and artificially add a kg to its Body
and compute the predicted weightsimpler.lm
simpler.lm
model\[\begin{aligned} SSE &= \sum_{i=1}^n(Y_i - \widehat{Y}_i)^2 \\ SSR &= \sum_{i=1}^n(\overline{Y} - \widehat{Y}_i)^2 \\ SST &= \sum_{i=1}^n(Y_i - \overline{Y})^2 = SSE + SSR \\ R^2 &= \frac{SSR}{SST} \end{aligned}\]
\(R^2\) is now called the multiple correlation coefficient of the model, or the coefficient of multiple determination.
The sums of squares and \(R^2\) are defined analogously to those in simple linear regression.
As we add more and more variables to the model – even random ones, \(R^2\) will increase to 1.
Adjusted \(R^2\) tries to take this into account by replacing sums of squares by mean squares
\[\begin{equation} R^2_a = 1 - \frac{SSE/(n-p-1)}{SST/(n-1)} = 1 - \frac{MSE}{MST}. \end{equation}\]
summary(brains.lm)
\[\texttt{Brain}_i = \beta_0 + \beta_1 \cdot \texttt{Body}_i + \beta_2 \cdot \texttt{Gestation}_i + \beta_3 \cdot \texttt{Litter}_i + \epsilon_i\]
Reduced model:
\[\texttt{Brain}_i = \beta_0 + \varepsilon_i\]
Statistic:
\[F=\frac{(SSE_R - SSE_F) / (df_R - df_F)}{SSE_F / df_F} = \frac{SSR/df(SSR)}{SSE/df(SSE)} = \frac{MSR}{MSE}.\]
Sides of the triangle: \(df_R-df_F=3\), \(df_F=92\)
Hypotenuse: \(df_R=95\)
\[{ Y}_{n \times 1} = {X}_{n \times (p + 1)} {\beta}_{(p+1) \times 1} + {\varepsilon}_{n \times 1}\]
\({X}\) is called the design matrix of the model
\({\varepsilon} \sim N(0, \sigma^2 I_{n \times n})\) is multivariate normal
\[\begin{equation} SSE(\beta) = ({Y} - {X} {\beta})'({Y} - {X} {\beta}) = \|Y-X\beta\|^2 \end{equation}\]
\[\begin{equation} X = \begin{pmatrix} 1 & X_{11} & X_{12} & \dots & X_{1p} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & X_{n1} & X_{n2} &\dots & X_{np} \\ \end{pmatrix} \end{equation}\]
The matrix X is the same as formed by R
Normal equations
\[\frac{\partial}{\partial \beta_j} SSE \biggl|_{\beta = \widehat{\beta}_{}} = -2 \left({Y\ } - {X} \widehat{\beta}_{} \right)^T {X}_j = 0, \qquad 0 \leq j \leq p.\]
Equivalent to
\[\begin{aligned} ({Y} - {X}{\widehat{\beta}_{}})^T{X} &= 0 \\ {\widehat{\beta}} &= ({X}^T{X})^{-1}{X}^T{Y} \end{aligned}\]
To obtain the distribution of \(\hat{\beta}\) we used the following fact about the multivariate Normal.
Suppose \(Z \sim N(\mu,\Sigma)\). Then, for any fixed matrix \(A\)
\[ AZ \sim N(A\mu, A\Sigma A^T). \]
Above, we saw that \(\hat{\beta}\) is equal to a matrix times \(Y\). The matrix form of our model is
\[ Y \sim N(X\beta, \sigma^2 I). \]
Therefore,
\[ \begin{aligned} \hat{\beta} &\sim N\left((X^TX)^{-1}X^T (X\beta), (X^TX)^{-1}X^T (\sigma^2 I) X (X^TX)^{-1}\right) \\ &\sim N(\beta, \sigma^2 (X^TX)^{-1}). \end{aligned} \]
flower
experimentR
has used a binary column for factor(Time)
.R
discards one of the columns. Why?Suppose we believe that Flowers
varies linearly with Intensity
but the slope depends on Time
.
We’d need two parameters for Intensity
Time==1
? And Time==2
?Set \(\beta_1=\beta_{\tt Intensity}\), \(\beta_2=\beta_{\tt Time2}\), \(\beta_3=\beta_{\tt Time2:Intensity}\).
In Time==1
group, one unit change of Intensity
leads to \(\beta_1\) units of change in Flower
.
In Time==2
group, one unit change of Intensity
leads to \(\beta_1 + \beta_3\) units of change in Flower
.
Test \(H_0\) slope is the same within each group.