Inference in multiple linear regression

STATS 191

2024-04-01

Outline

Case studies:

A. Galileo’s falling bodies
1. Energy costs of echolocation

Energy costs of echolocation

Fitting a model

Other models

Case study B: Galileo’s data

Galileo fit a quadratic model to his data
Note the notation I(Height^2) – without I a quadratic term will not be added…

A different way to fit the model

Predictions / CIs are the same

Tests of quadratic effect are the same

Confidence intervals

Suppose we want a $(1-\alpha)\cdot 100\%$ CI for $\sum_{j=0}^p a_j\beta_j$.
Just as in simple linear regression:

\[\sum_{j=0}^p a_j \widehat{\beta}_j \pm t_{1-\alpha/2, n-p-1} \cdot SE\left(\sum_{j=0}^p a_j\widehat{\beta}_j\right).\]

$T$-statistics revisited

Of course, these confidence intervals are based on the standard ingredients of a $T$-statistic.

Suppose we want to test

\[\begin{equation} H_0:\sum_{j=0}^p a_j\beta_j= h. \end{equation}\]

As in simple linear regression, it is based on

\[\begin{equation} T = \frac{\sum_{j=0}^p a_j \widehat{\beta}_j - h}{SE(\sum_{j=0}^p a_j \widehat{\beta}_j)}. \end{equation}\]

If $H_0$ is true, then $T \sim t_{n-p-1}$, so we reject $H_0$ at level $\alpha$ if

\[\begin{equation} \begin{aligned} |T| &\geq t_{1-\alpha/2,n-p-1}, \qquad \text{ OR} \\ p-\text{value} &= {\tt 2*(1-pt(|T|, n-p-1))} \leq \alpha. \end{aligned} \end{equation}\]

Let’s do a quick calculation to remind ourselves the relationships of the variables in the table above.

One-sided tests

Suppose, instead, we wanted to test the one-sided hypothesis

\[\begin{equation} H_0:\sum_{j=0}^p a_j\beta_j \leq h, \ \text{vs.} \ H_a: \sum_{j=0}^p a_j\beta_j > h \end{equation}\]

If $H_0$ is true, then $T$ is no longer exactly $t_{n-p-1}$ but we still have

\[\begin{equation} \mathbb{P}\left(T > t_{1-\alpha, n-p-1}\right) \leq 1 - \alpha \end{equation}\]

We reject $H_0$ at level $\alpha$ if

\[\begin{equation} \begin{aligned} T &\geq t_{1-\alpha,n-p-1}, \qquad \text{ OR} \\ p-\text{value} &= {\tt (1-pt(T, n-p-1))} \leq \alpha. \end{aligned} \end{equation}\]

Standard error of $\sum_{j=0}^p a_j \widehat{\beta}_j$

In order to form these $T$ statistics, we need the $SE$ of our estimate $\sum_{j=0}^p a_j \widehat{\beta}_j$.
Based on matrix approach to regression

\[\begin{equation} SE\left(\sum_{j=0}^p a_j\widehat{\beta}_j \right) = SE\left(a^T\widehat{\beta} \right) = \sqrt{\widehat{\sigma}^2 \cdot a^T (X^TX)^{-1} a}. \end{equation}\]

Don’t worry too much about specific implementation – for much of the effects we want R will do this for you in general.

The standard errors of each coefficient estimate are the square root of the diagonal entries. They appear as the Std. Error column in the coef table.

Prediction / forecasting interval

Basically identical to simple linear regression.
Prediction interval at $X_{1,new}, \dots, X_{p,new}$:

\[\begin{equation} \begin{aligned} \widehat{\beta}_0 + \sum_{j=1}^p X_{j,new} \widehat{\beta}_j\pm t_{1-\alpha/2, n-p-1} \times & \\ \qquad \sqrt{\widehat{\sigma}^2 + SE\left(\widehat{\beta}_0 + \sum_{j=1}^p X_{j,new}\widehat{\beta}_j\right)^2} & \end{aligned} \end{equation}\]

Questions about many (combinations) of $\beta_j$’s

In multiple regression we can ask more complicated questions than in simple regression.
For instance, in bats.lm we could ask whether Type is important at all?
These questions can be answered answered by $F$-statistics.

Dropping one or more variables

Suppose we wanted to test the above hypothesis Formally, the null hypothesis is:

\[ H_0: \beta_1=\beta_2=0 \]

The alternative is

\[ H_a = \text{one of $ \beta_1,\beta_2$ is not 0}. \]

This test is an $F$-test based on two models

\[\begin{equation} \begin{aligned} \text{Full:} & \qquad \texttt{Energy $\widetilde{}$ Type + Mass} \\ \text{Reduced:} & \qquad \texttt{Energy $\widetilde{}$ Mass} \\ \end{aligned} \end{equation}\]

$SSE$ of a model

In the graphic, a “model”, ${\cal M}$ is a subspace of $\mathbb{R}^n$ (e.g. column space of ${X}$).
Least squares fit = projection onto the subspace of ${\cal M}$, yielding predicted values $\widehat{Y}_{{\cal M}}$
Error sum of squares:

\[SSE({\cal M}) = \|Y - \widehat{Y}_{{\cal M}}\|^2.\]

Least squares for $F$ statistic

Fits of a full and reduced model $\hat{Y}_F$ and $\hat{Y}_R$
The difference $\hat{Y}_F-\hat{Y}_R$.

Right triangle for $F$ statistic

Sides of the triangle: $SSE_R-SSE_F$, $SSE_F$
Hypotenuse: $SSE_R$

Right triangle with full and reduced model: degrees of freedom

Sides of the triangle: $df_R-df_F$, $df_F$
Hypotenuse: $df_R$

$F$-statistic for $H_0:\beta_{1}=\beta_{2}=0$

We compute the $F$ statistic the same to compare any two (nested) models

\[\begin{equation} \begin{aligned} F &=\frac{\frac{SSE(R) - SSE(F)}{2}}{\frac{SSE(F)}{n-1-p}} \\ & \sim F_{2, 16} \qquad (\text{if $H_0$ is true}) \end{aligned} \end{equation}\]

Reject $H_0$ at level $\alpha$ if $F > F_{1-\alpha, 2, 16}$.

General $F$-tests

Given two models $R \subset F$ (i.e. $R$ is a subspace of $F$), we can consider testing

\[\begin{equation} H_0: \text{$R$ is adequate (i.e. $\mathbb{E}(Y) \in R$)} \end{equation}\]

\[\begin{equation} H_a: \text{$F$ is adequate (i.e. $\mathbb{E}(Y) \in F$)} \end{equation}\]

The least squares picture has models $X_R$ and $X_F=X_R+(X_F \perp X_R) \dots$

The test statistic is

\[\begin{equation} F = \frac{(SSE(R) - SSE(F)) / (df_R - df_F)}{SSE(F)/df_F} \end{equation}\]

If $H_0$ is true, $F \sim F_{df_R-df_F, df_F}$ so we reject $H_0$ at level $\alpha$ if $F > F_{1-\alpha, df_R-df_F, df_F}$.

Constraining coefficients

Suppose we wanted to test $H_0$: the line for non-echolocating bats has the same intercept as the line for non-echolocating birds.
Can be expressed as $H_0:\beta_1=\beta_2$ in bats.lm.

Strategy 1: fit a model in which this is forced to be true

Strategy 2: a $T$-statistic

Hypothesis is $H_0:\beta_1-\beta_2=0$
This method doesn’t require fitting the special model null_bats.lm!
Can be generalized to $F$ tests (hypotheses involving multiple contrasts of $\beta$)

Math aside: general linear hypothesis

Suppose we want to test the null hypothesis

\[\begin{equation} H_0:C_{q \times (p+1)}\beta_{(p+1) \times 1} = h \end{equation}\]

Alternative is

\[\begin{equation} H_a :C_{q \times (p+1)}\beta_{(p+1) \times 1} \neq h. \end{equation}\]

Math aside: $F$ statistic in general linear hypothesis

Numerator

\[ (C\hat{\beta}-h)^T \left(C(X^TX)^{-1}C^T \right)^{-1} (C\hat{\beta}-h) / q \]

Denominator: the usual MSE
We just used special case $q=1$ above…

Inference in multiple linear regression

Outline

Energy costs of echolocation

Fitting a model

Other models

Case study B: Galileo’s data

Confidence intervals

\(T\)-statistics revisited

One-sided tests

Standard error of \(\sum_{j=0}^p a_j \widehat{\beta}_j\)

Prediction / forecasting interval

Questions about many (combinations) of \(\beta_j\)’s

Dropping one or more variables

\(SSE\) of a model

Least squares for \(F\) statistic

Right triangle for \(F\) statistic

Right triangle with full and reduced model: degrees of freedom

\(F\)-statistic for \(H_0:\beta_{1}=\beta_{2}=0\)

General \(F\)-tests

Constraining coefficients

Strategy 1: fit a model in which this is forced to be true

Strategy 2: a \(T\)-statistic

Math aside: general linear hypothesis

Math aside: \(F\) statistic in general linear hypothesis