Inference in multiple linear regression

STATS 191

2024-04-01

Outline

  • Case studies:

    A. Galileo’s falling bodies

    1. Energy costs of echolocation

Energy costs of echolocation

Fitting a model

Other models

Case study B: Galileo’s data

  • Galileo fit a quadratic model to his data

  • Note the notation I(Height^2) – without I a quadratic term will not be added…

  • A different way to fit the model
  • Predictions / CIs are the same
  • Tests of quadratic effect are the same

Confidence intervals

  • Suppose we want a \((1-\alpha)\cdot 100\%\) CI for \(\sum_{j=0}^p a_j\beta_j\).

  • Just as in simple linear regression:

    \[\sum_{j=0}^p a_j \widehat{\beta}_j \pm t_{1-\alpha/2, n-p-1} \cdot SE\left(\sum_{j=0}^p a_j\widehat{\beta}_j\right).\]

\(T\)-statistics revisited

Of course, these confidence intervals are based on the standard ingredients of a \(T\)-statistic.

  • Suppose we want to test

\[\begin{equation} H_0:\sum_{j=0}^p a_j\beta_j= h. \end{equation}\]

  • As in simple linear regression, it is based on

\[\begin{equation} T = \frac{\sum_{j=0}^p a_j \widehat{\beta}_j - h}{SE(\sum_{j=0}^p a_j \widehat{\beta}_j)}. \end{equation}\]

  • If \(H_0\) is true, then \(T \sim t_{n-p-1}\), so we reject \(H_0\) at level \(\alpha\) if

\[\begin{equation} \begin{aligned} |T| &\geq t_{1-\alpha/2,n-p-1}, \qquad \text{ OR} \\ p-\text{value} &= {\tt 2*(1-pt(|T|, n-p-1))} \leq \alpha. \end{aligned} \end{equation}\]

Let’s do a quick calculation to remind ourselves the relationships of the variables in the table above.

One-sided tests

  • Suppose, instead, we wanted to test the one-sided hypothesis

\[\begin{equation} H_0:\sum_{j=0}^p a_j\beta_j \leq h, \ \text{vs.} \ H_a: \sum_{j=0}^p a_j\beta_j > h \end{equation}\]

  • If \(H_0\) is true, then \(T\) is no longer exactly \(t_{n-p-1}\) but we still have

\[\begin{equation} \mathbb{P}\left(T > t_{1-\alpha, n-p-1}\right) \leq 1 - \alpha \end{equation}\]

  • We reject \(H_0\) at level \(\alpha\) if

\[\begin{equation} \begin{aligned} T &\geq t_{1-\alpha,n-p-1}, \qquad \text{ OR} \\ p-\text{value} &= {\tt (1-pt(T, n-p-1))} \leq \alpha. \end{aligned} \end{equation}\]

Standard error of \(\sum_{j=0}^p a_j \widehat{\beta}_j\)

  • In order to form these \(T\) statistics, we need the \(SE\) of our estimate \(\sum_{j=0}^p a_j \widehat{\beta}_j\).

  • Based on matrix approach to regression

\[\begin{equation} SE\left(\sum_{j=0}^p a_j\widehat{\beta}_j \right) = SE\left(a^T\widehat{\beta} \right) = \sqrt{\widehat{\sigma}^2 \cdot a^T (X^TX)^{-1} a}. \end{equation}\]

  • Don’t worry too much about specific implementation – for much of the effects we want R will do this for you in general.

The standard errors of each coefficient estimate are the square root of the diagonal entries. They appear as the Std. Error column in the coef table.

Prediction / forecasting interval

  • Basically identical to simple linear regression.

  • Prediction interval at \(X_{1,new}, \dots, X_{p,new}\):

\[\begin{equation} \begin{aligned} \widehat{\beta}_0 + \sum_{j=1}^p X_{j,new} \widehat{\beta}_j\pm t_{1-\alpha/2, n-p-1} \times & \\ \qquad \sqrt{\widehat{\sigma}^2 + SE\left(\widehat{\beta}_0 + \sum_{j=1}^p X_{j,new}\widehat{\beta}_j\right)^2} & \end{aligned} \end{equation}\]

Questions about many (combinations) of \(\beta_j\)’s

  • In multiple regression we can ask more complicated questions than in simple regression.

  • For instance, in bats.lm we could ask whether Type is important at all?

  • These questions can be answered answered by \(F\)-statistics.

Dropping one or more variables

  • Suppose we wanted to test the above hypothesis Formally, the null hypothesis is:

\[ H_0: \beta_1=\beta_2=0 \]

  • The alternative is

\[ H_a = \text{one of $ \beta_1,\beta_2$ is not 0}. \]

  • This test is an \(F\)-test based on two models

\[\begin{equation} \begin{aligned} \text{Full:} & \qquad \texttt{Energy $\widetilde{}$ Type + Mass} \\ \text{Reduced:} & \qquad \texttt{Energy $\widetilde{}$ Mass} \\ \end{aligned} \end{equation}\]

\(SSE\) of a model

  • In the graphic, a “model”, \({\cal M}\) is a subspace of \(\mathbb{R}^n\) (e.g. column space of \({X}\)).

  • Least squares fit = projection onto the subspace of \({\cal M}\), yielding predicted values \(\widehat{Y}_{{\cal M}}\)

  • Error sum of squares:

\[SSE({\cal M}) = \|Y - \widehat{Y}_{{\cal M}}\|^2.\]

Least squares for \(F\) statistic

  • Fits of a full and reduced model \(\hat{Y}_F\) and \(\hat{Y}_R\)

  • The difference \(\hat{Y}_F-\hat{Y}_R\).

Right triangle for \(F\) statistic

  • Sides of the triangle: \(SSE_R-SSE_F\), \(SSE_F\)

  • Hypotenuse: \(SSE_R\)

Right triangle with full and reduced model: degrees of freedom

  • Sides of the triangle: \(df_R-df_F\), \(df_F\)

  • Hypotenuse: \(df_R\)

\(F\)-statistic for \(H_0:\beta_{1}=\beta_{2}=0\)

  • We compute the \(F\) statistic the same to compare any two (nested) models

\[\begin{equation} \begin{aligned} F &=\frac{\frac{SSE(R) - SSE(F)}{2}}{\frac{SSE(F)}{n-1-p}} \\ & \sim F_{2, 16} \qquad (\text{if $H_0$ is true}) \end{aligned} \end{equation}\]

  • Reject \(H_0\) at level \(\alpha\) if \(F > F_{1-\alpha, 2, 16}\).

General \(F\)-tests

  • Given two models \(R \subset F\) (i.e. \(R\) is a subspace of \(F\)), we can consider testing

\[\begin{equation} H_0: \text{$R$ is adequate (i.e. $\mathbb{E}(Y) \in R$)} \end{equation}\]

\[\begin{equation} H_a: \text{$F$ is adequate (i.e. $\mathbb{E}(Y) \in F$)} \end{equation}\]

  • The least squares picture has models \(X_R\) and \(X_F=X_R+(X_F \perp X_R) \dots\)
  • The test statistic is

\[\begin{equation} F = \frac{(SSE(R) - SSE(F)) / (df_R - df_F)}{SSE(F)/df_F} \end{equation}\]

  • If \(H_0\) is true, \(F \sim F_{df_R-df_F, df_F}\) so we reject \(H_0\) at level \(\alpha\) if \(F > F_{1-\alpha, df_R-df_F, df_F}\).

Constraining coefficients

  • Suppose we wanted to test \(H_0\): the line for non-echolocating bats has the same intercept as the line for non-echolocating birds.

  • Can be expressed as \(H_0:\beta_1=\beta_2\) in bats.lm.

Strategy 1: fit a model in which this is forced to be true

Strategy 2: a \(T\)-statistic

  • Hypothesis is \(H_0:\beta_1-\beta_2=0\)

  • This method doesn’t require fitting the special model null_bats.lm!

  • Can be generalized to \(F\) tests (hypotheses involving multiple contrasts of \(\beta\))

Math aside: general linear hypothesis

  • Suppose we want to test the null hypothesis

\[\begin{equation} H_0:C_{q \times (p+1)}\beta_{(p+1) \times 1} = h \end{equation}\]

  • Alternative is

\[\begin{equation} H_a :C_{q \times (p+1)}\beta_{(p+1) \times 1} \neq h. \end{equation}\]

Math aside: \(F\) statistic in general linear hypothesis

  • Numerator

\[ (C\hat{\beta}-h)^T \left(C(X^TX)^{-1}C^T \right)^{-1} (C\hat{\beta}-h) / q \]

  • Denominator: the usual MSE

  • We just used special case \(q=1\) above…