2024-04-01
\[Y|X = X\beta + \epsilon\]
We’ve talked about checking assumptions.
What to do if the assumptions don’t hold?
We will use the bootstrap!
Suppose we think of the pairs \((X_i, Y_i)\) coming from some distribution \(F\) – this is a distribution for both the features and the outcome.
In our usual model, \(\beta\) is clearly defined. What is \(\beta\) without this assumption?
\[ E_F[\pmb{X}\pmb{X}^T], \qquad E_F[\pmb{X} \cdot \pmb{Y}] \]
where \((\pmb{X}, \pmb{Y}) \sim F\) leading to
\[ \beta(F) = \left(E_F[\pmb{X}\pmb{X}^T]\right)^{-1} E_F[\pmb{X} \cdot \pmb{Y}]. \]
In fact, our least squares estimator is \(\beta(\hat{F}_n)\) where \(\hat{F}_n\) is the empirical distribution of our sample of \(n\) observations from \(F\).
As we take a larger and larger sample,
\[ \beta(\hat{F}_n) \to \beta(F) \]
and
\[ n^{1/2}(\beta(\hat{F}_n) - \beta(F)) \to N(0, \Sigma(F)) \]
for some covariance matrix \(\Sigma=\Sigma(F)\) depending only on \(F\).
Recall the variance of OLS estimator (with \(X\) fixed): \[ (X^TX)^{-1} \text{Var}(X^T(Y-X\beta)) (X^TX)^{-1}. \]
With \(X\) random and \(n\) large this is approximately \[ \frac{1}{n} \left(E_F[\pmb{X}\pmb{X}^T] \right)^{-1} \text{Var}_F(\pmb{X} \cdot (\pmb{Y} - \pmb{X} \beta(F))) \left(E_F[\pmb{X}\pmb{X}^T] \right)^{-1}. \]
\[\text{Var}(X^T(Y-X\beta)) = \sigma^2 X^TX \approx n \cdot E_F[\pmb{X} \pmb{X}^T].\]
This is wrong in general!
We will use OLS estimate – but correct its variance!
Can we get our hands on \(\text{Var}(X^T(Y-X\beta))\) or \(\text{Var}(\hat{\beta})\) without a model?
There are many variants of the bootstrap, most using roughly this structure
boot_sample = c()
for (b in 1:B) {
idx_star = sample(1:n, n, replace=TRUE)
X_star = X[idx_star,]
Y_star = Y[idx_star]
boot_sample = rbind(boot_sample, coef(lm(Y_star ~ X_star)))
}
cov_beta_boot = cov(boot_sample)
If \(X\) is fixed, it doesn’t make sense to sample new \(X\) values for \(X^*\).
Residual bootstrap keeps \(X\) fixed, but adds randomly sampled residuals
boot_sample = c()
M = lm(Y ~ X - 1)
beta.hat = coef(M)
X = model.matrix(M)
Y.hat = X @ beta.hat
r.hat = Y - Y.hat
for (b in 1:B) {
idx_star = sample(1:n, n, replace=TRUE)
X_star = X
Y_star = Y.hat + r.hat[idx_star]
boot_sample = rbind(boot_sample, coef(lm(Y_star ~ X_star - 1)))
}
cov_beta_boot = cov(boot_sample)
Estimated covariance cov_beta_boot
can be used to estimate \(\text{Var}(a^T\hat{\beta})\) for confidence intervals or general linear hypothesis tests.
Software does something slightly different – using percentiles of the bootstrap sample: bootstrap percentile intervals.
boot
packageBoot
function in car
is a wrapper around the more general boot
function.