Bootstrap#

Another resampling technique often seen in practice.

Cross-validation vs. the Bootstrap#

Cross-validation: provides estimates of the (test) error
The Bootstrap: provides the (standard) error of estimates

Bootstrap#

One of the most important techniques in all of Statistics.
Computer intensive method.
Popularized by Brad Efron $\leftarrow$ Stanford pride!

Standard errors in linear regression from a sample of size $n$#

Classical way to compute Standard Errors#

Example: Estimate the variance of a sample $x_1,x_2,\dots,x_n$:
Unbiased estimate of $\sigma^2$: $$\hat \sigma^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i-\overline x)^2.$$
What is the Standard Error of $\hat \sigma^2$?
- Assume that $x_1,\dots,x_n$ are normally distributed with common mean $\mu$ and variance $\sigma^2$.
- Then $\hat \sigma^2(n-1)$ has a $\chi$-squared distribution with $n-1$ degrees of freedom.
- For large $n$, $\hat{\sigma}^2$ is normally distributed around $\sigma^2$.
- The SD of this sampling distribution is the Standard Error.

Limitations of the classical approach#

This approach has served statisticians well for many years; however, what happens if:
- The distributional assumption — for example, $x_1,\dots,x_n$ being normal — breaks down?
- The estimator does not have a simple form and its sampling distribution cannot be derived analytically?
Bootstrap can handle these departures from the usual assumptions!

Example: Investing in two assets#

Suppose that $X$ and $Y$ are the returns of two assets.
These returns are observed every day: $(x_1,y_1),\dots,(x_n,y_n)$.

We have a fixed amount of money to invest and we will invest a fraction $\alpha$ on $X$ and a fraction $(1-\alpha)$ on $Y$.
Therefore, our return will be

\[\alpha X + (1-\alpha) Y.\]

Our goal will be to minimize the variance of our return as a function of $\alpha$.
One can show that the optimal $\alpha$ is:

\[\alpha = \frac{\sigma_Y^2 - \text{Cov}(X,Y)}{\sigma_X^2 + \sigma_Y^2 -2\text{Cov}(X,Y)}.\]

Proposal: Use an estimate:

\[\widehat \alpha = \frac{\widehat \sigma_Y^2 - \widehat{ \text{Cov}}(X,Y)}{\widehat \sigma_X^2 + \widehat \sigma_Y^2 -2\widehat{ \text{Cov}}(X,Y)}.\]

Suppose we compute the estimate $\widehat\alpha = 0.6$ using the samples $(x_1,y_1),\dots,(x_n,y_n)$.
How sure can we be of this value? (A little vague of a question.)
If we had sampled the observations in a different 100 days, would we get a wildly different $\widehat \alpha$? (A more precise question.)

Resampling the data from the true distribution#

In this thought experiment, we know the actual joint distribution $P(X,Y)$, so we can resample the $n$ observations to our hearts’ content.

Computing the standard error of $\widehat \alpha$#

We will use $S$ samples to estimate the standard error of $\widehat{\alpha}$.
For each sampling of the data, for $1 \leq s \leq S$

\[(x_1^{(s)},\dots,x_n^{(s)})\]

we can compute a value of the estimate $\widehat \alpha^{(1)},\widehat \alpha^{(2)},\dots$.

The Standard Error of $\widehat \alpha$ is approximated by the standard deviation of these values.

In reality, we only have $n$ samples#

However, these samples can be used to approximate the joint distribution of $X$ and $Y$.

The Bootstrap: Sample from the empirical distribution:

\[\widehat P(X,Y) = \frac{1}{n}\sum_{i=1}^{n} \delta_{(x_i,y_i)}.\]

Equivalently, resample the data by drawing $n$ samples with replacement from the actual observations.
Why it works: variances computed under the empirical distribution are good approximations of variances computed under the true distribution (in many cases).

A schematic of the Bootstrap#

Comparing Bootstrap sampling to sampling from the true distribution#

Left panel is population distribution of $\widehat{\alpha}$ – centered (approximately) around the true $\alpha$.
Middle panel is bootstrap distribution of $\widehat{\alpha}$ – centered (approximately) around observed $\widehat{\alpha}$.

STATS 202

Bootstrap

Contents

Bootstrap#

Cross-validation vs. the Bootstrap#

Bootstrap#

Standard errors in linear regression from a sample of size \(n\)#

Classical way to compute Standard Errors#

Limitations of the classical approach#

Example: Investing in two assets#

Resampling the data from the true distribution#

Computing the standard error of \(\widehat \alpha\)#

In reality, we only have \(n\) samples#

A schematic of the Bootstrap#

Comparing Bootstrap sampling to sampling from the true distribution#