Another resampling technique often seen in practice.
Cross-validation vs. the Bootstrap#
Cross-validation: provides estimates of the (test) error
The Bootstrap: provides the (standard) error of estimates

One of the most important techniques in all of Statistics.
Computer intensive method.
Popularized by Brad Efron \(\leftarrow\) Stanford pride!
Standard errors in linear regression from a sample of size \(n\)#
Classical way to compute Standard Errors#
Example: Estimate the variance of a sample \(x_1,x_2,\dots,x_n\):
Unbiased estimate of \(\sigma^2\): $\(\hat \sigma^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i-\overline x)^2.\)$
What is the Standard Error of \(\hat \sigma^2\)?
Assume that \(x_1,\dots,x_n\) are normally distributed with common mean \(\mu\) and variance \(\sigma^2\).
Then \(\hat \sigma^2(n-1)\) has a \(\chi\)-squared distribution with \(n-1\) degrees of freedom.
For large \(n\), \(\hat{\sigma}^2\) is normally distributed around \(\sigma^2\).
The SD of this sampling distribution is the Standard Error.
Limitations of the classical approach#
This approach has served statisticians well for many years; however, what happens if:
The distributional assumption — for example, \(x_1,\dots,x_n\) being normal — breaks down?
The estimator does not have a simple form and its sampling distribution cannot be derived analytically?
Bootstrap can handle these departures from the usual assumptions!
Example: Investing in two assets#

Suppose that \(X\) and \(Y\) are the returns of two assets.
These returns are observed every day: \((x_1,y_1),\dots,(x_n,y_n)\).
We have a fixed amount of money to invest and we will invest a fraction \(\alpha\) on \(X\) and a fraction \((1-\alpha)\) on \(Y\).
Therefore, our return will be
Our goal will be to minimize the variance of our return as a function of \(\alpha\).
One can show that the optimal \(\alpha\) is:
Proposal: Use an estimate:
Suppose we compute the estimate \(\widehat\alpha = 0.6\) using the samples \((x_1,y_1),\dots,(x_n,y_n)\).
How sure can we be of this value? (A little vague of a question.)
If we had sampled the observations in a different 100 days, would we get a wildly different \(\widehat \alpha\)? (A more precise question.)
Resampling the data from the true distribution#

In this thought experiment, we know the actual joint distribution \(P(X,Y)\), so we can resample the \(n\) observations to our hearts’ content.
Computing the standard error of \(\widehat \alpha\)#
We will use \(S\) samples to estimate the standard error of \(\widehat{\alpha}\).
For each sampling of the data, for \(1 \leq s \leq S\)
we can compute a value of the estimate \(\widehat \alpha^{(1)},\widehat \alpha^{(2)},\dots\).
The Standard Error of \(\widehat \alpha\) is approximated by the standard deviation of these values.
In reality, we only have \(n\) samples#

However, these samples can be used to approximate the joint distribution of \(X\) and \(Y\).
The Bootstrap: Sample from the empirical distribution:
Equivalently, resample the data by drawing \(n\) samples with replacement from the actual observations.
Why it works: variances computed under the empirical distribution are good approximations of variances computed under the true distribution (in many cases).
A schematic of the Bootstrap#

Comparing Bootstrap sampling to sampling from the true distribution#

Left panel is population distribution of \(\widehat{\alpha}\) – centered (approximately) around the true \(\alpha\).
Middle panel is bootstrap distribution of \(\widehat{\alpha}\) – centered (approximately) around observed \(\widehat{\alpha}\).