Best subset selection#

Best subset with `regsubsets`#

library(ISLR) # where Credit is stored
library(leaps) # where regsubsets is found
summary(regsubsets(Balance ~ ., data=Credit))

Naturally, $\text{RSS}$ and $$ R^2 = 1-\frac{\text{RSS}}{\text{TSS}} $$ improve as we increase $k$.

To optimize $k$, we want to minimize the test error, not the training error.

We could use cross-validation, or alternative estimates of test error:

Akaike Information Criterion (AIC) (closely related to Mallow’s $C_p$) given an estimate of the irreducible error $\hat{\sigma^2}$ :

\[\frac{1}{n}(\text{RSS}+2k\hat\sigma^2)\]

\[\frac{1}{n}(\text{RSS}+\log(n)\hat\sigma^2)\]

\[R^2_a = 1-\frac{\text{RSS}/(n-k-1)}{\text{TSS}/(n-1)}\]

They are much less expensive to compute.
They are motivated by asymptotic arguments and rely on model assumptions (eg. normality of the errors).
Equivalent concepts for other models (e.g. logistic regression).

Recall: In $K$-fold cross validation, we can estimate a standard error or accuracy for our test error estimate. Then, we can apply the 1SE rule.