Evaluating a classification method
Contents
Evaluating a classification method#
We have talked about the 0-1 loss:
It is possible to make the wrong prediction for some classes more often than others. The 0-1 loss doesn’t tell you anything about this.
A much more informative summary of the error is a confusion matrix:
![Table 4.6](http://www.stanford.edu/class/stats202/figs/confusion/confusion-abstract.png)
Fig. 20 Confusion matrix for a 2 class problem#
Confusion matrix for Default
example#
library(MASS) # where the `lda` function lives
library(ISLR) # where `Default` lives
lda.fit = predict(lda(default ~ balance + student, data=Default))
table(lda.fit$class, Default$default)
The error rate among people who do not default (false positive rate) is very low.
However, the rate of false negatives is 76%.
It is possible that false negatives are a bigger source of concern!
One possible solution: Change the threshold
Changing decision rule#
new.class = rep("No", length(Default$default))
new.class[lda.fit$posterior[,"Yes"] > 0.2] = "Yes"
table(new.class, Default$default)
Predicted
Yes
if \(P(\mathtt{default}=\text{yes} | X) > \color{Red}{0.2}\).Changing the threshold to 0.2 makes it easier to classify to
Yes
.Note that the rate of false positives became higher! That is the price to pay for fewer false negatives.
Let’s visualize the dependence of the error on the threshold:
![Fig 4.7](http://www.stanford.edu/class/stats202/figs/Chapter4/4.7.png)
Fig. 21 Error rates for LDA classifier on Default
dataset.#
\(-- -- --\) False negative rate (error for defaulting customers), \(\cdot\cdot\cdot\) False positive rate (error for non-defaulting customers), \(--------\) Overall error rate.
The ROC curve#
![Fig 4.8](http://www.stanford.edu/class/stats202/figs/Chapter4/4.8.png)
Fig. 22 ROC curve for LDA classifier on Default
dataset.#
Displays the performance of the method for any choice of threshold.
The area under the curve (AUC) measures the quality of the classifier:
0.5 is the AUC for a random classifier
The closer the AUC is to 1, the better.
Comparing classification methods through simulation#
Simulate data from several different known distributions with \(2\) predictors and a binary response variable.
Compare the test error (0-1 loss) for the following methods:
KNN-1
KNN-CV (“optimally tuned” KNN)
Logistic regression
Linear discriminant analysis (LDA)
Quadratic discriminant analysis (QDA)
Scenario 1#
![Scenario 1](http://www.stanford.edu/class/stats202/figs/classification_simulation/scenario1.png)
Fig. 23 Instance for simulation scenario #1.#
\(X_1,X_2\) normal with identical variance.
No correlation in either class.
Scenario 2#
![Scenario 2](http://www.stanford.edu/class/stats202/figs/classification_simulation/scenario2.png)
Fig. 24 Instance for simulation scenario #2.#
\(X_1,X_2\) normal with identical variance.
Correlation is -0.5 in both classes.
Scenario 3#
![Scenario 3](http://www.stanford.edu/class/stats202/figs/classification_simulation/scenario3.png)
Fig. 25 Instance for simulation scenario #3.#
\(X_1,X_2\) student \(T\).
No correlation in either class.
Results for first 3 scenarios#
![Figure 4.10](http://www.stanford.edu/class/stats202/figs/Chapter4/4.10.png)
Fig. 26 Simulation results for linear scenarios #1-3.#
Scenario 4#
![Scenario 4](http://www.stanford.edu/class/stats202/figs/classification_simulation/scenario4.png)
Fig. 27 Instance for simulation scenario #4.#
\(X_1, X_2\) normal with identical variance.
First class has correlation 0.5, second class has correlation -0.5.
Scenario 5#
\(X_1, X_2\) normal with identical variance.
Response \(Y\) was sampled from: $\( \begin{aligned} P(Y=1 \mid X) &= \frac{e^{\beta_0+\beta_1 X_1^2+\beta_2X_2^2+\beta_3X_1X_2}}{1+e^{\beta_0+\beta_1X_1^2+\beta_2X_2^2+\beta_3X_1X_2}}. \end{aligned} \)$
The true decision boundary is quadratic but this is not QDA model. (Why?)
Scenario 6#
\(X_1, X_2\) normal with identical variance.
Response \(Y\) was sampled from: $\( \begin{aligned} P(Y=1 \mid X) &= \frac{e^{f_\text{nonlinear}(X_1,X_2)}}{1+e^{f_\text{nonlinear}(X_1,X_2)}}. \end{aligned} \)$
The true decision boundary is very rough.
Results for scenarios 4-6#
![Figure 4.11](http://www.stanford.edu/class/stats202/figs/Chapter4/4.11.png)
Fig. 28 Simulation results for nonlinear scenarios #4-6.#