Support vector classifier
Contents
Support vector classifier#
Problem: It is not always possible to separate the points using a hyperplane.
Support vector classifier: a relaxation of the maximal margin classifier.
Allows a number of points to be on the wrong side of the margin or even the hyperplane by allowing slack \(\epsilon_i\) for each case.
![Fig 9.6](http://www.stanford.edu/class/stats202/figs/Chapter9/9.6.png)
A new optimization problem:
The quantity \(y_i(\beta_0 +\beta_1x_{i1}+\dots+\beta_p x_{ip})\) is distance from \(x_i\) from the hyperplane.
\(M\) is the width of the margin in either direction
\(\epsilon=(\epsilon_1,\dots,\epsilon_n)\) are called slack variables
\(C\) is called the budget
Tuning the budget, \(C\) (high to low)#
![Fig 9.7](http://www.stanford.edu/class/stats202/figs/Chapter9/9.7.png)
If the budget is too low, we tend to overfit#
![Fig 9.5](http://www.stanford.edu/class/stats202/figs/Chapter9/9.5.png)
Maximal margin classifier, \(C=0\).
Adding one observation dramatically changes the classifier.
Finding the support vector classifier#
We can reformulate the problem by defining a vector \(w=(w_1,\dots,w_p) = \beta/M\):
The penalty \(D\geq 0\) serves a function similar to the budget \(C\), but is inversely related to it.
Introducing KKT multipliers, \(\alpha_i\) and \(\mu_i\), this is equivalent to: $\( \begin{align*} &\max_{\alpha,\mu}\; \min_{\beta_0,w,\epsilon}\;\; \frac{1}{2}\|w\|^2 - \sum_{i=1}^n \alpha_i [y_i(\beta_0 +w\cdot x_i)-1+\epsilon_i] + \sum_{i=1}^n (D-\mu_i)\epsilon_i\\ &\text{subject to } \;\;\alpha_i \geq 0, \mu_i\geq 0 , \quad \text{ for all }i=1,\dots,n. \end{align*} \)$
Setting the derivatives with respect to \(w\), \(\beta_0\), and \(\epsilon\) to 0, we obtain: $\(\hat w = \sum_{i=1}^n \alpha_i y_i x_i, \quad \sum_{i=1}^n\alpha_i y_i = 0, \quad \mu_i=D-\alpha_i\)$
Furthermore, the KKT conditions yield \(\alpha_i>0\) if and only if \(y_i(\beta_0 +w\cdot x_i)\leq 1\), that is, if \(x_i\) falls on the wrong side of the margin.
Support vectors#
![Fig 9.7](http://www.stanford.edu/class/stats202/figs/Chapter9/9.7.png)