Margins and separating hyperplanes
Contents
Margins and separating hyperplanes#
Linear classifiers can be described geometrically by separating hyperplanes.
An affine function $\( x \mapsto \beta_0 + \sum_{j=1}^p x_j \beta_j \)\( determines a hyperplane \)\( H = \left\{x: \sum_{j=1}^p x_j \beta_j + \beta_0 = 0 \right\}. \)\( and two half-spaces \)\( \left\{x: \sum_{j=1}^p x_j \beta_j + \beta_0 > 0 \right\}, \qquad \left\{x: \sum_{j=1}^p x_j \beta_j + \beta_0 < 0 \right\}. \)$
The vector \(N=(\beta_1, \dots, \beta_p)\) is the normal vector of the hyperplane \(H\).
For a given \(H\), by scaling, we can always choose \(N\) so that \(\|N\|_2=1\) (we must also scale \(\beta_0\) to keep \(H\) the same).
Hyperplanes and normal vectors#
Function is \(x \mapsto 1 + 2 x_1 + 3 x_2\)
\(\{x: 1+2x_1+3x_2>0\}\)
\(\{x: 1+2x_1+3x_2>0\}\)
Hyperplanes and normal vectors#
- If the hyperplane goes through the origin $(\beta_0=0)$, the deviation between a point $(x_1,\dots,x_p)$ and the hyperplane is the dot product: $$x\cdot \beta = x_1\beta_1 + \dots + x_p\beta_p.$$
- The sign of the dot product tells us on which side of the hyperplane the point lies.
- If the hyperplane goes through a point $-\beta_0 \beta$, i.e. it is displaced from the origin by $-\beta_0$ along the normal vector $(\beta_1, \dots, \beta_p)$, the deviation of a point $(x_1,\dots,x_p)$ from the hyperplane is: $$ \beta_0 + x_1\beta_1 + \dots + x_p\beta_p.$$
- The sign tells us on which side of the hyperplane the point lies.
Maximal margin classifier#
- Suppose we have a classification problem with response $Y=-1$ or $Y=1$.
- If the classes can be separated, most likely, there will be an infinite number of hyperplanes separating the classes.