Margins and separating hyperplanes#

Linear classifiers can be described geometrically by separating hyperplanes.
An affine function $$ x \mapsto \beta_0 + \sum_{j=1}^p x_j \beta_j $$ determines a hyperplane $$ H = \left\{x: \sum_{j=1}^p x_j \beta_j + \beta_0 = 0 \right\}. $$ and two half-spaces $$ \left\{x: \sum_{j=1}^p x_j \beta_j + \beta_0 > 0 \right\}, \qquad \left\{x: \sum_{j=1}^p x_j \beta_j + \beta_0 < 0 \right\}. $$
The vector $N=(\beta_1, \dots, \beta_p)$ is the normal vector of the hyperplane $H$.
For a given $H$, by scaling, we can always choose $N$ so that $\|N\|_2=1$ (we must also scale $\beta_0$ to keep $H$ the same).

Hyperplanes and normal vectors#

If the hyperplane goes through the origin $(\beta_0=0)$, the deviation between a point $(x_1,\dots,x_p)$ and the hyperplane is the dot product: $$x\cdot \beta = x_1\beta_1 + \dots + x_p\beta_p.$$
The sign of the dot product tells us on which side of the hyperplane the point lies.
If the hyperplane goes through a point $-\beta_0 \beta$, i.e. it is displaced from the origin by $-\beta_0$ along the normal vector $(\beta_1, \dots, \beta_p)$, the deviation of a point $(x_1,\dots,x_p)$ from the hyperplane is: $$ \beta_0 + x_1\beta_1 + \dots + x_p\beta_p.$$
The sign tells us on which side of the hyperplane the point lies.

Suppose we have a classification problem with response $Y=-1$ or $Y=1$.
If the classes can be separated, most likely, there will be an infinite number of hyperplanes separating the classes.