Margins and separating hyperplanes#

  • Linear classifiers can be described geometrically by separating hyperplanes.

  • An affine function $\( x \mapsto \beta_0 + \sum_{j=1}^p x_j \beta_j \)\( determines a hyperplane \)\( H = \left\{x: \sum_{j=1}^p x_j \beta_j + \beta_0 = 0 \right\}. \)\( and two half-spaces \)\( \left\{x: \sum_{j=1}^p x_j \beta_j + \beta_0 > 0 \right\}, \qquad \left\{x: \sum_{j=1}^p x_j \beta_j + \beta_0 < 0 \right\}. \)$

  • The vector \(N=(\beta_1, \dots, \beta_p)\) is the normal vector of the hyperplane \(H\).

  • For a given \(H\), by scaling, we can always choose \(N\) so that \(\|N\|_2=1\) (we must also scale \(\beta_0\) to keep \(H\) the same).

Hyperplanes and normal vectors#

  • Function is \(x \mapsto 1 + 2 x_1 + 3 x_2\)

  • \(\{x: 1+2x_1+3x_2>0\}\)

  • \(\{x: 1+2x_1+3x_2>0\}\)

Hyperplanes and normal vectors#

  • If the hyperplane goes through the origin $(\beta_0=0)$, the deviation between a point $(x_1,\dots,x_p)$ and the hyperplane is the dot product: $$x\cdot \beta = x_1\beta_1 + \dots + x_p\beta_p.$$
  • The sign of the dot product tells us on which side of the hyperplane the point lies.
  • If the hyperplane goes through a point $-\beta_0 \beta$, i.e. it is displaced from the origin by $-\beta_0$ along the normal vector $(\beta_1, \dots, \beta_p)$, the deviation of a point $(x_1,\dots,x_p)$ from the hyperplane is: $$ \beta_0 + x_1\beta_1 + \dots + x_p\beta_p.$$
  • The sign tells us on which side of the hyperplane the point lies.

Maximal margin classifier#

  • Suppose we have a classification problem with response $Y=-1$ or $Y=1$.
  • If the classes can be separated, most likely, there will be an infinite number of hyperplanes separating the classes.