Contents

Logistic regression

Contents

Logistic regression#

We model the joint probability as:

\[\begin{split} \begin{aligned} P(Y=1 \mid X) &= \frac{e^{\beta_0 + \beta_1 X_1 +\dots+\beta_p X_p}}{1+e^{\beta_0 + \beta_1 X_1 +\dots+\beta_p X_p}} \\ P(Y=0 \mid X) &= \frac{1}{1+e^{\beta_0 + \beta_1 X_1 +\dots+\beta_p X_p}}. \end{aligned} \end{split}\]

This is the same as using a linear model for the log odds:

\[\log\left[\frac{P(Y=1 \mid X)}{P(Y=0 \mid X)}\right] = \beta_0 + \beta_1 X_1 +\dots+\beta_p X_p.\]

Fitting logistic regression#

The training data is a list of pairs \((y_1,x_1), (y_2,x_2), \dots, (y_n,x_n)\).
We don’t observe the left hand side in the model

\[\log\left[\frac{P(Y=1 \mid X)}{P(Y=0 \mid X)}\right] = \beta_0 + \beta_1 X_1 +\dots+\beta_p X_p,\]

\(\implies\) We cannot use a least squares fit.

Likelihood#

Solution: The likelihood is the probability of the training data, for a fixed set of coefficients \(\beta_0,\dots,\beta_p\):

\[\prod_{i=1}^n P(Y=y_i \mid X=x_i) \]

We can rewrite as

\[ \prod_{i=1}^n \left(\frac{e^{\beta_0 + \beta_1 x_{i1} +\dots+\beta_p x_{ip}}}{1+e^{\beta_0 + \beta_1 x_{i1} +\dots+\beta_p x_{ip}}}\right)^{y_i} \left(\frac{1}{1+e^{\beta_0 + \beta_1 x_{j1} + \dots+\beta_p x_{jp}}}\right)^{1-y_i} \]

Choose estimates \(\hat \beta_0, \dots,\hat \beta_p\) which maximize the likelihood.
Solved with numerical methods (e.g. Newton’s algorithm).

Logistic regression in R#

library(ISLR)
glm.fit = glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume,
              family=binomial, data=Smarket)
summary(glm.fit)

Inference for logistic regression#

We can estimate the Standard Error of each coefficient.
The \(z\)-statistic is the equivalent of the \(t\)-statistic in linear regression:

\[z = \frac{\hat \beta_j}{\text{SE}(\hat\beta_j)}.\]

The \(p\)-values are test of the null hypothesis \(\beta_j=0\) (Wald’s test).
Other possible hypothesis tests: likelihood ratio test (chi-square distribution).

Example: Predicting credit card `default`#

Predictors:

student: 1 if student, 0 otherwise
balance: credit card balance
income: person’s income.

Confounding#

In this dataset, there is confounding, but little collinearity.

Students tend to have higher balances. So, balance is explained by student, but not very well.
People with a high balance are more likely to default.
Among people with a given balance, students are less likely to default.

Results: predicting credit card `default`#

Fig 4.3 — Fig. 16 Confounding in Default data#

Using only `balance`#

library(ISLR) # where Default data is stored
summary(glm(default ~ balance,
        family=binomial, data=Default))

Using only `student`#

summary(glm(default ~ student,
        family=binomial, data=Default))

Using both `balance` and `student`#

summary(glm(default ~ balance + student,
        family=binomial, data=Default))

Using all 3 predictors#

summary(glm(default ~ balance + income + student,
        family=binomial, data=Default))

Multinomial logistic regression#

Extension of logistic regression to more than 2 categories
Suppose \(Y\) takes values in \(\{1,2,\dots,K\}\), then we can use a linear model for the log odds against a baseline category (e.g. 1): for \(j \neq 1\)

\[\log\left[\frac{P(Y=j \mid X)}{P(Y=1 \mid X)}\right] = \beta_{0,j} + \beta_{1,j} X_1 +\dots+\beta_{p,j} X_p\]

In this case \(\beta \in \mathbb{R}^{p \times (K-1)}\) is a matrix of coefficients.

Some potential problems#

The coefficients become unstable when there is collinearity. Furthermore, this affects the convergence of the fitting algorithm.
When the classes are well separated, the coefficients become unstable. This is always the case when \(p\geq n-1\). In this case, prediction error is low, but \(\hat{\beta}\) is very variable.