$\newcommand{\ones}{\mathbf 1}$

In descent methods, the particular choice of search direction does not matter so much.

In descent methods, the particular choice of line search does not matter so much.

When the gradient descent method is started from a point near the solution, it will converge very quickly.


Newton's method with step size $h=1$ always works.

When Newton's method is started from a point near the solution, it will converge very quickly.

Using Newton's method to minimize $f(Ty)$, where $Ty=x$ and $T$ is nonsingular, can greatly improve the convergence speed when $T$ is chosen appropriately.


If $f$ is self-concordant, its Hessian is Lipschitz continuous.

If the Hessian of $f$ is Lipschitz continuous, then $f$ is self-concordant.

Newton's method should only be used to minimize self-concordant functions.

$f(x) = \exp x$ is self-concordant.

$f(x) = -\log x$ is self-concordant.


Consider the problem of minimizing \[ f(x) = (c^Tx)^4 + \sum_{i=1}^n w_i \exp x_i, \] over $x \in \mathbf{R}^n$, where $w \succ 0$.

Newton's method would probably require fewer iterations than the gradient method, but each iteration would be much more costly.


Newton's method is seldom used in machine learning because