CONTENTS: Implementation of a conjugate-gradient type method
for solving sparse linear equations and
sparse least-squares problems:
\begin{align*}
\text{Solve } & Ax=b
\\ \text{or minimize } & \|Ax-b\|^2
\\ \text{or minimize } & \|Ax-b\|^2 + \lambda^2 \|x\|^2
\end{align*}
where the matrix \(A\) may be square or rectangular
(over-determined or under-determined), and may have any rank.
It is represented by a routine for computing \(Av\) and \(A^T u\)
for given vectors \(v\) and \(u\).
The scalar \(\lambda\) is a damping parameter.
If \(\lambda > 0\), the solution is "regularized" in the sense that
a unique solution always exists, and \(\|x\|\) is bounded.
The method is based on the Golub-Kahan bidiagonalization
process. It is algebraically equivalent to applying MINRES
to the normal equation
\(
(A^T A + \lambda^2 I) x = A^T b,
\)
but has better numerical properties, especially if
\(A\) is ill-conditioned.
If \(A\) is symmetric, use
SYMMLQ,
MINRES,
or MINRES-QLP.
If \(A\) has low column rank
and \(\lambda=0\), the solution is not unique.
LSMR returns the solution of minimum length.
Thus for under-determined systems, it solves the problem
\(\min \|x\| \text{ subject to } Ax=b\).
More generally, it solves the problem
\(\min \|x\| \text{ subject to } A^T Ax=A^T b\),
where \(A\) may have any shape or rank.
For \(\min \|x\| \text{ subject to } Ax=b\),
consider using CRAIG.
Special feature:
Both \(\|r\|\) and \(\|A^T r\|\) decrease monotonically,
where \(r = b - Ax\) is the current residual.
For LSQR, only \(\|r\|\) is monotonic.
LSQR is recommended for compatible systems \(Ax=b\),
but on least-squares problems with loose stopping tolerances,
LSMR may be able to terminate significantly sooner than LSQR.
Special application:
To find a null vector of a singular (square or rectangular) matrix \(A\),
apply LSMR to the system \( \min \|A^T x - b\| \)
with any nonzero vector \(b\) (e.g. a random \(b\)).
At a minimizer, the residual vector \(r = b - A^T x \) will satisfy \( Ar=0 \).
See [1] for examples.
Preconditioning:
If \(A\) is an explicit sparse matrix, it is straightforward to scale its columns
so that every column has unit 2-norm. This is called
diagonal preconditioning and should be done wherever possible.
It is likely to reduce the number of LSMR iterations significantly
without affecting the time per iteration. LSMR solves the problem
\( \min \|ADy - b\| \) and then \( x = Dy \) solves the original problem,
where the \(j\)th diagonal of \(D\) is the reciprocal of the 2-norm
of the \(j\)th column of \(A\).
If \(A\) is an operator, diagonal preconditioning is not simple unless
the user can estimate the column norms of \(A\) accurately.
A general preconditioner depends on the user's knowledge of \(A\).
For some square nonsingular matrix \(B\), the user asks LSMR
to solve the problem \( \min \|AB^{-1}y - b\| \) and then recovers the
final solution by solving \(Bx = y\). The user now needs to code
matrix-vector products of the form \(AB^{-1}v\) and \(B^{-T}A^T u\)
by solving linear systems involving \(B\) and \(B^T\) before and after
the products with \(A\) and \(A^T\) respectively.
\(B\) will be a good preconditioner if \(AB^{-1}\)
is significantly better conditioned that \(A\). An ideal
preconditioner would be \(B=R\), where \(R\) is a nonsingular triangular
matrix coming from the QR factors of \(A\). Thus incomplete QR factors
of \(A\) might give a good preconditioner.
Reverse communication:
One f90 version (lsmrReverse) implements reverse communication for the
products \(Av\) and \(A^T u\). Each time LSMR needs a product, it returns
control to the user. This permits users to implement their own
stopping rule, and perhaps their own preconditioner. Ideally the stopping
rule should apply to the original problem and not to the preconditioned
problem.