Refresher Courses

September 16, 2015

Spectral Theory

Resolvent

Definition

The resolvent of a square matrix \(A \in \mathbb{C}^{n \times n}\) is the matrix-valued mapping:

\[ \mathbf{R}(z) := (zI- A)^{-1}. \]

The entries of the resolvent are rational functions of the scalar \(z\). The resolvent fails to exist at any \(z \in \mathbb{C}\) if \(z\) is a pole of any of the rational functions.

The resolvent reveals the existence of eigenvalues, indicates where they lie in the complex plane and how sensitive they are to perturbations.

Eigenvalues and eigenvectors

Definition

A scalar \(\lambda\) is called an eigenvalue of \(A \in \mathbb{C}^{n \times n}\) if \(A - \lambda I\) is singular, i.e. the resolvent, \(\mathbb{R}(z)\), does not exist at \(z = \lambda\).

A nonzero vector \(u \in \mathbb{C}^n \setminus \{\mathbf{0}\}\) is called an eigenvector of \(A\) associated with the eigenvalue \(\lambda\) if \(u \in \mathcal{N}(A - \lambda I)\).

Equivalently,

\[ Au = \lambda u \]

Geometric Interpretation of Eigenvectors

Under the transformation matrix \[A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}\] the vectors parallel to

\(v_1 = \begin{bmatrix} 1 \\ 1\end{bmatrix}\)

and

\(v_2 = \begin{bmatrix} 1 \\ -1 \end{bmatrix}\)

are preserved.¹

Additionally, we call \(\mathcal{N}(A - \lambda I)\) the eigenspace of \(A\) associated with the eigenvalue \(\lambda\). The set of all eigenvalues is call the spectrum of \(A\).

\[ \sigma(A) := \{\lambda \in \mathbb{C}: A - \lambda I \text{ is singular}\}. \]

The value of the largest (in magnitude) eigenvalue is called the spectral radius

\[ \rho(A) := \max_{\lambda\in \sigma(A)} |\lambda| \]

Question: How large can the eigenvalues of \(A\) be? To answer this question we can use the Neumann series theorem.

Neumann series theorem

Theorem

Let \(A\) be a square matrix with \(\|A\| < 1\). Then the matrix \(I - A\) is nonsingular and

\[ (I - A)^{-1} = \sum_{i = 0}^\infty A^i \] \[ \|(I - A)^{-1}\| \le \frac{1}{1 - \|A\|} \]

We use this theorem to show that \(\rho(A) \le \|A\|\).

For \(|z| > \|A\|\), we define \(B = \frac{1}{|z|}A\), thus \(\|B\| < 1\) and we can use the theorem above to get:

\[ \begin{align} \mathbf{R}(z) = \|(zI - A)^{-1}\| & \le \frac{1}{|z|}\|(I - B)^{-1}\| \\ &\le \frac{1}{|z|}\frac{1}{1 - \|B\|} \\ & = \frac{1}{|z| - \|A\|} < \infty \end{align} \]

Since, the resolvent exists, for every \(z\) such that \(|z| > \|A\|\), there can be no eigenvalues with magnitude greater than \(\|A\|\).

Characteristic polynomial

Since, the matrix \(A - \lambda I\) is singular, its determinant must be zero. \[ \det (A - \lambda I) = 0 \]

_{The above equation can be used to find the eigenvalues of \(A\)}
_{The RHS of the eq. is an \(n\)-th degree polynomial.}
_{There are \(n\) eigenvalues \(\lambda_i\), \(i = 1, \dots, n\), not necessarily distinct.}

The polynomial used to find the eigenvalues is called the characteristic polynomial.

\[ P_{A}(z) = \det (z I - A) = \prod_{i = 1}^n(z - \lambda_i) \]

From the given definition we have, \[ P_A(0) = \det(-A) = (-1)^n\det(A) \]

and so,

\[ \det(A) = (-1)^n P_A(0) = (-1)^n \prod_{i = 1}^n(-\lambda_i) =\prod_{i = 1}^n\lambda_i \]

This gives a more practical formula for computing the determinant of a matrix. The entry-wise formula is highly cumbersome when \(n\) is large.

Cayley-Hamilton Theorem

The characteristic polynomial \(P_A(z)\) of a square matrix \(A\) anihilates \(A\), i.e. \(P_A(A) = 0\)

Multiplicity

Definition

The algebraic multiplicity of an eigenvalue \(\lambda\) is the number of times \(\lambda\) appears on the diagonal of the \(T\) factor in the Schur form \(A =UTU^*\).

This is equivalent to the power to which \((z - \lambda)\) divides the characteristic polynomial.

The geometric multiplicity of an eigenvalue \(\lambda\) is the dimension of the asscociated eigenspace \(\dim(\mathcal{N}(A - \lambda I))\).

The geometric multiplicity of an eigenvalue is always less than or equal to the algebraic multiplicity.

Similarity Transform

Definition

If \(A = XBX^{-1}\) for some nonsingular matrix X then A and B are said to be similar. The transform \(A = XBX^{-1}\) is called a similarity transform.

Similarity transforms preserve the eigenvalues.

Let \(\lambda\) and \(u\) be an eigenpair of \(B\) and \(X\) be nonsingular. We claim that \(\lambda\) is also an eigenvalue of \(A\). To show this, let \(y = Xu\) and see that: \[ Ay = XBX^{-1}(Xu) = XBu = \lambda (Xu) = \lambda y \]

So \(\lambda\) is also an eigenvalue of \(A\) with an associated eigenvector \(y\).

The Schur Form

Theorem

For every square matrix \(A \in \mathbb{C}^{n \times n}\), there exsist a unitary matrix \(U \in \mathbb{C}^{n \times n}\) (\(U^*U = I\)) and an upper triangular matrix \(T \in \mathbb{C}^{n \times n}\), such that

\[ \displaystyle A = UTU^* \]

The above formula is called the Schur form or Schur decomposition. Note that the above is also a similarity transform.

Since the eigenvalues of triangular matrices are the entries on its diagonal, the diagonal terms \(T_{ii}\) are eigenvalues of \(T\), and since \(T\) and \(A\) are similar, \(T_{ii}\)'s are also eigenvalues of \(A\).

The Schur Form

Proof (By induction)

Base Case \(A \in \mathbb{C}^{1 \times 1}\), then take \(U = 1\) and \(T = A\) (a scalar) then \(A = UTU^*\).

Assume the result holds for \((n - 1) \times (n - 1)\) matrices.

Let \(u\) be an eigenvector \(Au = \lambda u\), and \(\|u\| = 1\) and \(V\) an orthonormal basis for \(\text{span}(u)^\perp\), so that \([u \; V]\) is a unitary matrix. Then,

\[ [u \; V]^* A [u \; V] = \begin{bmatrix} u^*Au & u^*AV\\ V^*Au & V^*AV \end{bmatrix} = \begin{bmatrix} \lambda & u^*AV\\ 0 & V^*AV \end{bmatrix} \]

By induction hypothesis, \(V^*AV \in \mathbb{C}^{(n-1)\times(n-1)}\) must have a Schur form: \(V^*AV = \hat U \hat T \hat U^*\). Moreover since \([u \; V]\) is unitary, we have:

\[ \begin{aligned} A &= [u \; V] \begin{bmatrix} \lambda & u^*AV\\ 0 & \hat U \hat T \hat U^* \end{bmatrix} [u \; V]^* \\ &= [u \; V] \begin{bmatrix} 1 & 0\\ 0 & \hat U \end{bmatrix} \begin{bmatrix} \lambda & u^*AV\\ 0 & \hat T \end{bmatrix} \begin{bmatrix} 1 & 0\\ 0 & \hat U^* \end{bmatrix} [u \; V]^* \\ &= [u \; V\hat U] \begin{bmatrix} \lambda & u^*AV\\ 0 & \hat T \end{bmatrix} [u \; V \hat U]^* \\ &=\tilde U \tilde T \tilde U^* \end{aligned} \]

Note that \(\tilde T\) is clearly upper triangular and \(\tilde U\) is unitary since:

\[ \begin{aligned} \tilde U^* \tilde U &= [u \; V \hat U]^* [u \; V \hat U] \\ &= \begin{bmatrix} u^*u & u^*V\hat U\\ \hat U^* V^* u & \hat U^* V^* V \hat U \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & I \end{bmatrix} \end{aligned} \]

Thus, we showed that every \(A \in \mathbb{C}^{n\times n}\) has a Schur decomposition.

Trace Theorem

Theorem

Let \(A \in \mathbb{C}^{n \times n}\), then the trace or \(A\) is the sum of the eigenvalues.

Proof

Suppose, \(A = UTU^*\) is the Schur decomposition of \(A\). Then, using the cyclic property of trace, we have:

\[ \begin{aligned} Tr(A) &= Tr(UTU^*) = Tr(TU^*U) \\ &= Tr(T) = \sum_{i}T_{ii} = \sum_{i} \lambda_i \end{aligned} \]

Hermitian Matrices

For Hermitian matrices, \(A = A^*\), we have:

\[ UTU^* = A = A^* = U T^* U^* \]

Since \(U\) must be nonsingular, we must have \(T = T^*\). So,

\(T\) must be diagonal,
\(T_{ii} = T^*_{ii} \quad \forall i\) that is all eigenvalues of Hermitian matrices are real.

This means that Hermitian matrices have unitary diagonalization.

It can be shown that normal matrices \((A^*A = AA^*)\) are also unitary diagonalizable.

The spectral decomposition of Hermitian matrices

\[ \begin{aligned} A &= U\Lambda U^T = [u_1 \; \dots \; u_n] \begin{bmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_n \end{bmatrix} \begin{bmatrix} u_1^* \\ \vdots \\ u_n^* \end{bmatrix}\\ &= \sum_{i = 1}^n \lambda_i u_iu_i^* = \sum_{i = 1}^n \lambda_i P_i \end{aligned} \]

The matrices \(P_i\)'s are orthogonal projections since

\(P_i = P_i^*\)
\(P_i^2 = u_i(u_i^*u_i)u_i^* = u_iu_i^* = P_i\)

Diagonalization

Definition

We call a square matrix diagonalizable if it is similar to a diagonal matrix, that is there is a nonsingular matrix \(X\) such that

\(A = XDX^{-1}\)

The above factorization is called diagonalization.

Diagonalizable matrices

A \((n \times n)\)-matrix is diagonalizable if and only if it has \(n\) linearly independent eigenvectors.

Example of diagonalizable matrices:

normal matrices \((A^*A = AA^*)\)
matrices with \(n\) distinct eigenvalues
hermitian matrices \((A= A^*)\)

Eigendecomposition

If \(A \in \mathbb{C}^{n \times n}\) is a diagonalizable matrix, then it can be factorized as:

\[ A = U\Lambda U^{*} \]

where \(\Lambda = \text{diag}(\lambda_1, \dots, \lambda_n)\) and \(U\) is a square matrix whose columns are the corresponding eigenvectors.

We also refer to \(A = U\Lambda U^{-1}\) as the canonical form of \(A\).

Note

Eigendecomposition does not exist for every square matrix, since not all matrices are diagonalizable.

Exercises

Give another poof that \(\rho(A) \le \|A\|\) (not using the Neumann series).
Prove that if \(A\) is diagonalizable and has only one eigenvalue, then \(A\) is a constant multiple of the identity matrix.
Can any invertible \((n\times n)\)-matrix be diagonalized?

Answers

Give another poof that \(\rho(A) \le \|A\|\) (not using the Neumann series).

Here we use the fact that operator norm is submultiplicative: \[ \begin{aligned} \rho(A) &= |\lambda_1| = |\lambda_1|\|u_1\| = \|\lambda_1u_1\| \\ &= \|Au_1\| \le \|A\| \|u_1\| = \|A\| \end{aligned} \]

Answers

Prove that if \(A\) is diagonalizable and has only one eigenvalue, then \(A\) is a constant multiple of the identity matrix.

Since the matrix is diagonalizable we have an eigendecomposition \(A = U\Lambda U^*\), where \(\Lambda = \lambda I\).

Since \(A\) has only one eigenvalue, \[A = U \lambda I U^* = \lambda U U^* = \lambda I\]

Answers

Can any invertible \((n\times n)\)-matrix be diagonalized?

No. To show this Take \(A = \begin{pmatrix} 1 & 1 \\ 0 & 1\end{pmatrix}\)

Since \(det(A) = 1\) so, \(A\) is invertible.

The eigenvalues of \(A\) are \(\lambda_1 = \lambda_2 = 1\), but there is only one corresponding unit-length eigenvector:

\(0 = (A - \lambda_1 I)u = \begin{pmatrix} 0 & 1 \\ 0 & 0\end{pmatrix} u \rightarrow u = \begin{pmatrix} 1 \\ 0\end{pmatrix}\)

Thus, \(A\) does not have 2 linearly independent eigenvectors and so is not diagonalizable.

Singular Value Decomposition

Hermitian matrices \((A = A^*)\) have many nice properties:

an eigendecomposition exists \(A = U\Lambda U^{*}\)
eigenvectors forn a basis of \(\mathbb{C}^{n \times n}\)
eigenvalues are real
the spectral radius bound is tight \(\rho(A) = \|A\|\)

Unfortunately, these nice features are not true in genereal for other matrices.

Singular Value Decomposition

Theorem

Every matrix \(A \in \mathbb{C}^{m\times n}\) can be factorized as \(A = U\Sigma V^*\), where \(U \in \mathbb{C}^{m\times m}\) and \(V \in \mathbb{C}^{n \times n}\) are square unitary matrices and \(\Sigma\) is a diagonal matrix with nonegative entries \(\sigma_i\).

This factorization is called a singular value decomposition (SVD) of \(A\).
We call diagonal entries \(\sigma_i\) the singular values of \(A\),
the columns of \(U\) and \(V\) are the left and right singular vectors respectively.
The number of nonzero singular values is equal to the rank of \(A\).

Lemma

The induced 2-norm of a matrix \(\|A\|\) is equal to its largest singular value. \(\|A\|_2 = \sigma_{\max}\)

Lemma

The Frobenius norm of a matrix \(\|A\|_F\) is equal to the Euclidean norm of the vector of its singular values.

\(\|A\|_F =\sqrt{\sum_{i = 1}^r \sigma_{i}^2}\)

Proof

2-norm and Frobenius norms are invariant under unitary transformations:

\(\|A\|_2 = \|U\Sigma V^*\|_2 = \|\Sigma\|_2 = \sigma_{\max}\)

\(\|A\|_F= \|U\Sigma V^*\|_F = \|\Sigma\|_2 = \sqrt{\sum_{i}\sigma_i^2}\)

Singular Values and Eigenvalues

Note that bothe \(A^*A\) and \(AA^*\) are Hermitian matrices, therefore diagonalizable. Furthermore,

\[ A^*A = (U\Sigma V^*)^* (U\Sigma V^*) = V\Sigma U^* U \Sigma V^* = V\Sigma^2 V^* \]

Similarly \(AA^* = U \Sigma^2 U^*\). These are eigendecompositions, so it must follow that

\[ \sigma_i = \sqrt{\lambda_i(A^*A)} = \sqrt{\lambda_i(AA^*)}, \quad i = 1, \dots, r \]

the singular values of \(A\) are square roots of the eigenvalues of \(A^*A\) and \(AA^*\)
the right singular vec., \(V\), are the eigenvectors of \(A^*A\)
the left singular vec., \(U\), are the eigenvectors of \(AA^*\)

Geometric Interpretation of the SVD

The SVD and the fundamental subspaces

Theorem

Let \(A\in \mathbb{C}^{m \times n}\) be a matrix of rank \(r\). Then the following is true:

\[ \mathcal{R}(A) = \text{span}(u_1, \dots, u_r), \; \; \mathcal{N}(A^*) = \text{span}(u_{r+1}, \dots, u_m) \]

\[ \mathcal{R}(A^*) = \text{span}(v_1, \dots, v_r), \; \; \mathcal{N}(A) = \text{span}(v_{r+1}, \dots, v_n) \]

The SVD and the fundamental subspaces

Proof

First, we show that \(r\) left-singular-vectors associated with \(r\) nonzero singular values form a basis for the range of \(A\)

\[ Ax = U\Sigma V^* x = \sum_{i = 1}^n \sigma_i(v_i^*x)u_i =\sum_{i = 1}^r (\sigma_i c_i) u_i \]

The fundamental theorem of linear algebra states that for \(A\in \mathbb{C}^{m \times n}\), \(\mathcal{R}(A) \oplus \mathcal{N}(A^*) = \mathbb{C}^{m}\).

Thus, the last \(m - r\) singular vectors must span \(\mathcal{N}(A^*)\).

We apply the same reasoning to \(A^*\) to see that the first \(r\) and the last \(n - r\) right singular vectors span \(\mathcal{R}(A^*)\) and \(\mathcal{N}(A)\) respectively.

Low-rank matrix approximation

Theorem(Low-rank approximation)

Let \(A \in \mathbb{C}^{m\times n}\), \(\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_r\), and \(0 \le k < n\), then \[ \min_{B: \text{rank}(B) = k} \|A - B\|_2 = \sigma_{k+1} \] and if \(k < m \le n\), then \[ \min_{B: \text{rank}(B) = k} \|A - B\|_F = \sqrt{\sum_{i = k+1}^n \sigma_{i}^2} \] and where in both cases the minimum is attained by \(B^* = A_k = \sum_{i = 1}^k \sigma_iu_iv_i^*\).

One of the useful applications of the low-rank matrix approximation is image compression

rm(list=ls())
library(jpeg)
library(png)
library(grid)
library(gridExtra)

im = readJPEG('~/Documents/images/tiger.jpg')
im.red = im[, , 1]
im.green = im[, , 2]
im.blue = im[, , 3]

col <- rgb(im.red, im.green, im.blue)
dim(col) <- dim(im.blue)

png(paste('~/Documents/images/', letters[1], ' Image Original.png', sep = ''))
grid.raster(col, interpolate=FALSE)
dev.off()

grid.raster(col, interpolate=FALSE)

ranks <- c(5, 10, 30, 50, 100)
im.red.svd <- svd(im.red)
im.green.svd <- svd(im.green)
im.blue.svd <- svd(im.blue)

for (j in 1:length(ranks)){
  i <- ranks[j]
  red.compressed <- im.red.svd$u[,1:i] %*% 
    diag(im.red.svd$d[1:i]) %*% t(im.red.svd$v[,1:i])
  green.compressed <- im.green.svd$u[,1:i] %*% 
    diag(im.green.svd$d[1:i]) %*% t(im.green.svd$v[,1:i])
  blue.compressed <- im.blue.svd$u[,1:i] %*% 
    diag(im.blue.svd$d[1:i]) %*% t(im.blue.svd$v[,1:i])
    
  red.compressed[red.compressed <0] <- 0
  green.compressed[green.compressed<0] <- 0
  blue.compressed[blue.compressed<0] <- 0
  
  png(paste('~/Documents/images/',letters[j+1], ' Image SVD Compressed ', i, '.png', sep = ''))
  maxVal <- max(red.compressed, green.compressed, blue.compressed)
  col <- rgb(red.compressed, green.compressed, blue.compressed,
             maxColorValue = maxVal)
  dim(col) <- dim(blue.compressed)
  grid.raster(col, interpolate=FALSE)
  dev.off()
}

filenames <- dir(path = "~/Documents/images", pattern = 'Image', full.names = T)
foo<-list()
for(j in 1:length(filenames)) foo[[j]]<- readPNG(filenames[j])
g <- lapply(foo, rasterGrob)

do.call(grid.arrange, c(g, ncol=3))

orig.size <- object.size(im)
frac.size <- c()
size.compressed <- c()
for (i in c(5, 10, 30, 50, 100)){
  c.size <- 3*(object.size(im.red.svd$u[,1:i]) +
    object.size(im.red.svd$d[1:i]) + object.size(im.red.svd$v[,1:i]))
  f.size <- c.size/orig.size 
  frac.size = c(frac.size, f.size)
  size.compressed = c(size.compressed, c.size)
}

# Size in bytes:

orig.size       # original picture

## 5155408 bytes

size.compressed # compressed pictures

## [1]  113904  226584  676680 1126920 2252520

frac.size       # fraction of the original pictures

## [1] 0.02209408 0.04395074 0.13125634 0.21858988 0.43692371

Definition

We say that a matrix \(A \in \mathbb{R}^{n \times n}\) is symmetric, positive-definite (SPD) if for all \(v \neq 0\) it is true that:

\[ \frac{v^* B v}{v^* v} > 0. \]

The left hand side formula is also called the Rayleigh quotient of \(A\). The above definition can be extended to complex matrices, in which case we say that \(A\) is Hermitian, positive-definite (HPD).