September 16, 2015
The resolvent of a square matrix \(A \in \mathbb{C}^{n \times n}\) is the matrix-valued mapping:
\[ \mathbf{R}(z) := (zI- A)^{-1}. \]
The entries of the resolvent are rational functions of the scalar \(z\). The resolvent fails to exist at any \(z \in \mathbb{C}\) if \(z\) is a pole of any of the rational functions.
The resolvent reveals the existence of eigenvalues, indicates where they lie in the complex plane and how sensitive they are to perturbations.
A scalar \(\lambda\) is called an eigenvalue of \(A \in \mathbb{C}^{n \times n}\) if \(A - \lambda I\) is singular, i.e. the resolvent, \(\mathbb{R}(z)\), does not exist at \(z = \lambda\).
A nonzero vector \(u \in \mathbb{C}^n \setminus \{\mathbf{0}\}\) is called an eigenvector of \(A\) associated with the eigenvalue \(\lambda\) if \(u \in \mathcal{N}(A - \lambda I)\).
Equivalently,
\[ Au = \lambda u \]
Additionally, we call \(\mathcal{N}(A - \lambda I)\) the eigenspace of \(A\) associated with the eigenvalue \(\lambda\). The set of all eigenvalues is call the spectrum of \(A\).
\[ \sigma(A) := \{\lambda \in \mathbb{C}: A - \lambda I \text{ is singular}\}. \]
The value of the largest (in magnitude) eigenvalue is called the spectral radius
\[ \rho(A) := \max_{\lambda\in \sigma(A)} |\lambda| \]
Question: How large can the eigenvalues of \(A\) be? To answer this question we can use the Neumann series theorem.
Let \(A\) be a square matrix with \(\|A\| < 1\). Then the matrix \(I - A\) is nonsingular and
\[ (I - A)^{-1} = \sum_{i = 0}^\infty A^i \] \[ \|(I - A)^{-1}\| \le \frac{1}{1 - \|A\|} \]
We use this theorem to show that \(\rho(A) \le \|A\|\).
For \(|z| > \|A\|\), we define \(B = \frac{1}{|z|}A\), thus \(\|B\| < 1\) and we can use the theorem above to get:
\[ \begin{align} \mathbf{R}(z) = \|(zI - A)^{-1}\| & \le \frac{1}{|z|}\|(I - B)^{-1}\| \\ &\le \frac{1}{|z|}\frac{1}{1 - \|B\|} \\ & = \frac{1}{|z| - \|A\|} < \infty \end{align} \]
Since, the resolvent exists, for every \(z\) such that \(|z| > \|A\|\), there can be no eigenvalues with magnitude greater than \(\|A\|\).
Since, the matrix \(A - \lambda I\) is singular, its determinant must be zero. \[ \det (A - \lambda I) = 0 \]
The polynomial used to find the eigenvalues is called the characteristic polynomial.
\[ P_{A}(z) = \det (z I - A) = \prod_{i = 1}^n(z - \lambda_i) \]
From the given definition we have, \[ P_A(0) = \det(-A) = (-1)^n\det(A) \]
and so,
\[ \det(A) = (-1)^n P_A(0) = (-1)^n \prod_{i = 1}^n(-\lambda_i) =\prod_{i = 1}^n\lambda_i \]
This gives a more practical formula for computing the determinant of a matrix. The entry-wise formula is highly cumbersome when \(n\) is large.
The characteristic polynomial \(P_A(z)\) of a square matrix \(A\) anihilates \(A\), i.e. \(P_A(A) = 0\)
The algebraic multiplicity of an eigenvalue \(\lambda\) is the number of times \(\lambda\) appears on the diagonal of the \(T\) factor in the Schur form \(A =UTU^*\).
This is equivalent to the power to which \((z - \lambda)\) divides the characteristic polynomial.
The geometric multiplicity of an eigenvalue \(\lambda\) is the dimension of the asscociated eigenspace \(\dim(\mathcal{N}(A - \lambda I))\).
The geometric multiplicity of an eigenvalue is always less than or equal to the algebraic multiplicity.
If \(A = XBX^{-1}\) for some nonsingular matrix X then A and B are said to be similar. The transform \(A = XBX^{-1}\) is called a similarity transform.
Similarity transforms preserve the eigenvalues.
Let \(\lambda\) and \(u\) be an eigenpair of \(B\) and \(X\) be nonsingular. We claim that \(\lambda\) is also an eigenvalue of \(A\). To show this, let \(y = Xu\) and see that: \[ Ay = XBX^{-1}(Xu) = XBu = \lambda (Xu) = \lambda y \]
So \(\lambda\) is also an eigenvalue of \(A\) with an associated eigenvector \(y\).
For every square matrix \(A \in \mathbb{C}^{n \times n}\), there exsist a unitary matrix \(U \in \mathbb{C}^{n \times n}\) (\(U^*U = I\)) and an upper triangular matrix \(T \in \mathbb{C}^{n \times n}\), such that
\[ \displaystyle A = UTU^* \]
The above formula is called the Schur form or Schur decomposition. Note that the above is also a similarity transform.
Since the eigenvalues of triangular matrices are the entries on its diagonal, the diagonal terms \(T_{ii}\) are eigenvalues of \(T\), and since \(T\) and \(A\) are similar, \(T_{ii}\)'s are also eigenvalues of \(A\).
Base Case \(A \in \mathbb{C}^{1 \times 1}\), then take \(U = 1\) and \(T = A\) (a scalar) then \(A = UTU^*\).
Assume the result holds for \((n - 1) \times (n - 1)\) matrices.
Let \(u\) be an eigenvector \(Au = \lambda u\), and \(\|u\| = 1\) and \(V\) an orthonormal basis for \(\text{span}(u)^\perp\), so that \([u \; V]\) is a unitary matrix. Then,
\[ [u \; V]^* A [u \; V] = \begin{bmatrix} u^*Au & u^*AV\\ V^*Au & V^*AV \end{bmatrix} = \begin{bmatrix} \lambda & u^*AV\\ 0 & V^*AV \end{bmatrix} \]
By induction hypothesis, \(V^*AV \in \mathbb{C}^{(n-1)\times(n-1)}\) must have a Schur form: \(V^*AV = \hat U \hat T \hat U^*\). Moreover since \([u \; V]\) is unitary, we have:
\[ \begin{aligned} A &= [u \; V] \begin{bmatrix} \lambda & u^*AV\\ 0 & \hat U \hat T \hat U^* \end{bmatrix} [u \; V]^* \\ &= [u \; V] \begin{bmatrix} 1 & 0\\ 0 & \hat U \end{bmatrix} \begin{bmatrix} \lambda & u^*AV\\ 0 & \hat T \end{bmatrix} \begin{bmatrix} 1 & 0\\ 0 & \hat U^* \end{bmatrix} [u \; V]^* \\ &= [u \; V\hat U] \begin{bmatrix} \lambda & u^*AV\\ 0 & \hat T \end{bmatrix} [u \; V \hat U]^* \\ &=\tilde U \tilde T \tilde U^* \end{aligned} \]
Note that \(\tilde T\) is clearly upper triangular and \(\tilde U\) is unitary since:
\[ \begin{aligned} \tilde U^* \tilde U &= [u \; V \hat U]^* [u \; V \hat U] \\ &= \begin{bmatrix} u^*u & u^*V\hat U\\ \hat U^* V^* u & \hat U^* V^* V \hat U \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & I \end{bmatrix} \end{aligned} \]
Thus, we showed that every \(A \in \mathbb{C}^{n\times n}\) has a Schur decomposition.
Let \(A \in \mathbb{C}^{n \times n}\), then the trace or \(A\) is the sum of the eigenvalues.
Suppose, \(A = UTU^*\) is the Schur decomposition of \(A\). Then, using the cyclic property of trace, we have:
\[ \begin{aligned} Tr(A) &= Tr(UTU^*) = Tr(TU^*U) \\ &= Tr(T) = \sum_{i}T_{ii} = \sum_{i} \lambda_i \end{aligned} \]
For Hermitian matrices, \(A = A^*\), we have:
\[ UTU^* = A = A^* = U T^* U^* \]
Since \(U\) must be nonsingular, we must have \(T = T^*\). So,
This means that Hermitian matrices have unitary diagonalization.
It can be shown that normal matrices \((A^*A = AA^*)\) are also unitary diagonalizable.
\[ \begin{aligned} A &= U\Lambda U^T = [u_1 \; \dots \; u_n] \begin{bmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_n \end{bmatrix} \begin{bmatrix} u_1^* \\ \vdots \\ u_n^* \end{bmatrix}\\ &= \sum_{i = 1}^n \lambda_i u_iu_i^* = \sum_{i = 1}^n \lambda_i P_i \end{aligned} \]
The matrices \(P_i\)'s are orthogonal projections since
We call a square matrix diagonalizable if it is similar to a diagonal matrix, that is there is a nonsingular matrix \(X\) such that
\(A = XDX^{-1}\)
The above factorization is called diagonalization.
A \((n \times n)\)-matrix is diagonalizable if and only if it has \(n\) linearly independent eigenvectors.
Example of diagonalizable matrices:
If \(A \in \mathbb{C}^{n \times n}\) is a diagonalizable matrix, then it can be factorized as:
\[ A = U\Lambda U^{*} \]
where \(\Lambda = \text{diag}(\lambda_1, \dots, \lambda_n)\) and \(U\) is a square matrix whose columns are the corresponding eigenvectors.
We also refer to \(A = U\Lambda U^{-1}\) as the canonical form of \(A\).
Eigendecomposition does not exist for every square matrix, since not all matrices are diagonalizable.
Give another poof that \(\rho(A) \le \|A\|\) (not using the Neumann series).
Prove that if \(A\) is diagonalizable and has only one eigenvalue, then \(A\) is a constant multiple of the identity matrix.
Can any invertible \((n\times n)\)-matrix be diagonalized?
Give another poof that \(\rho(A) \le \|A\|\) (not using the Neumann series).
Here we use the fact that operator norm is submultiplicative: \[ \begin{aligned} \rho(A) &= |\lambda_1| = |\lambda_1|\|u_1\| = \|\lambda_1u_1\| \\ &= \|Au_1\| \le \|A\| \|u_1\| = \|A\| \end{aligned} \]
Prove that if \(A\) is diagonalizable and has only one eigenvalue, then \(A\) is a constant multiple of the identity matrix.
Since the matrix is diagonalizable we have an eigendecomposition \(A = U\Lambda U^*\), where \(\Lambda = \lambda I\).
Since \(A\) has only one eigenvalue, \[A = U \lambda I U^* = \lambda U U^* = \lambda I\]
Can any invertible \((n\times n)\)-matrix be diagonalized?
No. To show this Take \(A = \begin{pmatrix} 1 & 1 \\ 0 & 1\end{pmatrix}\)
Since \(det(A) = 1\) so, \(A\) is invertible.
The eigenvalues of \(A\) are \(\lambda_1 = \lambda_2 = 1\), but there is only one corresponding unit-length eigenvector:
\(0 = (A - \lambda_1 I)u = \begin{pmatrix} 0 & 1 \\ 0 & 0\end{pmatrix} u \rightarrow u = \begin{pmatrix} 1 \\ 0\end{pmatrix}\)
Thus, \(A\) does not have 2 linearly independent eigenvectors and so is not diagonalizable.
Hermitian matrices \((A = A^*)\) have many nice properties:
Unfortunately, these nice features are not true in genereal for other matrices.
Every matrix \(A \in \mathbb{C}^{m\times n}\) can be factorized as \(A = U\Sigma V^*\), where \(U \in \mathbb{C}^{m\times m}\) and \(V \in \mathbb{C}^{n \times n}\) are square unitary matrices and \(\Sigma\) is a diagonal matrix with nonegative entries \(\sigma_i\).
This factorization is called a singular value decomposition (SVD) of \(A\).
We call diagonal entries \(\sigma_i\) the singular values of \(A\),
the columns of \(U\) and \(V\) are the left and right singular vectors respectively.
The number of nonzero singular values is equal to the rank of \(A\).
The induced 2-norm of a matrix \(\|A\|\) is equal to its largest singular value. \(\|A\|_2 = \sigma_{\max}\)
The Frobenius norm of a matrix \(\|A\|_F\) is equal to the Euclidean norm of the vector of its singular values.
\(\|A\|_F =\sqrt{\sum_{i = 1}^r \sigma_{i}^2}\)
2-norm and Frobenius norms are invariant under unitary transformations:
\(\|A\|_2 = \|U\Sigma V^*\|_2 = \|\Sigma\|_2 = \sigma_{\max}\)
\(\|A\|_F= \|U\Sigma V^*\|_F = \|\Sigma\|_2 = \sqrt{\sum_{i}\sigma_i^2}\)
Note that bothe \(A^*A\) and \(AA^*\) are Hermitian matrices, therefore diagonalizable. Furthermore,
\[ A^*A = (U\Sigma V^*)^* (U\Sigma V^*) = V\Sigma U^* U \Sigma V^* = V\Sigma^2 V^* \]
Similarly \(AA^* = U \Sigma^2 U^*\). These are eigendecompositions, so it must follow that
\[ \sigma_i = \sqrt{\lambda_i(A^*A)} = \sqrt{\lambda_i(AA^*)}, \quad i = 1, \dots, r \]
Let \(A\in \mathbb{C}^{m \times n}\) be a matrix of rank \(r\). Then the following is true:
\[ \mathcal{R}(A) = \text{span}(u_1, \dots, u_r), \; \; \mathcal{N}(A^*) = \text{span}(u_{r+1}, \dots, u_m) \]
\[ \mathcal{R}(A^*) = \text{span}(v_1, \dots, v_r), \; \; \mathcal{N}(A) = \text{span}(v_{r+1}, \dots, v_n) \]
First, we show that \(r\) left-singular-vectors associated with \(r\) nonzero singular values form a basis for the range of \(A\)
\[ Ax = U\Sigma V^* x = \sum_{i = 1}^n \sigma_i(v_i^*x)u_i =\sum_{i = 1}^r (\sigma_i c_i) u_i \]
The fundamental theorem of linear algebra states that for \(A\in \mathbb{C}^{m \times n}\), \(\mathcal{R}(A) \oplus \mathcal{N}(A^*) = \mathbb{C}^{m}\).
Thus, the last \(m - r\) singular vectors must span \(\mathcal{N}(A^*)\).
We apply the same reasoning to \(A^*\) to see that the first \(r\) and the last \(n - r\) right singular vectors span \(\mathcal{R}(A^*)\) and \(\mathcal{N}(A)\) respectively.
Let \(A \in \mathbb{C}^{m\times n}\), \(\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_r\), and \(0 \le k < n\), then \[
\min_{B: \text{rank}(B) = k} \|A - B\|_2 = \sigma_{k+1}
\] and if \(k < m \le n\), then \[
\min_{B: \text{rank}(B) = k} \|A - B\|_F = \sqrt{\sum_{i = k+1}^n \sigma_{i}^2}
\] and where in both cases the minimum is attained by \(B^* = A_k = \sum_{i = 1}^k \sigma_iu_iv_i^*\).
One of the useful applications of the low-rank matrix approximation is image compression
rm(list=ls()) library(jpeg) library(png) library(grid) library(gridExtra) im = readJPEG('~/Documents/images/tiger.jpg') im.red = im[, , 1] im.green = im[, , 2] im.blue = im[, , 3] col <- rgb(im.red, im.green, im.blue) dim(col) <- dim(im.blue) png(paste('~/Documents/images/', letters[1], ' Image Original.png', sep = '')) grid.raster(col, interpolate=FALSE) dev.off()
grid.raster(col, interpolate=FALSE)
ranks <- c(5, 10, 30, 50, 100) im.red.svd <- svd(im.red) im.green.svd <- svd(im.green) im.blue.svd <- svd(im.blue) for (j in 1:length(ranks)){ i <- ranks[j] red.compressed <- im.red.svd$u[,1:i] %*% diag(im.red.svd$d[1:i]) %*% t(im.red.svd$v[,1:i]) green.compressed <- im.green.svd$u[,1:i] %*% diag(im.green.svd$d[1:i]) %*% t(im.green.svd$v[,1:i]) blue.compressed <- im.blue.svd$u[,1:i] %*% diag(im.blue.svd$d[1:i]) %*% t(im.blue.svd$v[,1:i]) red.compressed[red.compressed <0] <- 0 green.compressed[green.compressed<0] <- 0 blue.compressed[blue.compressed<0] <- 0 png(paste('~/Documents/images/',letters[j+1], ' Image SVD Compressed ', i, '.png', sep = '')) maxVal <- max(red.compressed, green.compressed, blue.compressed) col <- rgb(red.compressed, green.compressed, blue.compressed, maxColorValue = maxVal) dim(col) <- dim(blue.compressed) grid.raster(col, interpolate=FALSE) dev.off() }
filenames <- dir(path = "~/Documents/images", pattern = 'Image', full.names = T) foo<-list() for(j in 1:length(filenames)) foo[[j]]<- readPNG(filenames[j]) g <- lapply(foo, rasterGrob)
do.call(grid.arrange, c(g, ncol=3))
orig.size <- object.size(im) frac.size <- c() size.compressed <- c() for (i in c(5, 10, 30, 50, 100)){ c.size <- 3*(object.size(im.red.svd$u[,1:i]) + object.size(im.red.svd$d[1:i]) + object.size(im.red.svd$v[,1:i])) f.size <- c.size/orig.size frac.size = c(frac.size, f.size) size.compressed = c(size.compressed, c.size) } # Size in bytes: orig.size # original picture
## 5155408 bytes
size.compressed # compressed pictures
## [1] 113904 226584 676680 1126920 2252520
frac.size # fraction of the original pictures
## [1] 0.02209408 0.04395074 0.13125634 0.21858988 0.43692371
We say that a matrix \(A \in \mathbb{R}^{n \times n}\) is symmetric, positive-definite (SPD) if for all \(v \neq 0\) it is true that:
\[ \frac{v^* B v}{v^* v} > 0. \]
The left hand side formula is also called the Rayleigh quotient of \(A\). The above definition can be extended to complex matrices, in which case we say that \(A\) is Hermitian, positive-definite (HPD).