Linear Algebra and Calculus refresher
By Afshine Amidi and Shervine Amidi
General notations
Definitions
Vector We note $x\in\mathbb{R}^n$ a vector with $n$ entries, where $x_i\in\mathbb{R}$ is the $i^{th}$ entry:
Matrix We note $A\in\mathbb{R}^{m\times n}$ a matrix with $m$ rows and $n$ columns, where $A_{i,j}\in\mathbb{R}$ is the entry located in the $i^{th}$ row and $j^{th}$ column:
Remark: the vector $x$ defined above can be viewed as a $n\times1$ matrix and is more particularly called a column-vector.
Main matrices
Identity matrix The identity matrix $I\in\mathbb{R}^{n\times n}$ is a square matrix with ones in its diagonal and zero everywhere else:
Remark: for all matrices $A\in\mathbb{R}^{n\times n}$, we have $A\times I=I\times A=A$.
Diagonal matrix A diagonal matrix $D\in\mathbb{R}^{n\times n}$ is a square matrix with nonzero values in its diagonal and zero everywhere else:
Remark: we also note $D$ as $\textrm{diag}(d_1,...,d_n)$.
Matrix operations
Multiplication
Vector-vector There are two types of vector-vector products:
- inner product: for $x,y\in\mathbb{R}^n$, we have:
\[\boxed{x^Ty=\sum_{i=1}^nx_iy_i\in\mathbb{R}}\]
- outer product: for $x\in\mathbb{R}^m, y\in\mathbb{R}^n$, we have:
\[\boxed{xy^T=\left(\begin{array}{ccc}x_1y_1& \cdots&x_1y_n\\\vdots&& \vdots\\x_my_1& \cdots&x_my_n\end{array}\right)\in\mathbb{R}^{m\times n}}\]
Matrix-vector The product of matrix $A\in\mathbb{R}^{m\times n}$ and vector $x\in\mathbb{R}^{n}$ is a vector of size $\mathbb{R}^{m}$, such that:
Matrix-matrix The product of matrices $A\in\mathbb{R}^{m\times n}$ and $B\in\mathbb{R}^{n\times p}$ is a matrix of size $\mathbb{R}^{m\times p}$, such that:
Other operations
Transpose The transpose of a matrix $A\in\mathbb{R}^{m\times n}$, noted $A^T$, is such that its entries are flipped:
Remark: for matrices $A,B$, we have $(AB)^T=B^TA^T$.
Inverse The inverse of an invertible square matrix $A$ is noted $A^{-1}$ and is the only matrix such that:
Remark: not all square matrices are invertible. Also, for matrices $A,B$, we have $(AB)^{-1}=B^{-1}A^{-1}$
Trace The trace of a square matrix $A$, noted $\textrm{tr}(A)$, is the sum of its diagonal entries:
Remark: for matrices $A,B$, we have $\textrm{tr}(A^T)=\textrm{tr}(A)$ and $\textrm{tr}(AB)=\textrm{tr}(BA)$
Determinant The determinant of a square matrix $A\in\mathbb{R}^{n\times n}$, noted $|A|$ or $\textrm{det}(A)$ is expressed recursively in terms of $A_{\backslash i, \backslash j}$, which is the matrix A without its $i^{th}$ row and $j^{th}$ column, as follows:
Remark: $A$ is invertible if and only if $|A|\neq0$. Also, $|AB|=|A||B|$ and $|A^T|=|A|$.
Matrix properties
Definitions
Symmetric decomposition A given matrix $A$ can be expressed in terms of its symmetric and antisymmetric parts as follows:
Norm A norm is a function $N:V\longrightarrow[0,+\infty[$ where $V$ is a vector space, and such that for all $x,y\in V$, we have:
- $N(x+y)\leqslant N(x)+N(y)$
- $N(ax)=|a|N(x)$ for $a$ scalar
- if $N(x)=0$, then $x=0$
For $x\in V$, the most commonly used norms are summed up in the table below:
Norm | Notation | Definition | Use case |
Manhattan, $L^1$ | $||x||_1$ | $\displaystyle\sum_{i=1}^n|x_i|$ | LASSO regularization |
Euclidean, $L^2$ | $||x||_2$ | $\displaystyle\sqrt{\sum_{i=1}^nx_i^2}$ | Ridge regularization |
$p$-norm, $L^p$ | $||x||_p$ | $\displaystyle\left(\sum_{i=1}^nx_i^p\right)^{\frac{1}{p}}$ | Hölder inequality |
Infinity, $L^{\infty}$ | $||x||_{\infty}$ | $\underset{i}{\textrm{max }}|x_i|$ | Uniform convergence |
Linearly dependence A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.
Remark: if no vector can be written this way, then the vectors are said to be linearly independent.
Matrix rank The rank of a given matrix $A$ is noted $\textrm{rank}(A)$ and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of $A$.
Positive semi-definite matrix A matrix $A\in\mathbb{R}^{n\times n}$ is positive semi-definite (PSD) and is noted $A\succeq 0$ if we have:
Remark: similarly, a matrix $A$ is said to be positive definite, and is noted $A\succ0$, if it is a PSD matrix which satisfies for all non-zero vector $x$, $x^TAx>0$.
Eigenvalue, eigenvector Given a matrix $A\in\mathbb{R}^{n\times n}$, $\lambda$ is said to be an eigenvalue of $A$ if there exists a vector $z\in\mathbb{R}^n\backslash\{0\}$, called eigenvector, such that we have:
Spectral theorem Let $A\in\mathbb{R}^{n\times n}$. If $A$ is symmetric, then $A$ is diagonalizable by a real orthogonal matrix $U\in\mathbb{R}^{n\times n}$. By noting $\Lambda=\textrm{diag}(\lambda_1,...,\lambda_n)$, we have:
Singular-value decomposition For a given matrix $A$ of dimensions $m\times n$, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of $U$ $m\times m$ unitary, $\Sigma$ $m\times n$ diagonal and $V$ $n\times n$ unitary matrices, such that:
Matrix calculus
Gradient Let $f:\mathbb{R}^{m\times n}\rightarrow\mathbb{R}$ be a function and $A\in\mathbb{R}^{m\times n}$ be a matrix. The gradient of $f$ with respect to $A$ is a $m\times n$ matrix, noted $\nabla_A f(A)$, such that:
Remark: the gradient of $f$ is only defined when $f$ is a function that returns a scalar.
Hessian Let $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ be a function and $x\in\mathbb{R}^{n}$ be a vector. The hessian of $f$ with respect to $x$ is a $n\times n$ symmetric matrix, noted $\nabla_x^2 f(x)$, such that:
Remark: the hessian of $f$ is only defined when $f$ is a function that returns a scalar.
Gradient operations For matrices $A,B,C$, the following gradient properties are worth having in mind: