Basic Idea

Basic Operator

Vector

Inner product

\[\begin{align} \vec{x} \cdot \vec{y} = \vec{x}^{\intercal} \vec{y} = \sum_{i=1}^{d} x_{i}y_{i} & & \vec{x}, \vec{y} \in \mathbb{R}^{d} \end{align}\]

Outer product

\[\begin{align} \vec{x} \times \vec{y} = \vec{x} \vec{y}^{\intercal} = \begin{bmatrix} \begin{array} (x_{1}y_{1} & \cdots & x_{1}y_{d} \\ \vdots & \ddots & \vdots \\ x_{d}y_{1} & \cdots & x_{d}y_{d} \end{array} \end{bmatrix} & & \vec{x}, \vec{y} \in \mathbb{R}^{d} \end{align}\]

\(L_p\) Norm

\[\begin{align} \lVert \vec{x} \rVert_p = ( \sum_{i=1}^{d} \lvert x_i \rvert ^p )^{\frac{1}{p}} & & \vec{x} \in \mathbb{R}^d \end{align}\]

For \(p=1\), it degrades to Manhattan Norm

\[|\vec{x}| = \sum_{i=1}^{d} \lvert x_i \rvert\]

For \(p = 2\), it degrades to Euclidean Norm

\[\lVert \vec{x} \rVert = \sqrt{ \sum_{i=1}^{d} x_i^2}\]

And Euclidean Norm can be presented with inner product

\[\lVert \vec{x} \rVert^2 = \vec{x} \cdot \vec{x}\]

For \(p = 0\), it defines the number of non-negative entries. Be care, it is not a real norm! \(\lVert \vec{x} \rVert_0 = |\{i | \beta_i \ne 0\}|\)

Orthogonal vector

\[\begin{cases} \vec{x} \cdot \vec{y} = 0 \\ cos(\vec{x}, \vec{y}) = 0 \end{cases} \Rightarrow \text{ x, y are a pair of orthogonal vectors}\]

Orthonormal vector

\[\begin{cases} \vec{x} \cdot \vec{y} = 0 \\ cos(\vec{x}, \vec{y}) = 0 \\ \lVert \vec{x} \rVert = \lVert \vec{y} \rVert = 1 \end{cases} \Rightarrow \text{ x, y are a pair of orthonormal vectors}\]

Basic inequality

Cauchy-Schwarz inequality

\[\lvert \vec{x} \cdot \vec{y} \rvert \le \lVert \vec{x} \rVert \lVert \vec{y} \rVert\]

Triangle Inequality

\[\lVert \vec{x} - \vec{y} \rVert \le \lVert \vec{x} \rVert + \lVert \vec{y} \rVert\]

Angle

\[\cos(\vec{x}, \vec{y}) = \frac{\vec{x} \cdot \vec{y}}{\lVert \vec{x} \rVert \lVert \vec{y} \rVert}\]

Matrix

Matrix product

Numerical view

\[\begin{align} [A B]_{ij} = \sum_{l = 1}^{k} a_{il}b_{lj} &&A \in \mathbb{R}^{n \times k}, B \in \mathbb{R}^{k \times d} \end{align}\]

Row vector view

\[\begin{align} [AB]_{i\cdot} = \sum_{l=1}^{k} a_{il} B_{l \cdot} & & A \in \mathbb{R}^{n \times k}, B \in \mathbb{R}^{k \times d} \end{align}\]

It means each row \(i\) in the final matrix is the combination of all rows in matrix B with weight \(A_{i \cdot}\)

Column vector view

\[\begin{align} [AB]_{\cdot j} = \sum_{l=1}^{k} A_{\cdot l} b_{lj} & & A \in \mathbb{R}^{n \times k}, B \in \mathbb{R}^{k \times d} \end{align}\]

It means each column \(j\) in the final matrix is the combination of all column in matrix \(A\) with weight \(B_{\cdot j}\)

Frobenius norm

\[\lVert A \rVert_F = \lVert A^\intercal \rVert_F = \sqrt{ \sum_{i = 1}^{n} \sum_{j=1}^{d} a_{ij}^2 }\]

It is a concrete case of element-wise norm. Also, it is a concrete case of \(L_{p, q}\) norm.

\[\lVert \vec{x} \times \vec{y} \rVert_F = \lVert \vec{x} \rVert \lVert \vec{y} \rVert\]

Trace

\[\begin{align} tr(A) = \sum_{i=1}^{n} a_{ii} & & A \in \mathbb{R}^{n \times n} \end{align}\]

Equality about trace

\[\lVert A \rVert_{F}^{2} = Energy(A) = tr(AA^\intercal) = tr(A^\intercal A)\] \[\begin{align} tr(CD^\intercal) = tr(DC^\intercal) = \sum_{i=1}^n \sum_{j=1}^{d} c_{ij}d_{ij} \\ tr(AB) = tr(BA) = \sum_{i=1}^n \sum_{j=1}^{d} a_{ij}b_{ji} \end{align}\]

Rank

TODO()

Matrix Decompose

Decompose a complex matrix into the product of many simple matrices.

Elementary matrix operator

Interchange
Addition
Scaling

By using the row and column view of matrix product

Examples (row operators):

Interchange	Addition	Scaling
\(\begin{bmatrix} \begin{array} (0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \\ \end{array} \end{bmatrix}\)	\(\begin{bmatrix} \begin{array} (1 & c & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} \end{bmatrix}\)	\(\begin{bmatrix} \begin{array} (1 & 0 & 0 \\ 0 & c & 0 \\ 0 & 0 & 1 \\ \end{array} \end{bmatrix}\)
Interchange rows 1, 2	Add \(c \times\) (row 2) to row 1	Multiply row 2 by c

Geometry decomposable operator

Designed to transform a column vector \(\boldsymbol{\vec{x}}\)

Rotation
Reflection
Scaling

Examples (2-dimension):

Interchange	Addition	Scaling
\(\begin{bmatrix} \begin{array} ( \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{array} \end{bmatrix}\)	\(\begin{bmatrix} \begin{array} (1 & 0 \\ 0 & -1 \end{array} \end{bmatrix}\)	\(\begin{bmatrix} \begin{array} (c_1 & 0 \\ 0 & c_2 \end{array} \end{bmatrix}\)
Rotate counter-clockwise by \(\theta\)	Reflect across X-axis	Scale \(x\) and \(y\) by factor \(c_1\) and \(c_2\)

Matrix factorization (Optional)

Linear Transformation

Linear Transform

A vector-to-vector function \(f(\vec{x})\) defines a linear transform of \(\vec{x}\), if it follows:

\[\begin{align} f(c \vec{x}) & = c \cdot f(\vec{x}) \\ f(\vec{x} + \vec{y})& = f(\vec{x}) + f(\vec{y}) \end{align}\]

Translation

\[f(\vec{x}) = \vec{x} + \vec{b}\]

Affine Transform

an affine transform is a combination of a linear transform with a translation

A vector-to-vector function \(f(\vec{x})\) defines a linear transform of \(\vec{x}\), for any scalar \(\lambda\) if it follows:

\[f(\lambda \vec{x} + (1-\lambda)\vec{y}) = \lambda f(\vec{x}) + (1-\lambda)f(\vec{y})\]

All linear transforms are special cases of affine transforms, but not vice versa.

Any linear mapping \(f(x)\) from d-dimensional vectors to n-dimensional vectors can be represented as the matrix-to-vector product \(A\vec{x}\).

LU Decomposition

QR Decomposition

SVD(Single Value Decomposition)