Gaussian elimination applies a sequence of elementary operator matrices to transform the linear system \(\mathbf{A} \mathbf{x} = \mathbf{b}\) to an upper triangular system \[
\begin{eqnarray*}
\mathbf{E}_{n,n-1}(c_{n,n-1}) \cdots \mathbf{E}_{21}(c_{21}) \mathbf{A} \mathbf{x} &=& \mathbf{E}_{n,n-1}(c_{n,n-1}) \cdots \mathbf{E}_{21}(c_{21}) \mathbf{b} \\
\mathbf{U} \mathbf{x} &=& \mathbf{b}_{\text{new}}.
\end{eqnarray*}
\]
The whole LU algorithm is done in place, i.e., \(\mathbf{A}\) is overwritten by \(\mathbf{L}\) and \(\mathbf{U}\).
LU decomposition exists if the principal sub-matrix \(\mathbf{A}[1:k, 1:k]\) is non-singular for \(k=1,\ldots,n-1\).
If the LU decomposition exists and \(\mathbf{A}\) is non-singular, then the LU decmpositon is unique and \[
\det(\mathbf{A}) = \det(\mathbf{L}) \det(\mathbf{U}) = \prod_{k=1}^n u_{kk}.
\]
Actual implementations can differ: outer product LU (\(kij\) loop), block outer product LU (higher level-3 fraction), Crout’s algorithm (\(jki\) loop).
Given LU decomposition \(\mathbf{A} = \mathbf{L} \mathbf{U}\), solving \((\mathbf{L} \mathbf{U}) \mathbf{x} = \mathbf{b}\) for one right hand side costs \(2n^2\) flops:
One forward substitution (\(n^2\) flops) to solve \[
\mathbf{L} \mathbf{y} = \mathbf{b}
\]
One back substitution (\(n^2\) flops) to solve \[
\mathbf{U} \mathbf{x} = \mathbf{y}
\]
Therefore to solve \(\mathbf{A} \mathbf{x} = \mathbf{b}\) via LU, we need a total of \[
\frac 23 n^3 + 2n^2 \quad \text{flops}.
\]
If there are multiple right hand sides, LU only needs to be done once.
4 Matrix inversion
For matrix inversion, we need to solve \(\mathbf{A} \mathbf{X} = \mathbf{I}\), or \(\mathbf{A} \mathbf{x}_i = \mathbf{e}_i\) for \(i=1,\ldots,n\). There are \(n\) right hand sides \(\mathbf{e}_1, \ldots, \mathbf{e}_n\). Naively, we may need \(\frac 23 n^3 + 2n^3\) flops. But taking advantage of 0s reduces the second term \(2n^3\) to \(\frac 43 n^3\).
So matrix inversion via LU costs \[
\frac 23 n^3 + \frac 43 n^3 = 2n^3 \quad \text{flops}.
\] This is 3x more expensive than just solving one equation.
No inversion mentality:
Whenever we see matrix inverse, we should think in terms of solving linear equations.
We do not compute matrix inverse unless
it is necessary to compute standard errors
number of right hand sides is much larger than \(n\)
\(n\) is small
5 Pivoting
Let \[
\mathbf{A} = \begin{pmatrix}
0 & 1 \\
1 & 0 \\
\end{pmatrix}.
\] Is there a solution to \(\mathbf{A} \mathbf{x} = \mathbf{b}\) for an arbitrary \(\mathbf{b}\)? Does GE/LU work for \(\mathbf{A}\)?
What if, during LU procedure, the pivot\(a_{kk}\) is 0 or nearly 0 due to underflow?
Solution: pivoting.
Partial pivoting: before zeroing the \(k\)-th column, the row with \(\max_{i=k}^n |a_{ik}|\) is moved into the \(k\)-th row.
LU with partial pivoting yields \[
\mathbf{P} \mathbf{A} = \mathbf{L} \mathbf{U},
\] where \(\mathbf{P}\) is a permutation matrix, \(\mathbf{L}\) is unit lower triangular with \(|\ell_{ij}| \le 1\), and \(\mathbf{U}\) is upper triangular.
Complete pivoting: Do both row and column interchanges so that the largest entry in the sub matrix A[k:n, k:n] is permuted to the \((k,k)\)-th entry. This yields the decomposition \[
\mathbf{P} \mathbf{A} \mathbf{Q} = \mathbf{L} \mathbf{U},
\] where \(|\ell_{ij}| \le 1\).
LU decomposition with partial pivoting is the most commonly used methods for solving general linear systems. Complete pivoting is the most stable but costs more computation. Partial pivoting is stable most of times.
6 LAPACK and Julia implementation
LAPACK: ?getrf does \(\mathbf{P} \mathbf{A} = \mathbf{L} \mathbf{U}\) (LU decomposition with partial pivoting) in place.
R: solve() implicitly performs LU decomposition (wrapper of LAPACK routine dgesv). solve() allows specifying a single or multiple right hand sides. If none, it computes the matrix inverse. The matrix package contains lu() function that outputs L, U, and pivot.