Triangular Systems

Biostat/Biomath M257

Author

Dr. Hua Zhou @ UCLA

Published

April 19, 2023

System information (for reproducibility):

versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code

Load packages:

using Pkg

Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
  Activating project at `~/Documents/github.com/ucla-biostat-257/2023spring/slides/10-trisys`
Status `~/Documents/github.com/ucla-biostat-257/2023spring/slides/10-trisys/Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.2
  [37e2e46d] LinearAlgebra
  [9a3f8284] Random

For the next couple of lectures, we consider computer algorithms for solving linear equations \(\mathbf{A} \mathbf{x} = \mathbf{b}\), a ubiquitous task in statistics.

Key idea: turning original problem into an easy one, e.g., triangular system.

1 Lower triangular system

To solve \(\mathbf{A} \mathbf{x} = \mathbf{b}\), where \(\mathbf{A} \in \mathbb{R}^{n \times n}\) is lower triangular

\[ \begin{pmatrix} a_{11} & 0 & \cdots & 0 \\ a_{21} & a_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} = \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{pmatrix}. \]

  • Forward substitution: \[ \begin{eqnarray*} x_1 &=& b_1 / a_{11} \\ x_2 &=& (b_2 - a_{21} x_1) / a_{22} \\ x_3 &=& (b_3 - a_{31} x_1 - a_{32} x_2) / a_{33} \\ &\vdots& \\ x_n &=& (b_n - a_{n1} x_1 - a_{n2} x_2 - \cdots - a_{n,n-1} x_{n-1}) / a_{nn}. \end{eqnarray*} \]

  • \(1 + 3 + 5 + \cdots + (2n-1) = n^2\) flops.

  • \(\mathbf{A}\) can be accessed by row (\(ij\) loop) or column (\(ji\) loop).

2 Upper triangular system

To solve \(\mathbf{A} \mathbf{x} = \mathbf{b}\), where \(\mathbf{A} \in \mathbb{R}^{n \times n}\) is upper triangular
\[ \begin{pmatrix} a_{11} & \cdots & a_{1,n-1} & a_{1n} \\ \vdots & \ddots & \vdots & \vdots \\ 0 & \cdots & a_{n-1,n-1} & a_{n-1,n} \\ 0 & 0 & 0 & a_{nn} \end{pmatrix} \begin{pmatrix} x_1 \\ \vdots \\ x_{n-1} \\ x_n \end{pmatrix} = \begin{pmatrix} b_1 \\ \vdots \\ b_{n-1} \\ b_n \end{pmatrix}. \]

  • Back substitution \[ \begin{eqnarray*} x_n &=& b_n / a_{nn} \\ x_{n-1} &=& (b_{n-1} - a_{n-1,n} x_n) / a_{n-1,n-1} \\ x_{n-2} &=& (b_{n-2} - a_{n-2,n-1} x_{n-1} - a_{n-2,n} x_n) / a_{n-2,n-2} \\ &\vdots& \\ x_1 &=& (b_1 - a_{12} x_2 - a_{13} x_3 - \cdots - a_{1,n} x_{n}) / a_{11}. \end{eqnarray*} \]

  • \(n^2\) flops.

  • \(\mathbf{A}\) can be accessed by row (\(ij\) loop) or column (\(ji\) loop).

3 Implementation

  • BLAS level 2 function: ?trsv (triangular solve with one right hand side).

  • BLAS level 3 function: ?trsm (matrix triangular solve, i.e., multiple right hand sides).

  • Julia

    • The left divide \ operator in Julia is used for solving linear equations or least squares problem.
    • If A is a triangular matrix, the command A \ b uses forward or backward substitution
    • Or we can call the BLAS wrapper functions directly: trsv!, trsv, trsm!, trsm
using LinearAlgebra, Random

Random.seed!(257) # seed
n = 5
A = randn(n, n)
b = randn(n)
# a random matrix
A
5×5 Matrix{Float64}:
  0.679063   1.52556    0.234923  -0.111974  -1.10328
  1.24568   -1.69501   -1.12138   -0.16883    1.88349
 -1.21007   -0.245347   0.222238  -0.501868   0.488434
 -0.817491  -0.131288  -1.15676    0.86352    0.0769591
  1.04395   -1.06533    0.708506  -1.32251   -2.28938
Al = LowerTriangular(A) # does not create extra matrix
5×5 LowerTriangular{Float64, Matrix{Float64}}:
  0.679063    ⋅          ⋅          ⋅         ⋅ 
  1.24568   -1.69501     ⋅          ⋅         ⋅ 
 -1.21007   -0.245347   0.222238    ⋅         ⋅ 
 -0.817491  -0.131288  -1.15676    0.86352    ⋅ 
  1.04395   -1.06533    0.708506  -1.32251  -2.28938
dump(Al)
LowerTriangular{Float64, Matrix{Float64}}
  data: Array{Float64}((5, 5)) [0.6790633442371218 1.5255628642992316 … -0.11197407057583378 -1.1032824444790374; 1.2456776800889142 -1.6950136862944665 … -0.16883028457206983 1.8834907119704818; … ; -0.8174908512677062 -0.1312875922954256 … 0.8635202236283535 0.07695906622703877; 1.0439509789805828 -1.0653277558175929 … -1.3225143450097687 -2.289382892165633]
Al.data
5×5 Matrix{Float64}:
  0.679063   1.52556    0.234923  -0.111974  -1.10328
  1.24568   -1.69501   -1.12138   -0.16883    1.88349
 -1.21007   -0.245347   0.222238  -0.501868   0.488434
 -0.817491  -0.131288  -1.15676    0.86352    0.0769591
  1.04395   -1.06533    0.708506  -1.32251   -2.28938
# same data
pointer(Al.data), pointer(A)
(Ptr{Float64} @0x000000017c93c140, Ptr{Float64} @0x000000017c93c140)
Al \ b # dispatched to BLAS function for triangular solve
5-element Vector{Float64}:
 -0.6752595578784236
 -0.3076650040919988
 -4.102671130071105
 -5.384670255453669
  1.4844958985110646
# or use BLAS wrapper directly
BLAS.trsv('L', 'N', 'N', A, b)
5-element Vector{Float64}:
 -0.6752595578784236
 -0.3076650040919988
 -4.102671130071105
 -5.384670255453669
  1.4844958985110646
?BLAS.trsv
trsv(ul, tA, dA, A, b)

Return the solution to A*x = b or one of the other two variants determined by tA and ul. dA determines if the diagonal values are read or are assumed to be all ones.

  • Some other BLAS functions for triangular systems such as triangular matrix-vector and triangular matrix-matrix multiplications: trmv, trmv!, trmm, trmm!

4 Some algebraic facts of triangular system (HW1)

  • Eigenvalues of a triangular matrix \(\mathbf{A}\) are diagonal entries \(\lambda_i = a_{ii}\).

  • Determinant \(\det(\mathbf{A}) = \prod_i a_{ii}\).

  • The product of two upper (lower) triangular matrices is upper (lower) triangular.

  • The inverse of an upper (lower) triangular matrix is upper (lower) triangular.

  • The product of two unit upper (lower) triangular matrices is unit upper (lower) triangular.

  • The inverse of a unit upper (lower) triangular matrix is unit upper (lower) triangular.

5 Julia types for triangular matrices

LowerTriangular, UnitLowerTriangular, UpperTriangular, UnitUpperTriangular.

A
5×5 Matrix{Float64}:
  0.679063   1.52556    0.234923  -0.111974  -1.10328
  1.24568   -1.69501   -1.12138   -0.16883    1.88349
 -1.21007   -0.245347   0.222238  -0.501868   0.488434
 -0.817491  -0.131288  -1.15676    0.86352    0.0769591
  1.04395   -1.06533    0.708506  -1.32251   -2.28938
LowerTriangular(A)
5×5 LowerTriangular{Float64, Matrix{Float64}}:
  0.679063    ⋅          ⋅          ⋅         ⋅ 
  1.24568   -1.69501     ⋅          ⋅         ⋅ 
 -1.21007   -0.245347   0.222238    ⋅         ⋅ 
 -0.817491  -0.131288  -1.15676    0.86352    ⋅ 
  1.04395   -1.06533    0.708506  -1.32251  -2.28938
LinearAlgebra.UnitLowerTriangular(A)
5×5 UnitLowerTriangular{Float64, Matrix{Float64}}:
  1.0         ⋅          ⋅          ⋅        ⋅ 
  1.24568    1.0         ⋅          ⋅        ⋅ 
 -1.21007   -0.245347   1.0         ⋅        ⋅ 
 -0.817491  -0.131288  -1.15676    1.0       ⋅ 
  1.04395   -1.06533    0.708506  -1.32251  1.0
using BenchmarkTools, LinearAlgebra, Random

Random.seed!(257) # seed
A = randn(1000, 1000);
# if we don't tell Julia it's triangular: O(n^3) complexity
# tril(A) returns a full triangular matrix, same as Matlab
@benchmark eigvals((tril($A)))
BenchmarkTools.Trial: 204 samples with 1 evaluation.
 Range (minmax):  23.438 ms 26.971 ms   GC (min … max): 0.00% … 4.19%
 Time  (median):     24.436 ms                GC (median):    0.00%
 Time  (mean ± σ):   24.552 ms ± 684.572 μs   GC (mean ± σ):  1.13% ± 1.59%
   ▂      ▄▃▂▄ ▇   ▇   ▄▃▃                                  
  ▃███▇▃▆▅████████▇███▇▇████▁▆█▆▇▅▃▆▃▃▆▆▆▃▅▃▁▅▃▁▃▁▁▁▁▁▃▁▁▁▃▃ ▅
  23.4 ms         Histogram: frequency by time         26.7 ms <
 Memory estimate: 15.55 MiB, allocs estimate: 15.
# if we tell Julia it's triangular: O(n) complexity
@benchmark eigvals((LowerTriangular($A)))
BenchmarkTools.Trial: 10000 samples with 197 evaluations.
 Range (minmax):  435.492 ns18.383 μs   GC (min … max):  0.00% … 94.06%
 Time  (median):     835.868 ns               GC (median):     0.00%
 Time  (mean ± σ):     1.076 μs ±  1.354 μs   GC (mean ± σ):  23.57% ± 16.54%
  ▅▅█                                                       ▂
  ███▆▃▆▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▃▅▅▅▆▇▆▇▇▇▇▇▆▇▇▇▇▅ █
  435 ns        Histogram: log(frequency) by time      8.54 μs <
 Memory estimate: 7.94 KiB, allocs estimate: 1.
# if we don't tell Julia it's triangular: O(n^3) complexity
# tril(A) returns a full triangular matrix, same as Matlab
@benchmark det((tril($A)))
BenchmarkTools.Trial: 5322 samples with 1 evaluation.
 Range (minmax):  507.208 μs  2.788 ms   GC (min … max):  0.00% … 69.42%
 Time  (median):     818.458 μs                GC (median):     0.00%
 Time  (mean ± σ):   938.497 μs ± 287.467 μs   GC (mean ± σ):  12.61% ± 17.23%
             █▇▃                                                 
  ▂▁▁▁▁▁▁▁▂▂▇████▇▇▆▅▃▃▂▂▂▂▂▂▂▂▁▂▂▁▁▂▁▂▂▃▃▃▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂ ▃
  507 μs           Histogram: frequency by time         1.81 ms <
 Memory estimate: 7.64 MiB, allocs estimate: 3.
# if we tell Julia it's triangular: O(n) complexity
@benchmark det(LowerTriangular($A))
BenchmarkTools.Trial: 10000 samples with 155 evaluations.
 Range (minmax):  673.923 ns17.028 μs   GC (min … max):  0.00% … 92.98%
 Time  (median):       1.074 μs               GC (median):     0.00%
 Time  (mean ± σ):     1.329 μs ±  1.601 μs   GC (mean ± σ):  20.24% ± 14.86%
  ▅▆                                                        ▁
  ██▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅▅▅▆▄▆▇▇█▇▇ █
  674 ns        Histogram: log(frequency) by time      10.9 μs <
 Memory estimate: 7.94 KiB, allocs estimate: 1.
@benchmark det(LowerTriangular($A))
BenchmarkTools.Trial: 10000 samples with 155 evaluations.
 Range (minmax):  672.852 ns17.444 μs   GC (min … max):  0.00% … 93.73%
 Time  (median):       1.065 μs               GC (median):     0.00%
 Time  (mean ± σ):     1.340 μs ±  1.613 μs   GC (mean ± σ):  19.93% ± 14.72%
  ▄▇                                                        ▂
  ███▅▁▃▅▅▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅▆▇█▇▇▇▇▆▆▆ █
  673 ns        Histogram: log(frequency) by time      10.9 μs <
 Memory estimate: 7.94 KiB, allocs estimate: 1.