A list of “easy” linear systems

Biostat/Biomath M257

Author

Dr. Hua Zhou @ UCLA

Published

May 4, 2023

System information (for reproducibility):

versioninfo()

Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code

Load packages:

using Pkg

Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()

  Activating project at `~/Documents/github.com/ucla-biostat-257/2023spring/slides/19-easylineq`

Status `~/Documents/github.com/ucla-biostat-257/2023spring/slides/19-easylineq/Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.2
  [42fd0dbc] IterativeSolvers v0.9.2
  [b51810bb] MatrixDepot v1.0.10
  [af69fa37] Preconditioners v0.6.0
  [b8865327] UnicodePlots v3.5.2
  [efce3f68] WoodburyMatrices v0.5.5
  [37e2e46d] LinearAlgebra
  [9a3f8284] Random
  [2f01184e] SparseArrays

1 Introduction

Consider \(\mathbf{A} \mathbf{x} = \mathbf{b}\), \(\mathbf{A} \in \mathbb{R}^{n \times n}\). Or, consider matrix inverse (if you want). \(\mathbf{A}\) can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, … We should be alert to many easy linear systems.

Don’t blindly use A \ b and inv in Julia or solve function in R. Don’t waste computing resources by bad choices of algorithms!

1.1 Diagonal matrix

Diagonal \(\mathbf{A}\): \(n\) flops. Use Diagonal type of Julia.

using BenchmarkTools, LinearAlgebra, Random

# generate random data
Random.seed!(123)
n = 1000
A = diagm(0 => randn(n)) # a diagonal matrix stored as Matrix{Float64}
b = randn(n);

# should give link to the source code
@which A \ b

\(A::AbstractMatrix, B::AbstractVecOrMat) in LinearAlgebra at /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/generic.jl:1096

# check `istril(A)` and `istriu(A)` (O(n^2)), then call `Diagonal(A) \ b` (O(n))
@benchmark $A \ $b

BenchmarkTools.Trial: 9747 samples with 1 evaluation.
 Range (min … max):  489.041 μs …  2.349 ms  ┊ GC (min … max): 0.00% … 78.27%
 Time  (median):     502.334 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   511.837 μs ± 27.936 μs  ┊ GC (mean ± σ):  0.04% ±  0.79%
  ▅█▄▄▆▄▃▂▃▃▂▂▂▂▂▂▁▁▁     ▁     ▁       ▄▇▄▃▆▃▂▁▂▂▁▁▁▁         ▂
  ████████████████████████████████████▇███████████████████▇▇▇▆ █
  489 μs        Histogram: log(frequency) by time       554 μs <
 Memory estimate: 15.88 KiB, allocs estimate: 2.

# O(n) computation, no extra array allocation
@benchmark Diagonal($A) \ $b

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  950.000 ns … 197.258 μs  ┊ GC (min … max):  0.00% … 97.54%
 Time  (median):       1.646 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):     2.224 μs ±   8.637 μs  ┊ GC (mean ± σ):  22.07% ±  5.62%
                   ▂▆█▇▄▁    ▁                                   
  ▃▂▂▁▁▁▂▂▂▂▂▂▂▄▇▅▇██████▆▅▅▇██▆▆▄▄▃▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  950 ns           Histogram: frequency by time         2.95 μs <
 Memory estimate: 15.88 KiB, allocs estimate: 2.

1.2 Bidiagonal, tridiagonal, and banded matrices

Bidiagonal, tridiagonal, or banded \(\mathbf{A}\): Band LU, band Cholesky, … roughly \(O(n)\) flops.
* Use Bidiagonal, Tridiagonal, SymTridiagonal types of Julia.

Random.seed!(123) 

n  = 1000
dv = randn(n)
ev = randn(n - 1)
b  = randn(n) # rhs
# symmetric tridiagonal matrix
A  = SymTridiagonal(dv, ev)

1000×1000 SymTridiagonal{Float64, Vector{Float64}}:
 0.808288   0.598212    ⋅         ⋅        …    ⋅           ⋅          ⋅ 
 0.598212  -1.12207   -1.00753    ⋅             ⋅           ⋅          ⋅ 
  ⋅        -1.00753   -1.10464   1.97923        ⋅           ⋅          ⋅ 
  ⋅          ⋅         1.97923  -0.416993       ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅       -0.34622        ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
 ⋮                                         ⋱                         
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅            1.18201      ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅           -0.0625319  -0.271376    ⋅ 
  ⋅          ⋅          ⋅         ⋅           -0.271376   -1.25053    1.49717
  ⋅          ⋅          ⋅         ⋅             ⋅          1.49717   -0.34637

# convert to a full matrix
Afull = Matrix(A)

# LU decomposition (2/3) n^3 flops!
@benchmark $Afull \ $b

BenchmarkTools.Trial: 1284 samples with 1 evaluation.
 Range (min … max):  3.468 ms …  10.802 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.796 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.896 ms ± 305.271 μs  ┊ GC (mean ± σ):  2.44% ± 4.85%
                   ▅█▂                                         
  ▃▃▁▁▁▁▂▁▁▁▁▁▁▁▁▁▄███▅▄▃▃▃▃▁▁▂▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▂▃▃▃▃▄▃▃▂▂▂▂ ▂
  3.47 ms         Histogram: frequency by time         4.5 ms <
 Memory estimate: 7.64 MiB, allocs estimate: 4.

# specialized algorithm for tridiagonal matrix, O(n) flops
@benchmark $A \ $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  10.125 μs …  1.831 ms  ┊ GC (min … max): 0.00% … 98.21%
 Time  (median):     11.458 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.270 μs ± 26.361 μs  ┊ GC (mean ± σ):  3.64% ±  1.70%
      ▂█▇▁▃▃▁                                                  
  ▇▅▂▇███████▆█▅▅▃▃▂▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  10.1 μs         Histogram: frequency by time        19.8 μs <
 Memory estimate: 23.81 KiB, allocs estimate: 3.

1.3 Triangular matrix

Triangular \(\mathbf{A}\): \(n^2\) flops to solve linear system.

Random.seed!(123)

n = 1000
A = tril(randn(n, n)) # a lower-triangular matrix stored as Matrix{Float64}
b = randn(n)

# check istril() then triangular solve
@benchmark $A \ $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  299.917 μs … 552.833 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     322.125 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   333.401 μs ±  41.167 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
       █▇▅▄▃▂▂                                          ▁▁    ▁ ▂
  ▅▃▅▅▄█████████▇▅▆▆█▇▅▄▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅███████ █
  300 μs        Histogram: log(frequency) by time        525 μs <
 Memory estimate: 7.94 KiB, allocs estimate: 1.

# triangular solve directly; save the cost of istril()
@benchmark LowerTriangular($A) \ $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  91.750 μs … 305.833 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     93.125 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   94.271 μs ±   9.133 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
  ██▆▃▆▇▇▆▃▁  ▃▂▂▂▃▄▃▂▂▁     ▁▁   ▁                            ▃
  ██████████▇▆███████████████████████▇▇▆▇▇▇▆▆▆▆▇▆▇▆▇▆▆▄▆▁▅▆▄▃▅ █
  91.8 μs       Histogram: log(frequency) by time       109 μs <
 Memory estimate: 7.94 KiB, allocs estimate: 1.

1.4 Block diagonal matrix

Block diagonal: Suppose \(n = \sum_b n_b\). For linear equations, \((\sum_b n_b)^3\) (without using block diagonal structure) vs \(\sum_b n_b^3\) (using block diagonal structure).

Julia has a blockdiag function that generates a sparse matrix. Anyone interested writing a BlockDiagonal.jl package?

using SparseArrays

Random.seed!(123)

B  = 10 # number of blocks
ni = 100
A  = blockdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)

1000×1000 SparseMatrixCSC{Float64, Int64} with 975 stored entries:
⢶⡷⢬⡮⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣬⣿⣽⣿⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠈⠀⠁⠀⢻⣿⢿⣿⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⡿⣽⣳⡾⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠁⠁⢸⣬⣷⣟⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⢺⣿⣛⣲⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠈⠈⠁⠈⣝⢾⢻⣟⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣯⣫⣗⣮⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⢐⣟⣟⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣶⢫⣿⣽⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢾⠾⢿⣛⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣼⣺⠿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢈⣉⡾⣟⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⡞⣾⢿⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣎⣜⣿⣭⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢟⣭⡟⣇⣀⠀⣀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⣼⢾⣋⡿⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢿⣺⣯⡽⡀⢀⠀⡀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⣿⣷⢿⡵
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠾⡛⠼⡯

using UnicodePlots
spy(A)

         ┌──────────────────────────────────────────┐    
       1 │⢷⡷⢸⡽⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ > 0
         │⣼⣾⣽⣿⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ < 0
         │⠉⠀⠁⠀⢽⣿⢶⣷⠆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⢿⢯⡷⣴⠅⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠈⠉⠉⠁⢀⣦⣶⣴⠲⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⢚⣷⡳⠯⣗⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠐⠉⠛⠐⠓⣢⡶⢲⡶⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠷⣗⣋⡛⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⠛⠛⠋⠋⠤⡦⢴⡢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣂⠺⡿⣽⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⠓⠘⠓⠛⣠⣠⢄⢤⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹⣹⣽⢷⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠾⠪⠁⠳⣇⢀⣤⢤⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢤⣸⠻⣮⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠺⠗⠹⠟⠇⡀⣀⣠⣀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣚⣴⣽⢷⡶⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⠥⢾⠟⠄⣀⡀⣀⢀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢈⡻⢼⣭⡟⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠨⢿⡯⣿⠷⡀⢀⡀⢀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⣿⣷⢻⡳│    
   1 000 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢰⠾⡲⡽⣿│    
         └──────────────────────────────────────────┘    
         ⠀1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀1 000⠀    
         ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀975 ≠ 0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

1.5 Kronecker product

Use \[ \begin{eqnarray*} (\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\ (\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}) \\ \text{det}(\mathbf{A} \otimes \mathbf{B}) &=& [\text{det}(\mathbf{A})]^p [\text{det}(\mathbf{B})]^m, \quad \mathbf{A} \in \mathbb{R}^{m \times m}, \mathbf{B} \in \mathbb{R}^{p \times p} \end{eqnarray*} \] to avoid forming and doing costly computation on the potentially huge Kronecker \(\mathbf{A} \otimes \mathbf{B}\).

Anyone interested writing a package?

using MatrixDepot, LinearAlgebra

A = matrixdepot("lehmer", 50) # a pd matrix

[ Info: verify download of index files...
[ Info: reading database
[ Info: adding metadata...
[ Info: adding svd data...
[ Info: writing database
[ Info: used remote sites are sparse.tamu.edu with MAT index and math.nist.gov with HTML index

50×50 Matrix{Float64}:
 1.0        0.5        0.333333   0.25       …  0.0208333  0.0204082  0.02
 0.5        1.0        0.666667   0.5           0.0416667  0.0408163  0.04
 0.333333   0.666667   1.0        0.75          0.0625     0.0612245  0.06
 0.25       0.5        0.75       1.0           0.0833333  0.0816327  0.08
 0.2        0.4        0.6        0.8           0.104167   0.102041   0.1
 0.166667   0.333333   0.5        0.666667   …  0.125      0.122449   0.12
 0.142857   0.285714   0.428571   0.571429      0.145833   0.142857   0.14
 0.125      0.25       0.375      0.5           0.166667   0.163265   0.16
 0.111111   0.222222   0.333333   0.444444      0.1875     0.183673   0.18
 0.1        0.2        0.3        0.4           0.208333   0.204082   0.2
 0.0909091  0.181818   0.272727   0.363636   …  0.229167   0.22449    0.22
 0.0833333  0.166667   0.25       0.333333      0.25       0.244898   0.24
 0.0769231  0.153846   0.230769   0.307692      0.270833   0.265306   0.26
 ⋮                                           ⋱                        
 0.025641   0.0512821  0.0769231  0.102564      0.8125     0.795918   0.78
 0.025      0.05       0.075      0.1           0.833333   0.816327   0.8
 0.0243902  0.0487805  0.0731707  0.097561   …  0.854167   0.836735   0.82
 0.0238095  0.047619   0.0714286  0.0952381     0.875      0.857143   0.84
 0.0232558  0.0465116  0.0697674  0.0930233     0.895833   0.877551   0.86
 0.0227273  0.0454545  0.0681818  0.0909091     0.916667   0.897959   0.88
 0.0222222  0.0444444  0.0666667  0.0888889     0.9375     0.918367   0.9
 0.0217391  0.0434783  0.0652174  0.0869565  …  0.958333   0.938776   0.92
 0.0212766  0.0425532  0.0638298  0.0851064     0.979167   0.959184   0.94
 0.0208333  0.0416667  0.0625     0.0833333     1.0        0.979592   0.96
 0.0204082  0.0408163  0.0612245  0.0816327     0.979592   1.0        0.98
 0.02       0.04       0.06       0.08          0.96       0.98       1.0

B = matrixdepot("oscillate", 100) # pd matrix

100×100 Matrix{Float64}:
  0.539265      0.403989     -0.00217105   …   4.34736e-15  -2.06948e-14
  0.403989      0.531479      0.00497288      -4.47758e-15   2.13147e-14
 -0.00217105    0.00497288    0.660112         1.55726e-14  -7.41304e-14
  9.57703e-5   -0.000206496   0.0413618       -1.6975e-13    8.08067e-13
  7.50775e-6    0.000158391  -0.00736855       1.77959e-13  -8.47144e-13
  0.000158924  -8.79789e-6   -0.00523002   …  -2.20985e-13   1.05196e-12
 -1.92377e-5    2.14476e-5   -0.000180051      7.96194e-12  -3.79014e-11
  9.91107e-6   -1.07804e-5    0.000129376     -4.44381e-12   2.1154e-11
 -1.50826e-5    1.42813e-5    0.000145861      6.83341e-12  -3.25292e-11
  6.37565e-6   -5.92068e-6   -7.21201e-5      -4.05359e-12   1.92964e-11
  1.49406e-6   -1.03353e-6   -7.66895e-5   …   2.01848e-11  -9.60863e-11
 -8.76766e-7    8.08713e-7    1.27876e-5      -8.7208e-12    4.15138e-11
  3.24596e-6   -3.30136e-6    2.49045e-6       3.11066e-11  -1.48077e-10
  ⋮                                        ⋱                
  6.56008e-15  -6.75658e-15   2.35002e-14     -1.71847e-5    8.32939e-5
 -1.41705e-14   1.4595e-14   -5.07629e-14      4.02757e-5   -0.000189988
  1.29533e-13  -1.33413e-13   4.64024e-13  …  -0.000369986   0.00175252
 -7.28656e-15   7.50482e-15  -2.61026e-14      1.87337e-5   -0.000117048
  3.42848e-15  -3.53118e-15   1.22818e-14     -9.42239e-6    8.54268e-5
 -1.51144e-15   1.55671e-15  -5.41433e-15      9.28256e-7   -9.16845e-5
  1.40856e-15  -1.45075e-15   5.04573e-15     -6.35794e-5    0.000138679
 -2.11535e-15   2.17871e-15  -7.57738e-15  …   0.000251782  -0.000405406
  3.4777e-15   -3.58187e-15   1.24574e-14     -0.000439703   0.000823466
 -6.81829e-14   7.02252e-14  -2.44236e-13      0.0177542    -0.0172616
  4.34736e-15  -4.47758e-15   1.55726e-14      0.627671      0.0664583
 -2.06948e-14   2.13147e-14  -7.41304e-14      0.0664583     0.103314

M = kron(A, B)
c = ones(size(M, 2)) # rhs
# Method 1: form Kronecker product and Cholesky solve
x1 = cholesky(Symmetric(M)) \ c;

# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
m, p = size(A, 1), size(B, 1)
x2 = vec(transpose(cholesky(Symmetric(A)) \ 
    transpose(cholesky(Symmetric(B)) \ reshape(c, p, m))));

# relative error
norm(x1 - x2) / norm(x1)

1.548434035267782e-7

using BenchmarkTools

# Method 1: form Kronecker and Cholesky solve
@benchmark cholesky(Symmetric(kron($A, $B))) \ c

BenchmarkTools.Trial: 29 samples with 1 evaluation.
 Range (min … max):  162.684 ms … 253.965 ms  ┊ GC (min … max): 0.22% … 35.39%
 Time  (median):     169.107 ms               ┊ GC (median):    2.80%
 Time  (mean ± σ):   176.389 ms ±  21.658 ms  ┊ GC (mean ± σ):  5.91% ±  8.13%
      █                                                          
  ▃▁▃▇█▁▁▁▁▁▃▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃ ▁
  163 ms           Histogram: frequency by time          254 ms <
 Memory estimate: 381.51 MiB, allocs estimate: 7.

# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
@benchmark vec(transpose(cholesky(Symmetric($A)) \ 
    transpose(cholesky(Symmetric($B)) \ reshape($c, p, m))))

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  68.666 μs …  34.489 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     75.833 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   88.548 μs ± 445.869 μs  ┊ GC (mean ± σ):  9.38% ± 2.97%
                ▂▅▇██▆▆▃▃ ▁                                     
  ▁▁▁▁▁▁▁▁▁▁▂▃▅████████████▇▇▆▅▅▅▄▃▃▃▃▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  68.7 μs         Histogram: frequency by time         89.9 μs <
 Memory estimate: 176.23 KiB, allocs estimate: 15.

1.6 Sparse matrix

Sparsity: sparse matrix decomposition or iterative method.

The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (Matrix package), MKL (sparse BLAS), … as much as possible.

using MatrixDepot

Random.seed!(123)

# a 7701-by-7701 sparse pd matrix
A = matrixdepot("wathen", 50)
# random generated rhs
b = randn(size(A, 1))
Afull = Matrix(A)
count(!iszero, A) / length(A) # sparsity

0.001994776158751544

using UnicodePlots
spy(A)

         ┌──────────────────────────────────────────┐    
       1 │⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ > 0
         │⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ < 0
         │⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀│    
         │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀│    
   7 701 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷│    
         └──────────────────────────────────────────┘    
         ⠀1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀7 701⠀    
         ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀118 301 ≠ 0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

1.6.1 Matrix-vector multiplication

# dense matrix-vector multiplication
@benchmark $Afull * $b

BenchmarkTools.Trial: 654 samples with 1 evaluation.
 Range (min … max):  7.450 ms …  11.020 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     7.611 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.647 ms ± 189.017 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
        ▁ ▁▁▂█▇▁▁                                              
  ▂▂▃▃▅▆█▇███████▆▅▅▄▆▇▆█▆▃▄▃▂▁▃▂▃▃▂▃▃▁▂▂▁▁▁▁▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▂ ▃
  7.45 ms         Histogram: frequency by time         8.2 ms <
 Memory estimate: 60.23 KiB, allocs estimate: 2.

# sparse matrix-vector multiplication
@benchmark $A * $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  62.250 μs …  12.169 ms  ┊ GC (min … max): 0.00% … 99.28%
 Time  (median):     66.834 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   69.565 μs ± 121.076 μs  ┊ GC (mean ± σ):  1.74% ±  0.99%
          ▆█▇▁                                                  
  ▁▁▁▁▁▁▂▆█████▆▅▄▄▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  62.2 μs         Histogram: frequency by time         85.9 μs <
 Memory estimate: 60.23 KiB, allocs estimate: 2.

1.6.2 Solve linear equation

# solve via dense Cholesky
xchol = cholesky(Symmetric(Afull)) \ b
@benchmark cholesky($(Symmetric(Afull))) \ $b

BenchmarkTools.Trial: 9 samples with 1 evaluation.
 Range (min … max):  490.664 ms … 694.548 ms  ┊ GC (min … max): 0.03% … 1.51%
 Time  (median):     542.714 ms               ┊ GC (median):    0.03%
 Time  (mean ± σ):   564.897 ms ±  78.646 ms  ┊ GC (mean ± σ):  0.73% ± 0.87%
  ▁█  ▁          ▁      ▁                    ▁     ▁          ▁  
  ██▁▁█▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁█ ▁
  491 ms           Histogram: frequency by time          695 ms <
 Memory estimate: 452.52 MiB, allocs estimate: 4.

# solve via sparse Cholesky
xcholsp = cholesky(Symmetric(A)) \ b
@show norm(xchol - xcholsp)
@benchmark cholesky($(Symmetric(A))) \ $b

norm(xchol - xcholsp) = 5.062481022858237e-15

BenchmarkTools.Trial: 755 samples with 1 evaluation.
 Range (min … max):  6.157 ms … 17.571 ms  ┊ GC (min … max): 0.00% … 1.90%
 Time  (median):     6.388 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.623 ms ±  1.075 ms  ┊ GC (mean ± σ):  0.14% ± 0.58%
  ▅█▇▅▃▂                                                      
  ██████▇▇▅▄▅▆▇▁▄▄▁▄▁▁▁▄▁▁▁▄▁▄▁▁▄▄▁▁▁▁▁▁▁▁▁▅▄▁▄▁▁▁▁▁▅▁▄▄▄▁▁▅ ▇
  6.16 ms      Histogram: log(frequency) by time     12.8 ms <
 Memory estimate: 12.43 MiB, allocs estimate: 64.

# sparse solve via conjugate gradient
using IterativeSolvers

xcg = cg(A, b)
@show norm(xcg - xchol)
@benchmark cg($A, $b)

norm(xcg - xchol) = 1.418530254611637e-7

BenchmarkTools.Trial: 251 samples with 1 evaluation.
 Range (min … max):  19.705 ms …  24.431 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.895 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.958 ms ± 413.344 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
          ▆▇▄▇ ▁▄█ ▅▃▂▁▁▁▂                                      
  ▄▅▆▆▆▆▇▇████████▇███████▅▄▆▆▃▃▃▄▅▆▁▅▁▁▄▄▁▃▃▃▁▁▁▃▁▃▁▁▁▃▁▁▃▁▁▆ ▄
  19.7 ms         Histogram: frequency by time         20.4 ms <
 Memory estimate: 241.64 KiB, allocs estimate: 16.

# sparse solve via preconditioned conjugate gradient
using Preconditioners

xpcg = cg(A, b, Pl = DiagonalPreconditioner(A))
@show norm(xpcg - xchol)
@benchmark cg($A, $b, Pl = $(DiagonalPreconditioner(A)))

norm(xpcg - xchol) = 8.656816762415348e-8

BenchmarkTools.Trial: 1493 samples with 1 evaluation.
 Range (min … max):  3.281 ms …  4.863 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.336 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.349 ms ± 75.892 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
           ▁▆▆███▅▂▄▃▁                                        
  ▂▂▂▃▃▄▄▆████████████▇▇▆▄▆▆▅▅▄▄▆▅▃▃▃▄▃▂▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▂▂ ▄
  3.28 ms        Histogram: frequency by time        3.49 ms <
 Memory estimate: 241.64 KiB, allocs estimate: 16.

1.7 Easy plus low rank

Easy plus low rank: \(\mathbf{U} \in \mathbb{R}^{n \times r}\), \(\mathbf{V} \in \mathbb{R}^{r \times n}\), \(r \ll n\). Woodbury formula \[\begin{eqnarray*} (\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} &=& \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1} \\ \text{det}(\mathbf{A} + \mathbf{U} \mathbf{V}^T) &=& \text{det}(\mathbf{A}) \text{det}(\mathbf{I}_r + \mathbf{V} \mathbf{A}^{-1} \mathbf{U}^T). \end{eqnarray*}\]

Keep HW3 (multivariate density) and HW4 (PageRank) problems in mind.
WoodburyMatrices.jl package can be useful.

using BenchmarkTools, Random, WoodburyMatrices

Random.seed!(123)
n = 1000
r = 5

A = Diagonal(rand(n))
B = randn(n, r)
D = Diagonal(rand(r))
b = randn(n)
# Woodbury structure: W = A + B * D * B'
W = SymWoodbury(A, B, D)
Wfull = Matrix(W) # stored as a Matrix{Float64}

1000×1000 Matrix{Float64}:
  4.59798    -0.281279   -2.22717   …   2.15423     0.381752   -0.563599
 -0.281279    3.46848     1.39267       1.68392    -2.2848      0.956956
 -2.22717     1.39267     4.81456       0.879207   -0.516142    1.95953
  0.995152    0.835961   -0.198421      1.22304    -0.534476   -0.158006
 -4.18483     0.83762     2.07489      -3.41192    -2.40336     0.12795
  1.2769      0.450253    1.94231   …   3.05973     1.06062     1.58223
  1.2693     -0.993521   -0.522544     -0.180006   -0.0885834   0.433461
  0.263329    1.33478     1.84111       2.02565    -0.281757    1.85271
  1.56504    -1.2659     -1.79125      -0.0459542   0.927798   -0.648945
 -1.7482      0.478207    2.26841      -1.72566    -2.54041     1.36365
  0.0224748   1.0181      0.580171  …   0.640478   -0.998439   -0.171846
 -2.50815     1.01533     1.43186      -0.950055   -0.619325    0.162349
 -0.192541   -2.14724    -2.73878      -2.82316     0.861588   -1.74536
  ⋮                                 ⋱                          
 -1.68931    -0.0964689   0.122875     -1.87988    -0.748008   -1.02043
 -0.733929   -1.38688    -1.87072      -2.05063     1.15057    -0.971421
  0.124628    2.41238     1.6913    …   2.44864    -1.01345     1.0207
  1.31426    -0.690013   -1.36949       0.387585    0.746292   -0.804991
 -0.156504    0.385906    0.589231      1.38713     1.44466     0.62347
  3.51625    -1.71361    -0.157013      3.01664     2.82686     2.12007
  3.06339    -0.905018   -0.865003      1.95288     0.854144    0.166709
  0.480269   -1.14852     1.09075   …   0.405336    1.02454     1.58487
  0.282433   -0.537644   -0.713335     -1.08749    -0.736468   -0.090602
  2.15423     1.68392     0.879207      3.87128     0.0305312   0.956922
  0.381752   -2.2848     -0.516142      0.0305312   3.77871    -0.351256
 -0.563599    0.956956    1.95953       0.956922   -0.351256    2.04652

# compares storage
Base.summarysize(W), Base.summarysize(Wfull)

(64720, 8000040)

1.7.1 Solve linear equation

# solve via Cholesky
@benchmark cholesky($(Symmetric(Wfull))) \ $b

BenchmarkTools.Trial: 1837 samples with 1 evaluation.
 Range (min … max):  2.235 ms …  12.204 ms  ┊ GC (min … max): 0.00% … 78.08%
 Time  (median):     2.582 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.721 ms ± 975.335 μs  ┊ GC (mean ± σ):  4.71% ±  9.61%
    █▅                                                         
  ▇▃██▅▃▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▅ █
  2.23 ms      Histogram: log(frequency) by time      9.87 ms <
 Memory estimate: 7.64 MiB, allocs estimate: 3.

# solve using Woodbury formula
@benchmark $W \ reshape($b, n, 1) # hack; need to file an issue

BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.653 μs …  3.653 ms  ┊ GC (min … max):  0.00% … 99.71%
 Time  (median):     7.611 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   9.206 μs ± 69.443 μs  ┊ GC (mean ± σ):  15.07% ±  1.99%
                         ▁██▃                                 
  ▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▃▃▄▄▃▃▄█████▆▄▄▄▅▅▄▄▄▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  5.65 μs        Histogram: frequency by time        9.97 μs <
 Memory estimate: 32.03 KiB, allocs estimate: 8.

1.7.2 Matrix-vector multiplication

# multiplication without using Woodbury structure
@benchmark $Wfull * $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  114.125 μs …  10.155 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     157.625 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   163.001 μs ± 141.497 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
        ▁▂▁▁    ▃▄▇▇█▆▂  ▂▂▂▃▂▂                         ▁▁      ▂
  ▃▄▄▄▃▃███████████████▇▇███████▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇████▇██ █
  114 μs        Histogram: log(frequency) by time        260 μs <
 Memory estimate: 7.94 KiB, allocs estimate: 1.

# multiplication using Woodbury structure
@benchmark $W * $b

BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.190 μs …  2.203 ms  ┊ GC (min … max): 0.00% … 99.78%
 Time  (median):     2.579 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.852 μs ± 22.005 μs  ┊ GC (mean ± σ):  7.71% ±  1.00%
         ▁▄▄▆██▇███▆▅▄▄▂▂▁▁▁▁▁▁▂▁▁▁▁▁         ▁▁ ▁           ▃
  ▆▆▆▆▆▇▇██████████████████████████████▆▇▆▇▇▆██████▇▇▅▇▅▅▅▆▆ █
  2.19 μs      Histogram: log(frequency) by time     3.71 μs <
 Memory estimate: 7.94 KiB, allocs estimate: 1.

1.7.3 Determinant

# determinant without using Woodbury structure
@benchmark det($Wfull)

BenchmarkTools.Trial: 1364 samples with 1 evaluation.
 Range (min … max):  3.216 ms …  10.901 ms  ┊ GC (min … max): 0.00% … 66.40%
 Time  (median):     3.547 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.665 ms ± 861.235 μs  ┊ GC (mean ± σ):  3.10% ±  8.42%
    █▆                                                         
  ▇▁██▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▄▅ ▇
  3.22 ms      Histogram: log(frequency) by time        10 ms <
 Memory estimate: 7.64 MiB, allocs estimate: 3.

# determinant using Woodbury structure
@benchmark det($W)

BenchmarkTools.Trial: 10000 samples with 197 evaluations.
 Range (min … max):  459.391 ns …  91.162 μs  ┊ GC (min … max): 0.00% … 99.38%
 Time  (median):     484.772 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   500.909 ns ± 906.955 ns  ┊ GC (mean ± σ):  1.81% ±  0.99%
           ▅█▇▇▇▆▂▁                                              
  ▄▆▄▁▁▁▁▂▇████████▇▆▅▅▄▄▅▄▅▅▄▃▃▃▃▃▂▃▂▃▂▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁ ▃
  459 ns           Histogram: frequency by time          564 ns <
 Memory estimate: 352 bytes, allocs estimate: 2.

1.8 Easy plus border

Easy plus border: For \(\mathbf{A}\) pd and \(\mathbf{V}\) full row rank, \[ \begin{pmatrix} \mathbf{A} & \mathbf{V}^T \\ \mathbf{V} & \mathbf{0} \end{pmatrix}^{-1} = \begin{pmatrix} \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\ (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \end{pmatrix}. \] Anyone interested writing a package?

1.9 Orthogonal matrix

Orthogonal \(\mathbf{A}\): \(n^2\) flops at most. Why? Permutation matrix, Householder matrix, Jacobi matrix, … take less.

1.10 Toeplitz matrix

Toeplitz systems (constant diagonals): \[ \mathbf{T} = \begin{pmatrix} r_0 & r_1 & r_2 & r_3 \\ r_{-1} & r_0 & r_1 & r_2 \\ r_{-2} & r_{-1} & r_0 & r_1 \\ r_{-3} & r_{-2} & r_{-1} & r_0 \end{pmatrix}. \] \(\mathbf{T} \mathbf{x} = \mathbf{b}\), where \(\mathbf{T}\) is pd and Toeplitz, can be solved in \(O(n^2)\) flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general \(\mathbf{b}\)), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.

ToeplitzMatrices.jl package can be useful.

1.11 Circulant matrix

Circulant systems: Toeplitz matrix with wraparound \[ C(\mathbf{z}) = \begin{pmatrix} z_0 & z_4 & z_3 & z_2 & z_1 \\ z_1 & z_0 & z_4 & z_3 & z_2 \\ z_2 & z_1 & z_0 & z_4 & z_3 \\ z_3 & z_2 & z_1 & z_0 & z_4 \\ z_4 & z_3 & z_2 & z_1 & z_0 \end{pmatrix}, \] FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform).

1.12 Vandermonde matrix

Vandermonde matrix: such as in interpolation and approximation problems \[ \mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_0 & x_1 & \cdots & x_n \\ \vdots & \vdots & & \vdots \\ x_0^n & x_1^n & \cdots & x_n^n \end{pmatrix}. \] \(\mathbf{V} \mathbf{x} = \mathbf{b}\) or \(\mathbf{V}^T \mathbf{x} = \mathbf{b}\) can be solved in \(O(n^2)\) flops.

1.13 Cauchy-like matrix

Cauchy-like matrices: \[ \Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T, \] where \(\Omega = \text{diag}(\omega_1,\ldots,\omega_n)\) and \(\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)\). \(O(n)\) flops for LU and QR.

1.14 Structured-rank matrix

Structured-rank problems: semiseparable matrices (LU and QR takes \(O(n)\) flops), quasiseparable matrices, …