A list of "easy" linear systems

Consider $\mathbf{A} \mathbf{x} = \mathbf{b}$, $\mathbf{A} \in \mathbb{R}^{n \times n}$. Or, consider matrix inverse (if you want). $\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems.

Don't blindly use A \ b and inv in Julia or solve function in R. Don't waste computing resources by bad choices of algorithms!

In [1]:
versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

Diagonal matrix

Diagonal $\mathbf{A}$: $n$ flops. Use Diagonal type of Julia.

In [2]:
using BenchmarkTools, LinearAlgebra, Random

# generate random data
Random.seed!(123)
n = 1000
A = diagm(0 => randn(n)) # a diagonal matrix stored as Matrix{Float64}
b = randn(n);
In [3]:
# should give link: https://github.com/JuliaLang/julia/blob/5b637df34396034b0dd353e603ab3d61322369fb/stdlib/LinearAlgebra/src/generic.jl#L956
@which A \ b
In [4]:
# check `istril(A)` and `istriu(A)` (O(n^2)), then call `Diagonal(A) \ b` (O(n))
@benchmark $A \ $b
Out[4]:
BenchmarkTools.Trial: 4622 samples with 1 evaluation.
 Range (minmax):  977.440 μs 1.912 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):       1.052 ms               GC (median):    0.00%
 Time  (mean ± σ):     1.076 ms ± 86.446 μs   GC (mean ± σ):  0.00% ± 0.00%

    ▁▃▁▂▇█▅▃                                                 
  ▃▅██████████▇▆▅▆▅▄▅▄▄▄▃▃▃▃▃▃▃▂▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂ ▄
  977 μs          Histogram: frequency by time         1.45 ms <

 Memory estimate: 15.88 KiB, allocs estimate: 2.
In [5]:
# O(n) computation, no extra array allocation
@benchmark Diagonal($A) \ $b
Out[5]:
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (minmax):  2.802 μs 1.145 ms   GC (min … max):  0.00% … 99.34%
 Time  (median):     3.769 μs               GC (median):     0.00%
 Time  (mean ± σ):   7.243 μs ± 47.692 μs   GC (mean ± σ):  35.48% ±  5.41%

  ▅▅▁  █▄▃▃▄▃▁                        ▁▂▂▂▂▂▁▁▁▁▁           ▂
  ███▇█████████▇██▇▅▆▇▆▅▅▅▇▄▅▄▆▇▅▄▅▇█████████████▇▇▇▆▆▅▄▅▅ █
  2.8 μs       Histogram: log(frequency) by time     11.7 μs <

 Memory estimate: 15.88 KiB, allocs estimate: 2.

Bidiagonal, tridiagonal, and banded matrices

Bidiagonal, tridiagonal, or banded $\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops.

In [6]:
Random.seed!(123) 

n  = 1000
dv = randn(n)
ev = randn(n - 1)
b  = randn(n) # rhs
# symmetric tridiagonal matrix
A  = SymTridiagonal(dv, ev)
Out[6]:
1000×1000 SymTridiagonal{Float64, Vector{Float64}}:
 0.808288   0.598212    ⋅         ⋅        …    ⋅           ⋅          ⋅ 
 0.598212  -1.12207   -1.00753    ⋅             ⋅           ⋅          ⋅ 
  ⋅        -1.00753   -1.10464   1.97923        ⋅           ⋅          ⋅ 
  ⋅          ⋅         1.97923  -0.416993       ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅       -0.34622        ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
 ⋮                                         ⋱                         
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅            1.18201      ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅           -0.0625319  -0.271376    ⋅ 
  ⋅          ⋅          ⋅         ⋅           -0.271376   -1.25053    1.49717
  ⋅          ⋅          ⋅         ⋅             ⋅          1.49717   -0.34637
In [7]:
# convert to a full matrix
Afull = Matrix(A)

# LU decomposition (2/3) n^3 flops!
@benchmark $Afull \ $b
Out[7]:
BenchmarkTools.Trial: 408 samples with 1 evaluation.
 Range (minmax):   8.896 ms21.030 ms   GC (min … max): 0.00% … 17.63%
 Time  (median):     12.052 ms               GC (median):    0.00%
 Time  (mean ± σ):   12.241 ms ±  2.155 ms   GC (mean ± σ):  4.05% ±  7.74%

     █▇▁▅▄ ▇▅     ▁ ▃▇▄ ▁▃ ▁  ▃▄ ▁                           
  ▆▃▇████████▆▅▆█▇█████▆████▆▇██▆██▅▇▄▆▅▅▄▃▅▄▃▃▄▃▃▃▃▃▃▁▁▁▃▃ ▅
  8.9 ms          Histogram: frequency by time        18.3 ms <

 Memory estimate: 7.64 MiB, allocs estimate: 4.
In [8]:
# specialized algorithm for tridiagonal matrix, O(n) flops
@benchmark $A \ $b
Out[8]:
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (minmax):  13.607 μs  8.883 ms   GC (min … max):  0.00% … 99.38%
 Time  (median):     16.601 μs                GC (median):     0.00%
 Time  (mean ± σ):   19.552 μs ± 125.824 μs   GC (mean ± σ):  10.91% ±  1.72%

         ▃██▂                                                   
  ▂▃▅▄▃▂▆████▇▇▆▅▅▅▅▆▆▇▇▆▅▄▅▅▄▃▂▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂ ▃
  13.6 μs         Histogram: frequency by time         27.8 μs <

 Memory estimate: 23.81 KiB, allocs estimate: 3.

Triangular matrix

Triangular $\mathbf{A}$: $n^2$ flops to solve linear system.

In [9]:
Random.seed!(123)

n = 1000
A = tril(randn(n, n)) # a lower-triangular matrix stored as Matrix{Float64}
b = randn(n)

# check istril() then triangular solve
@benchmark $A \ $b
Out[9]:
BenchmarkTools.Trial: 6573 samples with 1 evaluation.
 Range (minmax):  597.896 μs  5.014 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     703.333 μs                GC (median):    0.00%
 Time  (mean ± σ):   753.879 μs ± 161.659 μs   GC (mean ± σ):  0.00% ± 0.00%

    ▃▄█▅▅▃▁                                                      
  ▃██████████▇▆▅▆▄▃▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▁▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁ ▃
  598 μs           Histogram: frequency by time         1.31 ms <

 Memory estimate: 7.94 KiB, allocs estimate: 1.
In [10]:
# triangular solve directly; save the cost of istril()
@benchmark LowerTriangular($A) \ $b
Out[10]:
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (minmax):   96.218 μs751.130 μs   GC (min … max): 0.00% … 0.00%
 Time  (median):     118.454 μs                GC (median):    0.00%
 Time  (mean ± σ):   133.487 μs ±  46.246 μs   GC (mean ± σ):  0.00% ± 0.00%

   ▄▂█▅▃                                                        
  ██████▇▆▆▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  96.2 μs          Histogram: frequency by time          336 μs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

Block diagonal matrix

Block diagonal: Suppose $n = \sum_b n_b$. For linear equations, $(\sum_b n_b)^3$ (without using block diagonal structure) vs $\sum_b n_b^3$ (using block diagonal structure).

Julia has a blockdiag function that generates a sparse matrix. Anyone interested writing a BlockDiagonal.jl package?

In [11]:
using SparseArrays

Random.seed!(123)

B  = 10 # number of blocks
ni = 100
A  = blockdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)
Out[11]:
1000×1000 SparseMatrixCSC{Float64, Int64} with 975 stored entries:
⢶⡷⢬⡮⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣬⣿⣽⣿⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠈⠀⠁⠀⢻⣿⢿⣿⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⡿⣽⣳⡾⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠁⠁⢸⣬⣷⣟⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⢺⣿⣛⣲⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠈⠈⠁⠈⣝⢾⢻⣟⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣯⣫⣗⣮⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⢐⣟⣟⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣶⢫⣿⣽⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢾⠾⢿⣛⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣼⣺⠿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢈⣉⡾⣟⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⡞⣾⢿⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣎⣜⣿⣭⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢟⣭⡟⣇⣀⠀⣀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⣼⢾⣋⡿⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢿⣺⣯⡽⡀⢀⠀⡀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⣿⣷⢿⡵
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠾⡛⠼⡯
In [12]:
using UnicodePlots
spy(A)
Out[12]:
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀Sparsity Pattern⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ┌──────────────────────────────────────────┐    
      1 ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ > 0
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ < 0
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
   1000 ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        └──────────────────────────────────────────┘1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀1000⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀975 nonzeros⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    

Kronecker product

Use $$ \begin{eqnarray*} (\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\ (\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}) \\ \text{det}(\mathbf{A} \otimes \mathbf{B}) &=& [\text{det}(\mathbf{A})]^p [\text{det}(\mathbf{B})]^m, \quad \mathbf{A} \in \mathbb{R}^{m \times m}, \mathbf{B} \in \mathbb{R}^{p \times p} \end{eqnarray*} $$ to avoid forming and doing costly computation on the potentially huge Kronecker $\mathbf{A} \otimes \mathbf{B}$.

Anyone interested writing a package?

In [13]:
using MatrixDepot, LinearAlgebra

A = matrixdepot("lehmer", 50) # a pd matrix
┌ Info: verify download of index files...
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/MatrixDepot.jl:139
┌ Info: reading database
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/download.jl:23
┌ Info: adding metadata...
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/download.jl:67
┌ Info: adding svd data...
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/download.jl:69
┌ Info: writing database
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/download.jl:74
┌ Info: used remote sites are sparse.tamu.edu with MAT index and math.nist.gov with HTML index
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/MatrixDepot.jl:141
Out[13]:
50×50 Matrix{Float64}:
 1.0        0.5        0.333333   0.25       …  0.0208333  0.0204082  0.02
 0.5        1.0        0.666667   0.5           0.0416667  0.0408163  0.04
 0.333333   0.666667   1.0        0.75          0.0625     0.0612245  0.06
 0.25       0.5        0.75       1.0           0.0833333  0.0816327  0.08
 0.2        0.4        0.6        0.8           0.104167   0.102041   0.1
 0.166667   0.333333   0.5        0.666667   …  0.125      0.122449   0.12
 0.142857   0.285714   0.428571   0.571429      0.145833   0.142857   0.14
 0.125      0.25       0.375      0.5           0.166667   0.163265   0.16
 0.111111   0.222222   0.333333   0.444444      0.1875     0.183673   0.18
 0.1        0.2        0.3        0.4           0.208333   0.204082   0.2
 0.0909091  0.181818   0.272727   0.363636   …  0.229167   0.22449    0.22
 0.0833333  0.166667   0.25       0.333333      0.25       0.244898   0.24
 0.0769231  0.153846   0.230769   0.307692      0.270833   0.265306   0.26
 ⋮                                           ⋱                        
 0.025641   0.0512821  0.0769231  0.102564      0.8125     0.795918   0.78
 0.025      0.05       0.075      0.1           0.833333   0.816327   0.8
 0.0243902  0.0487805  0.0731707  0.097561   …  0.854167   0.836735   0.82
 0.0238095  0.047619   0.0714286  0.0952381     0.875      0.857143   0.84
 0.0232558  0.0465116  0.0697674  0.0930233     0.895833   0.877551   0.86
 0.0227273  0.0454545  0.0681818  0.0909091     0.916667   0.897959   0.88
 0.0222222  0.0444444  0.0666667  0.0888889     0.9375     0.918367   0.9
 0.0217391  0.0434783  0.0652174  0.0869565  …  0.958333   0.938776   0.92
 0.0212766  0.0425532  0.0638298  0.0851064     0.979167   0.959184   0.94
 0.0208333  0.0416667  0.0625     0.0833333     1.0        0.979592   0.96
 0.0204082  0.0408163  0.0612245  0.0816327     0.979592   1.0        0.98
 0.02       0.04       0.06       0.08          0.96       0.98       1.0
In [14]:
B = matrixdepot("oscillate", 100) # pd matrix
Out[14]:
100×100 Matrix{Float64}:
  0.539265      0.403989     -0.00217105   …   4.34736e-15  -2.06948e-14
  0.403989      0.531479      0.00497288      -4.47758e-15   2.13147e-14
 -0.00217105    0.00497288    0.660112         1.55726e-14  -7.41304e-14
  9.57703e-5   -0.000206496   0.0413618       -1.6975e-13    8.08067e-13
  7.50775e-6    0.000158391  -0.00736855       1.77959e-13  -8.47144e-13
  0.000158924  -8.79789e-6   -0.00523002   …  -2.20985e-13   1.05196e-12
 -1.92377e-5    2.14476e-5   -0.000180051      7.96194e-12  -3.79014e-11
  9.91107e-6   -1.07804e-5    0.000129376     -4.44381e-12   2.1154e-11
 -1.50826e-5    1.42813e-5    0.000145861      6.83341e-12  -3.25292e-11
  6.37565e-6   -5.92068e-6   -7.21201e-5      -4.05359e-12   1.92964e-11
  1.49406e-6   -1.03353e-6   -7.66895e-5   …   2.01848e-11  -9.60863e-11
 -8.76766e-7    8.08713e-7    1.27876e-5      -8.7208e-12    4.15138e-11
  3.24596e-6   -3.30136e-6    2.49045e-6       3.11066e-11  -1.48077e-10
  ⋮                                        ⋱                
  6.56008e-15  -6.75658e-15   2.35002e-14     -1.71847e-5    8.32939e-5
 -1.41705e-14   1.4595e-14   -5.07629e-14      4.02757e-5   -0.000189988
  1.29533e-13  -1.33413e-13   4.64024e-13  …  -0.000369986   0.00175252
 -7.28656e-15   7.50482e-15  -2.61026e-14      1.87337e-5   -0.000117048
  3.42848e-15  -3.53118e-15   1.22818e-14     -9.42239e-6    8.54268e-5
 -1.51144e-15   1.55671e-15  -5.41433e-15      9.28256e-7   -9.16845e-5
  1.40856e-15  -1.45075e-15   5.04573e-15     -6.35794e-5    0.000138679
 -2.11535e-15   2.17871e-15  -7.57738e-15  …   0.000251782  -0.000405406
  3.4777e-15   -3.58187e-15   1.24574e-14     -0.000439703   0.000823466
 -6.81829e-14   7.02252e-14  -2.44236e-13      0.0177542    -0.0172616
  4.34736e-15  -4.47758e-15   1.55726e-14      0.627671      0.0664583
 -2.06948e-14   2.13147e-14  -7.41304e-14      0.0664583     0.103314
In [15]:
M = kron(A, B)
c = ones(size(M, 2)) # rhs
# Method 1: form Kronecker product and Cholesky solve
x1 = cholesky(Symmetric(M)) \ c;
In [16]:
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
m, p = size(A, 1), size(B, 1)
x2 = vec(transpose(cholesky(Symmetric(A)) \ 
    transpose(cholesky(Symmetric(B)) \ reshape(c, p, m))));
In [17]:
# relative error
norm(x1 - x2) / norm(x1)
Out[17]:
1.3222136567513794e-7
In [18]:
using BenchmarkTools

# Method 1: form Kronecker and Cholesky solve
@benchmark cholesky(Symmetric(kron($A, $B))) \ c
Out[18]:
BenchmarkTools.Trial: 7 samples with 1 evaluation.
 Range (minmax):  632.106 ms877.267 ms   GC (min … max): 0.14% … 19.49%
 Time  (median):     703.717 ms                GC (median):    5.37%
 Time  (mean ± σ):   739.803 ms ±  91.636 ms   GC (mean ± σ):  9.22% ±  8.19%

  █        █ █             █       █            █  
  █▁▁▁▁▁▁▁▁█▁█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  632 ms           Histogram: frequency by time          877 ms <

 Memory estimate: 381.51 MiB, allocs estimate: 9.
In [19]:
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
@benchmark vec(transpose(cholesky(Symmetric($A)) \ 
    transpose(cholesky(Symmetric($B)) \ reshape($c, p, m))))
Out[19]:
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (minmax):  120.861 μs 31.357 ms   GC (min … max):  0.00% … 98.96%
 Time  (median):     207.064 μs                GC (median):     0.00%
 Time  (mean ± σ):   269.288 μs ± 870.394 μs   GC (mean ± σ):  10.59% ±  3.28%

            █▆                                                   
  ▁▁▁▁▁▁▁▁▁▂██▇▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  121 μs           Histogram: frequency by time          532 μs <

 Memory estimate: 176.33 KiB, allocs estimate: 19.

Sparse matrix

Sparsity: sparse matrix decomposition or iterative method.

  • The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (Matrix package), MKL (sparse BLAS), ... as much as possible.
In [20]:
using MatrixDepot

Random.seed!(123)

# a 7701-by-7701 sparse pd matrix
A = matrixdepot("wathen", 50)
# random generated rhs
b = randn(size(A, 1))
Afull = Matrix(A)
count(!iszero, A) / length(A) # sparsity
Out[20]:
0.001994776158751544
In [21]:
using UnicodePlots
spy(A)
Out[21]:
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀Sparsity Pattern⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ┌──────────────────────────────────────────┐    
      1 ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ > 0
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ < 0
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
   7701 ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        └──────────────────────────────────────────┘1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀7701⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀118301 nonzeros⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    

Matrix-vector multiplication

In [22]:
# dense matrix-vector multiplication
@benchmark $Afull * $b
Out[22]:
BenchmarkTools.Trial: 181 samples with 1 evaluation.
 Range (minmax):  23.293 ms43.998 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     26.031 ms               GC (median):    0.00%
 Time  (mean ± σ):   27.628 ms ±  4.264 ms   GC (mean ± σ):  0.00% ± 0.00%

     ▄▂ ▆▅▆█                                              
  ▅▃███▇██████▆▆▆▆▄▄▁▅▁▄▃▃▁▁▁▁▁▁▃▃▁▁▁▄▄▁▁▃▄▅▃▃▃▁▃▁▃▄▁▃▄▁▃▁▄ ▃
  23.3 ms         Histogram: frequency by time        40.1 ms <

 Memory estimate: 60.23 KiB, allocs estimate: 2.
In [23]:
# sparse matrix-vector multiplication
@benchmark $A * $b
Out[23]:
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (minmax):  110.667 μs 50.684 ms   GC (min … max): 0.00% … 99.57%
 Time  (median):     132.206 μs                GC (median):    0.00%
 Time  (mean ± σ):   142.435 μs ± 505.685 μs   GC (mean ± σ):  3.54% ±  1.00%

            ▃▆██▆▆▅▃▃▂▂▃▃▂▂▁▁▁▁    ▁▁                         ▂
  ▃▄▅▅▃▃▁▁▁▅██████████████████████████████▇▇▇▇▇█▇▇▇▇▅▅▆▆▆▆▆▆▆ █
  111 μs        Histogram: log(frequency) by time        206 μs <

 Memory estimate: 60.23 KiB, allocs estimate: 2.

Solve linear equation

In [24]:
# solve via dense Cholesky
xchol = cholesky(Symmetric(Afull)) \ b
@benchmark cholesky($(Symmetric(Afull))) \ $b
Out[24]:
BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (minmax):  1.624 s  1.677 s   GC (min … max): 0.03% … 3.15%
 Time  (median):     1.656 s               GC (median):    0.81%
 Time  (mean ± σ):   1.653 s ± 24.347 ms   GC (mean ± σ):  1.20% ± 1.50%

  █                                              █       █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█ ▁
  1.62 s         Histogram: frequency by time        1.68 s <

 Memory estimate: 452.52 MiB, allocs estimate: 6.
In [25]:
# solve via sparse Cholesky
xcholsp = cholesky(Symmetric(A)) \ b
@show norm(xchol - xcholsp)
@benchmark cholesky($(Symmetric(A))) \ $b
norm(xchol - xcholsp) = 5.0342422076653274e-15
Out[25]:
BenchmarkTools.Trial: 316 samples with 1 evaluation.
 Range (minmax):  12.786 ms35.710 ms   GC (min … max): 0.00% … 2.75%
 Time  (median):     15.258 ms               GC (median):    0.00%
 Time  (mean ± σ):   15.840 ms ±  2.563 ms   GC (mean ± σ):  0.15% ± 0.48%

        ▂█▁▁▁                                               
  ▇▆▆▄▁▄███████▆█▁▄▄▄▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▄▁▁▁▁▁▁▁▄▄▄ ▆
  12.8 ms      Histogram: log(frequency) by time      30.2 ms <

 Memory estimate: 12.43 MiB, allocs estimate: 64.
In [26]:
# sparse solve via conjugate gradient
using IterativeSolvers

xcg = cg(A, b)
@show norm(xcg - xchol)
@benchmark cg($A, $b)
norm(xcg - xchol) = 1.418440389790897e-7
Out[26]:
BenchmarkTools.Trial: 159 samples with 1 evaluation.
 Range (minmax):  29.333 ms37.392 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     30.911 ms               GC (median):    0.00%
 Time  (mean ± σ):   31.600 ms ±  1.834 ms   GC (mean ± σ):  0.00% ± 0.00%

  ▂ ▂▃▃▃▆▆▂▃ █                         ▃                
  ████████████▇▇▇█▅▇▄▅▅▁▅▅▅█▅▅▅▁▅▅▅▅▁▄▄▇█▄▇▇█▁▁▄▄▄▁▁▅▁▁▄▁▁▄ ▄
  29.3 ms         Histogram: frequency by time        36.1 ms <

 Memory estimate: 241.64 KiB, allocs estimate: 16.

Easy plus low rank

Easy plus low rank: $\mathbf{U} \in \mathbb{R}^{n \times r}$, $\mathbf{V} \in \mathbb{R}^{r \times n}$, $r \ll n$. Woodbury formula \begin{eqnarray*} (\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} &=& \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1} \\ \text{det}(\mathbf{A} + \mathbf{U} \mathbf{V}^T) &=& \text{det}(\mathbf{A}) \text{det}(\mathbf{I}_r + \mathbf{V} \mathbf{A}^{-1} \mathbf{U}^T). \end{eqnarray*}

  • Keep HW2 Q2 (multivariate density) and PageRank problem in mind.
  • WoodburyMatrices.jl package can be useful.
In [27]:
using BenchmarkTools, Random, WoodburyMatrices

Random.seed!(123)
n = 1000
r = 5

A = Diagonal(rand(n))
B = randn(n, r)
D = Diagonal(rand(r))
b = randn(n)
# Woodbury structure: W = A + B * D * B'
W = SymWoodbury(A, B, D)
Wfull = Matrix(W) # stored as a Matrix{Float64}
Out[27]:
1000×1000 Matrix{Float64}:
  4.59798    -0.281279   -2.22717   …   2.15423     0.381752   -0.563599
 -0.281279    3.46848     1.39267       1.68392    -2.2848      0.956956
 -2.22717     1.39267     4.81456       0.879207   -0.516142    1.95953
  0.995152    0.835961   -0.198421      1.22304    -0.534476   -0.158006
 -4.18483     0.83762     2.07489      -3.41192    -2.40336     0.12795
  1.2769      0.450253    1.94231   …   3.05973     1.06062     1.58223
  1.2693     -0.993521   -0.522544     -0.180006   -0.0885834   0.433461
  0.263329    1.33478     1.84111       2.02565    -0.281757    1.85271
  1.56504    -1.2659     -1.79125      -0.0459542   0.927798   -0.648945
 -1.7482      0.478207    2.26841      -1.72566    -2.54041     1.36365
  0.0224748   1.0181      0.580171  …   0.640478   -0.998439   -0.171846
 -2.50815     1.01533     1.43186      -0.950055   -0.619325    0.162349
 -0.192541   -2.14724    -2.73878      -2.82316     0.861588   -1.74536
  ⋮                                 ⋱                          
 -1.68931    -0.0964689   0.122875     -1.87988    -0.748008   -1.02043
 -0.733929   -1.38688    -1.87072      -2.05063     1.15057    -0.971421
  0.124628    2.41238     1.6913    …   2.44864    -1.01345     1.0207
  1.31426    -0.690013   -1.36949       0.387585    0.746292   -0.804991
 -0.156504    0.385906    0.589231      1.38713     1.44466     0.62347
  3.51625    -1.71361    -0.157013      3.01664     2.82686     2.12007
  3.06339    -0.905018   -0.865003      1.95288     0.854144    0.166709
  0.480269   -1.14852     1.09075   …   0.405336    1.02454     1.58487
  0.282433   -0.537644   -0.713335     -1.08749    -0.736468   -0.090602
  2.15423     1.68392     0.879207      3.87128     0.0305312   0.956922
  0.381752   -2.2848     -0.516142      0.0305312   3.77871    -0.351256
 -0.563599    0.956956    1.95953       0.956922   -0.351256    2.04652
In [28]:
# compares storage
Base.summarysize(W), Base.summarysize(Wfull)
Out[28]:
(64720, 8000040)

Solve linear equation

In [29]:
# solve via Cholesky
@benchmark cholesky($(Symmetric(Wfull))) \ $b
Out[29]:
BenchmarkTools.Trial: 490 samples with 1 evaluation.
 Range (minmax):   5.969 ms69.702 ms   GC (min … max): 0.00% … 83.72%
 Time  (median):      9.521 ms               GC (median):    0.00%
 Time  (mean ± σ):   10.186 ms ±  7.209 ms   GC (mean ± σ):  9.23% ± 11.25%

  ▃  █                                                        
  █▅▄█▅▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂ ▂
  5.97 ms         Histogram: frequency by time        59.4 ms <

 Memory estimate: 7.64 MiB, allocs estimate: 5.
In [30]:
# solve using Woodbury formula
@benchmark $W \ reshape($b, n, 1) # hack; need to file an issue
Out[30]:
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (minmax):  11.217 μs96.229 μs   GC (min … max): 0.00% … 0.00%
 Time  (median):     21.811 μs               GC (median):    0.00%
 Time  (mean ± σ):   20.264 μs ±  6.391 μs   GC (mean ± σ):  0.00% ± 0.00%

     ▂▃▃             ▃                                      
  ▂▁▂████▅▃▂▁▁▁▁▁▁▁▁▆██▇▄▃▂▂▂▂▃▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  11.2 μs         Histogram: frequency by time        41.7 μs <

 Memory estimate: 32.03 KiB, allocs estimate: 8.

Matrix-vector multiplication

In [31]:
# multiplication without using Woodbury structure
@benchmark $Wfull * $b
Out[31]:
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (minmax):  123.121 μs747.017 μs   GC (min … max): 0.00% … 0.00%
 Time  (median):     182.658 μs                GC (median):    0.00%
 Time  (mean ± σ):   188.649 μs ±  32.262 μs   GC (mean ± σ):  0.00% ± 0.00%

             ▁▂▄▅▆▅█▇█▆▇▃▃▁▂                                  
  ▁▁▁▁▂▂▃▄▅▆██████████████████▇▆▅▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁ ▄
  123 μs           Histogram: frequency by time          291 μs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.
In [32]:
# multiplication using Woodbury structure
@benchmark $W * $b
Out[32]:
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (minmax):  2.759 μs 9.332 ms   GC (min … max):  0.00% … 99.86%
 Time  (median):     5.751 μs               GC (median):     0.00%
 Time  (mean ± σ):   6.968 μs ± 93.302 μs   GC (mean ± σ):  13.37% ±  1.00%

    █▁        ▄                                         
  ▃▇██▄▃▂▂▂▂▂▅██▇▅▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  2.76 μs        Histogram: frequency by time        15.9 μs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

Determinant

In [33]:
# determinant without using Woodbury structure
@benchmark det($Wfull)
Out[33]:
BenchmarkTools.Trial: 400 samples with 1 evaluation.
 Range (minmax):   8.839 ms65.424 ms   GC (min … max): 0.00% … 79.60%
 Time  (median):     11.930 ms               GC (median):    0.00%
 Time  (mean ± σ):   12.488 ms ±  6.716 ms   GC (mean ± σ):  6.80% ± 10.35%

  ▅▅▂█▃                                                       
  ██████▆▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅ ▆
  8.84 ms      Histogram: log(frequency) by time      59.9 ms <

 Memory estimate: 7.64 MiB, allocs estimate: 3.
In [34]:
# determinant using Woodbury structure
@benchmark det($W)
Out[34]:
BenchmarkTools.Trial: 10000 samples with 195 evaluations.
 Range (minmax):  461.533 ns302.622 μs   GC (min … max): 0.00% … 99.75%
 Time  (median):     499.359 ns                GC (median):    0.00%
 Time  (mean ± σ):   583.170 ns ±   3.022 μs   GC (mean ± σ):  5.18% ±  1.00%

  ▄ ▆▃█▃▂▁▁     ▁▄▅▅▄▄▃▂▁▁                                    ▂
  ███████████▇▇▇██████████████▆▆▇▆▆▆▆▆▅▅▆▆▅▆▇▆▆▆▆▅▅▅▅▅▅▄▄▅▄▅▅ █
  462 ns        Histogram: log(frequency) by time        939 ns <

 Memory estimate: 352 bytes, allocs estimate: 2.

Easy plus border

Easy plus border: For $\mathbf{A}$ pd and $\mathbf{V}$ full row rank, $$ \begin{pmatrix} \mathbf{A} & \mathbf{V}^T \\ \mathbf{V} & \mathbf{0} \end{pmatrix}^{-1} = \begin{pmatrix} \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\ (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \end{pmatrix}. $$ Anyone interested writing a package?

Orthogonal matrix

Orthogonal $\mathbf{A}$: $n^2$ flops at most. Why? Permutation matrix, Householder matrix, Jacobi matrix, ... take less.

Toeplitz matrix

Toeplitz systems (constant diagonals): $$ \mathbf{T} = \begin{pmatrix} r_0 & r_1 & r_2 & r_3 \\ r_{-1} & r_0 & r_1 & r_2 \\ r_{-2} & r_{-1} & r_0 & r_1 \\ r_{-3} & r_{-2} & r_{-1} & r_0 \end{pmatrix}. $$ $\mathbf{T} \mathbf{x} = \mathbf{b}$, where $\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.

Circulant matrix

Circulant systems: Toeplitz matrix with wraparound $$ C(\mathbf{z}) = \begin{pmatrix} z_0 & z_4 & z_3 & z_2 & z_1 \\ z_1 & z_0 & z_4 & z_3 & z_2 \\ z_2 & z_1 & z_0 & z_4 & z_3 \\ z_3 & z_2 & z_1 & z_0 & z_4 \\ z_4 & z_3 & z_2 & z_1 & z_0 \end{pmatrix}, $$ FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform).

Vandermonde matrix

Vandermonde matrix: such as in interpolation and approximation problems $$ \mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_0 & x_1 & \cdots & x_n \\ \vdots & \vdots & & \vdots \\ x_0^n & x_1^n & \cdots & x_n^n \end{pmatrix}. $$ $\mathbf{V} \mathbf{x} = \mathbf{b}$ or $\mathbf{V}^T \mathbf{x} = \mathbf{b}$ can be solved in $O(n^2)$ flops.

Cauchy-like matrix

Cauchy-like matrices: $$ \Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T, $$ where $\Omega = \text{diag}(\omega_1,\ldots,\omega_n)$ and $\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)$. $O(n)$ flops for LU and QR.

Structured-rank matrix

Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ...