A list of "easy" linear systems¶

Consider $\mathbf{A} \mathbf{x} = \mathbf{b}$, $\mathbf{A} \in \mathbb{R}^{n \times n}$. Or, consider matrix inverse (if you want). $\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems.

Don't blindly use A \ b and inv in Julia or solve function in R. Don't waste computing resources by bad choices of algorithms!

versioninfo()

Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

Diagonal matrix¶

Diagonal $\mathbf{A}$: $n$ flops. Use Diagonal type of Julia.

using BenchmarkTools, LinearAlgebra, Random

# generate random data
Random.seed!(123)
n = 1000
A = diagm(0 => randn(n)) # a diagonal matrix stored as Matrix{Float64}
b = randn(n);

# should give link: https://github.com/JuliaLang/julia/blob/5b637df34396034b0dd353e603ab3d61322369fb/stdlib/LinearAlgebra/src/generic.jl#L956
@which A \ b

# check `istril(A)` and `istriu(A)` (O(n^2)), then call `Diagonal(A) \ b` (O(n))
@benchmark $A \ $b

BenchmarkTools.Trial: 4622 samples with 1 evaluation.
 Range (min … max):  977.440 μs …  1.912 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       1.052 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.076 ms ± 86.446 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▁▃▁▂▇█▅▃▂▂                                                  
  ▃▅████████████▇▆▅▆▅▄▅▄▄▄▃▃▃▃▃▃▃▂▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂ ▄
  977 μs          Histogram: frequency by time         1.45 ms <

 Memory estimate: 15.88 KiB, allocs estimate: 2.

# O(n) computation, no extra array allocation
@benchmark Diagonal($A) \ $b

BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.802 μs …  1.145 ms  ┊ GC (min … max):  0.00% … 99.34%
 Time  (median):     3.769 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   7.243 μs ± 47.692 μs  ┊ GC (mean ± σ):  35.48% ±  5.41%

  ▅▅▁  ██▄▃▃▄▃▁                        ▁▂▂▂▂▂▁▁▁▁▁           ▂
  ███▇██████████▇██▇▅▆▇▆▅▅▅▇▄▅▄▆▆▇▅▄▅▇█████████████▇▇▇▆▆▅▄▅▅ █
  2.8 μs       Histogram: log(frequency) by time     11.7 μs <

 Memory estimate: 15.88 KiB, allocs estimate: 2.

Bidiagonal, tridiagonal, and banded matrices¶

Bidiagonal, tridiagonal, or banded $\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops.

Use Bidiagonal, Tridiagonal, SymTridiagonal types of Julia.

Random.seed!(123) 

n  = 1000
dv = randn(n)
ev = randn(n - 1)
b  = randn(n) # rhs
# symmetric tridiagonal matrix
A  = SymTridiagonal(dv, ev)

1000×1000 SymTridiagonal{Float64, Vector{Float64}}:
 0.808288   0.598212    ⋅         ⋅        …    ⋅           ⋅          ⋅ 
 0.598212  -1.12207   -1.00753    ⋅             ⋅           ⋅          ⋅ 
  ⋅        -1.00753   -1.10464   1.97923        ⋅           ⋅          ⋅ 
  ⋅          ⋅         1.97923  -0.416993       ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅       -0.34622        ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
 ⋮                                         ⋱                         
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅            1.18201      ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅           -0.0625319  -0.271376    ⋅ 
  ⋅          ⋅          ⋅         ⋅           -0.271376   -1.25053    1.49717
  ⋅          ⋅          ⋅         ⋅             ⋅          1.49717   -0.34637

# convert to a full matrix
Afull = Matrix(A)

# LU decomposition (2/3) n^3 flops!
@benchmark $Afull \ $b

BenchmarkTools.Trial: 408 samples with 1 evaluation.
 Range (min … max):   8.896 ms … 21.030 ms  ┊ GC (min … max): 0.00% … 17.63%
 Time  (median):     12.052 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.241 ms ±  2.155 ms  ┊ GC (mean ± σ):  4.05% ±  7.74%

     █▇▁▅▄ ▇▅     ▁ ▃▇▄▂▄ ▁▃ ▁  ▃▄ ▁                           
  ▆▃▇████████▆▅▆█▇███████▆████▆▇██▆██▅▇▄▆▅▅▄▃▅▄▃▃▄▃▃▃▃▃▃▁▁▁▃▃ ▅
  8.9 ms          Histogram: frequency by time        18.3 ms <

 Memory estimate: 7.64 MiB, allocs estimate: 4.

# specialized algorithm for tridiagonal matrix, O(n) flops
@benchmark $A \ $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  13.607 μs …   8.883 ms  ┊ GC (min … max):  0.00% … 99.38%
 Time  (median):     16.601 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   19.552 μs ± 125.824 μs  ┊ GC (mean ± σ):  10.91% ±  1.72%

         ▃██▂                                                   
  ▂▃▅▄▃▂▆████▇▇▇▆▅▅▅▅▆▆▇▇▆▅▄▄▅▅▄▃▂▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂ ▃
  13.6 μs         Histogram: frequency by time         27.8 μs <

 Memory estimate: 23.81 KiB, allocs estimate: 3.

Triangular matrix¶

Triangular $\mathbf{A}$: $n^2$ flops to solve linear system.

Random.seed!(123)

n = 1000
A = tril(randn(n, n)) # a lower-triangular matrix stored as Matrix{Float64}
b = randn(n)

# check istril() then triangular solve
@benchmark $A \ $b

BenchmarkTools.Trial: 6573 samples with 1 evaluation.
 Range (min … max):  597.896 μs …   5.014 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     703.333 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   753.879 μs ± 161.659 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▃▄█▅▅▃▁                                                      
  ▃███████████▇▆▆▅▆▄▃▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▁▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁ ▃
  598 μs           Histogram: frequency by time         1.31 ms <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

# triangular solve directly; save the cost of istril()
@benchmark LowerTriangular($A) \ $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   96.218 μs … 751.130 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     118.454 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   133.487 μs ±  46.246 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▄▂█▅▃▁                                                        
  ███████▇▆▆▅▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  96.2 μs          Histogram: frequency by time          336 μs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

Block diagonal matrix¶

Block diagonal: Suppose $n = \sum_b n_b$. For linear equations, $(\sum_b n_b)^3$ (without using block diagonal structure) vs $\sum_b n_b^3$ (using block diagonal structure).

Julia has a blockdiag function that generates a sparse matrix. Anyone interested writing a BlockDiagonal.jl package?

using SparseArrays

Random.seed!(123)

B  = 10 # number of blocks
ni = 100
A  = blockdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)

1000×1000 SparseMatrixCSC{Float64, Int64} with 975 stored entries:
⢶⡷⢬⡮⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣬⣿⣽⣿⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠈⠀⠁⠀⢻⣿⢿⣿⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⡿⣽⣳⡾⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠁⠁⢸⣬⣷⣟⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⢺⣿⣛⣲⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠈⠈⠁⠈⣝⢾⢻⣟⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣯⣫⣗⣮⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⢐⣟⣟⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣶⢫⣿⣽⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢾⠾⢿⣛⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣼⣺⠿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢈⣉⡾⣟⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⡞⣾⢿⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣎⣜⣿⣭⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢟⣭⡟⣇⣀⠀⣀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⣼⢾⣋⡿⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢿⣺⣯⡽⡀⢀⠀⡀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⣿⣷⢿⡵
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠾⡛⠼⡯

using UnicodePlots
spy(A)

        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀Sparsity Pattern⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ┌──────────────────────────────────────────┐    
      1 │⢷⡷⢸⡽⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ > 0
        │⣼⣾⣽⣿⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ < 0
        │⠉⠀⠁⠀⢽⣿⢶⣷⠆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⢿⢯⡷⣴⠅⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠈⠉⠉⠁⢀⣦⣶⣴⠲⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⢚⣷⡳⠯⣗⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠐⠉⠛⠐⠓⣢⡶⢲⡶⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠷⣗⣋⡛⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⠛⠛⠋⠋⠤⡦⢴⡢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣂⠺⡿⣽⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⠓⠘⠓⠛⣠⣠⢄⢤⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹⣹⣽⢷⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠾⠪⠁⠳⣇⢀⣤⢤⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢤⣸⠻⣮⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠺⠗⠹⠟⠇⡀⣀⣠⣀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣚⣴⣽⢷⡶⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⠥⢾⠟⠄⣀⡀⣀⢀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢈⡻⢼⣭⡟⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠨⢿⡯⣿⠷⡀⢀⡀⢀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⣿⣷⢻⡳│    
   1000 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢰⠾⡲⡽⣿│    
        └──────────────────────────────────────────┘    
        ⠀1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀1000⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀975 nonzeros⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

Kronecker product¶

Use $$ \begin{eqnarray*} (\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\ (\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}) \\ \text{det}(\mathbf{A} \otimes \mathbf{B}) &=& [\text{det}(\mathbf{A})]^p [\text{det}(\mathbf{B})]^m, \quad \mathbf{A} \in \mathbb{R}^{m \times m}, \mathbf{B} \in \mathbb{R}^{p \times p} \end{eqnarray*} $$ to avoid forming and doing costly computation on the potentially huge Kronecker $\mathbf{A} \otimes \mathbf{B}$.

Anyone interested writing a package?

using MatrixDepot, LinearAlgebra

A = matrixdepot("lehmer", 50) # a pd matrix

┌ Info: verify download of index files...
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/MatrixDepot.jl:139
┌ Info: reading database
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/download.jl:23
┌ Info: adding metadata...
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/download.jl:67
┌ Info: adding svd data...
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/download.jl:69
┌ Info: writing database
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/download.jl:74
┌ Info: used remote sites are sparse.tamu.edu with MAT index and math.nist.gov with HTML index
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/GEDc3/src/MatrixDepot.jl:141

50×50 Matrix{Float64}:
 1.0        0.5        0.333333   0.25       …  0.0208333  0.0204082  0.02
 0.5        1.0        0.666667   0.5           0.0416667  0.0408163  0.04
 0.333333   0.666667   1.0        0.75          0.0625     0.0612245  0.06
 0.25       0.5        0.75       1.0           0.0833333  0.0816327  0.08
 0.2        0.4        0.6        0.8           0.104167   0.102041   0.1
 0.166667   0.333333   0.5        0.666667   …  0.125      0.122449   0.12
 0.142857   0.285714   0.428571   0.571429      0.145833   0.142857   0.14
 0.125      0.25       0.375      0.5           0.166667   0.163265   0.16
 0.111111   0.222222   0.333333   0.444444      0.1875     0.183673   0.18
 0.1        0.2        0.3        0.4           0.208333   0.204082   0.2
 0.0909091  0.181818   0.272727   0.363636   …  0.229167   0.22449    0.22
 0.0833333  0.166667   0.25       0.333333      0.25       0.244898   0.24
 0.0769231  0.153846   0.230769   0.307692      0.270833   0.265306   0.26
 ⋮                                           ⋱                        
 0.025641   0.0512821  0.0769231  0.102564      0.8125     0.795918   0.78
 0.025      0.05       0.075      0.1           0.833333   0.816327   0.8
 0.0243902  0.0487805  0.0731707  0.097561   …  0.854167   0.836735   0.82
 0.0238095  0.047619   0.0714286  0.0952381     0.875      0.857143   0.84
 0.0232558  0.0465116  0.0697674  0.0930233     0.895833   0.877551   0.86
 0.0227273  0.0454545  0.0681818  0.0909091     0.916667   0.897959   0.88
 0.0222222  0.0444444  0.0666667  0.0888889     0.9375     0.918367   0.9
 0.0217391  0.0434783  0.0652174  0.0869565  …  0.958333   0.938776   0.92
 0.0212766  0.0425532  0.0638298  0.0851064     0.979167   0.959184   0.94
 0.0208333  0.0416667  0.0625     0.0833333     1.0        0.979592   0.96
 0.0204082  0.0408163  0.0612245  0.0816327     0.979592   1.0        0.98
 0.02       0.04       0.06       0.08          0.96       0.98       1.0

B = matrixdepot("oscillate", 100) # pd matrix

100×100 Matrix{Float64}:
  0.539265      0.403989     -0.00217105   …   4.34736e-15  -2.06948e-14
  0.403989      0.531479      0.00497288      -4.47758e-15   2.13147e-14
 -0.00217105    0.00497288    0.660112         1.55726e-14  -7.41304e-14
  9.57703e-5   -0.000206496   0.0413618       -1.6975e-13    8.08067e-13
  7.50775e-6    0.000158391  -0.00736855       1.77959e-13  -8.47144e-13
  0.000158924  -8.79789e-6   -0.00523002   …  -2.20985e-13   1.05196e-12
 -1.92377e-5    2.14476e-5   -0.000180051      7.96194e-12  -3.79014e-11
  9.91107e-6   -1.07804e-5    0.000129376     -4.44381e-12   2.1154e-11
 -1.50826e-5    1.42813e-5    0.000145861      6.83341e-12  -3.25292e-11
  6.37565e-6   -5.92068e-6   -7.21201e-5      -4.05359e-12   1.92964e-11
  1.49406e-6   -1.03353e-6   -7.66895e-5   …   2.01848e-11  -9.60863e-11
 -8.76766e-7    8.08713e-7    1.27876e-5      -8.7208e-12    4.15138e-11
  3.24596e-6   -3.30136e-6    2.49045e-6       3.11066e-11  -1.48077e-10
  ⋮                                        ⋱                
  6.56008e-15  -6.75658e-15   2.35002e-14     -1.71847e-5    8.32939e-5
 -1.41705e-14   1.4595e-14   -5.07629e-14      4.02757e-5   -0.000189988
  1.29533e-13  -1.33413e-13   4.64024e-13  …  -0.000369986   0.00175252
 -7.28656e-15   7.50482e-15  -2.61026e-14      1.87337e-5   -0.000117048
  3.42848e-15  -3.53118e-15   1.22818e-14     -9.42239e-6    8.54268e-5
 -1.51144e-15   1.55671e-15  -5.41433e-15      9.28256e-7   -9.16845e-5
  1.40856e-15  -1.45075e-15   5.04573e-15     -6.35794e-5    0.000138679
 -2.11535e-15   2.17871e-15  -7.57738e-15  …   0.000251782  -0.000405406
  3.4777e-15   -3.58187e-15   1.24574e-14     -0.000439703   0.000823466
 -6.81829e-14   7.02252e-14  -2.44236e-13      0.0177542    -0.0172616
  4.34736e-15  -4.47758e-15   1.55726e-14      0.627671      0.0664583
 -2.06948e-14   2.13147e-14  -7.41304e-14      0.0664583     0.103314

M = kron(A, B)
c = ones(size(M, 2)) # rhs
# Method 1: form Kronecker product and Cholesky solve
x1 = cholesky(Symmetric(M)) \ c;

# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
m, p = size(A, 1), size(B, 1)
x2 = vec(transpose(cholesky(Symmetric(A)) \ 
    transpose(cholesky(Symmetric(B)) \ reshape(c, p, m))));

# relative error
norm(x1 - x2) / norm(x1)

1.3222136567513794e-7

using BenchmarkTools

# Method 1: form Kronecker and Cholesky solve
@benchmark cholesky(Symmetric(kron($A, $B))) \ c

BenchmarkTools.Trial: 7 samples with 1 evaluation.
 Range (min … max):  632.106 ms … 877.267 ms  ┊ GC (min … max): 0.14% … 19.49%
 Time  (median):     703.717 ms               ┊ GC (median):    5.37%
 Time  (mean ± σ):   739.803 ms ±  91.636 ms  ┊ GC (mean ± σ):  9.22% ±  8.19%

  █        █ █     █                     █       █            █  
  █▁▁▁▁▁▁▁▁█▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  632 ms           Histogram: frequency by time          877 ms <

 Memory estimate: 381.51 MiB, allocs estimate: 9.

# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
@benchmark vec(transpose(cholesky(Symmetric($A)) \ 
    transpose(cholesky(Symmetric($B)) \ reshape($c, p, m))))

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  120.861 μs …  31.357 ms  ┊ GC (min … max):  0.00% … 98.96%
 Time  (median):     207.064 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   269.288 μs ± 870.394 μs  ┊ GC (mean ± σ):  10.59% ±  3.28%

            █▆                                                   
  ▁▁▁▁▁▁▁▁▁▂██▇▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  121 μs           Histogram: frequency by time          532 μs <

 Memory estimate: 176.33 KiB, allocs estimate: 19.

Sparse matrix¶

Sparsity: sparse matrix decomposition or iterative method.

The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (Matrix package), MKL (sparse BLAS), ... as much as possible.

using MatrixDepot

Random.seed!(123)

# a 7701-by-7701 sparse pd matrix
A = matrixdepot("wathen", 50)
# random generated rhs
b = randn(size(A, 1))
Afull = Matrix(A)
count(!iszero, A) / length(A) # sparsity

0.001994776158751544

using UnicodePlots
spy(A)

        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀Sparsity Pattern⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        ┌──────────────────────────────────────────┐    
      1 │⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ > 0
        │⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ < 0
        │⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀│    
        │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀│    
   7701 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷│    
        └──────────────────────────────────────────┘    
        ⠀1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀7701⠀    
        ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀118301 nonzeros⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

Matrix-vector multiplication¶

# dense matrix-vector multiplication
@benchmark $Afull * $b

BenchmarkTools.Trial: 181 samples with 1 evaluation.
 Range (min … max):  23.293 ms … 43.998 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     26.031 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   27.628 ms ±  4.264 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▄▂ ▆▅▆█▄▂                                                 
  ▅▃███▇███████▆▆▅▆▆▄▄▁▅▁▄▃▃▁▁▁▁▁▁▃▃▁▁▁▄▄▁▁▃▄▅▃▃▃▁▃▁▃▄▁▃▄▁▃▁▄ ▃
  23.3 ms         Histogram: frequency by time        40.1 ms <

 Memory estimate: 60.23 KiB, allocs estimate: 2.

# sparse matrix-vector multiplication
@benchmark $A * $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  110.667 μs …  50.684 ms  ┊ GC (min … max): 0.00% … 99.57%
 Time  (median):     132.206 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   142.435 μs ± 505.685 μs  ┊ GC (mean ± σ):  3.54% ±  1.00%

            ▃▆██▆▆▆▅▃▃▂▂▂▃▃▂▂▁▁▁▁    ▁▁                         ▂
  ▃▄▅▅▃▃▁▁▁▅████████████████████████████████▇▇▇▇▇█▇▇▇▇▅▅▆▆▆▆▆▆▆ █
  111 μs        Histogram: log(frequency) by time        206 μs <

 Memory estimate: 60.23 KiB, allocs estimate: 2.

Solve linear equation¶

# solve via dense Cholesky
xchol = cholesky(Symmetric(Afull)) \ b
@benchmark cholesky($(Symmetric(Afull))) \ $b

BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (min … max):  1.624 s …   1.677 s  ┊ GC (min … max): 0.03% … 3.15%
 Time  (median):     1.656 s              ┊ GC (median):    0.81%
 Time  (mean ± σ):   1.653 s ± 24.347 ms  ┊ GC (mean ± σ):  1.20% ± 1.50%

  █                  █                            █       █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█ ▁
  1.62 s         Histogram: frequency by time        1.68 s <

 Memory estimate: 452.52 MiB, allocs estimate: 6.

# solve via sparse Cholesky
xcholsp = cholesky(Symmetric(A)) \ b
@show norm(xchol - xcholsp)
@benchmark cholesky($(Symmetric(A))) \ $b

norm(xchol - xcholsp) = 5.0342422076653274e-15

BenchmarkTools.Trial: 316 samples with 1 evaluation.
 Range (min … max):  12.786 ms … 35.710 ms  ┊ GC (min … max): 0.00% … 2.75%
 Time  (median):     15.258 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.840 ms ±  2.563 ms  ┊ GC (mean ± σ):  0.15% ± 0.48%

        ▂██▅▃▁▁▁                                               
  ▇▆▆▄▁▄██████████▆█▁▄▄▄▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▄▁▁▁▁▁▁▁▄▄▄ ▆
  12.8 ms      Histogram: log(frequency) by time      30.2 ms <

 Memory estimate: 12.43 MiB, allocs estimate: 64.

# sparse solve via conjugate gradient
using IterativeSolvers

xcg = cg(A, b)
@show norm(xcg - xchol)
@benchmark cg($A, $b)

norm(xcg - xchol) = 1.418440389790897e-7

BenchmarkTools.Trial: 159 samples with 1 evaluation.
 Range (min … max):  29.333 ms … 37.392 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     30.911 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   31.600 ms ±  1.834 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂ ▂▃▃▃▆▆▂▃ █ ▃  ▂                           ▃                
  ████████████▇█▇▇█▅▇▄▅▅▅▁▅▅▅█▅▅▅▁▅▅▅▅▁▄▄▇█▄▇▇█▁▁▄▄▄▁▁▅▁▁▄▁▁▄ ▄
  29.3 ms         Histogram: frequency by time        36.1 ms <

 Memory estimate: 241.64 KiB, allocs estimate: 16.

Easy plus low rank¶

Easy plus low rank: $\mathbf{U} \in \mathbb{R}^{n \times r}$, $\mathbf{V} \in \mathbb{R}^{r \times n}$, $r \ll n$. Woodbury formula \begin{eqnarray*} (\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} &=& \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1} \\ \text{det}(\mathbf{A} + \mathbf{U} \mathbf{V}^T) &=& \text{det}(\mathbf{A}) \text{det}(\mathbf{I}_r + \mathbf{V} \mathbf{A}^{-1} \mathbf{U}^T). \end{eqnarray*}

Keep HW2 Q2 (multivariate density) and PageRank problem in mind.
WoodburyMatrices.jl package can be useful.

using BenchmarkTools, Random, WoodburyMatrices

Random.seed!(123)
n = 1000
r = 5

A = Diagonal(rand(n))
B = randn(n, r)
D = Diagonal(rand(r))
b = randn(n)
# Woodbury structure: W = A + B * D * B'
W = SymWoodbury(A, B, D)
Wfull = Matrix(W) # stored as a Matrix{Float64}

1000×1000 Matrix{Float64}:
  4.59798    -0.281279   -2.22717   …   2.15423     0.381752   -0.563599
 -0.281279    3.46848     1.39267       1.68392    -2.2848      0.956956
 -2.22717     1.39267     4.81456       0.879207   -0.516142    1.95953
  0.995152    0.835961   -0.198421      1.22304    -0.534476   -0.158006
 -4.18483     0.83762     2.07489      -3.41192    -2.40336     0.12795
  1.2769      0.450253    1.94231   …   3.05973     1.06062     1.58223
  1.2693     -0.993521   -0.522544     -0.180006   -0.0885834   0.433461
  0.263329    1.33478     1.84111       2.02565    -0.281757    1.85271
  1.56504    -1.2659     -1.79125      -0.0459542   0.927798   -0.648945
 -1.7482      0.478207    2.26841      -1.72566    -2.54041     1.36365
  0.0224748   1.0181      0.580171  …   0.640478   -0.998439   -0.171846
 -2.50815     1.01533     1.43186      -0.950055   -0.619325    0.162349
 -0.192541   -2.14724    -2.73878      -2.82316     0.861588   -1.74536
  ⋮                                 ⋱                          
 -1.68931    -0.0964689   0.122875     -1.87988    -0.748008   -1.02043
 -0.733929   -1.38688    -1.87072      -2.05063     1.15057    -0.971421
  0.124628    2.41238     1.6913    …   2.44864    -1.01345     1.0207
  1.31426    -0.690013   -1.36949       0.387585    0.746292   -0.804991
 -0.156504    0.385906    0.589231      1.38713     1.44466     0.62347
  3.51625    -1.71361    -0.157013      3.01664     2.82686     2.12007
  3.06339    -0.905018   -0.865003      1.95288     0.854144    0.166709
  0.480269   -1.14852     1.09075   …   0.405336    1.02454     1.58487
  0.282433   -0.537644   -0.713335     -1.08749    -0.736468   -0.090602
  2.15423     1.68392     0.879207      3.87128     0.0305312   0.956922
  0.381752   -2.2848     -0.516142      0.0305312   3.77871    -0.351256
 -0.563599    0.956956    1.95953       0.956922   -0.351256    2.04652

# compares storage
Base.summarysize(W), Base.summarysize(Wfull)

(64720, 8000040)

Solve linear equation¶

# solve via Cholesky
@benchmark cholesky($(Symmetric(Wfull))) \ $b

BenchmarkTools.Trial: 490 samples with 1 evaluation.
 Range (min … max):   5.969 ms … 69.702 ms  ┊ GC (min … max): 0.00% … 83.72%
 Time  (median):      9.521 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.186 ms ±  7.209 ms  ┊ GC (mean ± σ):  9.23% ± 11.25%

  ▃  ██                                                        
  █▅▄██▆▅▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂ ▂
  5.97 ms         Histogram: frequency by time        59.4 ms <

 Memory estimate: 7.64 MiB, allocs estimate: 5.

# solve using Woodbury formula
@benchmark $W \ reshape($b, n, 1) # hack; need to file an issue

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  11.217 μs … 96.229 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     21.811 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.264 μs ±  6.391 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▂▃▃              ▅█▃                                      
  ▂▁▂████▅▃▂▁▁▁▁▁▁▁▁▂▆███▇▄▃▂▂▂▂▃▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  11.2 μs         Histogram: frequency by time        41.7 μs <

 Memory estimate: 32.03 KiB, allocs estimate: 8.

Matrix-vector multiplication¶

# multiplication without using Woodbury structure
@benchmark $Wfull * $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  123.121 μs … 747.017 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     182.658 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   188.649 μs ±  32.262 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▁▂▄▅▆▅█▇█▆▇▇▅▄▃▃▁▂                                  
  ▁▁▁▁▂▂▃▄▅▆█████████████████████▇▆▅▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁ ▄
  123 μs           Histogram: frequency by time          291 μs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

# multiplication using Woodbury structure
@benchmark $W * $b

BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.759 μs …  9.332 ms  ┊ GC (min … max):  0.00% … 99.86%
 Time  (median):     5.751 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   6.968 μs ± 93.302 μs  ┊ GC (mean ± σ):  13.37% ±  1.00%

    █▁        ▄▃▁                                             
  ▃▇██▄▃▂▂▂▂▂▅███▇▅▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  2.76 μs        Histogram: frequency by time        15.9 μs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

Determinant¶

# determinant without using Woodbury structure
@benchmark det($Wfull)

BenchmarkTools.Trial: 400 samples with 1 evaluation.
 Range (min … max):   8.839 ms … 65.424 ms  ┊ GC (min … max): 0.00% … 79.60%
 Time  (median):     11.930 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.488 ms ±  6.716 ms  ┊ GC (mean ± σ):  6.80% ± 10.35%

  ▅▅▂█▆▃                                                       
  ███████▆▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅ ▆
  8.84 ms      Histogram: log(frequency) by time      59.9 ms <

 Memory estimate: 7.64 MiB, allocs estimate: 3.

# determinant using Woodbury structure
@benchmark det($W)

BenchmarkTools.Trial: 10000 samples with 195 evaluations.
 Range (min … max):  461.533 ns … 302.622 μs  ┊ GC (min … max): 0.00% … 99.75%
 Time  (median):     499.359 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   583.170 ns ±   3.022 μs  ┊ GC (mean ± σ):  5.18% ±  1.00%

  ▄ ▆▃█▄▃▂▁▁     ▁▃▄▅▅▄▄▃▂▁▁                                    ▂
  ████████████▇▇▇███████████████▆▆▇▆▆▆▆▆▅▅▆▆▅▆▇▆▆▆▆▅▅▅▅▅▅▄▄▅▄▅▅ █
  462 ns        Histogram: log(frequency) by time        939 ns <

 Memory estimate: 352 bytes, allocs estimate: 2.

Easy plus border¶

Easy plus border: For $\mathbf{A}$ pd and $\mathbf{V}$ full row rank, $$ \begin{pmatrix} \mathbf{A} & \mathbf{V}^T \\ \mathbf{V} & \mathbf{0} \end{pmatrix}^{-1} = \begin{pmatrix} \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\ (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \end{pmatrix}. $$ Anyone interested writing a package?

Orthogonal matrix¶

Orthogonal $\mathbf{A}$: $n^2$ flops at most. Why? Permutation matrix, Householder matrix, Jacobi matrix, ... take less.

Toeplitz matrix¶

Toeplitz systems (constant diagonals): $$ \mathbf{T} = \begin{pmatrix} r_0 & r_1 & r_2 & r_3 \\ r_{-1} & r_0 & r_1 & r_2 \\ r_{-2} & r_{-1} & r_0 & r_1 \\ r_{-3} & r_{-2} & r_{-1} & r_0 \end{pmatrix}. $$ $\mathbf{T} \mathbf{x} = \mathbf{b}$, where $\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.

ToeplitzMatrices.jl package can be useful.

Circulant matrix¶

Circulant systems: Toeplitz matrix with wraparound $$ C(\mathbf{z}) = \begin{pmatrix} z_0 & z_4 & z_3 & z_2 & z_1 \\ z_1 & z_0 & z_4 & z_3 & z_2 \\ z_2 & z_1 & z_0 & z_4 & z_3 \\ z_3 & z_2 & z_1 & z_0 & z_4 \\ z_4 & z_3 & z_2 & z_1 & z_0 \end{pmatrix}, $$ FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform).

Vandermonde matrix¶

Vandermonde matrix: such as in interpolation and approximation problems $$ \mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_0 & x_1 & \cdots & x_n \\ \vdots & \vdots & & \vdots \\ x_0^n & x_1^n & \cdots & x_n^n \end{pmatrix}. $$ $\mathbf{V} \mathbf{x} = \mathbf{b}$ or $\mathbf{V}^T \mathbf{x} = \mathbf{b}$ can be solved in $O(n^2)$ flops.

Cauchy-like matrix¶

Cauchy-like matrices: $$ \Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T, $$ where $\Omega = \text{diag}(\omega_1,\ldots,\omega_n)$ and $\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)$. $O(n)$ flops for LU and QR.

Structured-rank matrix¶

Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ...