Introduction
Consider \(\mathbf{A} \mathbf{x} = \mathbf{b}\) , \(\mathbf{A} \in \mathbb{R}^{n \times n}\) . Or, consider matrix inverse (if you want). \(\mathbf{A}\) can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, … We should be alert to many easy linear systems.
Don’t blindly use A \ b
and inv
in Julia or solve
function in R. Don’t waste computing resources by bad choices of algorithms!
Diagonal matrix
Diagonal \(\mathbf{A}\) : \(n\) flops. Use Diagonal
type of Julia.
using BenchmarkTools , LinearAlgebra , Random
# generate random data
Random .seed! (123 )
n = 1000
A = diagm (0 => randn (n)) # a diagonal matrix stored as Matrix{Float64}
b = randn (n);
# should give link to the source code
@which A \ b
# check `istril(A)` and `istriu(A)` (O(n^2)), then call `Diagonal(A) \ b` (O(n))
@benchmark $ A \ $ b
BenchmarkTools.Trial: 9747 samples with 1 evaluation.
Range ( min … max ): 489.041 μs … 2.349 ms ┊ GC ( min … max): 0.00% … 78.27%
Time ( median ): 502.334 μs ┊ GC ( median): 0.00%
Time ( mean ± σ ): 511.837 μs ± 27.936 μs ┊ GC ( mean ± σ): 0.04% ± 0.79%
▅█▄▄▆▄▃▂▃▃▂▂▂ ▂▂▂▁▁▁ ▁ ▁ ▄▇▄▃▆▃▂▁▂▂▁▁▁▁ ▂
█████████████ █████████ ██████████████▇███████████████████▇▇▇▆ █
489 μs Histogram: log( frequency ) by time 554 μs <
Memory estimate: 15.88 KiB , allocs estimate: 2 .
# O(n) computation, no extra array allocation
@benchmark Diagonal ($ A) \ $ b
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range ( min … max ): 950.000 ns … 197.258 μs ┊ GC ( min … max): 0.00% … 97.54%
Time ( median ): 1.646 μs ┊ GC ( median): 0.00%
Time ( mean ± σ ): 2.224 μs ± 8.637 μs ┊ GC ( mean ± σ): 22.07% ± 5.62%
▂▆█▇▄ ▁ ▁
▃▂▂▁▁▁▂▂▂▂▂▂▂▄▇▅▇█████ █▆▅▅▇██▆▆▄▄▃▃▃▃▃▃▃ ▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
950 ns Histogram: frequency by time 2.95 μs <
Memory estimate: 15.88 KiB , allocs estimate: 2 .
Bidiagonal, tridiagonal, and banded matrices
Bidiagonal, tridiagonal, or banded \(\mathbf{A}\) : Band LU, band Cholesky, … roughly \(O(n)\) flops.
* Use Bidiagonal
, Tridiagonal
, SymTridiagonal
types of Julia.
Random .seed! (123 )
n = 1000
dv = randn (n)
ev = randn (n - 1 )
b = randn (n) # rhs
# symmetric tridiagonal matrix
A = SymTridiagonal (dv, ev)
1000×1000 SymTridiagonal{Float64, Vector{Float64}}:
0.808288 0.598212 ⋅ ⋅ … ⋅ ⋅ ⋅
0.598212 -1.12207 -1.00753 ⋅ ⋅ ⋅ ⋅
⋅ -1.00753 -1.10464 1.97923 ⋅ ⋅ ⋅
⋅ ⋅ 1.97923 -0.416993 ⋅ ⋅ ⋅
⋅ ⋅ ⋅ -0.34622 ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋮ ⋱
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ 1.18201 ⋅ ⋅
⋅ ⋅ ⋅ ⋅ -0.0625319 -0.271376 ⋅
⋅ ⋅ ⋅ ⋅ -0.271376 -1.25053 1.49717
⋅ ⋅ ⋅ ⋅ ⋅ 1.49717 -0.34637
# convert to a full matrix
Afull = Matrix (A)
# LU decomposition (2/3) n^3 flops!
@benchmark $ Afull \ $ b
BenchmarkTools.Trial: 1284 samples with 1 evaluation.
Range ( min … max ): 3.468 ms … 10.802 ms ┊ GC ( min … max): 0.00% … 0.00%
Time ( median ): 3.796 ms ┊ GC ( median): 0.00%
Time ( mean ± σ ): 3.896 ms ± 305.271 μs ┊ GC ( mean ± σ): 2.44% ± 4.85%
▅█▂
▃▃▁▁▁▁▂▁▁▁▁▁▁▁▁▁▄███ ▅▄▃▃▃▃ ▁▁▂▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▂▃▃▃▃▄▃▃▂▂▂▂ ▂
3.47 ms Histogram: frequency by time 4.5 ms <
Memory estimate: 7.64 MiB , allocs estimate: 4 .
# specialized algorithm for tridiagonal matrix, O(n) flops
@benchmark $ A \ $ b
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range ( min … max ): 10.125 μs … 1.831 ms ┊ GC ( min … max): 0.00% … 98.21%
Time ( median ): 11.458 μs ┊ GC ( median): 0.00%
Time ( mean ± σ ): 12.270 μs ± 26.361 μs ┊ GC ( mean ± σ): 3.64% ± 1.70%
▂█▇▁▃ ▃▁
▇▅▂▇█████ ██▆█▅ ▅▃▃▂▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
10.1 μs Histogram: frequency by time 19.8 μs <
Memory estimate: 23.81 KiB , allocs estimate: 3 .
Triangular matrix
Triangular \(\mathbf{A}\) : \(n^2\) flops to solve linear system.
Random .seed! (123 )
n = 1000
A = tril (randn (n, n)) # a lower-triangular matrix stored as Matrix{Float64}
b = randn (n)
# check istril() then triangular solve
@benchmark $ A \ $ b
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range ( min … max ): 299.917 μs … 552.833 μs ┊ GC ( min … max): 0.00% … 0.00%
Time ( median ): 322.125 μs ┊ GC ( median): 0.00%
Time ( mean ± σ ): 333.401 μs ± 41.167 μs ┊ GC ( mean ± σ): 0.00% ± 0.00%
█▇ ▅▄▃ ▂▂ ▁▁ ▁ ▂
▅▃▅▅▄██ ███ ████▇▅▆▆█▇▅▄▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅███████ █
300 μs Histogram: log( frequency ) by time 525 μs <
Memory estimate: 7.94 KiB , allocs estimate: 1 .
# triangular solve directly; save the cost of istril()
@benchmark LowerTriangular ($ A) \ $ b
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range ( min … max ): 91.750 μs … 305.833 μs ┊ GC ( min … max): 0.00% … 0.00%
Time ( median ): 93.125 μs ┊ GC ( median): 0.00%
Time ( mean ± σ ): 94.271 μs ± 9.133 μs ┊ GC ( mean ± σ): 0.00% ± 0.00%
██▆▃▆▇ ▇▆▃▁ ▃▂▂▂▃▄▃▂▂▁ ▁▁ ▁ ▃
██████ ████ ▇▆███████████████████████▇▇▆▇▇▇▆▆▆▆▇▆▇▆▇▆▆▄▆▁▅▆▄▃▅ █
91.8 μs Histogram: log( frequency ) by time 109 μs <
Memory estimate: 7.94 KiB , allocs estimate: 1 .
Block diagonal matrix
Block diagonal: Suppose \(n = \sum_b n_b\) . For linear equations, \((\sum_b n_b)^3\) (without using block diagonal structure) vs \(\sum_b n_b^3\) (using block diagonal structure).
Julia has a blockdiag
function that generates a sparse matrix. Anyone interested writing a BlockDiagonal.jl
package?
using SparseArrays
Random .seed! (123 )
B = 10 # number of blocks
ni = 100
A = blockdiag ([sprandn (ni, ni, 0.01 ) for b in 1 : B]... )
1000×1000 SparseMatrixCSC{Float64, Int64} with 975 stored entries:
⢶⡷⢬⡮⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣬⣿⣽⣿⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠈⠀⠁⠀⢻⣿⢿⣿⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⡿⣽⣳⡾⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠁⠁⢸⣬⣷⣟⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⢺⣿⣛⣲⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠈⠈⠁⠈⣝⢾⢻⣟⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣯⣫⣗⣮⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⢐⣟⣟⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣶⢫⣿⣽⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢾⠾⢿⣛⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣼⣺⠿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢈⣉⡾⣟⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⡞⣾⢿⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣎⣜⣿⣭⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢟⣭⡟⣇⣀⠀⣀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⣼⢾⣋⡿⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢿⣺⣯⡽⡀⢀⠀⡀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⣿⣷⢿⡵
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠾⡛⠼⡯
using UnicodePlots
spy (A)
┌──────────────────────────────────────────┐
1 │ ⢷ ⡷ ⢸ ⡽ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ > 0
│ ⣼ ⣾ ⣽ ⣿ ⠃ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ < 0
│ ⠉ ⠀⠁ ⠀⢽ ⣿ ⢶ ⣷ ⠆ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⢿ ⢯ ⡷ ⣴ ⠅ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠈ ⠉ ⠉ ⠁ ⢀ ⣦ ⣶ ⣴ ⠲ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⢚ ⣷ ⡳ ⠯ ⣗ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠐ ⠉ ⠛ ⠐ ⠓ ⣢ ⡶ ⢲ ⡶ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸ ⠷ ⣗ ⣋ ⡛ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐ ⠛ ⠛ ⠋ ⠋ ⠤ ⡦ ⢴ ⡢ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀ ⣂ ⠺ ⡿ ⣽ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘ ⠓ ⠘ ⠓ ⠛ ⣠ ⣠ ⢄ ⢤ ⠄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹ ⣹ ⣽ ⢷ ⠄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠾ ⠪ ⠁ ⠳ ⣇ ⢀ ⣤ ⢤ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢤ ⣸ ⠻ ⣮ ⡇ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠺ ⠗ ⠹ ⠟ ⠇ ⡀ ⣀ ⣠ ⣀ ⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣚ ⣴ ⣽ ⢷ ⡶ ⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸ ⠥ ⢾ ⠟ ⠄ ⣀ ⡀ ⣀ ⢀ ⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢈ ⡻ ⢼ ⣭ ⡟ ⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠨ ⢿ ⡯ ⣿ ⠷ ⡀ ⢀ ⡀ ⢀ │
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘ ⣿ ⣷ ⢻ ⡳ │
1 000 │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢰ ⠾ ⡲ ⡽ ⣿ │
└──────────────────────────────────────────┘
⠀1 ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀1 000 ⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀975 ≠ 0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Kronecker product
Use \[
\begin{eqnarray*}
(\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\
(\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}) \\
\text{det}(\mathbf{A} \otimes \mathbf{B}) &=& [\text{det}(\mathbf{A})]^p [\text{det}(\mathbf{B})]^m, \quad \mathbf{A} \in \mathbb{R}^{m \times m}, \mathbf{B} \in \mathbb{R}^{p \times p}
\end{eqnarray*}
\] to avoid forming and doing costly computation on the potentially huge Kronecker \(\mathbf{A} \otimes \mathbf{B}\) .
Anyone interested writing a package?
using MatrixDepot , LinearAlgebra
A = matrixdepot ("lehmer" , 50 ) # a pd matrix
[ Info: verify download of index files...
[ Info: reading database
[ Info: adding metadata...
[ Info: adding svd data...
[ Info: writing database
[ Info: used remote sites are sparse.tamu.edu with MAT index and math.nist.gov with HTML index
50×50 Matrix{Float64}:
1.0 0.5 0.333333 0.25 … 0.0208333 0.0204082 0.02
0.5 1.0 0.666667 0.5 0.0416667 0.0408163 0.04
0.333333 0.666667 1.0 0.75 0.0625 0.0612245 0.06
0.25 0.5 0.75 1.0 0.0833333 0.0816327 0.08
0.2 0.4 0.6 0.8 0.104167 0.102041 0.1
0.166667 0.333333 0.5 0.666667 … 0.125 0.122449 0.12
0.142857 0.285714 0.428571 0.571429 0.145833 0.142857 0.14
0.125 0.25 0.375 0.5 0.166667 0.163265 0.16
0.111111 0.222222 0.333333 0.444444 0.1875 0.183673 0.18
0.1 0.2 0.3 0.4 0.208333 0.204082 0.2
0.0909091 0.181818 0.272727 0.363636 … 0.229167 0.22449 0.22
0.0833333 0.166667 0.25 0.333333 0.25 0.244898 0.24
0.0769231 0.153846 0.230769 0.307692 0.270833 0.265306 0.26
⋮ ⋱
0.025641 0.0512821 0.0769231 0.102564 0.8125 0.795918 0.78
0.025 0.05 0.075 0.1 0.833333 0.816327 0.8
0.0243902 0.0487805 0.0731707 0.097561 … 0.854167 0.836735 0.82
0.0238095 0.047619 0.0714286 0.0952381 0.875 0.857143 0.84
0.0232558 0.0465116 0.0697674 0.0930233 0.895833 0.877551 0.86
0.0227273 0.0454545 0.0681818 0.0909091 0.916667 0.897959 0.88
0.0222222 0.0444444 0.0666667 0.0888889 0.9375 0.918367 0.9
0.0217391 0.0434783 0.0652174 0.0869565 … 0.958333 0.938776 0.92
0.0212766 0.0425532 0.0638298 0.0851064 0.979167 0.959184 0.94
0.0208333 0.0416667 0.0625 0.0833333 1.0 0.979592 0.96
0.0204082 0.0408163 0.0612245 0.0816327 0.979592 1.0 0.98
0.02 0.04 0.06 0.08 0.96 0.98 1.0
B = matrixdepot ("oscillate" , 100 ) # pd matrix
100×100 Matrix{Float64}:
0.539265 0.403989 -0.00217105 … 4.34736e-15 -2.06948e-14
0.403989 0.531479 0.00497288 -4.47758e-15 2.13147e-14
-0.00217105 0.00497288 0.660112 1.55726e-14 -7.41304e-14
9.57703e-5 -0.000206496 0.0413618 -1.6975e-13 8.08067e-13
7.50775e-6 0.000158391 -0.00736855 1.77959e-13 -8.47144e-13
0.000158924 -8.79789e-6 -0.00523002 … -2.20985e-13 1.05196e-12
-1.92377e-5 2.14476e-5 -0.000180051 7.96194e-12 -3.79014e-11
9.91107e-6 -1.07804e-5 0.000129376 -4.44381e-12 2.1154e-11
-1.50826e-5 1.42813e-5 0.000145861 6.83341e-12 -3.25292e-11
6.37565e-6 -5.92068e-6 -7.21201e-5 -4.05359e-12 1.92964e-11
1.49406e-6 -1.03353e-6 -7.66895e-5 … 2.01848e-11 -9.60863e-11
-8.76766e-7 8.08713e-7 1.27876e-5 -8.7208e-12 4.15138e-11
3.24596e-6 -3.30136e-6 2.49045e-6 3.11066e-11 -1.48077e-10
⋮ ⋱
6.56008e-15 -6.75658e-15 2.35002e-14 -1.71847e-5 8.32939e-5
-1.41705e-14 1.4595e-14 -5.07629e-14 4.02757e-5 -0.000189988
1.29533e-13 -1.33413e-13 4.64024e-13 … -0.000369986 0.00175252
-7.28656e-15 7.50482e-15 -2.61026e-14 1.87337e-5 -0.000117048
3.42848e-15 -3.53118e-15 1.22818e-14 -9.42239e-6 8.54268e-5
-1.51144e-15 1.55671e-15 -5.41433e-15 9.28256e-7 -9.16845e-5
1.40856e-15 -1.45075e-15 5.04573e-15 -6.35794e-5 0.000138679
-2.11535e-15 2.17871e-15 -7.57738e-15 … 0.000251782 -0.000405406
3.4777e-15 -3.58187e-15 1.24574e-14 -0.000439703 0.000823466
-6.81829e-14 7.02252e-14 -2.44236e-13 0.0177542 -0.0172616
4.34736e-15 -4.47758e-15 1.55726e-14 0.627671 0.0664583
-2.06948e-14 2.13147e-14 -7.41304e-14 0.0664583 0.103314
M = kron (A, B)
c = ones (size (M, 2 )) # rhs
# Method 1: form Kronecker product and Cholesky solve
x1 = cholesky (Symmetric (M)) \ c;
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
m, p = size (A, 1 ), size (B, 1 )
x2 = vec (transpose (cholesky (Symmetric (A)) \
transpose (cholesky (Symmetric (B)) \ reshape (c, p, m))));
# relative error
norm (x1 - x2) / norm (x1)
using BenchmarkTools
# Method 1: form Kronecker and Cholesky solve
@benchmark cholesky (Symmetric (kron ($ A, $ B))) \ c
BenchmarkTools.Trial: 29 samples with 1 evaluation.
Range ( min … max ): 162.684 ms … 253.965 ms ┊ GC ( min … max): 0.22% … 35.39%
Time ( median ): 169.107 ms ┊ GC ( median): 2.80%
Time ( mean ± σ ): 176.389 ms ± 21.658 ms ┊ GC ( mean ± σ): 5.91% ± 8.13%
█
▃▁▃▇█ ▁▁▁▁▁ ▃▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃ ▁
163 ms Histogram: frequency by time 254 ms <
Memory estimate: 381.51 MiB , allocs estimate: 7 .
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
@benchmark vec (transpose (cholesky (Symmetric ($ A)) \
transpose (cholesky (Symmetric ($ B)) \ reshape ($ c, p, m))))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range ( min … max ): 68.666 μs … 34.489 ms ┊ GC ( min … max): 0.00% … 0.00%
Time ( median ): 75.833 μs ┊ GC ( median): 0.00%
Time ( mean ± σ ): 88.548 μs ± 445.869 μs ┊ GC ( mean ± σ): 9.38% ± 2.97%
▂▅▇██▆▆ ▃▃ ▁
▁▁▁▁▁▁▁▁▁▁▂▃▅████████ ████▇▇▆▅▅▅▄▃▃▃▃▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁ ▃
68.7 μs Histogram: frequency by time 89.9 μs <
Memory estimate: 176.23 KiB , allocs estimate: 15 .
Sparse matrix
Sparsity: sparse matrix decomposition or iterative method.
The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (Matrix
package), MKL (sparse BLAS), … as much as possible.
using MatrixDepot
Random .seed! (123 )
# a 7701-by-7701 sparse pd matrix
A = matrixdepot ("wathen" , 50 )
# random generated rhs
b = randn (size (A, 1 ))
Afull = Matrix (A)
count (!iszero, A) / length (A) # sparsity
using UnicodePlots
spy (A)
┌──────────────────────────────────────────┐
1 │ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ > 0
│ ⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ < 0
│ ⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀⠀⠀│
│ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ ⣄ ⠀│
7 701 │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙ ⢿ ⣷ │
└──────────────────────────────────────────┘
⠀1 ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀7 701 ⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀118 301 ≠ 0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Matrix-vector multiplication
# dense matrix-vector multiplication
@benchmark $ Afull * $ b
BenchmarkTools.Trial: 654 samples with 1 evaluation.
Range ( min … max ): 7.450 ms … 11.020 ms ┊ GC ( min … max): 0.00% … 0.00%
Time ( median ): 7.611 ms ┊ GC ( median): 0.00%
Time ( mean ± σ ): 7.647 ms ± 189.017 μs ┊ GC ( mean ± σ): 0.00% ± 0.00%
▁ ▁▁▂█▇▁ ▁
▂▂▃▃▅▆█▇██████ █▆▅ ▅▄▆▇▆█▆▃▄▃▂▁▃▂▃▃▂▃▃▁▂▂▁▁▁▁▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▂ ▃
7.45 ms Histogram: frequency by time 8.2 ms <
Memory estimate: 60.23 KiB , allocs estimate: 2 .
# sparse matrix-vector multiplication
@benchmark $ A * $ b
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range ( min … max ): 62.250 μs … 12.169 ms ┊ GC ( min … max): 0.00% … 99.28%
Time ( median ): 66.834 μs ┊ GC ( median): 0.00%
Time ( mean ± σ ): 69.565 μs ± 121.076 μs ┊ GC ( mean ± σ): 1.74% ± 0.99%
▆█▇▁
▁▁▁▁▁▁▂▆█████ ▆▅▄▄▄▄▃ ▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
62.2 μs Histogram: frequency by time 85.9 μs <
Memory estimate: 60.23 KiB , allocs estimate: 2 .
Solve linear equation
# solve via dense Cholesky
xchol = cholesky (Symmetric (Afull)) \ b
@benchmark cholesky ($ (Symmetric (Afull))) \ $ b
BenchmarkTools.Trial: 9 samples with 1 evaluation.
Range ( min … max ): 490.664 ms … 694.548 ms ┊ GC ( min … max): 0.03% … 1.51%
Time ( median ): 542.714 ms ┊ GC ( median): 0.03%
Time ( mean ± σ ): 564.897 ms ± 78.646 ms ┊ GC ( mean ± σ): 0.73% ± 0.87%
▁█ ▁ ▁ ▁ ▁ ▁ ▁
██▁▁█▁ ▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁█ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁█ ▁
491 ms Histogram: frequency by time 695 ms <
Memory estimate: 452.52 MiB , allocs estimate: 4 .
# solve via sparse Cholesky
xcholsp = cholesky (Symmetric (A)) \ b
@show norm (xchol - xcholsp)
@benchmark cholesky ($ (Symmetric (A))) \ $ b
norm(xchol - xcholsp) = 5.062481022858237e-15
BenchmarkTools.Trial: 755 samples with 1 evaluation.
Range ( min … max ): 6.157 ms … 17.571 ms ┊ GC ( min … max): 0.00% … 1.90%
Time ( median ): 6.388 ms ┊ GC ( median): 0.00%
Time ( mean ± σ ): 6.623 ms ± 1.075 ms ┊ GC ( mean ± σ): 0.14% ± 0.58%
▅█▇ ▅▃ ▂
███ ██ █▇▇▅▄▅▆▇▁▄▄▁▄▁▁▁▄▁▁▁▄▁▄▁▁▄▄▁▁▁▁▁▁▁▁▁▅▄▁▄▁▁▁▁▁▅▁▄▄▄▁▁▅ ▇
6.16 ms Histogram: log( frequency ) by time 12.8 ms <
Memory estimate: 12.43 MiB , allocs estimate: 64 .
# sparse solve via conjugate gradient
using IterativeSolvers
xcg = cg (A, b)
@show norm (xcg - xchol)
@benchmark cg ($ A, $ b)
norm(xcg - xchol) = 1.418530254611637e-7
BenchmarkTools.Trial: 251 samples with 1 evaluation.
Range ( min … max ): 19.705 ms … 24.431 ms ┊ GC ( min … max): 0.00% … 0.00%
Time ( median ): 19.895 ms ┊ GC ( median): 0.00%
Time ( mean ± σ ): 19.958 ms ± 413.344 μs ┊ GC ( mean ± σ): 0.00% ± 0.00%
▆▇▄▇ ▁▄█ ▅▃▂▁ ▁▁▂
▄▅▆▆▆▆▇▇████████ ▇████ ███▅▄▆▆▃▃▃▄▅▆▁▅▁▁▄▄▁▃▃▃▁▁▁▃▁▃▁▁▁▃▁▁▃▁▁▆ ▄
19.7 ms Histogram: frequency by time 20.4 ms <
Memory estimate: 241.64 KiB , allocs estimate: 16 .
# sparse solve via preconditioned conjugate gradient
using Preconditioners
xpcg = cg (A, b, Pl = DiagonalPreconditioner (A))
@show norm (xpcg - xchol)
@benchmark cg ($ A, $ b, Pl = $ (DiagonalPreconditioner (A)))
norm(xpcg - xchol) = 8.656816762415348e-8
BenchmarkTools.Trial: 1493 samples with 1 evaluation.
Range ( min … max ): 3.281 ms … 4.863 ms ┊ GC ( min … max): 0.00% … 0.00%
Time ( median ): 3.336 ms ┊ GC ( median): 0.00%
Time ( mean ± σ ): 3.349 ms ± 75.892 μs ┊ GC ( mean ± σ): 0.00% ± 0.00%
▁▆▆███▅ ▂▄▃▁
▂▂▂▃▃▄▄▆████████ ████ ▇▇▆▄▆▆▅▅▄▄▆▅▃▃▃▄▃▂▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▂▂ ▄
3.28 ms Histogram: frequency by time 3.49 ms <
Memory estimate: 241.64 KiB , allocs estimate: 16 .
Easy plus low rank
Easy plus low rank: \(\mathbf{U} \in \mathbb{R}^{n \times r}\) , \(\mathbf{V} \in \mathbb{R}^{r \times n}\) , \(r \ll n\) . Woodbury formula \[\begin{eqnarray*}
(\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} &=& \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1} \\
\text{det}(\mathbf{A} + \mathbf{U} \mathbf{V}^T) &=& \text{det}(\mathbf{A}) \text{det}(\mathbf{I}_r + \mathbf{V} \mathbf{A}^{-1} \mathbf{U}^T).
\end{eqnarray*}\]
using BenchmarkTools , Random , WoodburyMatrices
Random .seed! (123 )
n = 1000
r = 5
A = Diagonal (rand (n))
B = randn (n, r)
D = Diagonal (rand (r))
b = randn (n)
# Woodbury structure: W = A + B * D * B'
W = SymWoodbury (A, B, D)
Wfull = Matrix (W) # stored as a Matrix{Float64}
1000×1000 Matrix{Float64}:
4.59798 -0.281279 -2.22717 … 2.15423 0.381752 -0.563599
-0.281279 3.46848 1.39267 1.68392 -2.2848 0.956956
-2.22717 1.39267 4.81456 0.879207 -0.516142 1.95953
0.995152 0.835961 -0.198421 1.22304 -0.534476 -0.158006
-4.18483 0.83762 2.07489 -3.41192 -2.40336 0.12795
1.2769 0.450253 1.94231 … 3.05973 1.06062 1.58223
1.2693 -0.993521 -0.522544 -0.180006 -0.0885834 0.433461
0.263329 1.33478 1.84111 2.02565 -0.281757 1.85271
1.56504 -1.2659 -1.79125 -0.0459542 0.927798 -0.648945
-1.7482 0.478207 2.26841 -1.72566 -2.54041 1.36365
0.0224748 1.0181 0.580171 … 0.640478 -0.998439 -0.171846
-2.50815 1.01533 1.43186 -0.950055 -0.619325 0.162349
-0.192541 -2.14724 -2.73878 -2.82316 0.861588 -1.74536
⋮ ⋱
-1.68931 -0.0964689 0.122875 -1.87988 -0.748008 -1.02043
-0.733929 -1.38688 -1.87072 -2.05063 1.15057 -0.971421
0.124628 2.41238 1.6913 … 2.44864 -1.01345 1.0207
1.31426 -0.690013 -1.36949 0.387585 0.746292 -0.804991
-0.156504 0.385906 0.589231 1.38713 1.44466 0.62347
3.51625 -1.71361 -0.157013 3.01664 2.82686 2.12007
3.06339 -0.905018 -0.865003 1.95288 0.854144 0.166709
0.480269 -1.14852 1.09075 … 0.405336 1.02454 1.58487
0.282433 -0.537644 -0.713335 -1.08749 -0.736468 -0.090602
2.15423 1.68392 0.879207 3.87128 0.0305312 0.956922
0.381752 -2.2848 -0.516142 0.0305312 3.77871 -0.351256
-0.563599 0.956956 1.95953 0.956922 -0.351256 2.04652
# compares storage
Base .summarysize (W), Base .summarysize (Wfull)
Solve linear equation
# solve via Cholesky
@benchmark cholesky ($ (Symmetric (Wfull))) \ $ b
BenchmarkTools.Trial: 1837 samples with 1 evaluation.
Range ( min … max ): 2.235 ms … 12.204 ms ┊ GC ( min … max): 0.00% … 78.08%
Time ( median ): 2.582 ms ┊ GC ( median): 0.00%
Time ( mean ± σ ): 2.721 ms ± 975.335 μs ┊ GC ( mean ± σ): 4.71% ± 9.61%
█▅
▇▃██ ▅ ▃▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▅ █
2.23 ms Histogram: log( frequency ) by time 9.87 ms <
Memory estimate: 7.64 MiB , allocs estimate: 3 .
# solve using Woodbury formula
@benchmark $ W \ reshape ($ b, n, 1 ) # hack; need to file an issue
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
Range ( min … max ): 5.653 μs … 3.653 ms ┊ GC ( min … max): 0.00% … 99.71%
Time ( median ): 7.611 μs ┊ GC ( median): 0.00%
Time ( mean ± σ ): 9.206 μs ± 69.443 μs ┊ GC ( mean ± σ): 15.07% ± 1.99%
▁██▃
▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▃▃▄▄▃▃▄████ █▆▄▄▄▅▅▄▄▄▃▃▃▃▃▃▃▃▃▂▂▂ ▂▂▂▂▂▂▂▂▂ ▃
5.65 μs Histogram: frequency by time 9.97 μs <
Memory estimate: 32.03 KiB , allocs estimate: 8 .
Matrix-vector multiplication
# multiplication without using Woodbury structure
@benchmark $ Wfull * $ b
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range ( min … max ): 114.125 μs … 10.155 ms ┊ GC ( min … max): 0.00% … 0.00%
Time ( median ): 157.625 μs ┊ GC ( median): 0.00%
Time ( mean ± σ ): 163.001 μs ± 141.497 μs ┊ GC ( mean ± σ): 0.00% ± 0.00%
▁▂▁▁ ▃▄▇▇█ ▆▂ ▂▂▂▃▂▂ ▁▁ ▂
▃▄▄▄▃▃█████████████ ██ ▇▇███████▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇████▇██ █
114 μs Histogram: log( frequency ) by time 260 μs <
Memory estimate: 7.94 KiB , allocs estimate: 1 .
# multiplication using Woodbury structure
@benchmark $ W * $ b
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range ( min … max ): 2.190 μs … 2.203 ms ┊ GC ( min … max): 0.00% … 99.78%
Time ( median ): 2.579 μs ┊ GC ( median): 0.00%
Time ( mean ± σ ): 2.852 μs ± 22.005 μs ┊ GC ( mean ± σ): 7.71% ± 1.00%
▁▄▄▆██▇██ █▆▅▄▄▂▂▁▁▁ ▁▁▁▂▁▁▁▁▁ ▁▁ ▁ ▃
▆▆▆▆▆▇▇█████████ ██████████ ███████████▆▇▆▇▇▆██████▇▇▅▇▅▅▅▆▆ █
2.19 μs Histogram: log( frequency ) by time 3.71 μs <
Memory estimate: 7.94 KiB , allocs estimate: 1 .
Determinant
# determinant without using Woodbury structure
@benchmark det ($ Wfull)
BenchmarkTools.Trial: 1364 samples with 1 evaluation.
Range ( min … max ): 3.216 ms … 10.901 ms ┊ GC ( min … max): 0.00% … 66.40%
Time ( median ): 3.547 ms ┊ GC ( median): 0.00%
Time ( mean ± σ ): 3.665 ms ± 861.235 μs ┊ GC ( mean ± σ): 3.10% ± 8.42%
█▆
▇▁██ ▄ ▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▄▅ ▇
3.22 ms Histogram: log( frequency ) by time 10 ms <
Memory estimate: 7.64 MiB , allocs estimate: 3 .
# determinant using Woodbury structure
@benchmark det ($ W)
BenchmarkTools.Trial: 10000 samples with 197 evaluations.
Range ( min … max ): 459.391 ns … 91.162 μs ┊ GC ( min … max): 0.00% … 99.38%
Time ( median ): 484.772 ns ┊ GC ( median): 0.00%
Time ( mean ± σ ): 500.909 ns ± 906.955 ns ┊ GC ( mean ± σ): 1.81% ± 0.99%
▅█▇▇▇▆▂ ▁
▄▆▄▁▁▁▁▂▇███████ █▇▆▅▅▄▄▅▄ ▅▅▄▃▃▃▃▃▂▃▂▃▂▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁ ▃
459 ns Histogram: frequency by time 564 ns <
Memory estimate: 352 bytes , allocs estimate: 2 .
Easy plus border
Easy plus border: For \(\mathbf{A}\) pd and \(\mathbf{V}\) full row rank, \[
\begin{pmatrix}
\mathbf{A} & \mathbf{V}^T \\
\mathbf{V} & \mathbf{0}
\end{pmatrix}^{-1} = \begin{pmatrix}
\mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\
(\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1}
\end{pmatrix}.
\] Anyone interested writing a package?
Orthogonal matrix
Orthogonal \(\mathbf{A}\) : \(n^2\) flops at most . Why? Permutation matrix, Householder matrix, Jacobi matrix, … take less.
Toeplitz matrix
Toeplitz systems (constant diagonals): \[
\mathbf{T} = \begin{pmatrix}
r_0 & r_1 & r_2 & r_3 \\
r_{-1} & r_0 & r_1 & r_2 \\
r_{-2} & r_{-1} & r_0 & r_1 \\
r_{-3} & r_{-2} & r_{-1} & r_0
\end{pmatrix}.
\] \(\mathbf{T} \mathbf{x} = \mathbf{b}\) , where \(\mathbf{T}\) is pd and Toeplitz, can be solved in \(O(n^2)\) flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general \(\mathbf{b}\) ), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.
Circulant matrix
Circulant systems: Toeplitz matrix with wraparound \[
C(\mathbf{z}) = \begin{pmatrix}
z_0 & z_4 & z_3 & z_2 & z_1 \\
z_1 & z_0 & z_4 & z_3 & z_2 \\
z_2 & z_1 & z_0 & z_4 & z_3 \\
z_3 & z_2 & z_1 & z_0 & z_4 \\
z_4 & z_3 & z_2 & z_1 & z_0
\end{pmatrix},
\] FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform).
Vandermonde matrix
Vandermonde matrix: such as in interpolation and approximation problems \[
\mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix}
1 & 1 & \cdots & 1 \\
x_0 & x_1 & \cdots & x_n \\
\vdots & \vdots & & \vdots \\
x_0^n & x_1^n & \cdots & x_n^n
\end{pmatrix}.
\] \(\mathbf{V} \mathbf{x} = \mathbf{b}\) or \(\mathbf{V}^T \mathbf{x} = \mathbf{b}\) can be solved in \(O(n^2)\) flops.
Cauchy-like matrix
Cauchy-like matrices: \[
\Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T,
\] where \(\Omega = \text{diag}(\omega_1,\ldots,\omega_n)\) and \(\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)\) . \(O(n)\) flops for LU and QR.
Structured-rank matrix
Structured-rank problems: semiseparable matrices (LU and QR takes \(O(n)\) flops), quasiseparable matrices, …