Summary of linear regression

In [1]:
versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

Methods for solving linear regression $\widehat \beta = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$:

Method Flops Remarks Software Stability
Sweep $np^2 + p^3$ $(X^TX)^{-1}$ available SAS less stable
Cholesky $np^2 + p^3/3$ less stable
QR by Householder $2np^2 - (2/3)p^3$ R stable
QR by MGS $2np^2$ $Q_1$ available stable
QR by SVD $4n^2p + 8np^2 + 9p^3$ $X = UDV^T$ most stable

Remarks:

  1. When $n \gg p$, sweep and Cholesky are twice faster than QR and need less space.
  2. Sweep and Cholesky are based on the Gram matrix $\mathbf{X}^T \mathbf{X}$, which can be dynamically updated with incoming data. They can handle huge $n$, moderate $p$ data sets that cannot fit into memory.
  3. QR methods are more stable and produce numerically more accurate solution.
  4. Although sweep is slower than Cholesky, it yields standard errors and so on.
  5. MGS appears slower than Householder, but it yields $\mathbf{Q}_1$.

There is simply no such thing as a universal 'gold standard' when it comes to algorithms.

In [2]:
using SweepOperator, BenchmarkTools, LinearAlgebra

linreg_cholesky(y::Vector, X::Matrix) = cholesky!(X'X) \ (X'y)

linreg_qr(y::Vector, X::Matrix) = X \ y

function linreg_sweep(y::Vector, X::Matrix)
    p = size(X, 2)
    xy = [X y]
    tableau = xy'xy
    sweep!(tableau, 1:p)
    return tableau[1:p, end]
end

function linreg_svd(y::Vector, X::Matrix)
    xsvd = svd(X)
    return xsvd.V * ((xsvd.U'y) ./ xsvd.S)
end
Out[2]:
linreg_svd (generic function with 1 method)
In [3]:
using Random

Random.seed!(123) # seed

n, p = 10, 3
X = randn(n, p)
y = randn(n)

# check these methods give same answer
@show linreg_cholesky(y, X)
@show linreg_qr(y, X)
@show linreg_sweep(y, X)
@show linreg_svd(y, X);
linreg_cholesky(y, X) = [-0.07196570434574734, -0.1357569945585938, -0.1882019968945651]
linreg_qr(y, X) = [-0.0719657043457473, -0.13575699455859397, -0.18820199689456493]
linreg_sweep(y, X) = [-0.07196570434574734, -0.1357569945585939, -0.188201996894565]
linreg_svd(y, X) = [-0.07196570434574728, -0.13575699455859372, -0.188201996894565]
In [4]:
n, p = 1000, 300
X = randn(n, p)
y = randn(n)

@benchmark linreg_cholesky(y, X)
Out[4]:
BenchmarkTools.Trial: 2871 samples with 1 evaluation.
 Range (minmax):  1.284 ms  5.723 ms   GC (min … max): 0.00% … 58.60%
 Time  (median):     1.651 ms                GC (median):    0.00%
 Time  (mean ± σ):   1.730 ms ± 477.394 μs   GC (mean ± σ):  2.64% ±  7.68%

  ▂    █                                                     
  █▄▃▃▆█▆▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▂▂▁▁▂▁▁▂▁▁▂▁▁▁▁▂▁▂▂▂ ▃
  1.28 ms         Histogram: frequency by time        4.64 ms <

 Memory estimate: 708.22 KiB, allocs estimate: 6.
In [5]:
@benchmark linreg_sweep(y, X)
Out[5]:
BenchmarkTools.Trial: 399 samples with 1 evaluation.
 Range (minmax):   9.087 ms65.345 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     11.367 ms               GC (median):    0.00%
 Time  (mean ± σ):   12.506 ms ±  4.600 ms   GC (mean ± σ):  1.90% ± 5.57%

    ▅▃█▆▄▁                                                    
  ▃███████▅▄▃▄▄▄▄▃▁▃▃▂▃▃▃▃▂▃▃▂▂▃▁▂▂▁▁▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▃
  9.09 ms         Histogram: frequency by time        31.4 ms <

 Memory estimate: 2.99 MiB, allocs estimate: 6.
In [6]:
@benchmark linreg_qr(y, X)
Out[6]:
BenchmarkTools.Trial: 430 samples with 1 evaluation.
 Range (minmax):   8.977 ms24.742 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     11.173 ms               GC (median):    0.00%
 Time  (mean ± σ):   11.625 ms ±  2.096 ms   GC (mean ± σ):  3.86% ± 9.02%

       ▃█▆▃▅▃ ▂▃▁ ▃▂                                          
  ▃▃▂▅▅██████▇█████▇▆▄▅▄▄▃▃▃▃▃▁▃▂▃▁▃▃▄▂▂▁▂▃▃▂▁▃▃▃▃▁▁▃▁▁▂▂▂▃ ▄
  8.98 ms         Histogram: frequency by time        18.7 ms <

 Memory estimate: 3.95 MiB, allocs estimate: 1828.
In [7]:
@benchmark linreg_svd(y, X)
Out[7]:
BenchmarkTools.Trial: 153 samples with 1 evaluation.
 Range (minmax):  26.924 ms44.321 ms   GC (min … max): 0.00% … 7.95%
 Time  (median):     32.201 ms               GC (median):    0.00%
 Time  (mean ± σ):   32.728 ms ±  3.975 ms   GC (mean ± σ):  1.67% ± 3.28%

       ▁▆█▃▁ ▁                                                 
  ▃▁▃▃▇█████▆█▃▆▄▆▃▄▃▄▇▃▇▇▇▆▄█▇▄▆▃▇▁▄▄▄▃▄▃▃▆▄▁▁▄▃▃▇▃▃▁▁▃▁▄ ▃
  26.9 ms         Histogram: frequency by time        41.8 ms <

 Memory estimate: 8.06 MiB, allocs estimate: 14.