Activating project at `~/Documents/github.com/ucla-biostat-257/2023spring/slides/15-linreg`
Status `~/Documents/github.com/ucla-biostat-257/2023spring/slides/15-linreg/Project.toml`
[6e4b80f9] BenchmarkTools v1.3.2
[7522ee7d] SweepOperator v0.3.3
[37e2e46d] LinearAlgebra
1 Comparing methods for linear regression
Methods for solving linear regression \(\widehat \beta = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}\):
Method
Flops
Remarks
Software
Stability
Sweep
\(np^2 + p^3\)
\((X^TX)^{-1}\) available
SAS
less stable
Cholesky
\(np^2 + p^3/3\)
less stable
QR by Householder
\(2np^2 - (2/3)p^3\)
R
stable
QR by MGS
\(2np^2\)
\(Q_1\) available
stable
QR by SVD
\(4n^2p + 8np^2 + 9p^3\)
\(X = UDV^T\)
most stable
Remarks:
When \(n \gg p\), sweep and Cholesky are twice faster than QR and need less space.
Sweep and Cholesky are based on the Gram matrix\(\mathbf{X}^T \mathbf{X}\), which can be dynamically updated with incoming data. They can handle huge \(n\), moderate \(p\) data sets that cannot fit into memory.
QR methods are more stable and produce numerically more accurate solution.
Although sweep is slower than Cholesky, it yields standard errors and so on.
MGS appears slower than Householder, but it yields \(\mathbf{Q}_1\).
There is simply no such thing as a universal ‘gold standard’ when it comes to algorithms.
usingRandomRandom.seed!(123) # seedn, p =10, 3X =randn(n, p)y =randn(n)# check these methods give same answer@showlinreg_cholesky(y, X)@showlinreg_qr(y, X)@showlinreg_sweep(y, X)@showlinreg_svd(y, X);