BIOSTAT 257: Statistical Computing¶

Tue/Thu 1pm-2:50pm @ CHS 33-105A
Instructor: Dr. Hua Zhou, huazhou@ucla.edu

What is statistics?¶

Statistics, the science of data analysis, is the applied mathematics in the 21st century.
People (scientists, goverment, health professionals, companies) collect data in order to answer certain questions. Statisticians's job is to help them extract knowledge and insights from data.
Must-read for (bio)statistics students:
- 50 years of data sicence, by David Donoho.
If existing software tools readily solve the problem, all the better.
Often statisticians need to implement their own methods, test new algorithms, or tailor classical methods to new types of data (big, streaming).
This entails at least two essential skills: programming and fundamental knowledge of algorithms.

What is this course about?¶

Not a course on statistical packages. It does not answer questions such as How to fit a linear mixed model in R, Julia, SAS, SPSS, or Stata?
Not a pure programming course, although programming is important and we do homework in Julia.
BIOSTAT 203A (Data Management) in fall quarter focuses on programming in R and SAS.
Not a course on data science. BIOSTAT 203B (Introduction to Data Science) in winter quarter focuses on some software tools for data scientists.
This course focuses on algorithms, mostly those in numerical linear algebra and numerical optimization.

Learning objectives¶

Be highly appreciative of this quote by James Gentle

The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.

Examples: $\mathbf{X}^T \mathbf{W} \mathbf{X}$, $\operatorname{tr} (\mathbf{A} \mathbf{B})$, $\operatorname{diag}(\mathbf{A} \mathbf{B})$, multivariate normal density,...
Become memory-conscious. You care about looping order. You do benchmarking on hot functions fanatically to make sure it's not allocating.

Image source
No inversion mentality. Whenever you see a matrix inverse in mathematical expression, your brain reacts with matrix decomposition, iterative solvers, etc. For R users, that means you almost never use the solve(M) function to obtain inverse of a matrix $\boldsymbol{M}$.

Examples: $(\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$, $\mathbf{y}^T \boldsymbol{\Sigma}^{-1} \mathbf{y}$, Newton-Raphson algorithm, ...

Image source
Know some basic strategies to solve big data problems.

Examples: how Google solve the PageRank problem with $10^{9}$ webpages, linear regression with $10^7$ observations, etc.
No afraid of optimizations and treat it as a technology. Be able to recognize some major optimization classes and choose the best solver(s) correspondingly.
Be immune to the language fight.

Course logistics¶

Course webpage: https://ucla-biostat-257.github.io/2022spring or http://ucla-biostat-257.com.
Syllabus.
Check the Schedule page frequently.
Jupyter notebooks will be posted/updated before each lecture.

How to get started¶

All course materials are in GitHub repo https://github.com/ucla-biostat-257/2022spring. Lecture notes are Jupyter Notebooks (.ipynb files) under the slides folder. It is a good idea to learn by running through the code examples. You can do this in several ways.

Run Jupyter Notebook in Binder¶

A quick and easy way to run the Jupyter Notebooks is Binder, a free service that allows us to run Jupyter Notebooks in cloud. Simply follow the Binder link at https://ucla-biostat-257.github.io/2022spring/schedule/schedule.html.

If you want the JupyterLab interface, replace the tree by lab in the URL.

Run Jupyter Notebook locally on your own computer¶

Download and install Julia v1.7.2 from https://julialang.org/downloads/. On Mac, use Bash command
```
sudo ln -s /Applications/Julia-1.7.app/Contents/Resources/julia/bin/julia /usr/local/bin/julia
```
to create a symbolic link so julia command is available anywhere in the terminal.
Git clone the course material.
```
git clone https://github.com/ucla-biostat-257/2022spring.git biostat-257-2022spring
```
You can change biostat-257-2022spring to another directory name you prefer.
Enter the folder biostat-257-2022spring on Terminal.
Open Julia within that folder, type ] to enter the package mode, then type
```
activate .
instantiate
```
to install necessary packages.

In Julia REPL, type

using IJulia
jupyterlab(dir = pwd())

to open the JupyterLab in browser or

using IJulia
notebook(dir = pwd())

to open a Jupyter notebook.

Course material is updated frequently. Remember to git pull to obtain the most recent material.