Introduction to Julia

Biostat/Biomath M257

Author

Dr. Hua Zhou @ UCLA

Published

April 11, 2023

System information (for reproducibility)

versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code

Load packages:

using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
  Activating project at `~/Documents/github.com/ucla-biostat-257/2023spring/slides/03-juliaintro`
Status `~/Documents/github.com/ucla-biostat-257/2023spring/slides/03-juliaintro/Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.2
  [31c24e10] Distributions v0.25.87
  [6f49c342] RCall v0.13.14
  [37e2e46d] LinearAlgebra
  [9abbd945] Profile
  [2f01184e] SparseArrays
using BenchmarkTools, Distributions, RCall
using LinearAlgebra, Profile, SparseArrays

1 What’s Julia?

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments

  • History:

    • Project started in 2009. First public release in 2012
    • Creators: Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral Shah
    • First major release v1.0 was released on Aug 8, 2018
    • Current stable release v1.8.5 (as of Apr 6, 2023)
  • Aim to solve the notorious two language problem: Prototype code goes into high-level languages like R/Python, production code goes into low-level language like C/C++.

    Julia aims to: > Walks like Python. Runs like C.

See https://julialang.org/benchmarks/ for the details of benchmark.

  • Write high-level, abstract code that closely resembles mathematical formulas
    • yet produces fast, low-level machine code that has traditionally only been generated by static languages.
  • Julia is more than just “Fast R” or “Fast Matlab”
    • Performance comes from features that work well together.
    • You can’t just take the magic dust that makes Julia fast and sprinkle it on [language of choice]

2 Learning resources

  1. The (free) online course Introduction to Julia, by Jane Herriman.

  2. Cheat sheet: The Fast Track to Julia.

  3. Browse the Julia documentation.
    bm

  4. For Matlab users, read Noteworthy Differences From Matlab.

    For R users, read Noteworthy Differences From R.

    For Python users, read Noteworthy Differences From Python.

  5. The Learning page on Julia’s website has pointers to many other learning resources.

3 Julia REPL (Read-Evaluation-Print-Loop)

The Julia REPL, or Julia shell, has at least five modes.

  1. Default mode is the Julian prompt julia>. Type backspace in other modes to enter default mode.

  2. Help mode help?>. Type ? to enter help mode. ?search_term does a fuzzy search for search_term.

  3. Shell mode shell>. Type ; to enter shell mode.

  4. Package mode (@v1.8) pkg>. Type ] to enter package mode for managing Julia packages (install, uninstall, update, …).

  5. Search mode (reverse-i-search). Press ctrl+R to enter search model.

  6. With RCall.jl package installed, we can enter the R mode by typing $ (shift+4) at Julia REPL.

Some survival commands in Julia REPL:
1. exit() or Ctrl+D: exit Julia.

  1. Ctrl+C: interrupt execution.

  2. Ctrl+L: clear screen.

  3. Append ; (semi-colon) to suppress displaying output from a command. Same as Matlab.

  4. include("filename.jl") to source a Julia code file.

4 Seek help

5 Which IDE?

  • Julia homepage lists many choices: Juno, VS Code, Vim, …

  • Unfortunately at the moment there are no mature RStudio- or Matlab-like IDE for Julia yet.

  • For dynamic document, e.g., homework, I recommend Jupyter Notebook or JupyterLab.

  • For extensive Julia coding, myself has been happily using the editor VS Code with extensions Julia and VS Code Jupyter Notebook Previewer installed.

6 Julia package system

  • Each Julia package is a Git repository. Each Julia package name ends with .jl. E.g., Distributions.jl package lives at https://github.com/JuliaStats/Distributions.jl.
    Google search with PackageName.jl usually leads to the package on github.com.

  • The package ecosystem is rapidly maturing; a complete list of registered packages (which are required to have a certain level of testing and documentation) is at http://pkg.julialang.org/.

  • For example, the package called Distributions.jl is added with

# in Pkg mode
(@v1.8) pkg> add Distributions

and “removed” (although not completely deleted) with

# in Pkg mode
(@v1.8) pkg> rm Distributions
  • The package manager provides a dependency solver that determines which packages are actually required to be installed.

  • Non-registered packages are added by cloning the relevant Git repository. E.g.,

# in Pkg mode
(@v1.8) pkg> add https://github.com/OpenMendel/SnpArrays.jl
  • A package needs only be added once, at which point it is downloaded into your local .julia/packages directory in your home directory.
readdir(Sys.islinux() ? ENV["JULIA_PATH"] * "/pkg/packages" : ENV["HOME"] * "/.julia/packages")
419-element Vector{String}:
 "AMD"
 "ASL_jll"
 "AbstractFFTs"
 "AbstractTrees"
 "Adapt"
 "AlgebraicMultigrid"
 "Animations"
 "ArnoldiMethod"
 "Arpack"
 "Arpack_jll"
 "ArrayInterface"
 "ArrayInterfaceCore"
 "Arrow"
 ⋮
 "fzf_jll"
 "isoband_jll"
 "libaom_jll"
 "libass_jll"
 "libfdk_aac_jll"
 "libpng_jll"
 "libsixel_jll"
 "libsodium_jll"
 "libvorbis_jll"
 "x264_jll"
 "x265_jll"
 "xkbcommon_jll"
  • Directory of a specific package can be queried by pathof():
pathof(Distributions)
"/Users/huazhou/.julia/packages/Distributions/Spcmv/src/Distributions.jl"
  • If you start having problems with packages that seem to be unsolvable, you may try just deleting your .julia directory and reinstalling all your packages.

  • Periodically, one should run update in Pkg mode, which checks for, downloads and installs updated versions of all the packages you currently have installed.

  • status lists the status of all installed packages.

  • Using functions in package.

using Distributions

This pulls all of the exported functions in the module into your local namespace, as you can check using the whos() command. An alternative is

import Distributions

Now, the functions from the Distributions package are available only using

Distributions.<FUNNAME>

All functions, not only exported functions, are always available like this.

7 Calling R from Julia

  • The RCall.jl package allows us to embed R code inside of Julia.

  • There are also PyCall.jl, MATLAB.jl, JavaCall.jl, CxxWrap.jl packages for interfacing with other languages.

# x is in Julia workspace
x = randn(1000)

# $ is the interpolation operator
R"""
hist($x, main = "I'm plotting a Julia vector")
""";

R"""
library(ggplot2)

qplot($x)
"""
┌ Warning: RCall.jl: Warning: `qplot()` was deprecated in ggplot2 3.4.0.
└ @ RCall ~/.julia/packages/RCall/Wyd74/src/io.jl:172
┌ Warning: RCall.jl: `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
└ @ RCall ~/.julia/packages/RCall/Wyd74/src/io.jl:172

RObject{VecSxp}
x = R"""
rnorm(10)
"""
RObject{RealSxp}
 [1]  0.2247420  1.2273733  0.9709629 -1.1576598 -3.9420234  0.1714377
 [7] -0.3351332  1.7797962  0.6225098  0.7374851
# collect R variable into Julia workspace
y = collect(x)
10-element Vector{Float64}:
  0.22474202705810406
  1.2273733413730559
  0.9709629236192919
 -1.1576597849952914
 -3.9420234390343123
  0.1714377233212713
 -0.33513322488310543
  1.7797962093218176
  0.6225097916808172
  0.7374851278811361
  • Access Julia variables in R REPL mode:
julia> x = rand(5) # Julia variable
R> y <- $x
  • Pass Julia expression in R REPL mode:
R> y <- $(rand(5))
  • Put Julia variable into R environment:
julia> @rput x
R> x
  • Get R variable into Julia environment:
R> r <- 2
Julia> @rget r
  • If you want to call Julia within R, check out the JuliaCall package.

8 Some basic Julia code

# an integer, same as int in R
y = 1
1
# query type of a Julia object
typeof(y)
Int64
# a Float64 number, same as double in R
y = 1.0
1.0
typeof(y) 
Float64
# Greek letters:  `\pi<tab>`
π
π = 3.1415926535897...
typeof(π)
Irrational{:π}
# Greek letters:  `\theta<tab>`
θ = y + π
4.141592653589793
# emoji! `\:kissing_cat:<tab>`
😽 = 5.0
😽 + 1
6.0
# `\alpha<tab>\hat<tab>`
α̂ = π
π = 3.1415926535897...

For a list of unicode symbols that can be tab-completed, see https://docs.julialang.org/en/v1/manual/unicode-input/. Or in the help mode, type ? followed by the unicode symbol you want to input.

# vector of Float64 0s
x = zeros(5)
5-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
# vector Int64 0s
x = zeros(Int, 5)
5-element Vector{Int64}:
 0
 0
 0
 0
 0
# matrix of Float64 0s
x = zeros(5, 3)
5×3 Matrix{Float64}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
# matrix of Float64 1s
x = ones(5, 3)
5×3 Matrix{Float64}:
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
# define array without initialization
x = Matrix{Float64}(undef, 5, 3)
5×3 Matrix{Float64}:
 5.6442e-314   5.6442e-314   5.6442e-314
 2.34883e-314  5.6442e-314   5.64432e-314
 2.34883e-314  2.35609e-314  5.6442e-314
 2.35609e-314  2.34883e-314  5.64414e-314
 2.34883e-314  5.6442e-314   2.34883e-314
# fill a matrix by 0s
fill!(x, 0)
5×3 Matrix{Float64}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
# initialize an array to be constant 2.5
fill(2.5, (5, 3))
5×3 Matrix{Float64}:
 2.5  2.5  2.5
 2.5  2.5  2.5
 2.5  2.5  2.5
 2.5  2.5  2.5
 2.5  2.5  2.5
# rational number
a = 3//5
3//5
typeof(a)
Rational{Int64}
b = 3//7
3//7
a + b
36//35
# uniform [0, 1) random numbers
x = rand(5, 3)
5×3 Matrix{Float64}:
 0.322242  0.378362  0.337076
 0.146328  0.12133   0.750103
 0.426817  0.71952   0.800841
 0.166777  0.013656  0.136591
 0.985461  0.221333  0.904977
# uniform random numbers (in single precision)
x = rand(Float16, 5, 3)
5×3 Matrix{Float16}:
 0.9473   0.0542  0.4438
 0.85     0.5513  0.3623
 0.0752   0.8257  0.127
 0.7637   0.5176  0.883
 0.05664  0.1768  0.755
# random numbers from {1,...,5}
x = rand(1:5, 5, 3)
5×3 Matrix{Int64}:
 1  4  1
 1  1  3
 3  1  4
 5  3  1
 4  1  4
# standard normal random numbers
x = randn(5, 3)
5×3 Matrix{Float64}:
  0.694874   0.550705   1.8547
  1.21514   -1.24211    1.06645
  1.33659    1.77713   -0.749662
  0.57968   -0.482531   0.0723377
 -2.71671   -0.130033  -2.12771
# range
1:10
1:10
typeof(1:10)
UnitRange{Int64}
1:2:10
1:2:9
typeof(1:2:10)
StepRange{Int64, Int64}
# integers 1-10
x = collect(1:10)
10-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
# or equivalently
[1:10...]
10-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
# Float64 numbers 1-10
x = collect(1.0:10)
10-element Vector{Float64}:
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
  9.0
 10.0
# convert to a specific type
convert(Vector{Float64}, 1:10)
10-element Vector{Float64}:
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
  9.0
 10.0

9 Matrices and vectors

9.1 Dimensions

x = randn(5, 3)
5×3 Matrix{Float64}:
 -1.5906      0.656171   -0.089139
  1.14808     1.51043     1.56051
  0.0364995   0.326349   -0.768385
  0.0774726   0.0958234   0.662401
  0.138165   -0.585027    1.09452
size(x)
(5, 3)
size(x, 1) # nrow() in R
5
size(x, 2) # ncol() in R
3
# total number of elements
length(x)
15

9.2 Indexing

# 5 × 5 matrix of random Normal(0, 1)
x = randn(5, 5)
5×5 Matrix{Float64}:
 -0.968619    -0.984721   0.135944   0.100963  -0.47371
 -1.73654     -0.185011   0.773381   1.54203    0.121822
 -0.00253971   1.54365   -0.94251   -0.131932   0.650161
  1.11881      0.4261     1.22545    0.836901  -0.849982
 -0.0956107   -1.9232     0.599138   1.29311   -0.495559
# first column
x[:, 1]
5-element Vector{Float64}:
 -0.968618977805755
 -1.7365352826751739
 -0.002539710924920733
  1.1188073427240657
 -0.0956107142743949
# first row
x[1, :]
5-element Vector{Float64}:
 -0.968618977805755
 -0.9847212735235253
  0.13594383451312017
  0.1009634255812595
 -0.4737095014635574
# sub-array
x[1:2, 2:3]
2×2 Matrix{Float64}:
 -0.984721  0.135944
 -0.185011  0.773381
# getting a subset of a matrix creates a copy, but you can also create "views"
z = view(x, 1:2, 2:3)
2×2 view(::Matrix{Float64}, 1:2, 2:3) with eltype Float64:
 -0.984721  0.135944
 -0.185011  0.773381
# same as
@views z = x[1:2, 2:3]
2×2 view(::Matrix{Float64}, 1:2, 2:3) with eltype Float64:
 -0.984721  0.135944
 -0.185011  0.773381
# change in z (view) changes x as well
z[2, 2] = 0.0
x
5×5 Matrix{Float64}:
 -0.968619    -0.984721   0.135944   0.100963  -0.47371
 -1.73654     -0.185011   0.0        1.54203    0.121822
 -0.00253971   1.54365   -0.94251   -0.131932   0.650161
  1.11881      0.4261     1.22545    0.836901  -0.849982
 -0.0956107   -1.9232     0.599138   1.29311   -0.495559
# y points to same data as x
y = x
5×5 Matrix{Float64}:
 -0.968619    -0.984721   0.135944   0.100963  -0.47371
 -1.73654     -0.185011   0.0        1.54203    0.121822
 -0.00253971   1.54365   -0.94251   -0.131932   0.650161
  1.11881      0.4261     1.22545    0.836901  -0.849982
 -0.0956107   -1.9232     0.599138   1.29311   -0.495559
# x and y point to same data
pointer(x), pointer(y)
(Ptr{Float64} @0x00000002a8aed240, Ptr{Float64} @0x00000002a8aed240)
# changing y also changes x
y[:, 1] .= 0
x
5×5 Matrix{Float64}:
 0.0  -0.984721   0.135944   0.100963  -0.47371
 0.0  -0.185011   0.0        1.54203    0.121822
 0.0   1.54365   -0.94251   -0.131932   0.650161
 0.0   0.4261     1.22545    0.836901  -0.849982
 0.0  -1.9232     0.599138   1.29311   -0.495559
# create a new copy of data
z = copy(x)
5×5 Matrix{Float64}:
 0.0  -0.984721   0.135944   0.100963  -0.47371
 0.0  -0.185011   0.0        1.54203    0.121822
 0.0   1.54365   -0.94251   -0.131932   0.650161
 0.0   0.4261     1.22545    0.836901  -0.849982
 0.0  -1.9232     0.599138   1.29311   -0.495559
pointer(x), pointer(z)
(Ptr{Float64} @0x00000002a8aed240, Ptr{Float64} @0x00000002a8aee940)

9.3 Concatenate matrices

# 3-by-1 vector
[1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3
# 1-by-3 array
[1 2 3]
1×3 Matrix{Int64}:
 1  2  3
# multiple assignment by tuple
x, y, z = randn(5, 3), randn(5, 2), randn(3, 5)
([0.5306543995224662 0.6225280514660663 1.2133609465917021; 0.41657006575817723 0.22723307250962527 -0.6013668732864648; … ; 0.360471843227362 -2.903969729042324 -1.9629100092835445; -0.11438750857156317 -0.31151629821350835 0.1547439387296229], [-1.4294058388229003 -0.612772946204093; 0.4898587486724813 -1.0476567706838051; … ; 1.7391075040425321 -0.09138048776930512; -0.6356868940346705 0.07317488855899447], [1.8537699309740638 -1.2982432434937796 … 1.109301388144398 0.5917650992614132; -0.18832956147463467 0.6964635437280152 … -0.0604228120617397 1.812333003737048; -0.17667008467249978 -0.5388403172431345 … 1.2254921452083942 0.6360285170465801])
[x y] # 5-by-5 matrix
5×5 Matrix{Float64}:
  0.530654    0.622528   1.21336   -1.42941   -0.612773
  0.41657     0.227233  -0.601367   0.489859  -1.04766
  0.0139967  -1.28484   -0.163066  -1.25654    0.922533
  0.360472   -2.90397   -1.96291    1.73911   -0.0913805
 -0.114388   -0.311516   0.154744  -0.635687   0.0731749
[x y; z] # 8-by-5 matrix
8×5 Matrix{Float64}:
  0.530654    0.622528   1.21336   -1.42941    -0.612773
  0.41657     0.227233  -0.601367   0.489859   -1.04766
  0.0139967  -1.28484   -0.163066  -1.25654     0.922533
  0.360472   -2.90397   -1.96291    1.73911    -0.0913805
 -0.114388   -0.311516   0.154744  -0.635687    0.0731749
  1.85377    -1.29824    0.31308    1.1093      0.591765
 -0.18833     0.696464   0.146122  -0.0604228   1.81233
 -0.17667    -0.53884   -0.882946   1.22549     0.636029

9.4 Dot operation (broadcasting)

Dot operation in Julia is elementwise operation, similar to Matlab.

x = randn(5, 3)
5×3 Matrix{Float64}:
 -0.429393   0.998931   1.55727
  1.0829    -0.615571   0.888358
  0.367197   0.840352  -0.169557
  0.1464    -0.880181  -1.33113
 -0.780918   2.18358    0.0114738
y = ones(5, 3)
5×3 Matrix{Float64}:
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
x .* y # same x * y in R
5×3 Matrix{Float64}:
 -0.429393   0.998931   1.55727
  1.0829    -0.615571   0.888358
  0.367197   0.840352  -0.169557
  0.1464    -0.880181  -1.33113
 -0.780918   2.18358    0.0114738
x .^ (-2)
5×3 Matrix{Float64}:
  5.42364   1.00214     0.412355
  0.852754  2.63903     1.26714
  7.41653   1.41605    34.7833
 46.6573    1.29079     0.56436
  1.63979   0.20973  7595.99
sin.(x)
5×3 Matrix{Float64}:
 -0.416319   0.840893   0.999909
  0.883321  -0.577425   0.776037
  0.359001   0.744878  -0.168745
  0.145877  -0.770854  -0.971418
 -0.703932   0.81805    0.0114736

9.5 Basic linear algebra

x = randn(5)
5-element Vector{Float64}:
  1.1760304644549
  0.7777437061689307
 -1.7129081275488882
 -0.5976027966837995
 -0.18908673782377325
# vector L2 norm
norm(x)
2.305400198720293
# same as
sqrt(sum(abs2, x))
2.305400198720293
y = randn(5) # another vector
# dot product
dot(x, y) # x' * y
1.3995421365381335
# same as
x'y
1.3995421365381335
x, y = randn(5, 3), randn(3, 2)
# matrix multiplication, same as %*% in R
x * y
5×2 Matrix{Float64}:
 -0.498013   1.92185
  0.386115  -1.89236
 -1.88883   -1.10231
 -1.63281    1.73977
  2.78465   -0.930618
x = randn(3, 3)
3×3 Matrix{Float64}:
 -1.65844    0.262945  -0.974491
 -0.288683  -0.501152   1.26319
  0.542274   1.54997    1.57951
# conjugate transpose
x'
3×3 adjoint(::Matrix{Float64}) with eltype Float64:
 -1.65844   -0.288683  0.542274
  0.262945  -0.501152  1.54997
 -0.974491   1.26319   1.57951
b = rand(3)
x'b # same as x' * b
3-element Vector{Float64}:
 -0.7993221665746972
  1.216665030124029
  1.8209266280308594
# trace
tr(x)
-0.5800867393022509
det(x)
5.031071806986817
rank(x)
3

9.6 Sparse matrices

# 10-by-10 sparse matrix with sparsity 0.1
X = sprandn(10, 10, .1)
10×10 SparseMatrixCSC{Float64, Int64} with 9 stored entries:
  ⋅    ⋅   1.75062   ⋅    ⋅    1.61172     ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅   -0.799106    ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅        -1.20504    ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅          ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅          ⋅       -0.185157   ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅          ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅          ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅   1.94306   ⋅    ⋅     ⋅          ⋅       -0.753204  0.402747   ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅    1.32575     ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅          ⋅         ⋅         ⋅         ⋅ 
# dump() in Julia is like str() in R
dump(X)
SparseMatrixCSC{Float64, Int64}
  m: Int64 10
  n: Int64 10
  colptr: Array{Int64}((11,)) [1, 1, 1, 3, 3, 3, 6, 7, 9, 10, 10]
  rowval: Array{Int64}((9,)) [1, 8, 1, 2, 9, 3, 5, 8, 8]
  nzval: Array{Float64}((9,)) [1.7506183231853552, 1.9430610241472452, 1.6117170105281784, -0.7991060806252267, 1.3257450414910152, -1.2050423821162402, -0.18515702068129933, -0.7532036472736939, 0.4027467431289392]
# convert to dense matrix; be cautious when dealing with big data
Xfull = convert(Matrix{Float64}, X)
10×10 Matrix{Float64}:
 0.0  0.0  1.75062  0.0  0.0   1.61172    0.0       0.0       0.0       0.0
 0.0  0.0  0.0      0.0  0.0  -0.799106   0.0       0.0       0.0       0.0
 0.0  0.0  0.0      0.0  0.0   0.0       -1.20504   0.0       0.0       0.0
 0.0  0.0  0.0      0.0  0.0   0.0        0.0       0.0       0.0       0.0
 0.0  0.0  0.0      0.0  0.0   0.0        0.0      -0.185157  0.0       0.0
 0.0  0.0  0.0      0.0  0.0   0.0        0.0       0.0       0.0       0.0
 0.0  0.0  0.0      0.0  0.0   0.0        0.0       0.0       0.0       0.0
 0.0  0.0  1.94306  0.0  0.0   0.0        0.0      -0.753204  0.402747  0.0
 0.0  0.0  0.0      0.0  0.0   1.32575    0.0       0.0       0.0       0.0
 0.0  0.0  0.0      0.0  0.0   0.0        0.0       0.0       0.0       0.0
# convert a dense matrix to sparse matrix
sparse(Xfull)
10×10 SparseMatrixCSC{Float64, Int64} with 9 stored entries:
  ⋅    ⋅   1.75062   ⋅    ⋅    1.61172     ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅   -0.799106    ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅        -1.20504    ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅          ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅          ⋅       -0.185157   ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅          ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅          ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅   1.94306   ⋅    ⋅     ⋅          ⋅       -0.753204  0.402747   ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅    1.32575     ⋅         ⋅         ⋅         ⋅ 
  ⋅    ⋅    ⋅        ⋅    ⋅     ⋅          ⋅         ⋅         ⋅         ⋅ 
# syntax for sparse linear algebra is same as dense linear algebra
β = ones(10)
X * β
10-element Vector{Float64}:
  3.3623353337135335
 -0.7991060806252267
 -1.2050423821162402
  0.0
 -0.18515702068129933
  0.0
  0.0
  1.5926041200024907
  1.3257450414910152
  0.0
# many functions apply to sparse matrices as well
sum(X)
4.091379011784274

10 Control flow and loops

  • if-elseif-else-end
if condition1
    # do something
elseif condition2
    # do something
else
    # do something
end
  • for loop
for i in 1:10
    println(i)
end
  • Nested for loop:
for i in 1:10
    for j in 1:5
        println(i * j)
    end
end

Same as

for i in 1:10, j in 1:5
    println(i * j)
end
  • Exit loop:
for i in 1:10
    # do something
    if condition1
        break # skip remaining loop
    end
end
  • Exit iteration:
for i in 1:10
    # do something
    if condition1
        continue # skip to next iteration
    end
    # do something
end

11 Functions

  • In Julia, all arguments to functions are passed by reference, in contrast to R and Matlab.

  • Function names ending with ! indicates that function mutates at least one argument, typically the first.

sort!(x) # vs sort(x)
  • Function definition
function func(req1, req2; key1=dflt1, key2=dflt2)
    # do stuff
    return out1, out2, out3
end

Required arguments are separated with a comma and use the positional notation.
Optional arguments need a default value in the signature.
Semicolon is not required in function call.
return statement is optional.
Multiple outputs can be returned as a tuple, e.g., return out1, out2, out3.

  • Anonymous functions, e.g., x -> x^2, is commonly used in collection function or list comprehensions.
map(x -> x^2, y) # square each element in x
  • Functions can be nested:
function outerfunction()
    # do some outer stuff
    function innerfunction()
        # do inner stuff
        # can access prior outer definitions
    end
    # do more outer stuff
end
  • Functions can be vectorized using the Dot syntax:
# defined for scalar
function myfunc(x)
    return sin(x^2)
end

x = randn(5, 3)
myfunc.(x)
5×3 Matrix{Float64}:
 0.0200593   0.578907  0.447498
 0.546491    0.208195  0.444306
 0.0254301  -0.931594  0.983389
 0.98522     0.183089  0.345044
 0.398691    0.999248  0.95928
  • Collection function (think this as the series of apply functions in R).

    Apply a function to each element of a collection:

map(f, coll) # or
map(coll) do elem
    # do stuff with elem
    # must contain return
end
map(x -> sin(x^2), x)
5×3 Matrix{Float64}:
 0.0200593   0.578907  0.447498
 0.546491    0.208195  0.444306
 0.0254301  -0.931594  0.983389
 0.98522     0.183089  0.345044
 0.398691    0.999248  0.95928
map(x) do elem
    elem = elem^2
    return sin(elem)
end
5×3 Matrix{Float64}:
 0.0200593   0.578907  0.447498
 0.546491    0.208195  0.444306
 0.0254301  -0.931594  0.983389
 0.98522     0.183089  0.345044
 0.398691    0.999248  0.95928
# Mapreduce
mapreduce(x -> sin(x^2), +, x)
6.193253945555338
# same as
sum(x -> sin(x^2), x)
6.193253945555338
  • List comprehension
[sin(2i + j) for i in 1:5, j in 1:3] # similar to Python
5×3 Matrix{Float64}:
  0.14112   -0.756802  -0.958924
 -0.958924  -0.279415   0.656987
  0.656987   0.989358   0.412118
  0.412118  -0.544021  -0.99999
 -0.99999   -0.536573   0.420167

12 Type system

  • Every variable in Julia has a type.

  • When thinking about types, think about sets.

  • Everything is a subtype of the abstract type Any.

  • An abstract type defines a set of types

    • Consider types in Julia that are a Number:

  • We can explore type hierarchy with typeof(), supertype(), and subtypes().
# 1.0: double precision, 1: 64-bit integer
typeof(1.0), typeof(1)
(Float64, Int64)
supertype(Float64)
AbstractFloat
subtypes(AbstractFloat)
4-element Vector{Any}:
 BigFloat
 Float16
 Float32
 Float64
# Is Float64 a subtype of AbstractFloat?
Float64 <: AbstractFloat
true
# On 64bit machine, Int == Int64
Int == Int64
true
# convert to Float64
convert(Float64, 1)
1.0
# same as casting
Float64(1)
1.0
# Float32 vector
x = randn(Float32, 5)
5-element Vector{Float32}:
  0.6958228
  1.7147661
 -1.2331108
  2.317374
 -0.16774198
# convert to Float64
convert(Vector{Float64}, x)
5-element Vector{Float64}:
  0.6958227753639221
  1.7147661447525024
 -1.233110785484314
  2.3173739910125732
 -0.16774198412895203
# same as broadcasting (dot operatation)
Float64.(x)
5-element Vector{Float64}:
  0.6958227753639221
  1.7147661447525024
 -1.233110785484314
  2.3173739910125732
 -0.16774198412895203
# convert Float64 to Int64
convert(Int, 1.0)
1
convert(Int, 1.5) # should use round(1.5)
LoadError: InexactError: Int64(1.5)
round(Int, 1.5)
2

13 Multiple dispatch

  • Multiple dispatch lies in the core of Julia design. It allows built-in and user-defined functions to be overloaded for different combinations of argument types.

  • Let’s consider a simple “doubling” function:

g(x) = x + x
g (generic function with 1 method)
g(1.5)
3.0

This definition is too broad, since some things, e.g., strings, can’t be added

g("hello world")
LoadError: MethodError: no method matching +(::String, ::String)
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::ChainRulesCore.Tangent{P}, ::P) where P at ~/.julia/packages/ChainRulesCore/a4mIA/src/tangent_arithmetic.jl:146
  +(::ChainRulesCore.AbstractThunk, ::Any) at ~/.julia/packages/ChainRulesCore/a4mIA/src/tangent_arithmetic.jl:122
  ...
  • This definition is correct but too restrictive, since any Number can be added.
g(x::Float64) = x + x
g (generic function with 2 methods)
  • This definition will automatically work on the entire type tree above!
g(x::Number) = x + x
g (generic function with 3 methods)

This is a lot nicer than

function g(x)
    if isa(x, Number)
        return x + x
    else
        throw(ArgumentError("x should be a number"))
    end
end
  • methods(func) function display all methods defined for func.
methods(g)
# 3 methods for generic function g:
  • g(x::Float64) in Main at In[109]:1
  • g(x::Number) in Main at In[110]:1
  • g(x) in Main at In[106]:1
  • When calling a function with multiple definitions, Julia will search from the narrowest signature to the broadest signature.

  • @which func(x) marco tells which method is being used for argument signature x.

# an Int64 input
@which g(1)
g(x::Number) in Main at In[110]:1
# a Vector{Float64} input
@which g(randn(5))
g(x) in Main at In[106]:1

14 Just-in-time compilation (JIT)

Following figures are taken from Arch D. Robinson’s slides Introduction to Writing High Performance Julia.

Julia toolchain Julia toolchain
  • Julia’s efficiency results from its capability to infer the types of all variables within a function and then call LLVM compiler to generate optimized machine code at run-time.

Consider the g (doubling) function defined earlier. This function will work on any type which has a method for +.

g(2), g(2.0)
(4, 4.0)

Step 1: Parse Julia code into abstract syntax tree (AST).

@code_lowered g(2)
CodeInfo(
1 ─ %1 = x + x
└──      return %1
)

Step 2: Type inference according to input type.

# type inference for integer input
@code_warntype g(2)
MethodInstance for g(::Int64)
  from g(x::Number) in Main at In[110]:1
Arguments
  #self#::Core.Const(g)
  x::Int64
Body::Int64
1 ─ %1 = (x + x)::Int64
└──      return %1
# type inference for Float64 input
@code_warntype g(2.0)
MethodInstance for g(::Float64)
  from g(x::Float64) in Main at In[109]:1
Arguments
  #self#::Core.Const(g)
  x::Float64
Body::Float64
1 ─ %1 = (x + x)::Float64
└──      return %1

Step 3: Compile into LLVM bitcode (equivalent of R bytecode generated by the compiler package).

# LLVM bitcode for integer input
@code_llvm g(2)
;  @ In[110]:1 within `g`
define i64 @julia_g_7542(i64 signext %0) #0 {
top:
; ┌ @ int.jl:87 within `+`
   %1 = shl i64 %0, 1
; └
  ret i64 %1
}
# LLVM bitcode for Float64 input
@code_llvm g(2.0)
;  @ In[109]:1 within `g`
define double @julia_g_7565(double %0) #0 {
top:
; ┌ @ float.jl:383 within `+`
   %1 = fadd double %0, %0
; └
  ret double %1
}

We didn’t provide a type annotation. But different LLVM bitcodes were generated depending on the argument type!

In R or Python, g(2) and g(2.0) would use the same code for both.

In Julia, g(2) and g(2.0) dispatches to optimized code for Int64 and Float64, respectively.

For integer input x, LLVM compiler is smart enough to know x + x is simply shifting x by 1 bit, which is faster than addition.

  • Step 4: Lowest level is the assembly code, which is machine dependent.
# Assembly code for integer input
@code_native g(2)
    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 13, 0
    .globl  _julia_g_7581                   ; -- Begin function julia_g_7581
    .p2align    2
_julia_g_7581:                          ; @julia_g_7581
; ┌ @ In[110]:1 within `g`
    .cfi_startproc
; %bb.0:                                ; %top
; │┌ @ int.jl:87 within `+`
    lsl x0, x0, #1
; │└
    ret
    .cfi_endproc
; └
                                        ; -- End function
.subsections_via_symbols
# Assembly code for Float64 input
@code_native g(2.0)
    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 13, 0
    .globl  _julia_g_7593                   ; -- Begin function julia_g_7593
    .p2align    2
_julia_g_7593:                          ; @julia_g_7593
; ┌ @ In[109]:1 within `g`
    .cfi_startproc
; %bb.0:                                ; %top
; │┌ @ float.jl:383 within `+`
    fadd    d0, d0, d0
; │└
    ret
    .cfi_endproc
; └
                                        ; -- End function
.subsections_via_symbols

15 Profiling Julia code

Julia has several built-in tools for profiling. The @time marco outputs run time and heap allocation. Note the first call of a function incurs (substantial) compilation time.

# a function defined earlier
function tally(x::Array)
    s = zero(eltype(x))
    for v in x
        s += v
    end
    s
end

using Random
Random.seed!(257)
a = rand(20_000_000)
@time tally(a) # first run: include compile time
  0.020385 seconds (4.10 k allocations: 200.732 KiB, 10.51% compilation time)
1.000097950627383e7
@time tally(a)
  0.017411 seconds (1 allocation: 16 bytes)
1.000097950627383e7

For more robust benchmarking, the BenchmarkTools.jl package is highly recommended.

@benchmark tally($a)
BenchmarkTools.Trial: 307 samples with 1 evaluation.
 Range (minmax):  16.247 ms16.662 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     16.307 ms               GC (median):    0.00%
 Time  (mean ± σ):   16.333 ms ± 91.066 μs   GC (mean ± σ):  0.00% ± 0.00%
  ▇█▄  ▁ ▁       ▁                                             
  ██████▇█▃▇▄▄▇█▇▅▅▃▆▃▃▃▁▃▁▂▄▂▁▁▂▃▃▁▂▃▂▃▁▂▁▁▃▃▂▄▃▁▃▃▃▂▁▂▁▁▂ ▃
  16.2 ms         Histogram: frequency by time        16.6 ms <
 Memory estimate: 0 bytes, allocs estimate: 0.

The Profile module gives line by line profile results.

Profile.clear()
@profile tally(a)
Profile.print(format=:flat)
 Count  Overhead File            Line Function
 =====  ======== ====            ==== ========
     6         0 In[122]            5 tally(x::Vector{Flo...
     7         0 In[122]            6 tally(x::Vector{Flo...
     7         7 @Base/array.jl   924 getindex
     7         0 @Base/array.jl   898 iterate
     6         6 @Base/float.jl   383 +
Total snapshots: 112 (12% utilization across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task)

One can use ProfileView package for better visualization of profile data:

using ProfileView

ProfileView.view()
# check type stability
@code_warntype tally(a)
MethodInstance for tally(::Vector{Float64})
  from tally(x::Array) in Main at In[122]:2
Arguments
  #self#::Core.Const(tally)
  x::Vector{Float64}
Locals
  @_3::Union{Nothing, Tuple{Float64, Int64}}
  s::Float64
  v::Float64
Body::Float64
1 ─ %1  = Main.eltype(x)::Core.Const(Float64)
│         (s = Main.zero(%1))
│   %3  = x::Vector{Float64}
│         (@_3 = Base.iterate(%3))
│   %5  = (@_3 === nothing)::Bool
│   %6  = Base.not_int(%5)::Bool
└──       goto #4 if not %6
2 ┄ %8  = @_3::Tuple{Float64, Int64}
│         (v = Core.getfield(%8, 1))
│   %10 = Core.getfield(%8, 2)::Int64
│         (s = s + v)
│         (@_3 = Base.iterate(%3, %10))
│   %13 = (@_3 === nothing)::Bool
│   %14 = Base.not_int(%13)::Bool
└──       goto #4 if not %14
3 ─       goto #2
4 ┄       return s
# check LLVM bitcode
@code_llvm tally(a)
;  @ In[122]:2 within `tally`
define double @julia_tally_8368({}* nonnull align 16 dereferenceable(40) %0) #0 {
top:
;  @ In[122]:4 within `tally`
; ┌ @ array.jl:898 within `iterate` @ array.jl:898
; │┌ @ array.jl:215 within `length`
    %1 = bitcast {}* %0 to { i8*, i64, i16, i16, i32 }*
    %2 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %1, i64 0, i32 1
    %3 = load i64, i64* %2, align 8
; │└
; │┌ @ int.jl:487 within `<` @ int.jl:480
    %.not = icmp eq i64 %3, 0
; │└
   br i1 %.not, label %L44, label %L18

L18:                                              ; preds = %top
; │┌ @ array.jl:924 within `getindex`
    %4 = bitcast {}* %0 to double**
    %5 = load double*, double** %4, align 8
    %6 = load double, double* %5, align 8
; └└
;  @ In[122]:5 within `tally`
; ┌ @ float.jl:383 within `+`
   %7 = fadd double %6, 0.000000e+00
; └
;  @ In[122]:6 within `tally`
; ┌ @ array.jl:898 within `iterate`
; │┌ @ int.jl:487 within `<` @ int.jl:480
    %.not1317.not = icmp eq i64 %3, 1
; │└
   br i1 %.not1317.not, label %L44, label %L38

L38:                                              ; preds = %L38, %L18
   %8 = phi i64 [ %value_phi418, %L38 ], [ 1, %L18 ]
   %9 = phi double [ %13, %L38 ], [ %7, %L18 ]
   %value_phi418 = phi i64 [ %12, %L38 ], [ 2, %L18 ]
; │┌ @ array.jl:924 within `getindex`
    %10 = getelementptr inbounds double, double* %5, i64 %8
    %11 = load double, double* %10, align 8
; │└
; │┌ @ int.jl:87 within `+`
    %12 = add nuw nsw i64 %value_phi418, 1
; └└
;  @ In[122]:5 within `tally`
; ┌ @ float.jl:383 within `+`
   %13 = fadd double %11, %9
; └
;  @ In[122]:6 within `tally`
; ┌ @ array.jl:898 within `iterate`
; │┌ @ int.jl:487 within `<` @ int.jl:480
    %exitcond.not = icmp eq i64 %value_phi418, %3
; │└
   br i1 %exitcond.not, label %L44, label %L38

L44:                                              ; preds = %L38, %L18, %top
   %value_phi9 = phi double [ 0.000000e+00, %top ], [ %7, %L18 ], [ %13, %L38 ]
; └
;  @ In[122]:7 within `tally`
  ret double %value_phi9
}
@code_native tally(a)
    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 13, 0
    .globl  _julia_tally_8370               ; -- Begin function julia_tally_8370
    .p2align    2
_julia_tally_8370:                      ; @julia_tally_8370
; ┌ @ In[122]:2 within `tally`
    .cfi_startproc
; %bb.0:                                ; %top
; │ @ In[122]:4 within `tally`
; │┌ @ array.jl:898 within `iterate` @ array.jl:898
; ││┌ @ array.jl:215 within `length`
    ldr x8, [x0, #8]
; ││└
    cbz x8, LBB0_5
; %bb.1:                                ; %L18
; ││┌ @ array.jl:924 within `getindex`
    ldr x9, [x0]
    ldr d0, [x9]
    movi    d1, #0000000000000000
; │└└
; │ @ In[122]:5 within `tally`
; │┌ @ float.jl:383 within `+`
    fadd    d0, d0, d1
; │└
; │ @ In[122]:6 within `tally`
; │┌ @ array.jl:898 within `iterate`
    subs    x8, x8, #1                      ; =1
    b.eq    LBB0_4
; %bb.2:                                ; %L38.preheader
    add x9, x9, #8                      ; =8
LBB0_3:                                 ; %L38
                                        ; =>This Inner Loop Header: Depth=1
; ││┌ @ array.jl:924 within `getindex`
    ldr d1, [x9], #8
; │└└
; │ @ In[122]:5 within `tally`
; │┌ @ float.jl:383 within `+`
    fadd    d0, d1, d0
; │└
; │ @ In[122]:6 within `tally`
; │┌ @ array.jl:898 within `iterate`
; ││┌ @ int.jl:487 within `<` @ int.jl:480
    subs    x8, x8, #1                      ; =1
; ││└
    b.ne    LBB0_3
LBB0_4:                                 ; %L44
; │└
; │ @ In[122]:7 within `tally`
    ret
LBB0_5:
    movi    d0, #0000000000000000
    ret
    .cfi_endproc
; └
                                        ; -- End function
.subsections_via_symbols

Exercise: Annotate the loop in tally function by @simd and look for the difference in LLVM bitcode and machine code.

16 Memory profiling

Detailed memory profiling requires a detour. First let’s write a script bar.jl, which contains the workload function tally and a wrapper for profiling.

;cat bar.jl
using Profile

function tally(x::Array)
    s = zero(eltype(x))
    for v in x
        s += v
    end
    s
end

# call workload from wrapper to avoid misattribution bug
function wrapper()
    y = rand(10000)
    # force compilation
    println(tally(y))
    # clear allocation counters
    Profile.clear_malloc_data()
    # run compiled workload
    println(tally(y))
end

wrapper()

Next, in terminal, we run the script with --track-allocation=user option.

#;julia --track-allocation=user bar.jl

The profiler outputs a file bar.jl.51116.mem.

;cat bar.jl.51116.mem
        - using Profile
        - 
        - function tally(x::Array)
        -     s = zero(eltype(x))
        -     for v in x
        -         s += v
        -     end
        -     s
        - end
        - 
        - # call workload from wrapper to avoid misattribution bug
        - function wrapper()
        0     y = rand(10000)
        -     # force compilation
        0     println(tally(y))
        -     # clear allocation counters
        0     Profile.clear_malloc_data()
        -     # run compiled workload
      528     println(tally(y))
        - end
        - 
        - wrapper()
        -