Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments
History:
Project started in 2009. First public release in 2012
Creators: Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral Shah
First major release v1.0 was released on Aug 8, 2018
Current stable release v1.8.5 (as of Apr 6, 2023)
Aim to solve the notorious two language problem: Prototype code goes into high-level languages like R/Python, production code goes into low-level language like C/C++.
For extensive Julia coding, myself has been happily using the editor VS Code with extensions Julia and VS Code Jupyter Notebook Previewer installed.
6 Julia package system
Each Julia package is a Git repository. Each Julia package name ends with .jl. E.g., Distributions.jl package lives at https://github.com/JuliaStats/Distributions.jl.
Google search with PackageName.jl usually leads to the package on github.com.
The package ecosystem is rapidly maturing; a complete list of registered packages (which are required to have a certain level of testing and documentation) is at http://pkg.julialang.org/.
For example, the package called Distributions.jl is added with
# in Pkg mode(@v1.8) pkg> add Distributions
and “removed” (although not completely deleted) with
# in Pkg mode(@v1.8) pkg> rm Distributions
The package manager provides a dependency solver that determines which packages are actually required to be installed.
Non-registered packages are added by cloning the relevant Git repository. E.g.,
# in Pkg mode(@v1.8) pkg> add https://github.com/OpenMendel/SnpArrays.jl
A package needs only be added once, at which point it is downloaded into your local .julia/packages directory in your home directory.
If you start having problems with packages that seem to be unsolvable, you may try just deleting your .julia directory and reinstalling all your packages.
Periodically, one should run update in Pkg mode, which checks for, downloads and installs updated versions of all the packages you currently have installed.
status lists the status of all installed packages.
Using functions in package.
usingDistributions
This pulls all of the exported functions in the module into your local namespace, as you can check using the whos() command. An alternative is
importDistributions
Now, the functions from the Distributions package are available only using
Distributions.<FUNNAME>
All functions, not only exported functions, are always available like this.
7 Calling R from Julia
The RCall.jl package allows us to embed R code inside of Julia.
There are also PyCall.jl, MATLAB.jl, JavaCall.jl, CxxWrap.jl packages for interfacing with other languages.
# x is in Julia workspacex =randn(1000)# $ is the interpolation operatorR"""hist($x, main = "I'm plotting a Julia vector")""";
R"""library(ggplot2)qplot($x)"""
┌ Warning: RCall.jl: Warning: `qplot()` was deprecated in ggplot2 3.4.0.
└ @ RCall ~/.julia/packages/RCall/Wyd74/src/io.jl:172
┌ Warning: RCall.jl: `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
└ @ RCall ~/.julia/packages/RCall/Wyd74/src/io.jl:172
# many functions apply to sparse matrices as wellsum(X)
4.091379011784274
10 Control flow and loops
if-elseif-else-end
if condition1# do somethingelseif condition2# do somethingelse# do somethingend
for loop
for i in1:10println(i)end
Nested for loop:
for i in1:10for j in1:5println(i * j)endend
Same as
for i in1:10, j in1:5println(i * j)end
Exit loop:
for i in1:10# do somethingif condition1break# skip remaining loopendend
Exit iteration:
for i in1:10# do somethingif condition1continue# skip to next iterationend# do somethingend
11 Functions
In Julia, all arguments to functions are passed by reference, in contrast to R and Matlab.
Function names ending with ! indicates that function mutates at least one argument, typically the first.
sort!(x) # vs sort(x)
Function definition
functionfunc(req1, req2; key1=dflt1, key2=dflt2)# do stuffreturn out1, out2, out3end
Required arguments are separated with a comma and use the positional notation. Optional arguments need a default value in the signature. Semicolon is not required in function call. return statement is optional.
Multiple outputs can be returned as a tuple, e.g., return out1, out2, out3.
Anonymous functions, e.g., x -> x^2, is commonly used in collection function or list comprehensions.
map(x -> x^2, y) # square each element in x
Functions can be nested:
functionouterfunction()# do some outer stufffunctioninnerfunction()# do inner stuff# can access prior outer definitionsend# do more outer stuffend
Functions can be vectorized using the Dot syntax:
# defined for scalarfunctionmyfunc(x)returnsin(x^2)endx =randn(5, 3)myfunc.(x)
Multiple dispatch lies in the core of Julia design. It allows built-in and user-defined functions to be overloaded for different combinations of argument types.
Let’s consider a simple “doubling” function:
g(x) = x + x
g (generic function with 1 method)
g(1.5)
3.0
This definition is too broad, since some things, e.g., strings, can’t be added
g("hello world")
LoadError: MethodError: no method matching +(::String, ::String)
[0mClosest candidates are:
[0m +(::Any, ::Any, [91m::Any[39m, [91m::Any...[39m) at operators.jl:591
[0m +([91m::ChainRulesCore.Tangent{P}[39m, ::P) where P at ~/.julia/packages/ChainRulesCore/a4mIA/src/tangent_arithmetic.jl:146
[0m +([91m::ChainRulesCore.AbstractThunk[39m, ::Any) at ~/.julia/packages/ChainRulesCore/a4mIA/src/tangent_arithmetic.jl:122
[0m ...
This definition is correct but too restrictive, since any Number can be added.
g(x::Float64) = x + x
g (generic function with 2 methods)
This definition will automatically work on the entire type tree above!
g(x::Number) = x + x
g (generic function with 3 methods)
This is a lot nicer than
functiong(x)ifisa(x, Number)return x + xelsethrow(ArgumentError("x should be a number"))endend
methods(func) function display all methods defined for func.
methods(g)
# 3 methods for generic function g:
g(x::Float64) in Main at In[109]:1
g(x::Number) in Main at In[110]:1
g(x) in Main at In[106]:1
When calling a function with multiple definitions, Julia will search from the narrowest signature to the broadest signature.
@which func(x) marco tells which method is being used for argument signature x.
Julia’s efficiency results from its capability to infer the types of all variables within a function and then call LLVM compiler to generate optimized machine code at run-time.
Consider the g (doubling) function defined earlier. This function will work on any type which has a method for +.
# type inference for integer input@code_warntypeg(2)
MethodInstance for g(::Int64)
from g(x::Number) in Main at In[110]:1
Arguments
#self#::Core.Const(g)
x::Int64
Body::Int64
1 ─ %1 = (x + x)::Int64
└── return %1
# type inference for Float64 input@code_warntypeg(2.0)
MethodInstance for g(::Float64)
from g(x::Float64) in Main at In[109]:1
Arguments
#self#::Core.Const(g)
x::Float64
Body::Float64
1 ─ %1 = (x + x)::Float64
└── return %1
Step 3: Compile into LLVM bitcode (equivalent of R bytecode generated by the compiler package).
; @ In[109]:1 within `g`
define double @julia_g_7565(double %0) #0 {
top:
; ┌ @ float.jl:383 within `+`
%1 = fadd double %0, %0
; └
ret double %1
}
We didn’t provide a type annotation. But different LLVM bitcodes were generated depending on the argument type!
In R or Python, g(2) and g(2.0) would use the same code for both.
In Julia, g(2) and g(2.0) dispatches to optimized code for Int64 and Float64, respectively.
For integer input x, LLVM compiler is smart enough to know x + x is simply shifting x by 1 bit, which is faster than addition.
Step 4: Lowest level is the assembly code, which is machine dependent.
# Assembly code for integer input@code_nativeg(2)
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 13, 0
.globl _julia_g_7581 ; -- Begin function julia_g_7581
.p2align 2
_julia_g_7581: ; @julia_g_7581
; ┌ @ In[110]:1 within `g`
.cfi_startproc
; %bb.0: ; %top
; │┌ @ int.jl:87 within `+`
lsl x0, x0, #1
; │└
ret
.cfi_endproc
; └
; -- End function
.subsections_via_symbols
# Assembly code for Float64 input@code_nativeg(2.0)
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 13, 0
.globl _julia_g_7593 ; -- Begin function julia_g_7593
.p2align 2
_julia_g_7593: ; @julia_g_7593
; ┌ @ In[109]:1 within `g`
.cfi_startproc
; %bb.0: ; %top
; │┌ @ float.jl:383 within `+`
fadd d0, d0, d0
; │└
ret
.cfi_endproc
; └
; -- End function
.subsections_via_symbols
15 Profiling Julia code
Julia has several built-in tools for profiling. The @time marco outputs run time and heap allocation. Note the first call of a function incurs (substantial) compilation time.
# a function defined earlierfunctiontally(x::Array) s =zero(eltype(x))for v in x s += vend sendusingRandomRandom.seed!(257)a =rand(20_000_000)@timetally(a) # first run: include compile time
0.020385 seconds (4.10 k allocations: 200.732 KiB, 10.51% compilation time)
1.000097950627383e7
@timetally(a)
0.017411 seconds (1 allocation: 16 bytes)
1.000097950627383e7
For more robust benchmarking, the BenchmarkTools.jl package is highly recommended.
@benchmarktally($a)
BenchmarkTools.Trial: 307 samples with 1 evaluation.
Range (min … max): 16.247 ms … 16.662 ms┊ GC (min … max): 0.00% … 0.00%
Time (median): 16.307 ms ┊ GC (median): 0.00%
Time (mean ± σ): 16.333 ms ± 91.066 μs┊ GC (mean ± σ): 0.00% ± 0.00%
▇█▄ ▁ ▁ ▁
██████▇█▃▆▇▄▄▆▇█▇▅▅▃▆▃▃▃▁▃▁▂▄▂▁▁▂▃▃▁▂▃▂▃▁▂▁▁▃▃▂▄▃▁▃▃▃▂▁▂▁▁▂ ▃
16.2 ms Histogram: frequency by time 16.6 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
The Profile module gives line by line profile results.
Exercise: Annotate the loop in tally function by @simd and look for the difference in LLVM bitcode and machine code.
16 Memory profiling
Detailed memory profiling requires a detour. First let’s write a script bar.jl, which contains the workload function tally and a wrapper for profiling.
;cat bar.jl
using Profile
function tally(x::Array)
s = zero(eltype(x))
for v in x
s += v
end
s
end
# call workload from wrapper to avoid misattribution bug
function wrapper()
y = rand(10000)
# force compilation
println(tally(y))
# clear allocation counters
Profile.clear_malloc_data()
# run compiled workload
println(tally(y))
end
wrapper()
Next, in terminal, we run the script with --track-allocation=user option.
#;julia --track-allocation=user bar.jl
The profiler outputs a file bar.jl.51116.mem.
;cat bar.jl.51116.mem
- using Profile
-
- function tally(x::Array)
- s = zero(eltype(x))
- for v in x
- s += v
- end
- s
- end
-
- # call workload from wrapper to avoid misattribution bug
- function wrapper()
0 y = rand(10000)
- # force compilation
0 println(tally(y))
- # clear allocation counters
0 Profile.clear_malloc_data()
- # run compiled workload
528 println(tally(y))
- end
-
- wrapper()
-