Introduction

Author

Phillip Alday

MixedModels.jl is part of the JuliaStats ecosystem and so shares a number of interface and design elements with GLM.jl. MixedModels.jl can also be viewed as the next step in the research programme behind the R package lme4 (and further back, nlme). The focus of development is on linear mixed effects models with unconstrained covariance matrices, with a secondary focus on generalized linear mixed effects models. We’ll come back to this focus later, when we discuss limitations of the software. For now, let us start off with a simple mixed model.

A first model

MixedModels.jl ships with a number of sample and testing datasets, with varying levels of documentation.

using MixedModels
MixedModels.datasets()

17-element Vector{String}:
 "cbpp"
 "contra"
 "d3"
 "dyestuff"
 "dyestuff2"
 "grouseticks"
 "insteval"
 "kb07"
 "machines"
 "ml1m"
 "mmec"
 "mrk17_exp1"
 "oxide"
 "pastes"
 "penicillin"
 "sleepstudy"
 "verbagg"

Let’s take a look at insteval, which is the same dataset documented in lme4.

insteval = MixedModels.dataset("insteval")

Arrow.Table with 73421 rows, 7 columns, and schema:
 :s        String
 :d        String
 :dept     String
 :studage  String
 :lectage  String
 :service  String
 :y        Int8

s: individual students (2972 values)
d: individual instructors (from Dozent, 2160 values)
studage: student “age” measured in their current semester number
lectage: lecture “age”, measuring how many semesters back the lecture rated had taken place (this was part of a retrospective study, so some ratings were from a few years back)
service: whether or not the lecture is held as a “service” in a different department
dept: department (15 unique values)

fm1 = fit(MixedModel,
          @formula(y ~ 1 + studage + lectage + service + (1|s) + (1|d) + (1|dept)),
          insteval; progress=false)

	Est.	SE	z	p	σ_s	σ_d	σ_dept
(Intercept)	3.2908	0.0324	101.45	<1e-99	0.3264	0.5106	0.0787
studage: 4	0.0519	0.0232	2.24	0.0249
studage: 6	0.0721	0.0240	3.01	0.0026
studage: 8	0.1363	0.0264	5.17	<1e-06
lectage: 2	-0.0808	0.0154	-5.25	<1e-06
lectage: 3	-0.1102	0.0167	-6.59	<1e-10
lectage: 4	-0.1892	0.0196	-9.65	<1e-21
lectage: 5	-0.1644	0.0214	-7.68	<1e-13
lectage: 6	-0.2460	0.0205	-12.01	<1e-32
service: Y	-0.0727	0.0135	-5.40	<1e-07
Residual	1.1762

Note that we don’t need to convert the sample dataset into a DataFrame: there is a standard for tabular data in Julia and MixedModels.jl can consume any table meeting that standard. This can be very useful for large datasets, when having an additional copy of the data in memory can be costly, and also allows for using memory mapped tabular structures.

For display in the Quarto notebook, we set progress=false, but the progress meter defaults to enabled. For a given model, you may nonetheless not see the progress meter if the model is finished fitting before the first progress update would be delivered.

Examining model output

We note that MixedModels.jl has a different display in the REPL than it does in an Markdown/HTML/LaTeX document. Packages are free to define show methods for displaying their output different for different output venues (i.e. for different MIME types). We can force the plaintext output with println:

println(fm1)

Linear mixed model fit by maximum likelihood
 y ~ 1 + studage + lectage + service + (1 | s) + (1 | d) + (1 | dept)
    logLik     -2 logLik       AIC         AICc          BIC     
 -118776.9929  237553.9858  237581.9858  237581.9915  237710.8413

Variance components:
            Column    Variance  Std.Dev. 
s        (Intercept)  0.1065189 0.3263723
d        (Intercept)  0.2607185 0.5106060
dept     (Intercept)  0.0061926 0.0786929
Residual              1.3833370 1.1761535
 Number of obs: 73421; levels of grouping factors: 2972, 1128, 14

  Fixed-effects parameters:
─────────────────────────────────────────────────────
                  Coef.  Std. Error       z  Pr(>|z|)
─────────────────────────────────────────────────────
(Intercept)   3.29078     0.0324363  101.45    <1e-99
studage: 4    0.0519478   0.0231671    2.24    0.0249
studage: 6    0.0721226   0.0239627    3.01    0.0026
studage: 8    0.136285    0.0263767    5.17    <1e-06
lectage: 2   -0.080753    0.0153803   -5.25    <1e-06
lectage: 3   -0.110187    0.0167235   -6.59    <1e-10
lectage: 4   -0.189168    0.0196016   -9.65    <1e-21
lectage: 5   -0.16443     0.0214044   -7.68    <1e-13
lectage: 6   -0.245996    0.020484   -12.01    <1e-32
service: Y   -0.0727167   0.0134729   -5.40    <1e-07
─────────────────────────────────────────────────────

The default pretty printing method for HTML output shows less information than the default output in the REPL, but represents a compact way to display many elements in a single table. We can also extract the individual elements:

println(coeftable(fm1)) # the fixed effects

─────────────────────────────────────────────────────
                  Coef.  Std. Error       z  Pr(>|z|)
─────────────────────────────────────────────────────
(Intercept)   3.29078     0.0324363  101.45    <1e-99
studage: 4    0.0519478   0.0231671    2.24    0.0249
studage: 6    0.0721226   0.0239627    3.01    0.0026
studage: 8    0.136285    0.0263767    5.17    <1e-06
lectage: 2   -0.080753    0.0153803   -5.25    <1e-06
lectage: 3   -0.110187    0.0167235   -6.59    <1e-10
lectage: 4   -0.189168    0.0196016   -9.65    <1e-21
lectage: 5   -0.16443     0.0214044   -7.68    <1e-13
lectage: 6   -0.245996    0.020484   -12.01    <1e-32
service: Y   -0.0727167   0.0134729   -5.40    <1e-07
─────────────────────────────────────────────────────

println(VarCorr(fm1)) # the variance-covariance, i.e. random effect, estimates

Variance components:
            Column    Variance  Std.Dev. 
s        (Intercept)  0.1065189 0.3263723
d        (Intercept)  0.2607185 0.5106060
dept     (Intercept)  0.0061926 0.0786929
Residual              1.3833370 1.1761535

Notably, each of these components also has a pretty printing method defined for rich displays.

Optimization results

For more technically oriented users and debugging problematic models, the fitted model also includes information about its fit:

fm1.optsum


Initialization
Initial parameter vector	[1.0, 1.0, 1.0]
Initial objective value	241905.85361894374
Optimizer settings
Optimizer (from NLopt)	`LN_BOBYQA`
Lower bounds	[0.0, 0.0, 0.0]
`ftol_rel`	1.0e-12
`ftol_abs`	1.0e-8
`xtol_rel`	0.0
`xtol_abs`	[1.0e-10, 1.0e-10, 1.0e-10]
`initial_step`	[0.75, 0.75, 0.75]
`maxfeval`	-1
`maxtime`	-1.0
Result
Function evaluations	97
Final parameter vector	[0.2775, 0.4341, 0.0669]
Final objective value	237553.9858
Return code	`FTOL_REACHED`

Best linear unbiased predictions

We can also examine the best linear unbiased predictions (BLUPS, i.e. conditional modes) of the fitted model:

# gives a compact mathematical representation
ranef(fm1)

3-element Vector{Matrix{Float64}}:
 [0.16774140235737575 -0.04482701285162444 … 0.1382442938500027 0.2650774864010803]
 [0.3871977014122401 -0.4543515141805431 … 0.5663070246200219 -0.2322550813384579]
 [0.01925209823588375 -0.03142469083851408 … -0.03943347505732836 0.010869167979893487]

# gives a NamedTuple of tables for each grouping variable
re = raneftables(fm1)

(s = @NamedTuple{s::String, (Intercept)::Float64}[(s = "S0001", var"(Intercept)" = 0.16774140235737575), (s = "S0002", var"(Intercept)" = -0.04482701285162444), (s = "S0003", var"(Intercept)" = 0.31575935321258014), (s = "S0004", var"(Intercept)" = 0.24940732776784375), (s = "S0005", var"(Intercept)" = 0.050863352996870996), (s = "S0006", var"(Intercept)" = 0.10070590858725184), (s = "S0007", var"(Intercept)" = 0.49346379654842415), (s = "S0008", var"(Intercept)" = 0.21230071437874554), (s = "S0009", var"(Intercept)" = 0.4332979595948472), (s = "S0010", var"(Intercept)" = 0.34743864199091984)  …  (s = "S2963", var"(Intercept)" = -0.0749327557261207), (s = "S2964", var"(Intercept)" = -0.07639856541116313), (s = "S2965", var"(Intercept)" = -0.012697609893877363), (s = "S2966", var"(Intercept)" = -0.059683628597548756), (s = "S2967", var"(Intercept)" = -0.4776209481575346), (s = "S2968", var"(Intercept)" = -0.038080766921188454), (s = "S2969", var"(Intercept)" = 0.013164291227622239), (s = "S2970", var"(Intercept)" = 0.15284748072859305), (s = "S2971", var"(Intercept)" = 0.1382442938500027), (s = "S2972", var"(Intercept)" = 0.2650774864010803)], d = @NamedTuple{d::String, (Intercept)::Float64}[(d = "I0001", var"(Intercept)" = 0.3871977014122401), (d = "I0006", var"(Intercept)" = -0.4543515141805431), (d = "I0007", var"(Intercept)" = 0.6941271319185779), (d = "I0008", var"(Intercept)" = -0.6556985316701274), (d = "I0012", var"(Intercept)" = 0.2232270929725692), (d = "I0013", var"(Intercept)" = 0.35851765095513616), (d = "I0014", var"(Intercept)" = 0.36727982802972875), (d = "I0015", var"(Intercept)" = -0.37841262336526776), (d = "I0017", var"(Intercept)" = 0.4490431092879482), (d = "I0018", var"(Intercept)" = -0.2130290212029939)  …  (d = "I2143", var"(Intercept)" = 0.46453563052639246), (d = "I2145", var"(Intercept)" = -0.5103159237956026), (d = "I2146", var"(Intercept)" = 0.24093840057454008), (d = "I2147", var"(Intercept)" = -0.609469079727866), (d = "I2149", var"(Intercept)" = -0.2569369866698443), (d = "I2152", var"(Intercept)" = -0.14863445582416115), (d = "I2153", var"(Intercept)" = -0.45824502947271933), (d = "I2156", var"(Intercept)" = -0.5981638095970262), (d = "I2157", var"(Intercept)" = 0.5663070246200219), (d = "I2160", var"(Intercept)" = -0.2322550813384579)], dept = @NamedTuple{dept::String, (Intercept)::Float64}[(dept = "D01", var"(Intercept)" = 0.01925209823588375), (dept = "D02", var"(Intercept)" = -0.03142469083851408), (dept = "D03", var"(Intercept)" = 0.026990446729129435), (dept = "D04", var"(Intercept)" = 0.08635016621351274), (dept = "D05", var"(Intercept)" = 0.04253962436019779), (dept = "D06", var"(Intercept)" = -0.06123971074400431), (dept = "D07", var"(Intercept)" = 0.03799512248138028), (dept = "D08", var"(Intercept)" = 0.10707218145634924), (dept = "D09", var"(Intercept)" = -0.031304654966880036), (dept = "D10", var"(Intercept)" = -0.1307501138466036), (dept = "D11", var"(Intercept)" = -0.05345402697344361), (dept = "D12", var"(Intercept)" = 0.01653786497082539), (dept = "D14", var"(Intercept)" = -0.03943347505732836), (dept = "D15", var"(Intercept)" = 0.010869167979893487)])

re[:dept]

Table with 2 columns and 14 rows:
      dept  (Intercept)
    ┌──────────────────
 1  │ D01   0.0192521
 2  │ D02   -0.0314247
 3  │ D03   0.0269904
 4  │ D04   0.0863502
 5  │ D05   0.0425396
 6  │ D06   -0.0612397
 7  │ D07   0.0379951
 8  │ D08   0.107072
 9  │ D09   -0.0313047
 10 │ D10   -0.13075
 11 │ D11   -0.053454
 12 │ D12   0.0165379
 13 │ D14   -0.0394335
 14 │ D15   0.0108692

Similarly, condVar and condVartables provide similar results for the conditional variances, which can be used to construct prediction intervals. Note that this quantity is slightly more challenging to compute, so the next code chunk can be quite slow for large and/or complex models.

cv = condVartables(fm1)

(s = (s = ["S0001", "S0002", "S0003", "S0004", "S0005", "S0006", "S0007", "S0008", "S0009", "S0010"  …  "S2963", "S2964", "S2965", "S2966", "S2967", "S2968", "S2969", "S2970", "S2971", "S2972"], σ = [(0.2858671228362245,), (0.3040165017502583,), (0.22821165533498675,), (0.2578004425103726,), (0.2860221041711748,), (0.25793963311878376,), (0.2649009415162111,), (0.27833696665010815,), (0.2522474519336828,), (0.22880839419265198,)  …  (0.20261432307540148,), (0.15345463114059987,), (0.18906069725437913,), (0.23644930010448192,), (0.16864269562394849,), (0.132482319658741,), (0.21568697330755854,), (0.15989935597851845,), (0.1970392691204506,), (0.17611124660034733,)], ρ = [(), (), (), (), (), (), (), (), (), ()  …  (), (), (), (), (), (), (), (), (), ()]), d = (d = ["I0001", "I0006", "I0007", "I0008", "I0012", "I0013", "I0014", "I0015", "I0017", "I0018"  …  "I2143", "I2145", "I2146", "I2147", "I2149", "I2152", "I2153", "I2156", "I2157", "I2160"], σ = [(0.29447193578517417,), (0.1998489863854152,), (0.19760667793253092,), (0.1527043969316495,), (0.20500621849854034,), (0.18323915120884238,), (0.20826361545342348,), (0.1379769401014668,), (0.0991637673510267,), (0.2589837320741506,)  …  (0.2709897676057369,), (0.27856001288516263,), (0.2642022156832947,), (0.1135259649977105,), (0.21483200256076004,), (0.28061051441897883,), (0.21640462394420396,), (0.26558786177554317,), (0.18875478926313752,), (0.12464418506317759,)], ρ = [(), (), (), (), (), (), (), (), (), ()  …  (), (), (), (), (), (), (), (), (), ()]), dept = (dept = ["D01", "D02", "D03", "D04", "D05", "D06", "D07", "D08", "D09", "D10", "D11", "D12", "D14", "D15"], σ = [(0.05342326269418921,), (0.056952844171391834,), (0.053094116815828214,), (0.041891800931267716,), (0.055366070778972126,), (0.04470931534818489,), (0.05357594671331021,), (0.05037357874667522,), (0.05202188180710211,), (0.04767962130821475,), (0.052241617344059924,), (0.04219066031585405,), (0.05298726732192016,), (0.04862598365500502,)], ρ = [(), (), (), (), (), (), (), (), (), (), (), (), (), ()]))

# this output still isn't pretty, but we're working on it!
cv[:dept]

(dept = ["D01", "D02", "D03", "D04", "D05", "D06", "D07", "D08", "D09", "D10", "D11", "D12", "D14", "D15"], σ = [(0.05342326269418921,), (0.056952844171391834,), (0.053094116815828214,), (0.041891800931267716,), (0.055366070778972126,), (0.04470931534818489,), (0.05357594671331021,), (0.05037357874667522,), (0.05202188180710211,), (0.04767962130821475,), (0.052241617344059924,), (0.04219066031585405,), (0.05298726732192016,), (0.04862598365500502,)], ρ = [(), (), (), (), (), (), (), (), (), (), (), (), (), ()])

At this point, it becomes convenient to place everything into a dataframe so that we can easily manipulate the relevant quantities.

using DataFrames
dept = DataFrame(cv[:dept])

14×3 DataFrame

Row	dept	σ	ρ
	String	Tuple…	Tuple{}
1	D01	(0.0534233,)	()
2	D02	(0.0569528,)	()
3	D03	(0.0530941,)	()
4	D04	(0.0418918,)	()
5	D05	(0.0553661,)	()
6	D06	(0.0447093,)	()
7	D07	(0.0535759,)	()
8	D08	(0.0503736,)	()
9	D09	(0.0520219,)	()
10	D10	(0.0476796,)	()
11	D11	(0.0522416,)	()
12	D12	(0.0421907,)	()
13	D14	(0.0529873,)	()
14	D15	(0.048626,)	()

Let’s construct prediction intervals:

select!(dept, :dept, :σ => ByRow(first) => :condvar)
leftjoin!(dept, DataFrame(re[:dept]); on=:dept)

14×3 DataFrame

Row	dept	condvar	(Intercept)
	String	Float64	Float64?
1	D01	0.0534233	0.0192521
2	D02	0.0569528	-0.0314247
3	D03	0.0530941	0.0269904
4	D04	0.0418918	0.0863502
5	D05	0.0553661	0.0425396
6	D06	0.0447093	-0.0612397
7	D07	0.0535759	0.0379951
8	D08	0.0503736	0.107072
9	D09	0.0520219	-0.0313047
10	D10	0.0476796	-0.13075
11	D11	0.0522416	-0.053454
12	D12	0.0421907	0.0165379
13	D14	0.0529873	-0.0394335
14	D15	0.048626	0.0108692

select!(dept, "dept", "(Intercept)" => "blup", "condvar")
transform!(dept,
           [:blup, :condvar] => ByRow((x,y) -> x - 1.96 * y) => :lower,
           [:blup, :condvar] => ByRow((x,y) -> x + 1.96 * y) => :upper)

14×5 DataFrame

Row	dept	blup	condvar	lower	upper
	String	Float64?	Float64	Float64	Float64
1	D01	0.0192521	0.0534233	-0.0854575	0.123962
2	D02	-0.0314247	0.0569528	-0.143052	0.0802029
3	D03	0.0269904	0.0530941	-0.077074	0.131055
4	D04	0.0863502	0.0418918	0.00424224	0.168458
5	D05	0.0425396	0.0553661	-0.0659779	0.151057
6	D06	-0.0612397	0.0447093	-0.14887	0.0263905
7	D07	0.0379951	0.0535759	-0.0670137	0.143004
8	D08	0.107072	0.0503736	0.00833997	0.205804
9	D09	-0.0313047	0.0520219	-0.133268	0.0706582
10	D10	-0.13075	0.0476796	-0.224202	-0.0372981
11	D11	-0.053454	0.0522416	-0.155848	0.0489395
12	D12	0.0165379	0.0421907	-0.0661558	0.0992316
13	D14	-0.0394335	0.0529873	-0.143289	0.0644216
14	D15	0.0108692	0.048626	-0.0844378	0.106176

Measures of model fit

MixedModels.jl provides methods for the standard functions aic, aicc, bic, deviance, fitted, logliklihood, nobs, residuals.

The deviance is computed as -2 loglikelihood and is thus missing an additive constant for the saturated model. However, defining that constant is challenging for mixed models (what is the saturated model? do you saturate via the fixed or the random effects?) and that constant cancels out in the relevant computations.

MixedModels.jl intentionally does not provide methods for r2 and adjr2. These quantities are notoriously difficult to define in a completely satisfactory way for mixed models and we, the developers, felt uncomfortable giving our implicit endorsement by defining them as part of the core package. That said, there is an implementation of a naive definition of the coefficient of determination in MixedModelsExtras.jl because it is a commonly requested measure and I felt that it was better to have a well-tested implementation than have users handroll their own buggy implementation of an already problematic measure.

Predicting new data

The predict function can be used to generate predictions on new data. As an initial sanity check, we can consider the case of predicting from the original data – this should yield the fitted values:

predict(fm1, insteval) ≈ fitted(fm1)

true

The predict function supports three different options for handling new levels of the grouping variable:

:population: return population values for the relevant grouping variable. In other words, treat the associated random effect as 0. If all grouping variables have new levels, then this is equivalent to just the fixed effects.
:missing: return missing.
:error: error on this condition.

For example, we can construct a novel dataset based on the first row of the insteval data.

df = first(DataFrame(insteval), 2)
df[!, :s] .= "new"
df

2×7 DataFrame

Row	s	d	dept	studage	lectage	service	y
	String	String	String	String	String	String	Int8
1	new	I1002	D02	2	2	N	5
2	new	I1050	D06	2	1	Y	2

predict(fm1, df; new_re_levels=:population)

2-element Vector{Float64}:
 2.9786264758074457
 2.996608353761869

predict(fm1, df; new_re_levels=:missing)

2-element Vector{Union{Missing, Float64}}:
 missing
 missing

predict(fm1, df; new_re_levels=:error)

ArgumentError: New level encountered in s
Stacktrace:
 [1] _predict(m::LinearMixedModel{Float64}, newdata::@NamedTuple{s::Vector{String}, d::Vector{String}, dept::Vector{String}, studage::Vector{String}, lectage::Vector{String}, service::Vector{String}, y::Vector{Int8}}, β::Vector{Float64}; new_re_levels::Symbol)
   @ MixedModels ~/.julia/packages/MixedModels/NBPpG/src/predict.jl:148
 [2] _predict
   @ ~/.julia/packages/MixedModels/NBPpG/src/predict.jl:91 [inlined]
 [3] predict(m::LinearMixedModel{Float64}, newdata::@NamedTuple{s::Vector{String}, d::Vector{String}, dept::Vector{String}, studage::Vector{String}, lectage::Vector{String}, service::Vector{String}, y::Vector{Int8}}; new_re_levels::Symbol)
   @ MixedModels ~/.julia/packages/MixedModels/NBPpG/src/predict.jl:73
 [4] predict(m::LinearMixedModel{Float64}, newdata::DataFrame; kwargs::@Kwargs{new_re_levels::Symbol})
   @ MixedModels ~/.julia/packages/MixedModels/NBPpG/src/predict.jl:220
 [5] top-level scope
   @ ~/Work/economics2024/01-intro.qmd:164

Similarly, the simulate function can be used to simulate new data and will draw a new sample from the estimated random effects distribution:

using Random
simulate(MersenneTwister(42), fm1, df)

2-element Vector{Float64}:
 3.435828747762102
 1.9761887463338355

Constructing more complex models

We now consider a few more complicated models to examine a few further extensions to the syntax. Including additional varying slopes is as easy as adding them before the relevant grouping variable:

fm2 = fit(MixedModel,
          @formula(y ~ 1 + studage + lectage + service +
                      (1 | s) +
                      (1 + service | d) +
                      (1 + service | dept)),
          insteval; progress=false)

	Est.	SE	z	p	σ_s	σ_d	σ_dept
(Intercept)	3.2985	0.0308	107.26	<1e-99	0.3242	0.5160	0.0642
studage: 4	0.0502	0.0232	2.16	0.0306
studage: 6	0.0573	0.0242	2.37	0.0180
studage: 8	0.1128	0.0268	4.21	<1e-04
lectage: 2	-0.0787	0.0156	-5.03	<1e-06
lectage: 3	-0.1036	0.0169	-6.14	<1e-09
lectage: 4	-0.1837	0.0199	-9.21	<1e-19
lectage: 5	-0.1503	0.0217	-6.94	<1e-11
lectage: 6	-0.2232	0.0209	-10.66	<1e-25
service: Y	-0.0281	0.0498	-0.56	0.5731		0.3906	0.1639
Residual	1.1698

println(fm2)

Linear mixed model fit by maximum likelihood
 y ~ 1 + studage + lectage + service + (1 | s) + (1 + service | d) + (1 + service | dept)
    logLik     -2 logLik       AIC         AICc          BIC     
 -118578.6372  237157.2744  237193.2744  237193.2837  237358.9458

Variance components:
            Column    Variance  Std.Dev.   Corr.
s        (Intercept)  0.1051220 0.3242252
d        (Intercept)  0.2662917 0.5160346
         service: Y   0.1525493 0.3905756 -0.40
dept     (Intercept)  0.0041249 0.0642251
         service: Y   0.0268795 0.1639498 -0.71
Residual              1.3684551 1.1698098
 Number of obs: 73421; levels of grouping factors: 2972, 1128, 14

  Fixed-effects parameters:
─────────────────────────────────────────────────────
                  Coef.  Std. Error       z  Pr(>|z|)
─────────────────────────────────────────────────────
(Intercept)   3.2985      0.0307514  107.26    <1e-99
studage: 4    0.05022     0.0232291    2.16    0.0306
studage: 6    0.057262    0.0242028    2.37    0.0180
studage: 8    0.11277     0.0267844    4.21    <1e-04
lectage: 2   -0.0787343   0.0156461   -5.03    <1e-06
lectage: 3   -0.103602    0.0168684   -6.14    <1e-09
lectage: 4   -0.183663    0.0199353   -9.21    <1e-19
lectage: 5   -0.150343    0.0216722   -6.94    <1e-11
lectage: 6   -0.223191    0.0209329  -10.66    <1e-25
service: Y   -0.0280736   0.0498244   -0.56    0.5731
─────────────────────────────────────────────────────

Of course, we want to know whether this more complicated model has a better fit than our original model. To that end, we can use the lrtest function (same as in GLM.jl and originally part of the StatsAPI.jl specification):

using MixedModels: lrtest
lrtest(fm1, fm2)

Likelihood-ratio test: 2 models fitted on 73421 observations
──────────────────────────────────────────────────────────────
     DOF  ΔDOF        LogLik     Deviance     Chisq  p(>Chisq)
──────────────────────────────────────────────────────────────
[1]   14        -118776.9929  237553.9858                     
[2]   18     4  -118578.6372  237157.2744  396.7114     <1e-83
──────────────────────────────────────────────────────────────

The lrtest function is very general and as such has rather generic output. The checks it performs for nested models are also fairly conservative in the mixed models case – determining nesting of mixed models is not always trivial. MixedModels.jl also provides a likelihoodratio function that is a bit more specialized for mixed models and which performs a different, less conservative set of nesting checks:

using MixedModels: likelihoodratiotest
likelihoodratiotest(fm1, fm2)

	model-dof	deviance	χ²	χ²-dof	P(>χ²)
y ~ 1 + studage + lectage + service + (1 \| s) + (1 \| d) + (1 \| dept)	14	237554
y ~ 1 + studage + lectage + service + (1 \| s) + (1 + service \| d) + (1 + service \| dept)	18	237157	397	4	<1e-83

We can see that the display is different but that the computed quantities are indeed identical.

We can also consider an even more complicated model, where there is a varying effect of studage by dozent – perhaps a particular instructor has a teaching style that is better for students at the beginning or end of their studies. However, studage is a categorical variable with four levels, so including it means that we would include three additional contrasts in the random effects. Together with the intercept and service, we would then have 6 * 5 / 2 = 15 correlation parameters to estimate, which dramatically increases model complexity. We can force the correlation parameters for a particular blocking variable to zero with zerocorr. Furthermore, if we include the same blocking variable multiple times, then the estimated correlations between the different occurrences are all forced to zero.

fm3 = fit(MixedModel,
          @formula(y ~ 1 + studage + lectage + service +
                      (1 | s) +
                      (1 + service | d) +
                      zerocorr(0 + studage | d) +
                      (1 + service | dept)),
          insteval; progress=false)

	Est.	SE	z	p	σ_d	σ_s	σ_dept
(Intercept)	3.3084	0.0290	114.23	<1e-99	0.5022	0.3235	0.0453
studage: 4	0.0550	0.0277	1.98	0.0473	0.2699
studage: 6	0.0544	0.0265	2.06	0.0398	0.2074
studage: 8	0.1097	0.0293	3.74	0.0002	0.2538
lectage: 2	-0.0856	0.0172	-4.98	<1e-06
lectage: 3	-0.0983	0.0188	-5.24	<1e-06
lectage: 4	-0.1996	0.0218	-9.15	<1e-19
lectage: 5	-0.1651	0.0231	-7.14	<1e-12
lectage: 6	-0.2478	0.0221	-11.21	<1e-28
service: Y	-0.0199	0.0504	-0.39	0.6930	0.3704		0.1671
Residual	1.1581

Note that correlation that are systematically zero are shown with a . The estimated between-department variance for the intercept term has also dropped to zero, which leads to the associated correlation being NaN.

println(fm3)

Linear mixed model fit by maximum likelihood
 y ~ 1 + studage + lectage + service + (1 | s) + (1 + service | d) + zerocorr(0 + studage | d) + (1 + service | dept)
    logLik     -2 logLik       AIC         AICc          BIC     
 -118332.7153  236665.4305  236707.4305  236707.4431  236900.7138

Variance components:
            Column   Variance Std.Dev.   Corr.
d        (Intercept)  0.252223 0.502218
         service: Y   0.137161 0.370352 -0.40
         studage: 4   0.072836 0.269881   .     .  
         studage: 6   0.042995 0.207352   .     .     .  
         studage: 8   0.064423 0.253817   .     .     .     .  
s        (Intercept)  0.104624 0.323456
dept     (Intercept)  0.002055 0.045334
         service: Y   0.027939 0.167149 -1.00
Residual              1.341233 1.158116
 Number of obs: 73421; levels of grouping factors: 1128, 2972, 14

  Fixed-effects parameters:
─────────────────────────────────────────────────────
                  Coef.  Std. Error       z  Pr(>|z|)
─────────────────────────────────────────────────────
(Intercept)   3.30836     0.0289634  114.23    <1e-99
studage: 4    0.0549586   0.0277042    1.98    0.0473
studage: 6    0.0544289   0.0264724    2.06    0.0398
studage: 8    0.109672    0.0293385    3.74    0.0002
lectage: 2   -0.0855503   0.0171859   -4.98    <1e-06
lectage: 3   -0.0982976   0.0187591   -5.24    <1e-06
lectage: 4   -0.199561    0.0218111   -9.15    <1e-19
lectage: 5   -0.165114    0.0231269   -7.14    <1e-12
lectage: 6   -0.247818    0.0221157  -11.21    <1e-28
service: Y   -0.019882    0.050354    -0.39    0.6930
─────────────────────────────────────────────────────

This additional model complexity is warranted in terms of goodness of fit:

likelihoodratiotest(fm2, fm3)

	model-dof	deviance	χ²	χ²-dof	P(>χ²)
y ~ 1 + studage + lectage + service + (1 \| s) + (1 + service \| d) + (1 + service \| dept)	18	237157
y ~ 1 + studage + lectage + service + (1 \| s) + (1 + service \| d) + zerocorr(0 + studage \| d) + (1 + service \| dept)	21	236665	492	3	<1e-99

As a final note, we can also examine the effective dimensionality of the random effects with PCA (Bates et al., 2018). The property rePCA displays the cumulative variance explained for each principle component of each variance component and thus an estimate of excess dimensionality:

fm2.rePCA

(s = [1.0], d = [0.6996051806016415, 1.0], dept = [0.8551042911816997, 1.0])

fm3.rePCA

(d = [0.27925260802249807, 0.4792526080224981, 0.6792526080224981, 0.879252608022498, 1.0], s = [1.0], dept = [1.0, 1.0])

Together with the estimated correlations, this suggests that we could reduce the complexity of the by-department random effects. (Given that there are only 15 levels of department, it is also not surprising that we are unable to estimate subtle between department effects.)

fm4 = fit(MixedModel,
          @formula(y ~ 1 + studage + lectage + service +
                      (1 | s) +
                      (1 + service | d) +
                      zerocorr(0 + studage | d) +
                      (1 | dept)),
          insteval; progress=false)
println(fm4)

Linear mixed model fit by maximum likelihood
 y ~ 1 + studage + lectage + service + (1 | s) + (1 + service | d) + zerocorr(0 + studage | d) + (1 | dept)
    logLik     -2 logLik       AIC         AICc          BIC     
 -118344.9806  236689.9611  236727.9611  236727.9715  236902.8365

Variance components:
            Column    Variance  Std.Dev.   Corr.
d        (Intercept)  0.2512074 0.5012060
         service: Y   0.1548050 0.3934526 -0.39
         studage: 4   0.0735059 0.2711197   .     .  
         studage: 6   0.0429213 0.2071747   .     .     .  
         studage: 8   0.0641994 0.2533760   .     .     .     .  
s        (Intercept)  0.1046981 0.3235708
dept     (Intercept)  0.0041447 0.0643792
Residual              1.3412751 1.1581343
 Number of obs: 73421; levels of grouping factors: 1128, 2972, 14

  Fixed-effects parameters:
─────────────────────────────────────────────────────
                  Coef.  Std. Error       z  Pr(>|z|)
─────────────────────────────────────────────────────
(Intercept)   3.30686     0.0315165  104.92    <1e-99
studage: 4    0.0569039   0.0277378    2.05    0.0402
studage: 6    0.0590462   0.02647      2.23    0.0257
studage: 8    0.116331    0.0293413    3.96    <1e-04
lectage: 2   -0.0851802   0.0172081   -4.95    <1e-06
lectage: 3   -0.0995893   0.0187627   -5.31    <1e-06
lectage: 4   -0.20152     0.0218215   -9.23    <1e-19
lectage: 5   -0.168801    0.023126    -7.30    <1e-12
lectage: 6   -0.253877    0.0221083  -11.48    <1e-29
service: Y   -0.0499557   0.0224017   -2.23    0.0257
─────────────────────────────────────────────────────

likelihoodratiotest(fm3, fm4)

	model-dof	deviance	χ²	χ²-dof	P(>χ²)
y ~ 1 + studage + lectage + service + (1 \| s) + (1 + service \| d) + zerocorr(0 + studage \| d) + (1 \| dept)	19	236690
y ~ 1 + studage + lectage + service + (1 \| s) + (1 + service \| d) + zerocorr(0 + studage \| d) + (1 + service \| dept)	21	236665	25	2	<1e-05

fm4.rePCA

(d = [0.278618228343286, 0.47861822834328593, 0.678618228343286, 0.878618228343286, 1.0], s = [1.0], dept = [1.0])

How big can we go?

The MovieLens data (Harper & Konstan, 2016) contain millions of observations and provide a good stress test for model size and complexity.

Note

The following models consume a large amount of memory because of the sheer size of the underlying dataset. Do not attempt to fit these models on a machine with less than 32GiB of memory.

Memory allocation vs. fitting

using Econ2024
ratings = Econ2024.dataset("ratings")
@time fm_ratings = LinearMixedModel(@formula(rating ~ 1 + (1|userId) + (1|movieId)), ratings)

@time fit!(fm_ratings)

To try on your own after the course

using Econ2024
ratings = DataFrame(Econ2024.dataset("ratings_genre"))
describe(ratings)

using StatsBase
mcount = countmap(ratings.movieId)
ucount = countmap(ratings.userId)
mexclude = Set(k for (k, v) in pairs(mcount) if v < 50)
uexclude = Set(k for (k, v) in pairs(ucount) if v < 50)
ratings = subset(ratings,
                 :movieId => ByRow(!in(mexclude)),
                 :userId => ByRow(!in(uexclude)))

# This takes about an hour on my home computer when using the full dataset
form1 = @formula(rating ~ 0 + Action + Adventure + Animation +
                             Children + Comedy + Crime +
                             Documentary + Drama +
                             Fantasy + Film_Noir +
                             Horror + IMAX +
                             Musical + Mystery + Romance +
                             Sci_Fi + Thriller + War + Western +
                             (1 | movieId) +
                             (1 | userId))
fit(MixedModel, form1, ratings)

Generalized Linear Mixed Effects Models

One of the test data sets from the Center for Multilevel Modelling, University of Bristol is derived from the 1989 Bangladesh Fertility Survey (Huq & Cleland, 1990). The data are a subsample of 1934 women selected from 60 of the 64 political districts or zila, available as the contra data set in the MixedModels package.

Variable in the dataset:

use, whether the woman chooses to use artificial contraception, with two possible values, N and Y
dist, district in which the woman resides
livch, the number of live children she currently has, coded as 0, 1, 2, and 3+
age, in years, but pre-centered and rounded, with the original center not available
urban, coded as N and Y, indicating rural or urban.

contra = MixedModels.dataset(:contra)

Arrow.Table with 1934 rows, 5 columns, and schema:
 :dist   String
 :urban  String
 :livch  String
 :age    Float64
 :use    String

In order to simplify the data a bit, we will also add a binary variable anych which indicates whether the woman has any children:

contra = DataFrame(contra)
contra[!, :anych] .= contra[!, :livch] .!= "0"
describe(contra)

6×7 DataFrame

Row	variable	mean	min	median	max	nmissing	eltype
	Symbol	Union…	Any	Union…	Any	Int64	DataType
1	dist		D01		D61	0	String
2	urban		N		Y	0	String
3	livch		0		3+	0	String
4	age	0.00204757	-13.56	-1.56	19.44	0	Float64
5	use		N		Y	0	String
6	anych	0.725957	false	1.0	true	0	Bool

Note

For a more principled examination of model building with this dataset, please refer to the chapter “Generalized linear mixed models for binary responses” of Embrace Uncertainty.

We set some appropriate contrasts

contrasts = Dict(:livch => HelmertCoding(; base="0"),
                 :urban => EffectsCoding(),
                 :anych => EffectsCoding())

Dict{Symbol, StatsModels.AbstractContrasts} with 3 entries:
  :urban => EffectsCoding(nothing, nothing)
  :livch => HelmertCoding("0", nothing)
  :anych => EffectsCoding(nothing, nothing)

and fit a model

gm1 = fit(MixedModel,
          @formula(use ~ 1 + urban + anych * age + abs2(age) + (1 | dist & urban)),
          contra,
          Bernoulli(),
          LogitLink(); # optional, defaults to canonical link
          nAGQ=1, # optional, default to 1
          fast=false, # optional, defaults to false, see the docs for more details.
          contrasts,
          progress=false)

	Est.	SE	z	p	σ_dist & urban
(Intercept)	-0.3410	0.1265	-2.70	0.0070	0.5682
urban: Y	0.3934	0.0853	4.61	<1e-05
anych: true	0.6064	0.1045	5.80	<1e-08
age	-0.0129	0.0112	-1.16	0.2465
abs2(age)	-0.0056	0.0008	-6.67	<1e-10
anych: true & age	0.0332	0.0128	2.59	0.0095

Limitations of MixedModels.jl

We expect that MixedModels.jl will generally be best in class for the types of models that it can fit. We use cutting edge algorithms based on penalized least squares and sparse matrix methods that take advantage of the particular sparsity and structure that arises in the case of the linear mixed effects model with an unconstrained covariance structure. Glossing over a fair number of technical details, MixedModels.jl uses a different, novel formulation of the underlying numerical problem which tends to be much more efficient computationally and allows us to fit models with multiple crossed, partially crossed or nested grouping variables without any special treatment.

Very few options for covariance structure

Nonetheless, there is no free lunch and the tradeoff that we make is that it is much more difficult to formulate constraints on the covariance structure (whether on the random effects or on the response/residuals) in our formulation. MixedModels.jl currently supports precisely two covariance structures explicitly:

unconstrained
zero correlation (diagonal covariance structure)

It is also possible to express some models with compound symmetry by clever manipulation of the formula syntax (i.e. (1+c|g) for categorical c with compound symmetry is the same as (1|g) + (1|g&c)).

MixedModels.jl does support constraining the residual variance to known scalar value, which is useful in meta-analysis.

Metida.jl may provide an alternative if this functionality is required (not an endorsement).

No support for sandwich/robust variance-covariance estimators

This may change in the foreseeable future!

If this would be a valuable feature, then please file an issue. Issues are prioritized by the developers’ own needs and potential impact for users, so showing a large need for a feature will tend to increase its priority.

FixedEffectsModels.jl may be a viable alternative (not an endorsement). It provides “fast estimation of linear models with IV and high dimensional categorical variables” and provides similar functionality to Stata’s reghdfe and R’s lfe and fixest.

No support for generalized linear mixed models with a dispersion parameter

While MixedModels.jl does nominally support any GLM family and link function support by GLM.jl, the results for model families with a dispersion parameter (normal with non-identity link, gamma, inverse Gaussian) are known to be incorrect. The package issues a warning if you attempt to fit such models.

No support for polytomous responses

Multinomial and ordered responses are not supported. I am unaware of a Julia package offering support for this.

No support for regularization of the fixed effects

HighDimMixedModels.jl may provide an alternative if this functionality is required (not an endorsement).

No support for generalized additive mixed models

Generalized additive models can be expressed a mixed model, so supporting this would require “only” adding a translation layer.

No support for nonlinear mixed effects models

Pumas.jl (commercial) provides this (not an endorsement).

References

Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2018). Parsimonious Mixed Models. arXiv:1506.04967 [Stat]. http://arxiv.org/abs/1506.04967

arXiv: 1506.04967

Harper, F. M., & Konstan, J. A. (2016). The MovieLens datasets. ACM Transactions on Interactive Intelligent Systems, 5(4), 1–19. https://doi.org/10.1145/2827872

Huq, N. M., & Cleland, J. (1990). Bangladesh fertility survey 1989 (main report). National Institute of Population Research; Training.