Statistics beyond classical balanced designs

The joy of mixed models

Phillip Alday

me@phillipalday.com

Beacon Biosignals

17 October 2025

Why do we care about balance?

Balance is a simplifying assumption

Computer used to refer to a person and not a machine
Balance is a special case of symmetry and symmetry is a very useful property for simplifying computations

what is the combined variance of several samples?

\[ \frac{\sum_i (n_i-1)s^2_i}{\sum_i (n_i - 1)}\]

when the samples are the same size,

what is the combined variance of several samples?

\[ \frac{1}{N} \sum_i s^2_i \]

when the samples are the same size,

and have the same variance,

what is the combined variance of several samples?

\[ s^2 \]

Balance is a simplifying assumption

Balance is a special case of symmetry and symmetry is a very useful property for simplifying computations
Other types of symmetry assumptions are present everywhere (homoskedacity, sphericity, etc.), and it’s often possible to relax them at the cost of great mathematical and thus computational complexity

Mixed models

Mixed models are an extension of the classical regression framework often used to analyze repeated-measures data.
They relax the independence assumption by decomposing the variance into multiple components.

\[ \begin{aligned} (Y|B=b)&\sim{}N\left(X\beta+Zb,\sigma^{2}I\right)\\ B&\sim{}N\left(0,\Sigma_\theta\right) . \end{aligned} \]

Balance is a simplifying assumption

Balance is a special case of symmetry and symmetry is a very useful property for simplifying computations
Other types of symmetry assumptions are present everywhere (homoskedacity, sphericity, etc.), and it’s often possible to relax them at the cost of great mathematical and thus computational complexity
In mixed models, nesting and crossing are also axes of potential symmetry

Nesting and crossing

When examining “multi-level” data, we often consider whether the levels are nested or crossed.

Nesting
Crossing

A classical example of nesting is children within classes/grades, classes within schools, schools within districts. In other others, each child appears in exactly one class and each class appears in exactly one school, so the grouping structure is like a Russian matroska doll.

A classical example of crossing occurs in many experimental designs in psychology, where each participant sees each stimulus item. In some sense, the item grouping is nested within the participant grouping, but it would also be equally true to say that the participant is grouping is nested within each item grouping.

I’m not well balanced…

Leaving the nest breaks nesting

A classical example of nesting is children within classes/grades, classes within schools, schools within districts. . . .

What happens when we examine longitudinal data?

Students move between classes/grades. Students in the same cohort may no longer be lumped together at the next grade level.
Students move between schools and even districts, if they move homes.
Students may have to repeat a grade.

Crossing our fingers

A classical example of crossing occurs in many experimental designs in psychology, where each participant sees each stimulus item.

Often it is impractical to show every participant every stimulus, so each participants may only see a subset of items (e.g. Latin square design).
Mixed between-within designs where some conditions are within participants and some conditions are between participants defy classication into nesting or crossing.

It gets worse….

Even if we’re able to perfectly control our experimental design, such that the design achieves perfect symmetry and the underlying experimental phenomon also displays the desired symmetry, we often won’t acheive the desired balance in practice.
Good data science means that we have to be concerned with the data and ideal data doesn’t exist:
- in longitudinal studies, it’s very common for some participants/groups to not be present for the entirety of the study: students move far away, participants fail to show up for a later appointment, etc.
- what happens if the number of students isn’t balanced across classes?
- individual observations may need to be excluded due measurement artifacts or incorrect responses
- etc…
And what happens if we want to look at observational data? 😱

The Sapir-Whorf Hypothesis for Software and Computation

The theory of mixed models doesn’t require perfect balance, so why does your software?

Historically, it was very difficult to fit a mixed model, even in the 1990s.
Every shortcut in the book was considered to make the process faster and amenable to non trivial datasets, including requiring nesting and crossing and constraints on the variance-covariance matrix of the random effects.
The usual expression of the model fitting problem couldn’t automatically detect the assumed structures, so these had to specified by the user.
Over time, the software influenced the thinking of the community until people struggled to think in terms beyond balance, nesting and crossing.

When all you have is a hammer, every problem looks like a nail

The theory of mixed models doesn’t require perfect balance, so why does your software?

The underlying computational issue is often expressed a generalized least squares problem.
If we instead formulate it as penalized least squares problem, then it becomes trivial to handle arbitrary nesting, crossing or partial crossing with unstructructured covariance matrices.
The research program behind R’s lme4 and Julia’s MixedModels.jl implements this approach.
MixedModels.jl is able to fit mixed models to very large datasets (largest so far: 32m movie ratings with partially crossed random effects representing 201k users and 84k movies).

Examples

Course ratings from ETH-Zurich (insteval)

using MixedModels
using MixedModelsDatasets: dataset
@time fm1 = lmm(@formula(y ~ 1 + service + (1 | d) + (1 | s) + (1 + service | dept)),
                dataset(:insteval); progress=isinteractive())
println(fm1)

Course ratings from ETH-Zurich (insteval)

  5.942749 seconds (9.12 M allocations: 651.773 MiB, 1.82% gc time, 50.57% compilation time)
Linear mixed model fit by maximum likelihood
 y ~ 1 + service + (1 | d) + (1 | s) + (1 + service | dept)
    logLik     -2 logLik       AIC         AICc          BIC     
 -118823.5292  237647.0584  237663.0584  237663.0603  237736.6901

Variance components:
            Column    Variance  Std.Dev.   Corr.
s        (Intercept)  0.1052731 0.3244581
d        (Intercept)  0.2624306 0.5122798
dept     (Intercept)  0.0029308 0.0541366
         service: Y   0.0259425 0.1610668 -0.57
Residual              1.3849984 1.1768596
 Number of obs: 73421; levels of grouping factors: 2972, 1128, 14

  Fixed-effects parameters:
─────────────────────────────────────────────────────
                  Coef.  Std. Error       z  Pr(>|z|)
─────────────────────────────────────────────────────
(Intercept)   3.2785      0.024053   136.30    <1e-99
service: Y   -0.0496955   0.0460242   -1.08    0.2802
─────────────────────────────────────────────────────

2972×1128 adjoint(::Matrix{Int64}) with eltype Int64:
 5  2  5  3  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  2  4  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  4  5  5  4  4  4  4     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  4  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 5  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 ⋮              ⋮              ⋮        ⋱              ⋮              ⋮     
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  2  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 4  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  3  0  0  3     0  0  0  0  0  0  0  0  0  0  0  0

14×1128 adjoint(::Matrix{Int64}) with eltype Int64:
 617    0  122    0    0    0    0  …   0   0   0   0   0   0   0   0   0
   0  307    0    0    0    0    0      0   0   0   0   0   0   0   0   0
   0    0    0  148    0    0    0      0   0   0   0   0   0   0   0   0
   0    0    0    0  310  395    0      0   0   0   0   0   0   0   0   0
   0    0    0    0    0    0  134      0   0   0   0   0   0   0   0   0
   0    0    0    0    0    0    0  …   0   0   0   0   0   0   0   0   0
   0    0    0    0    0    0    0      0   0   0   0   0   0   0   0   0
   0    0    0    0    0    0    0      0   0   0   0   0   0   0   0  39
   0    0    0    0    0    0    0      0   0   0   0   0   0   0  33   0
   0    0    0    0    0    0    0     54  48  46  44  27  54  53   0   0
   0    0    0    0    0    0    0  …   0   0   0   0   0   0   0   0   0
   0    0    0    0    0    0    0      0   0   0   0   0   0   0   0   0
   0    0    0    0    0    0    0      0   0   0   0   0   0   0   0   0
   0    0    0    0    0    0    0      0   0   0   0   0   0   0   0   0

14×2972 adjoint(::Matrix{Int64}) with eltype Int64:
 10  0   0  25   0   0   5   0  26   0  …   1   0   9    6   0    0   4   6
  2  0   0   0   0   0   0   0   6   0     11   0   5   11  34   10   0   6
  3  0   0   0   0   0   0   0   0   0     42   0   4    0   0    0   3   9
  0  6   0   0  15   0   0   0   0   0      0   0   0    0   0    0   0   0
  0  0   4   8   0   0   0  15   0   0      8   4   9    6   0    1  14  22
  0  0  39   0   0   0   4   0   0  45  …   0   0   3    5   0    5   0  34
  0  0   6   0   0   0   0   4   0   0      5   0   0    1   0    0  11   7
  0  0   4   0   0   4   0   0   0   4      4   0   5   13  11   14   0   8
  0  0   0   0   0  28   0   0   0   0      0   3   0   14   0    3   0   0
  0  0   0   0   0   0  10   0   0   0      0   0   0  108   4  108   0  16
  0  0   0   0   0   0  14   0   0   0  …   0   0   0   10   0    0   0   2
  0  0   0   0   0   0   0   0   5   0      8  29   0    0   0    0  47   0
  0  0   0   0   0   0   0   0   0   3      0   3  59    5   5    0   0   0
  0  0   0   0   0   0   0   0   0   0      1   0   2   34   0   10   0   0

insteval – Balance

Instructors
Department
Students

English Lexicon Project

# not shown: data wrangling and contrast creation
@time elm01 = lmm(@formula(1000 / rt ~ 1 + isword * wrdlen + (1 | item) + (1 | subj)),
                  pruned; contrasts, progress=isinteractive())
println(elm01)

English Lexicon Project

  8.580446 seconds (9.97 M allocations: 1.859 GiB, 5.51% gc time, 40.40% compilation time: 17% of which was recompilation)
Linear mixed model fit by maximum likelihood
 :(1000 / rt) ~ 1 + isword + wrdlen(centered: 8) + isword & wrdlen(centered: 8) + (1 | item) + (1 | subj)
     logLik     -2 logLik        AIC           AICc          BIC      
 -1273728.2624  2547456.5247  2547470.5247  2547470.5248  2547560.2231

Variance components:
            Column   Variance Std.Dev. 
item     (Intercept)  0.014039 0.118487
subj     (Intercept)  0.065050 0.255049
Residual              0.142990 0.378140
 Number of obs: 2714311; levels of grouping factors: 80962, 814

  Fixed-effects parameters:
──────────────────────────────────────────────────────────────────────────────
                                          Coef.  Std. Error        z  Pr(>|z|)
──────────────────────────────────────────────────────────────────────────────
(Intercept)                          1.37584     0.00895211   153.69    <1e-99
isword: true                         0.0624608   0.00047552   131.35    <1e-99
wrdlen(centered: 8)                 -0.0436209   0.00019354  -225.38    <1e-99
isword: true & wrdlen(centered: 8)  -0.00557854  0.00019353   -28.83    <1e-99
──────────────────────────────────────────────────────────────────────────────

ELP – Nesting

814×80962 Matrix{Bool}:
 1  1  1  1  1  1  1  1  1  1  1  1  1  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 ⋮              ⋮              ⋮        ⋱  ⋮              ⋮              ⋮  
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  1  0  0  0  0  0  0  0  0  0  0
 1  0  0  0  0  0  0  0  0  0  0  0  0     0  0  1  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  1  0  1  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  1  0  0
 0  1  0  0  0  0  1  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  1

ELP – Balance

Subjects
Items

ELP – Excluding subjects

All subjects
Subjects with at least 80% accuracy

ELP – Updated model

@time elm02 = lmm(@formula(1000 / rt ~ 1 + isword * wrdlen + (1 | item) + (1 | subj)),
                  filter(:spropacc => >=(0.80), pruned); contrasts, progress=isinteractive())
println(elm02)

ELP – Updated model

  4.447658 seconds (216.04 k allocations: 1.126 GiB, 6.20% gc time, 1.91% compilation time)
Linear mixed model fit by maximum likelihood
 :(1000 / rt) ~ 1 + isword + wrdlen(centered: 8) + isword & wrdlen(centered: 8) + (1 | item) + (1 | subj)
    logLik     -2 logLik       AIC         AICc          BIC     
 -815182.9780 1630365.9561 1630379.9561 1630379.9561 1630468.5427

Variance components:
            Column   Variance Std.Dev. 
item     (Intercept)  0.015538 0.124651
subj     (Intercept)  0.053748 0.231837
Residual              0.111684 0.334191
 Number of obs: 2315715; levels of grouping factors: 80962, 690

  Fixed-effects parameters:
───────────────────────────────────────────────────────────────────────────────
                                          Coef.   Std. Error        z  Pr(>|z|)
───────────────────────────────────────────────────────────────────────────────
(Intercept)                          1.36113     0.0088395     153.98    <1e-99
isword: true                         0.0655624   0.000490278   133.73    <1e-99
wrdlen(centered: 8)                 -0.0444304   0.000199551  -222.65    <1e-99
isword: true & wrdlen(centered: 8)  -0.00573248  0.000199541   -28.73    <1e-99
───────────────────────────────────────────────────────────────────────────────

MovieLens Data

272.020954 seconds (2.81 M allocations: 1.065 GiB, 0.52% gc time, 1.00% compilation time)
Linear mixed model fit by maximum likelihood
 rating ~ 1 + Action + Adventure + Animation + Children + Comedy + Crime + Documentary + Drama + Fantasy + FilmNoir + Horror + IMAX + Musical + Mystery + Romance + SciFi + Thriller + War + Western + (1 | userId) + (1 | movieId)
    logLik     -2 logLik       AIC         AICc          BIC     
 -139245.9467  278491.8934  278537.8934  278537.9045  278756.6907

Variance components:
            Column   Variance Std.Dev. 
userId   (Intercept)  0.205653 0.453489
movieId  (Intercept)  0.169397 0.411579
Residual              0.712472 0.844080
 Number of obs: 100000; levels of grouping factors: 54940, 10098

  Fixed-effects parameters:
──────────────────────────────────────────────────────
                   Coef.  Std. Error       z  Pr(>|z|)
──────────────────────────────────────────────────────
(Intercept)   3.3273       0.0187175  177.76    <1e-99
Action       -0.136        0.0186972   -7.27    <1e-12
Adventure     0.038103     0.0208094    1.83    0.0671
Animation     0.273746     0.0330253    8.29    <1e-15
Children     -0.274002     0.030205    -9.07    <1e-18
Comedy       -0.0777225    0.01618     -4.80    <1e-05
Crime         0.0872368    0.0205663    4.24    <1e-04
Documentary   0.392768     0.0394531    9.96    <1e-22
Drama         0.232496     0.0157985   14.72    <1e-48
Fantasy       0.023327     0.0245536    0.95    0.3421
FilmNoir      0.342308     0.0652081    5.25    <1e-06
Horror       -0.252196     0.0240813  -10.47    <1e-24
IMAX          0.0655499    0.0425256    1.54    0.1232
Musical       0.0859278    0.0354544    2.42    0.0154
Mystery       0.108849     0.026671     4.08    <1e-04
Romance       0.0303007    0.0179232    1.69    0.0909
SciFi        -0.0151515    0.0219067   -0.69    0.4892
Thriller     -0.00606902   0.0180137   -0.34    0.7362
War           0.200163     0.0324413    6.17    <1e-09
Western       0.14738      0.0470189    3.13    0.0017
──────────────────────────────────────────────────────

MovieLens Data – Balance

Note the y-scale!

Users
Movies

Thank you for your attention!

Work done in collaboration with Doug Bates.

All mistakes are my own, and opinions presented here do not represent the opinion of my employer.

References

Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), 445–459. https://doi.org/10.3758/bf03193014

Harper, F. M., & Konstan, J. A. (2016). The MovieLens datasets. ACM Transactions on Interactive Intelligent Systems, 5(4), 1–19. https://doi.org/10.1145/2827872

Wilkinson, G. N., & Rogers, C. E. (1973). Symbolic Description of Factorial Models for Analysis of Variance. Applied Statistics, 22(3), 392. https://doi.org/10.2307/2346786