using BoxCox
using CairoMakie
using Random
CairoMakie.activate!(; type="svg")
x = abs2.(randn(MersenneTwister(42), 1000))
let f = Figure()
ax = Axis(f[1,1]; xlabel="x", ylabel="density")
density!(ax, x)
ax = Axis(f[1,2]; xlabel="theoretical quantiles", ylabel="observed values")
qqnorm!(ax, x)
colsize!(f.layout, 1, Aspect(1, 1.0))
colsize!(f.layout, 2, Aspect(1, 1.0))
resize_to_layout!(f)
f
endBoxCox.jl
A lightweight package with nice extensions
The Box-Cox Transformation
Box & Cox (1964)
\[ \begin{cases} \frac{x^{\lambda} - 1}{\lambda} &\quad \lambda \neq 0 \\ \log x &\quad \lambda = 0 \end{cases} \]
The denominator serves to normalize the transformation and preserve the original direction of the effect (the sign is flipped when \(\lambda < 0\)). In application, we may only care about the numerator (e.g. when it suggests using “speed” instead of “time”.)
Example: Square of a Normal Distribution
Fitting the transformation
bc = fit(BoxCoxTransformation, x)Box-Cox transformation
estimated λ: 0.2073
resultant transformation:
y^0.2 - 1
-----------
0.2
Examining the fitted transformation
If an appropriate Makie backend is loaded, then you can also do diagnostic plots.
let f = Figure(; size=(500, 300))
ax = Axis(f[1,1])
boxcoxplot!(ax, bc;
conf_level=0.95)
f
endApplying the fitted transformation
bc.(x)'1×1000 adjoint(::Vector{Float64}) with eltype Float64:
-1.04197 -1.37758 -3.74249 -1.89776 … -3.3449 -0.295085 -1.06768
let f = Figure(; size=(800, 300)), bcx = bc.(x)
ax = Axis(f[1,1]; xlabel="x", ylabel="density")
density!(ax, bcx)
ax = Axis(f[1,2]; xlabel="theoretical quantiles", ylabel="observed values")
qqnorm!(ax, bcx; qqline=:fitrobust)
colsize!(f.layout, 1, Aspect(1, 1.0))
colsize!(f.layout, 2, Aspect(1, 1.0))
resize_to_layout!(f)
f
endConditional Distributions: transforming the response of a regression model
Example: Tree Growth
using DataFrames
using RDatasets: dataset as rdataset
trees = rdataset("datasets", "trees")describe(trees)| Row | variable | mean | min | median | max | nmissing | eltype |
|---|---|---|---|---|---|---|---|
| Symbol | Float64 | Real | Float64 | Real | Int64 | DataType | |
| 1 | Girth | 13.2484 | 8.3 | 12.9 | 20.6 | 0 | Float64 |
| 2 | Height | 76.0 | 63 | 76.0 | 87 | 0 | Int64 |
| 3 | Volume | 30.171 | 10.2 | 24.2 | 77.0 | 0 | Float64 |
Linear Regression
y = trees[!, :Volume]
X = hcat(ones(length(y)),
log.(trees[!, :Height]),
log.(trees[!, :Girth]))
bc_tree = fit(BoxCoxTransformation, X, y)Box-Cox transformation
estimated λ: -0.0673
resultant transformation:
y^-0.1 - 1
------------
-0.1
let f = Figure(; size=(400, 300))
boxcoxplot!(Axis(f[1,1]), bc_tree; conf_level=0.95)
f
end\(\lambda=0\) is well within the 95% CI and log fits in well with the rest of the model.
Linear Regression
using StatsModels
fit(BoxCoxTransformation,
@formula(Volume ~ 1 +
log(Height) +
log(Girth)),
trees)Box-Cox transformation
estimated λ: -0.0673
resultant transformation:
y^-0.1 - 1
------------
-0.1
If StatsModels.jl is loaded (even indirectly via e.g. GLM.jl), then you can also use the @formula macro to specify the regression model.
This also works for mixed models!
Reaction time in the sleep study
using MixedModels
model = fit(MixedModel,
@formula(reaction ~ 1 + days + (1 + days | subj)),
dataset(:sleepstudy))| Est. | SE | z | p | σ_subj | |
| (Intercept) | 251.4051 | 6.6323 | 37.91 | <1e-99 | 23.7805 |
| days | 10.4673 | 1.5022 | 6.97 | <1e-11 | 5.7168 |
| Residual | 25.5918 |
Fitting the transformation
bc_mixed = fit(BoxCoxTransformation,
model)Box-Cox transformation
estimated λ: -1.0747
resultant transformation:
y^-1.1 - 1
------------
-1.1
let f = Figure(; size=(400, 300))
boxcoxplot!(Axis(f[1,1]), bc_mixed; conf_level=0.95)
f
end\(\text{time}^{-1}\) has a natural interpretation as speed and -1 is nearly as good as the “optimal” transformation. We thus use our domain expertise and use speed (as responses per second) instead of the fitted result.
Speed in the sleep study
model_bc = fit(MixedModel,
@formula(1000 / reaction ~ 1 + days + (1 + days | subj)),
dataset(:sleepstudy))| Est. | SE | z | p | σ_subj | |
| (Intercept) | 3.9658 | 0.1056 | 37.55 | <1e-99 | 0.4190 |
| days | -0.1110 | 0.0151 | -7.37 | <1e-12 | 0.0566 |
| Residual | 0.2698 |
We use 1000 in the numerator to scale things back to seconds from milliseconds.
References
Version Info
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
JULIA_LOAD_PATH = @:@stdlib
This page was rendered from git revision 8c549ca .