spherical-distribution-example.Rmd
We present here some examples to highlight the differences between the gANOVA()
function and the built-in spherical distribution of lmer()
. Both functions estimate a spherical correlation structure. However the parametrization using gANOVA()
increases the parameter space and widen the range of the models that can be estimated without having estimates at the boundary of the parameter space (e.g. variances estimated at 0).
We need the following packages:
Then, we construct the design of a simple experiment. It contains 12 participants, a within-participant factor with 4 levels and 3 replications for each participant within each level. The design is simply:
set.seed(42) df <- expand.grid(pt = paste0("P",str_pad(1:12,width = 2,pad = "0")), A = paste0("A",1:4), r = 1:3, stringsAsFactors = F)%>% select(-r)
Then we simulate 2 different response variables. There are y_diff
and y_same
and they differ only by the variance of the random intercepts. The response y_diff
has a lower variance of the random intercepts compared to the one of the random interactions.
# add +(1|pt) random intercept df<- df%>% nest(data=-pt)%>% mutate(ranef_pt = rnorm(n(), sd = .8))%>% unnest(data) # add +(1|pt:A) random interaction df<- df%>% nest(data=-c(pt,A))%>% mutate(ranef_ptA = rnorm(n(), sd = 3))%>% unnest(data) # add an error term df<- df%>% mutate(err = rnorm(n(),sd = 4)) # create 2 dependant variables df <- df%>% transmute(pt = pt, A= A, y_diff = ranef_pt + ranef_ptA + err, y_same = ranef_pt*2 +ranef_ptA + err)
Finally, we estimate the model using y_same
as dependant variable:
lmer_same <- lmer(y_same ~ A+ (1|pt) + (1|pt:A), data=df, contrasts = list(A = contr.sum)) summary(lmer_same) #> Linear mixed model fit by REML ['lmerMod'] #> Formula: y_same ~ A + (1 | pt) + (1 | pt:A) #> Data: df #> #> REML criterion at convergence: 836.8 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.61123 -0.61168 -0.06369 0.59976 2.64576 #> #> Random effects: #> Groups Name Variance Std.Dev. #> pt:A (Intercept) 9.799 3.130 #> pt (Intercept) 2.169 1.473 #> Residual 13.744 3.707 #> Number of obs: 144, groups: pt:A, 48; pt, 12 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 0.47727 0.69306 0.689 #> A1 0.36270 0.94803 0.383 #> A2 0.03153 0.94803 0.033 #> A3 -1.47134 0.94803 -1.552 #> #> Correlation of Fixed Effects: #> (Intr) A1 A2 #> A1 0.000 #> A2 0.000 -0.333 #> A3 0.000 -0.333 -0.333
and then gANOVA:
gANOVA_same <- gANOVA(y_same ~ A + (1|pt|A), data=df, contrasts = list(A = contr.sum)) summary(gANOVA_same) #> Linear mixed model fit by REML. t-tests use Satterthwaite's method [ #> lmerModgANOVA] #> Formula: y_same ~ A + (1 | pt | A) #> Data: df #> Control: structure(list(optimizer = "nloptwrap", restart_edge = TRUE, #> boundary.tol = 1e-05, calc.derivs = TRUE, use.last.params = FALSE, #> checkControl = list(check.nobs.vs.rankZ = "ignore", check.nobs.vs.nlev = "stop", #> check.nlev.gtreq.5 = "ignore", check.nlev.gtr.1 = "stop", #> check.nobs.vs.nRE = "stop", check.rankX = "message+drop.cols", #> check.scaleX = "warning", check.formula.LHS = "stop"), #> checkConv = list(check.conv.grad = list(action = "warning", #> tol = 0.002, relTol = NULL), check.conv.singular = list( #> action = "message", tol = 1e-04), check.conv.hess = list( #> action = "warning", tol = 1e-06)), optCtrl = list()), class = c("lmerControl", #> "merControl")) #> #> REML criterion at convergence: 836.8 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.61123 -0.61168 -0.06369 0.59976 2.64577 #> #> Random effects: #> Groups Name Variance Std.Dev. #> pt:A (Intercept) 9.799 3.130 #> pt (Intercept) 4.619 2.149 #> Residual 13.744 3.707 #> Number of obs: 144, groups: pt:A, 48; pt, 12 #> #> Fixed effects: #> Estimate Std. Error df t value Pr(>|t|) #> (Intercept) 0.47727 0.69306 10.99989 0.689 0.505 #> A1 0.36270 0.94803 33.00003 0.383 0.704 #> A2 0.03153 0.94803 33.00003 0.033 0.974 #> A3 -1.47134 0.94803 33.00003 -1.552 0.130 #> #> Correlation of Fixed Effects: #> (Intr) A1 A2 #> A1 0.000 #> A2 0.000 -0.333 #> A3 0.000 -0.333 -0.333
Using y_same
the 2 estimations are the same (likelihood, fixed parameters, etc). Here the reparametrizations of gANOVA()
does not improve the results as the variance of the random intercept is big enough with respect to the variance of the random interaction.
However, using y_diff
, the variability of the random intercepts is smaller. The estimation using the parametrization of lmer()
produces a singular fit:
lmer_diff <- lmer(y_diff~ A + (1|pt)+(1|pt:A),data=df,contrasts = list(A = contr.sum)) #> boundary (singular) fit: see ?isSingular summary(lmer_diff) #> Linear mixed model fit by REML ['lmerMod'] #> Formula: y_diff ~ A + (1 | pt) + (1 | pt:A) #> Data: df #> #> REML criterion at convergence: 831 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.62948 -0.60740 -0.07108 0.61599 2.68565 #> #> Random effects: #> Groups Name Variance Std.Dev. #> pt:A (Intercept) 9.581 3.095 #> pt (Intercept) 0.000 0.000 #> Residual 13.744 3.707 #> Number of obs: 144, groups: pt:A, 48; pt, 12 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) -0.12703 0.54318 -0.234 #> A1 0.36270 0.94081 0.386 #> A2 0.03153 0.94081 0.034 #> A3 -1.47134 0.94081 -1.564 #> #> Correlation of Fixed Effects: #> (Intr) A1 A2 #> A1 0.000 #> A2 0.000 -0.333 #> A3 0.000 -0.333 -0.333 #> convergence code: 0 #> boundary (singular) fit: see ?isSingular
Estimating the model using gANOVA()
reduces the constraint on the parameter space (of the variance of the intercept). The gANOVA
parametrization produces a non-zero estimate of the variance of the random intercept. It follows that the likelihood is smaller:
mod_gANOVA <- gANOVA(y_diff~ A+ (1|pt|A),data=df,contrasts = list(A = contr.sum)) summary(mod_gANOVA) #> Linear mixed model fit by REML. t-tests use Satterthwaite's method [ #> lmerModgANOVA] #> Formula: y_diff ~ A + (1 | pt | A) #> Data: df #> Control: structure(list(optimizer = "nloptwrap", restart_edge = TRUE, #> boundary.tol = 1e-05, calc.derivs = TRUE, use.last.params = FALSE, #> checkControl = list(check.nobs.vs.rankZ = "ignore", check.nobs.vs.nlev = "stop", #> check.nlev.gtreq.5 = "ignore", check.nlev.gtr.1 = "stop", #> check.nobs.vs.nRE = "stop", check.rankX = "message+drop.cols", #> check.scaleX = "warning", check.formula.LHS = "stop"), #> checkConv = list(check.conv.grad = list(action = "warning", #> tol = 0.002, relTol = NULL), check.conv.singular = list( #> action = "message", tol = 1e-04), check.conv.hess = list( #> action = "warning", tol = 1e-06)), optCtrl = list()), class = c("lmerControl", #> "merControl")) #> #> REML criterion at convergence: 830.9 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.61261 -0.61229 -0.07301 0.62167 2.68455 #> #> Random effects: #> Groups Name Variance Std.Dev. #> pt:A (Intercept) 9.799 3.130 #> pt (Intercept) 2.232 1.494 #> Residual 13.744 3.707 #> Number of obs: 144, groups: pt:A, 48; pt, 12 #> #> Fixed effects: #> Estimate Std. Error df t value Pr(>|t|) #> (Intercept) -0.12703 0.53048 10.99999 -0.239 0.815 #> A1 0.36270 0.94803 33.00006 0.383 0.704 #> A2 0.03153 0.94803 33.00006 0.033 0.974 #> A3 -1.47134 0.94803 33.00006 -1.552 0.130 #> #> Correlation of Fixed Effects: #> (Intr) A1 A2 #> A1 0.000 #> A2 0.000 -0.333 #> A3 0.000 -0.333 -0.333