Analysis of single-case experimental count data using the linear mixed effects model: A simulation study
Abstract
When (meta-)analyzing single-case experimental design (SCED) studies by means of hierarchical or multilevel modeling, applied researchers almost exclusively rely on the linear mixed model (LMM). This type of model assumes that the residuals are normally distributed. However, very often SCED studies consider outcomes of a discrete rather than a continuous nature, like counts, percentages or rates. In those cases the normality assumption does not hold. The LMM can be extended into a generalized linear mixed model (GLMM), which can account for the discrete nature of SCED count data. In this simulation study, we look at the effects of misspecifying an LMM for SCED count data simulated according to a GLMM. We compare the performance of a misspecified LMM and of a GLMM in terms of goodness of fit, fixed effect parameter recovery, type I error rate, and power. Because the LMM and the GLMM do not estimate identical fixed effects, we provide a transformation to compare the fixed effect parameter recovery. The results show that, compared to the GLMM, the LMM has worse performance in terms of goodness of fit and power. Performance in terms of fixed effect parameter recovery is equally good for both models, and in terms of type I error rate the LMM performs better than the GLMM. Finally, we provide some guidelines for applied researchers about aspects to consider when using an LMM for analyzing SCED count data.
Keywords
Generalized linear mixed model Linear mixed model Single-case experimental design Monte Carlo simulationIntroduction
A single-case experimental design (SCED) is an experimental design where one subject, participant, or case is observed repeatedly over time, resulting in a time series. During this time series, one or more dependent variables are measured under different levels in order to assess the effect of the particular treatment or intervention (Onghena & Edgington, 2005). Often the time series includes at least one baseline phase and one treatment phase. Studies using an SCED design frequently report results of a small number of multiple cases. When generalizing the results of several SCED studies in a meta-analysis, the data of interest is then of a hierarchical nature: measurements are nested within cases, which in turn are nested within studies. This hierarchical nesting of the data can be taken into account elegantly by using hierarchical or multilevel modeling for statistical analysis (Van den Noortgate & Onghena 2003a, 2003b, 2008).
In the basic multilevel model for meta-analysis of SCED data as proposed in previous research (Raudenbush & Bryk 2002; Moeyaert et al., 2014; Shadish et al., 2008, 2013; Van den Noortgate & Onghena 2007), the observed scores for each case are assumed to be normally distributed around their expected value. However, Shadish and Sullivan (2011) have reported that the outcome variables measured in SCED studies are very often of a discrete rather than continuous nature, and for these discrete outcomes the assumption of conditional normality does not hold. To account for both the hierarchical and the count nature of SCED data, two frameworks can be combined: linear mixed modeling (LMM) (Hox, 2010; Gelman & Hill, 2009; Snijders & Bosker, 2012) and generalized linear modeling (GLM) (Gill, 2001; McCullagh & Nelder, 1999). Both frameworks have proven to provide very flexible tools. From their most basic forms, they expand into more specialized models in a clear and simple manner. Combining both frameworks results in a generalized linear mixed model (GLMM) (Hox, 2010; Gelman & Hill, 2009; Snijders & Bosker, 2012; Jiang, 2007), which is specified by (1) a distribution for the random effects, (2) a linear combination of predictor variables, (3) a function linking this linear predictor to the expected value of the response variable conditional on the random effects, and (4) a distribution for the response variable around this expected value. GLMMs can be very well customized to the particular type of data at hand, i.e., count data in SCED meta-analyses (Shadish et al., 2013).
One downside of the GLMM framework is that it is relatively complex to understand. Customizing a generalized linear mixed model requires a more general mathematical understanding of both the GLM and the LMM framework. Even though efficient estimation methods are available in many popular software packages (Zhou et al., 1999; Bates et al., 2015; Molenberghs et al., 2002) and even though these models have proven their robustness and their power (Abad et al., 2010; Capanu et al., 2013; Yau & Kuk, 2002), they might be somewhat intimidating for social scientists to apply. Another difficult aspect of the GLMM framework is that the more sophisticated the model, the more information is needed to make sure the GLMM estimation converges (Li & Redden, 2015; Abad et al., 2010). However, in SCED contexts typically a relatively small number of data points is available (Shadish & Sullivan, 2011) and this might result in less reliable GLMM estimates (Nelson & Leroux, 2008).
For an assessment of the current use of GLMMs in SCED contexts, we have access to data collected for a recent review conducted by the same team of authors of this simulation study (Jamshidi et al., 2017). This systematic review includes 178 systematic reviews and meta-analyses of SCED studies from the last three decades and includes a description of their study characteristics. Of the included studies, only 22 (12%) used hierarchical or mixed modeling and 19 of those were published after 2010. Only about half of these studies reported the type of measurement scale of the dependent variable, but those that did reported almost exclusively rates, percentages, or counts. Yet all of these 22 studies used an LMM rather than a GLMM. Together with the aforementioned issues of the complexity of the GLMM, this observation encourages us to look deeper into the consequences of misspecifying SCED count data with an LMM (which assumes normally distributed outcomes).
To this end, a simulation study is conducted in which count data with a hierarchical structure are generated according to a two-level GLMM, assuming a Poisson distribution of scores within the phases. The simulated datasets are analyzed by fitting the GLMM used for data generation, as well as by fitting a two-level LMM that assumes normality of the scores within phases. The main aim of this study is to investigate whether the GLMM, as the theoretically correctly specified model, outperforms the LMM across all conditions, and, if not, in which conditions the LMM performs well enough (or better).
As to the conditions in which the LMM leads to acceptable performance, we have two hypotheses. First, if the expected count responses in the baseline and/or treatment phase are relatively high, the LMM might perform relatively better than when the expected counts are small due to better normal approximations of Poisson distributions with larger expected values (Stroup, 2013). Second, if the sample size is small, the LMM might perform relatively better than the GLMM due to the GLMM being a too complex model to estimate when information is sparse (Hembry et al., 2015).
Various simulation conditions are taken into account. These conditions differ in the number of cases, the number of measurements within cases, the average baseline response, the average effect size and the true variance component values. To analyze the performance of the model fits, we look at common goodness of fit criteria, fixed effect parameter recovery, the Type I error rate and the power. The goal is to provide applied researchers with recommendations on the required criteria (e.g., the required sample size or the required average count in the baseline and/or treatment phase) for reliable analysis of count data with simpler LMMs.
Methodology
Simulation conditions
Design parameters
Timing of intervention for simulated cases
I | J | Starting point values |
---|---|---|
8 | 4 | \(\left (3,4,5,6\right )\) |
12 | 4 | \(\left (3,6,6,9\right )\) |
20 | 4 | \(\left (5,10,10,15\right )\) |
8 | 8 | \(\left (2,2,3,3,5,5,6,6\right )\) |
12 | 8 | \(\left (3,3,5,5,7,7,9,9\right )\) |
20 | 8 | \(\left (5,5,8,8,12,12,15,15\right )\) |
8 | 10 | \(\left (2,2,3,3,4,4,5,5,6,6\right )\) |
12 | 10 | \(\left (3,3,5,5,6,6,7,7,9,9\right )\) |
20 | 10 | \(\left (5,5,8,8,10,10,12,15,15\right )\) |
Model parameters
Categorization of the average baseline response and treatment response
Average Baseline Response | Average Treatment Response |
---|---|
Highly discrete (HD) | Highly discrete (HD) |
Highly discrete (HD) | Approximately continuous (AC) |
Approximately continuous (AC) | Approximately continuous (AC) |
Simulation conditions based on categorization of the average baseline response, treatment response, and effect size
\(\text {E}\left [\exp \left (\beta _{0j}\right )\right ]\) | \(\text {E}\left [\exp \left (\beta _{1j}\right )\right ]\) | \(\text {E}\left [\exp \left (\beta _{0j}\right )\exp \left (\beta _{1j}\right )\right ]\) | Baseline category | Treatment category | Multiplicative effect category |
---|---|---|---|---|---|
Average baseline response | Average treatment response | ||||
(a) σ_{u01} = 0 | |||||
4 | 1 | 4 | Highly discrete (HD) | Highly discrete (HD) | Zero |
4 | 1.5 | 6 | Highly discrete (HD) | Highly discrete (HD) | Small |
2 | 3.5 | 7 | Highly discrete (HD) | Highly discrete (HD) | Large |
4 | 3.5 | 14 | Highly discrete (HD) | Approximately continuous | Large |
20 | 1 | 20 | Approximately continuous (AC) | Approximately continuous (AC) | Zero |
30 | 1.6 | 48 | Approximately continuous (AC) | Approximately continuous (AC) | Small |
20 | 3.5 | 70 | Approximately continuous (AC) | Approximately continuous (AC) | Large |
(b) \(\sigma _{u01} = \log (1.05)\) | |||||
4 | 1 | 4.2 | Highly discrete (HD) | Highly discrete (HD) | Zero |
4 | 1.5 | 6.3 | Highly discrete (HD) | Highly discrete (HD) | Small |
2 | 3.5 | 7.35 | Highly discrete (HD) | Highly discrete (HD) | Large |
4 | 3.5 | 14.7 | Highly discrete (HD) | Approximately continuous (AC) | Large |
20 | 1 | 21 | Approximately continuous (AC) | Approximately continuous (AC) | Zero |
30 | 1.6 | 50.4 | Approximately continuous (AC) | Approximately continuous (AC) | Small |
20 | 3.5 | 73.5 | Approximately continuous (AC) | Approximately continuous (AC) | Large |
Analysis
Goodness of fit
Fixed effect parameter recovery
This observation leads to the following complication in this simulation study. Data are generated from the GLM model as defined in Eq. 1, with a nominal value for γ_{10}. Afterwards, the LMM as defined in Eq. 2 is fit, which yields an estimate \(\hat {\gamma }_{10}^{*}\). However, this \(\hat {\gamma }_{10}^{*}\) will not be comparable with the nominal γ_{10}, since γ_{10} and \(\gamma _{10}^{*}\) are two different parameters and they do not express the same concept.
To address this complication, two approaches are proposed. Both approaches provide a transformation of the parameters of one of the models into a new parameter. This new parameter is comparable to the fixed effect parameter of the other model and therefore a fixed parameter recovery assessment can be conducted based on the new parameter estimate from the first model and the fixed effect parameter estimate from the second model. Note that a general investigation of transformations of effect sizes based on the LMM to effect sizes based on the GLMM and vice versa is not within the scope of this paper, though this might be interesting for future research.
The first approach consists of a transformation of the GLMM parameters into a new parameter Δ_{G}, which expresses an effect size comparable to the fixed effect \(\gamma _{10}^{\ast }\) of the LMM. By comparing the estimate for Δ_{G} from the GLMM and the estimate for \(\gamma _{10}^{\ast }\) from the LMM we can assess the fixed effect parameter recovery. The second approach is analogous, but uses the LMM as a starting point instead. Based on a transformation the LMM parameters, it introduces a new fixed effect parameter Γ_{L} and this Γ_{L} is subsequently compared to γ_{10} to assess fixed parameter recovery.
Inference
The p values for the LMM are computed based on the approximate Wald F-test with Satterthwaite denominator degrees of freedom (Gumedze & Dunne, 2011; Satterthwaite, 1946). The underlying p values for the GLMM are computed based on an approximate Wald Z-test. The choice of Z-test for inference based on the GLMM was due to practical constraints with lme4 (Bates et al., 2015), the package we used for simulation in R (R Core Team, 2017). We elaborate on this further in Appendix B. The significance level α is set to .05.
Simulation conditions
Simulation condition factors summary
Parameter | Value | Motivation |
---|---|---|
γ _{00} | \(\log \left (2\right ) - \frac {\sigma _{u0}^{2}}{2}\) | Average baseline response highly discrete: |
\(\text {E}\left [\exp \left (\beta _{0j}\right )\right ] = 2\) | ||
\(\log \left (4\right ) - \frac {\sigma _{u0}^{2}}{2}\) | Average baseline response highly discrete: | |
\(\text {E}\left [\exp \left (\beta _{0j}\right )\right ] = 4\) | ||
\(\log \left (20\right ) - \frac {\sigma _{u0}^{2}}{2}\) | Average baseline response approximately normal: | |
\(\text {E}\left [\exp \left (\beta _{0j}\right )\right ] = 20\) | ||
γ _{10} | 0 | To test H_{0} : γ_{10} = 0 |
\(-\frac {\sigma _{u1}^{2}}{2}\) | To test \(H_{0} : \gamma _{10}^{*} = 0 \Leftrightarrow \left [\left (\gamma _{10} = -\frac {\sigma _{u1}^{2}}{2}\right ) \wedge \left (\sigma _{u01} = 0\right )\right ]\) | |
\(\log \left (3.5\right ) - \frac {\sigma _{u1}^{2}}{2}\) | Larger average multiplicative effect: | |
\(\text {E}\left [\exp \left (\beta _{1j}\right )\right ] = 3.5\) | ||
\(\sigma _{u0}^{2}\) | \(\left [\log \left (1.35\right )\right ]^{2}\) | \(\text {Var}\left [\exp \left (\beta _{0j}\right )\right ] = 35\%\cdot \text {E}\left [\exp \left (\beta _{0j}\right )\right ]\) |
\(\left [\log \left (1.50\right )\right ]^{2}\) | \(\text {Var}\left [\exp \left (\beta _{0j}\right )\right ] = 50\%\cdot \text {E}\left [\exp \left (\beta _{0j}\right )\right ]\) | |
\(\sigma _{u1}^{2}\) | \(\left [\log \left (1.35\right )\right ]^{2}\) | \(\text {Var}\left [\exp \left (\beta _{1j}\right )\right ] = 35\%\cdot \text {E}\left [\exp \left (\beta _{1j}\right )\right ]\) |
\(\left [\log \left (1.50\right )\right ]^{2}\) | \(\text {Var}\left [\exp \left (\beta _{1j}\right )\right ] = 50\%\cdot \text {E}\left [\exp \left (\beta _{1j}\right )\right ]\) | |
σ _{ u01} | 0 | To test \(H_{0} : \gamma _{10}^{*} = 0 \Leftrightarrow \left [\left (\gamma _{10} = -\frac {\sigma _{u1}^{2}}{2}\right ) \wedge \left (\sigma _{u01} = 0\right )\right ]\) |
\(\log \left (1.05\right )\) | Small influence on multiplicative effect: | |
\(\exp \left (\sigma _{u01}\right ) = 1.05\) | ||
I | 8 | Common SCED values |
12 | ||
20 | ||
J | 4 | Common SCED values |
8 | ||
10 |
Summary of results
Eta-squared values (η^{2}) for association of design factors with outcomes
S _{AIC} | S _{BIC} | MSE Δ | MSE Γ | RB Δ | RB Γ | Type I error rate | Power | |
---|---|---|---|---|---|---|---|---|
Model | .0004 | .0002 | .0853 | .0833 | .3909 | .0325 | ||
I | .0230 | .0141 | .0099 | .0177 | .0045 | .0442 | .0296 | .0003 |
J | .0572 | .0396 | .0976 | .0809 | .0080 | .0050 | .0478 | .0137 |
Baseline-treatment category | .6188 | .6510 | .0304 | .0553 | .0011 | .0388 | .2487 | .0022 |
Effect size | .0645 | .0589 | .2453 | .3288 | .0438 | .2464 | .8835 | |
Model:I | .0001 | .0001 | .0004 | .0007 | .0034 | .0000 | ||
Model:J | .0005 | .0001 | .0136 | .0005 | .0658 | .0121 | ||
Model:(Baseline-treatment category) | .0001 | .0002 | .0021 | .0039 | .0421 | .0000 | ||
Model:(Effect size) | .0002 | .0001 | .0306 | .0910 | .0072 | |||
.7635 | .7636 | .3844 | .4833 | .1892 | .5139 | .8283 | .9515 |
For graphical purposes, the baseline-treatment categories from Table 3 are denoted as follows in the graphical results: a highly discrete average baseline response and a highly discrete average treatment response is denoted as category ‘HD-HD’ (from ‘highly discrete - highly discrete’), a highly discrete average baseline response and an approximately continuous average treatment response is denoted as category ‘HD-AC’ (from ‘highly discrete - approximately continuous’) and finally an approximately normal average baseline response and an approximately normal average treatment response is denoted as category ‘AC-AC’ (from ‘approximately continuous - approximately continuous’).
Software
We use the open-source R software (R Core Team, 2017) to generate and analyze the SCED count data. The LMM and the GLMM are estimated through the lmer() and glmer() functions, respectively, both available in the lme4 package (Bates et al., 2015). Using the default argument settings, the lmer() function provides restricted maximum likelihood (REML) estimates for the LMM parameters and the glmer() function provides estimates based on a Gauss–Hermite quadrature approximation of the log-likelihood function. In Appendix B, we provide some R code samples and explain how we obtained and analyzed the LMM and GLMM estimates.
Results
Goodness of fit criteria
Fixed effect parameter recovery
For the statistics Δ_{G} (Eq. 14), Δ_{L} (Eq. 16), Γ_{G} (Eq. 19) and Γ_{L} (Eq. 21), a simple linear regression analysis is conducted to study the relation between the LMM and the GLMM estimators for Δ and Γ. The fitted model predicts the LMM estimate based on the GLMM estimates. A significant regression equation was found for both the Δ and the Γ estimates, with an R^{2} of .9986 (Δ_{L} = 0.0225 + 1.0042 ⋅Δ_{G}) and .9963 (Γ_{L} = − 0.0066 + 1.0088 ⋅Γ_{G}), respectively. This is an important result because it allows for comparison between the GLMM and the LMM based on their parameter estimates. Now that it is clear that there is a way to compare the fixed effect estimations of the GLMM and the LMM, the next step is to assess which model provides the best fixed effect estimator. To assess the quality of the Δ_{G}, Δ_{L}, Γ_{G} and Γ_{L} as estimators, the relative bias and the relative MSE of all four are analyzed. Note that conditions where Δ = 0 or Γ = 0 were left out in order to be able to calculate a finite relative bias and relative MSE.
Overall mean relative bias and relative mean squared error for the Δ and Γ parameter estimates
Parameter | Model | Relative MSE | Relative bias |
---|---|---|---|
\(\hat {\Delta }\) ^{a} | GLMM | 7.5580 | .0434 |
LMM | 7.1689 | .0025 | |
\(\hat {\Gamma }\) ^{b} | GLMM | 6.2523 | .0246 |
LMM | 5.9945 | .0825 |
Inference
Type I error rates
Type I error rate | ||
---|---|---|
Baseline-treatment category | GLMM^{a} | LMM^{b} |
HD - HD | .07 | .04 |
AC - AC | .12 | .05 |
Proportion rejections in function of Γ
Power | ||||||
---|---|---|---|---|---|---|
J = 4 | J = 8 | J = 10 | ||||
Γ | GLMM^{a} | LMM^{b} | GLMM^{a} | LMM^{b} | GLMM^{a} | LMM^{b} |
− 0.0822 | .13 | .04 | .11 | .05 | .11 | .05 |
− 0.045 | .10 | .03 | .09 | .04 | .08 | .04 |
1.1706 | .99 | .43 | 1.00 | .89 | 1.00 | .91 |
1.2077 | 1.00 | .56 | 1.00 | .93 | 1.00 | .95 |
Proportion rejections in function of Δ
Power | ||||||
---|---|---|---|---|---|---|
J = 4 | J = 8 | J = 10 | ||||
Δ | GLMM^{a} | LMM^{b} | GLMM^{a} | LMM^{b} | GLMM^{a} | LMM^{b} |
0.0921 | .06 | .02 | .06 | .03 | .06 | .03 |
0.1 | .07 | .02 | .07 | .03 | .07 | .03 |
0.1713 | .09 | .03 | .07 | .04 | .07 | .04 |
0.1842 | .08 | .03 | .07 | .04 | .07 | .04 |
0.1967 | .06 | .02 | .05 | .03 | .05 | .04 |
0.2 | .09 | .03 | .09 | .04 | .08 | .04 |
0.2799 | .08 | .02 | .06 | .03 | .06 | .04 |
0.3427 | .11 | .03 | .09 | .04 | .08 | .04 |
0.3935 | .08 | .02 | .07 | .04 | .07 | .04 |
0.5598 | .11 | .03 | .08 | .04 | .08 | .05 |
0.9212 | .15 | .04 | .10 | .04 | .10 | .05 |
1 | .17 | .05 | .13 | .05 | .12 | .05 |
1.7135 | .17 | .04 | .11 | .05 | .10 | .05 |
1.9673 | .15 | .04 | .10 | .04 | .09 | .05 |
2.7992 | .16 | .04 | .11 | .05 | .09 | .05 |
5 | .98 | .46 | 1.00 | .87 | 1.00 | .90 |
5.35 | .98 | .38 | 1.00 | .87 | 1.00 | .91 |
10 | 1.00 | .54 | 1.00 | .92 | 1.00 | .93 |
10.7 | .99 | .44 | 1.00 | .90 | 1.00 | .93 |
50 | 1.00 | .63 | 1.00 | .95 | 1.00 | .96 |
53.5 | 1.00 | .50 | 1.00 | .93 | 1.00 | .96 |
Discussion
With this simulation study, we wanted to see whether the GLMM consistently outperforms the LMM, and, if not, in which cases the LMM has an acceptable performance. Three aspects of both models have been considered to assess their performance: goodness of fit, fixed effect parameter recovery, and inference.
In terms of goodness of fit, the LMM does in general not perform as well as the GLMM. In Fig. 1, a vast majority of the S_{AIC} scores lies above 0, indicating that the AIC of the GLMM is generally lower than the AIC of the LMM according to Eq. 11. Only when the baseline and treatment average responses are relatively high and when the number of cases is very small (J = 4) does the LMM achieve a goodness of fit comparable to that of the GLMM. In conditions with very sparse information, the more complex GLMM has a disadvantage compared to the LMM. Additionally, the LMM has the advantage that the baseline and treatment phase averages of the underlying count data are high and that the LMM therefore provides a good normal approximation of the data.
To assess the performance of both models in terms of fixed effect parameter recovery, we compared their parameter estimators \(\hat {\Delta }_{G}\) vs. \(\hat {\Delta }_{L}\) and \(\hat {\Gamma }_{G}\) vs. \(\hat {\Gamma }_{L}\). The most important measure of quality of an estimator is the MSE because it encompasses both the bias and the variance. A qualitative estimator should have an MSE as small as possible, i.e., a bias of zero and a small variance. From Table 6 and from Figs. 5 and 6 it is clear that the MSEs of the estimators of both models are on average very alike, with a slight advantage for the model which can directly estimate the parameter (i.e., the LMM for Δ and the GLMM for Γ).
In terms of inference, the first step in comparing the performance of the LMM with the GLMM is to look at the type I error rate. As seen in Table 7, the type I error rate of the LMM is better under control than the rate when using the GLMM. Although this might seem surprising, similar good behavior of less complex albeit misspecified (generalized) linear mixed models on small sample data has been observed (Bell et al., 2014). The more complex models, even though theoretically better fit to model the data, might function poorly when making too many estimates from too few pieces of information (Muth et al., 2016). Since the type I error rate of the LMM is under control, the next step is to look at its power. From Tables 8 and 9 it is clear that the LMM does not obtain the same power as the GLMM, not even for large effects. Only when the effect size and the number of cases J are large (Δ ≥ 5 or Γ ≥ 1.1706, and J ≥ 8) does the power of the LMM reach a level of 80%. This was true for all values of I (the number of measurements) considered in our simulation.
For applied research, a crucial next question is when is it acceptable to use an LMM to analyze single-case count data? In terms of goodness of fit, the LMM only yields acceptable AICs (i.e., AICs as low or lower than those of the GLMM) if the count data are well approximated by a normal distribution in both the baseline and the treatment phase and if the sample size (and especially the number of cases J) is very small. However, even in those conditions the LMM obtains a goodness of fit that is only 10% worse than that of the GLMM (Fig. 1). If this is considered acceptable, we recommend using the LMM in situations where the estimated effect size and the number of cases are reasonably large (J ≥ 8), to ensure an acceptable power and unbiased fixed effect estimates.
When it comes to selecting an effect size to express the fixed effect, applied researchers need to determine whether they have a specific interest in either the additive effect expressed by Δ or the effect expressed by Γ. It makes sense to opt for the additive effect as expressed by Δ because it is more easily interpretable. Moreover, its estimate \(\hat {\Delta }\) is readily available from the applied LMM as it does not need any transformation. Since this simulation study has provided some quantitative evidence of the good performance of the Δ_{L} estimator in terms of relative bias and relative MSE, the use of the LMM to model single-case count data to obtain an estimate for Δ would not be discouraged, even though it is an overly simplified model. Inference based on Δ_{L} is valid, because the type I error rate of Δ_{L} is under control and behaves well in all conditions. Again, caution is advised when doing inference based on Δ_{L} if the effect size or the number of cases is small, since then the power might not be acceptable.
When practitioners want to estimate the effect size Γ, it is preferable to use the GLMM to avoid the manipulations required to get the Γ_{L} estimate from the LMM (as illustrated in Appendix B). The GLMM will result in a slightly higher bias for Γ, but a lower MSE, compared to modeling a LMM and estimating Δ. Even when using the GLMM to estimate Γ, however, there might be up to 10% relative bias in the estimates for some conditions, particularly when the effect size is not large and the amount of data for estimation is limited. When the sample sizes increase (i.e., the measurement series gets longer and the number of participants increases), this bias disappears.
If practitioners decide to model SCED count data using the GLMM with a Poisson distribution (Eq. 1), they need to be aware of the assumptions associated with the Poisson distribution (Winkelmann, 2008). First of all, the length of time intervals or session during which the counts are measured has to be the same across the entire time series. Applied researchers might do this already intuitively to make counts comparable over sessions, or based on good practices recommended by single-case handbooks (Ayres and Gast, 2010). In case the time series includes sessions of different lengths, the GLMM (1) can be adjusted to account for this by including an offset (Casals et al., 2015). As such, the outcome modeled is a rate rather than a count.
Another assumption to be taken into account when modeling a Poisson distribution is that the rate of occurrence across each time interval or session has to be constant; that is, the probability of occurrence of the measured event should be constant throughout each time interval. This assumption might be violated when an observed participant is disturbed by an external event or factor during a measuring session and when this disturbance has a temporary impact on the measured outcome. For example, when measuring problem behavior in a classroom environment, an observed participant might show temporarily increased problem behavior when a classmate initiates a fight with the participant during a measuring session. To lessen the likelihood of external factors impacting the rate of occurrence, practitioners can try to keep the length of measuring sessions short.
A final assumption of the Poisson distribution is that the events occurring in different time intervals should be independent. This assumption is violated when autocorrelation is present in the data. Practitioners can try to avoid this from happening by making sure their measuring sessions are far enough apart in time.
The results presented in this study have limitations inherent to all simulation studies, i.e., they are conditional on the simulation design and parameter values used. Because this study is the first of its kind, we have used the most basic GLMM design to simulate data and all simulation conditions were exclusively based on sample size and nominal parameter values. Naturally, there is much more to a GLMM design than these two aspects, and the many GLMM design extensions could all provide starting points for further exploration of the impact of model misspecifications for count data. These extensions include: (1) using alternative probability distributions to sample the dependent variable from, such as the binomial or other discrete distributions (to model discrete proportions or percentages), zero-inflated distributions and distributions fit for over-dispersed data; (2) specifying a specific covariance structure and as such modeling autocorrelation, rather than using an unstructured covariance matrix like in this study; (3) simulating data with variable exposure (i.e., the frequency of the behavior of interest is not tallied across the same period of time at each measurement occasion); (4) including linear or non-linear trends in the simulated data and in the fitted models; (5) using different single-case design types, e.g., alternating treatments or phase changes with reversal, rather than the multiple baseline AB design used in this study; and (6) simulating unbalanced data.
We focused mainly on the average treatment effect when comparing the results of the LMM and the GLMM estimations. This is in line with common practice, where applied researchers who are combining SCED count data are usually primarily interested in the average treatment effects (as expressed by Γ and Δ in this study), rather than in the individual treatment effects or the variance components. Moreover, just like average treatment effect estimations, individual effect and variance component estimations are not comparable between the LMM and the GLMM. Attempting to compare them would involve a similar and arguably even more complex method of transformation as illustrated for Γ and Δ. This is beyond the scope of this study.
Finally, we want to point out that inference results of the GLMM are based on an approximate Wald Z-test, which is likely to misspecify the sampling distribution of the Wald statistic as normal, especially in small samples. As explained in Appendix B, this was due to a lack of available procedures in the lme4 package in R. In SAS, the PROC GLIMMIX procedure does include the option to set different degrees of freedom approximations to adjust for small sample sizes. It would be very useful to reanalyze our simulated datasets in SAS to see whether the inference results lead to substantially different conclusions from the conclusions we drew based on the R p values.
Conclusions
This simulation study showed that the GLMM in general does not substantially outperform the LMM, except in terms of the goodness of fit criteria. For the small sample sizes that we have considered, and which are common to SCED count datasets, we have found that the LMM does equally well as the GLMM in terms of fixed effect parameter recovery. In terms of inference, the type I error rates of the LMM are more under control than those of the GLMM. The power of the LMM is generally lower than the power of the GLMM, but the LMM might provide acceptable power for SCED samples with a sufficient number of cases. This simulation provided some evidence that the GLMM might not necessarily be the better choice when it comes to very sparse SCED count data due to the model being too complex to estimate. Evidence for relatively better performance of the LMM if the expected count responses in baseline and/or treatment phases are relatively high was not so clear. Based on our results, we have provided some guidelines for applied researchers. Reviewers or meta-analysts using mixed modeling to combine SCED studies should be well aware of the effects of misspecifying their mixed model for discrete data. Their model choice should be well considered based on the type of raw data included and on the sample sizes.
References
- Abad, A. A., Litière, S., & Molenberghs, G. (2010). Testing for misspecification in generalized linear mixed models. Biostatistics, 11(4), 771–786. https://doi.org/10.1093/biostatistics/kxq019.CrossRefPubMedGoogle Scholar
- Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In E. Parzen, K. Tanabe, & G. Kitagawa (Eds.) Selected Papers of Hirotogu Akaike. https://doi.org/10.1007/978-1-4612-1694-0_15 (pp. 199–213). New York: Springer.
- Ayres, K., & Gast, D. L. (2010). Dependent measures and measurement procedures. In D.L. Gast (Ed.) Single subject research methodology in behavioral sciences (pp. 129–165). New York: Routledge.Google Scholar
- Barlow, D. H., & Hersen, M. (1984) Single-case experimental designs: Strategies for studying behavior change, (2nd edn.) New York: Pergamon.Google Scholar
- Bates, D., et al. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67 (1), 1–48. https://doi.org/10.18637/jss.v067.i01 .CrossRefGoogle Scholar
- Bell, B. A., et al. (2014). How low can you go? An investigation of the influence of sample size and model complexity on point and interval estimates in two-level linear models. Methodology, 10(1), 1–11. https://doi.org/10.1027/1614-2241/a000062.CrossRefGoogle Scholar
- Capanu, M., Gönen, M., & Begg, C. B. (2013). An assessment of estimation methods for generalized linear mixed models with binary outcomes. Statistics in Medicine, 32(26), 4550–4566. https://doi.org/10.1002/sim.5866.CrossRefPubMedGoogle Scholar
- Casals, M., et al. (2015). Parameter estimation of Poisson generalized linear mixed models based on three different statistical principles: A simulation study. Statistics and Operations Research Transactions, 39(2), 281–308.Google Scholar
- Claeskens, G., & Jansen, M. (2015). Model selection and model averaging. In International encyclopedia of the social & behavioral sciences. (2nd edn.) (pp. 647–652). Oxford: Elsevier.Google Scholar
- Cohen, J. (1988) Statistical power analysis for the behavioral sciences, (2nd edn.) Hillsdale: Erlbaum.Google Scholar
- Ferron, J. M., Farmer, J. L., & Owens, C. M. (2010). Estimating individual treatment effects from multiple-baseline data: A Monte Carlo study of multilevel-modeling approaches. Behavior Research Methods, 42(4), 930–943. https://doi.org/10.3758/BRM.42.4.930.CrossRefPubMedGoogle Scholar
- Gelman, A., & Hill, J. (2009) Data analysis using regression and multilevel/hierarchical models, (11th edn.) Cambridge: Cambridge University Press.Google Scholar
- Gill, J. (2001) Generalized linear models: A unified approach. Thousand Oaks (CA): Sage Publications.CrossRefGoogle Scholar
- Gumedze, F. N., & Dunne, T. T. (2011). Parameter estimation and inference in the linear mixed model. Linear Algebra and its Applications, 435(8), 1920–1944. https://doi.org/10.1016/j.laa.2011.04.015.CrossRefGoogle Scholar
- Hembry, I., et al. (2015). Estimation of a nonlinear intervention phase trajectory for multiple-baseline design data. Journal of Experimental Education, 83(4), 514–546. https://doi.org/10.1080/00220973.2014.907231.CrossRefGoogle Scholar
- Hox, J. J. (2010) Multilevel analysis: techniques and applications, (2nd edn.) New York: Routledge.Google Scholar
- Jamshidi, L., et al. (2017). Review of single-subject experimental design meta-analyses and reviews: 1985-2015. Manuscript submitted for publication.Google Scholar
- Jiang, J. (2007) Linear and generalized linear mixed models and their applications. New York: Springer.Google Scholar
- Johnson, N. L., Kemp, A. W., & Kotz, S. (2005) Univariate discrete distributions, (3rd edn.), (p. 646). New York: Wiley. https://doi.org/10.1002/0471715816.CrossRefGoogle Scholar
- Kazdin, A. E., & Kopel, S. A. (1975). On resolving ambiguities of the multiple-baseline design: Problems and recommendations. Behavior Therapy, 6(5), 601–608. https://doi.org/10.1016/S0005-7894(75)80181-X.CrossRefGoogle Scholar
- Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26. https://doi.org/10.18637/jss.v082.i13.CrossRefGoogle Scholar
- Li, P., & Redden, D. T. (2015). Comparing denominator degrees of freedom approximations for the generalized linear mixed model in analyzing binary outcome in small sample cluster-randomized trials. BMC Medical Research Methodology, 15(1), 38. https://doi.org/10.1186/s12874-015-0026-x.CrossRefPubMedPubMedCentralGoogle Scholar
- McCullagh, P., & Nelder, J. A. (1999) Generalized linear models, (2nd edn.) London: Chapman and Hall.Google Scholar
- Moeyaert, M., et al. (2013). The three-level synthesis of standardized single-subject experimental data: A Monte Carlo simulation study. Multivariate Behavioral Research, 48(5), 719–748. https://doi.org/10.1080/00273171.2013.816621.CrossRefPubMedGoogle Scholar
- Moeyaert, M., et al. (2014). Three-level analysis of single-case experimental data: Empirical validation. The Journal of Experimental Education, 82(1), 1–21. https://doi.org/10.1080/00220973.2012.745470.CrossRefGoogle Scholar
- Molenberghs, G., Renard, D., & Verbeke, G. (2002). A review of generalized linear mixed models. Journal de la Société Française de statistique, 143(1), 53–78.Google Scholar
- Muth, C., et al. (2016). Alternative models for small samples in psychological research: Applying linear mixed effects models and generalized estimating equations to repeated measures data. Educational and Psychological Measurement, 76(1), 64–87. https://doi.org/10.1177/0013164415580432. .CrossRefPubMedGoogle Scholar
- Nelson, K. P., & Leroux, B. G. (2008). Properties and comparison of estimation methods in a log-linear generalized linear mixed model. Journal of Statistical Computation and Simulation, 78(3), 367–384. https://doi.org/10.1080/10629360601023599.CrossRefGoogle Scholar
- Onghena, P., & Edgington, E. S. (2005). Customization of pain treatments: Single-case design and analysis. The Clinical Journal of Pain, 21(1), 56–68. https://doi.org/10.1097/00002508-200501000-00007.CrossRefPubMedGoogle Scholar
- R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/.
- Raudenbush, S. W., & Bryk, A. S. (2002) Hierarchical linear models: Applications and data analysis methods, (2nd edn.), (p. 485). Thousand Oaks: Sage Publications.Google Scholar
- Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2(6), 110–114.CrossRefPubMedGoogle Scholar
- Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.CrossRefGoogle Scholar
- Shadish, W. R., Kyse, E. N., & Rindskopf, D. M. (2013). Analyzing data from single-case designs using multilevel models: New applications and some agenda items for future research. Psychological Methods, 18(3), 385–405. https://doi.org/10.1037/a0032964.CrossRefPubMedGoogle Scholar
- Shadish, W. R., Rindskopf, D. M., & Hedges, L. V. (2008). The state of the science in the meta-analysis of single-case experimental designs. Evidence-Based Communication Assessment & Intervention, 2(3), 188–196. https://doi.org/10.1080/17489530802581603.CrossRefGoogle Scholar
- Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods, 43(4), 971–980. https://doi.org/10.3758/s13428-011-0111-y.CrossRefPubMedGoogle Scholar
- Snijders, T. A. B., & Bosker, R. J. (2012) Multilevel analysis: An introduction to basic and advanced multilevel modeling, (2nd edn.) London: Sage Publications.Google Scholar
- Stroup, Walter W. (2013) Generalized linear mixed models: Modern concepts, methods and applications. Boca Raton: CRC Press.Google Scholar
- Swanson, H., & Sachse-Lee, C. (2000). A meta-analysis of single-subject-design intervention research for students with LD. Journal of Learning Disabilities, 33(2), 114–136.CrossRefPubMedGoogle Scholar
- Van den Noortgate, W., & Onghena, P. (2003a). Combining single-case experimental data using hierarchical linear models. School Psychology Quarterly, 18(3), 325–346. https://doi.org/10.1521/scpq.18.3.325.22577.
- Van den Noortgate, W., & Onghena, P. (2003b). Hierarchical linear models for the quantitative integration of effect sizes in single-case research. Behavior Research Methods, Instruments, & Computers, 35(1), 1–10. https://doi.org/10.3758/BF03195492.
- Van den Noortgate, W., & Onghena, P. (2007). The aggregation of single-case results using hierarchical linear models. Behavior Analyst Today, 8(2), 52–57. https://doi.org/10.1037/h0100613.Google Scholar
- Van den Noortgate, W., & Onghena, P. (2008). A multilevel meta-analysis of single-subject experimental design studies. Evidence-Based Communication Assessment and Intervention, 2(3), 142–151. https://doi.org/10.1080/17489530802505362.CrossRefGoogle Scholar
- Winkelmann, R. (2008) Econometric analysis of count data, (4th edn.) Berlin: Springer. isbn: 9783540776482.Google Scholar
- Yau, Kelvin K. W., & Kuk, Anthony Y. C. (2002). Robust estimation in generalized linear mixed models. Journal of the Royal Statistical Society Series B (Statistical Methodology), 64(1), 101–117.CrossRefGoogle Scholar
- Zhou, X. -H., Perkins, A. J., & Hui, S. L. (1999). Comparisons of software packages for generalized linear multilevel models. The American Statistician, 53(3), 282–290. https://doi.org/10.2307/2686112.Google Scholar