, Volume 68, Issue 3, pp 309–329 | Cite as

Selection bias in linear mixed models



The paper investigates the consequences of sample selection in multilevel or mixed models, focusing on the random intercept two-level linear model under a selection mechanism acting at both hierarchical levels. The behavior of sample selection and the resulting biases on the regression coefficients and on the variance components are studied both theoretically and through a simulation study. Most theoretical results exploit the properties of Normal and Skew-Normal distributions. The analysis allows to outline a taxonomy of sample selection in the multilevel framework that can support the qualitative assessment of the problem in specific applications and the development of suitable techniques for diagnosis and correction.

Key Words

Clustered data Multilevel model Random effects Sample selection Skew-Normal distributions Truncation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Arellano-Valle, R. B. and Azzalini, A. (2006) On the unification of families of skew-normal distributions, Scandinavian Journal of Statistics, 33, 561–574.MathSciNetMATHCrossRefGoogle Scholar
  2. Azzalini, A. and Dalla Valle, A. (1996) The multivariate skew-normal distribution, Biometrika, 83, 715–726.MathSciNetMATHCrossRefGoogle Scholar
  3. Bellio, R. and Gori, E. (2003) Impact evaluation of job training programmes: selection bias in multilevel models, Journal of Applied Statistics, 30, 893–907.MathSciNetMATHCrossRefGoogle Scholar
  4. Borgoni, R. and Billari, F. C. (2002) A multilevel sample selection probit model with an application to contraceptive use, In: Proceedings of the XLI meeting of the Italian Statistical Society, Padova, CLEUP.Google Scholar
  5. Follmann, D. and Wu, M. C. (1995) An approximate generalized linear model with random effects for informative missing data, Biometrics, 51, 151–168.MathSciNetMATHCrossRefGoogle Scholar
  6. Goldstein, H. (2003) Multilevel statistical models, (3rd), New York, Oxford University Press.Google Scholar
  7. Grilli, L. and Rampichini, C. (2007a) A multilevel multinomial logit model for the analysis of graduates’ skills, Statistical Methods and Applications, 16, 381–393.MathSciNetMATHCrossRefGoogle Scholar
  8. Grilli, L. and Rampichini, C. (2007b) Selection bias in linear mixed models, Electronic Working Papers of the Department of Statistics, No. 2007/10, University of Florence.Google Scholar
  9. Hausman, J. and Wise, D. (1979) Attrition bias in experimental and panel data: the Gary income maintenance experiment, Econometrica, 47, 455–473.CrossRefGoogle Scholar
  10. Heckman, J. (1979) Sample selection bias as a specificaton error, Econometrica, 47, 153–161.MathSciNetMATHCrossRefGoogle Scholar
  11. Jensen, P., Rosholm, M. and Verner, M. (2002) A Comparison of different estimators for panel data sample selection models, Economics Working Paper No. 2002-1, University of Aarhus.Google Scholar
  12. Kyriazidou, E. (1997) Estimation of a panel data sample selection model, Econometrica, 65, 1335–1364.MathSciNetMATHCrossRefGoogle Scholar
  13. Littell, R., Milliken, G., Stroup, W., Wolfinger, R. and Schabenberber, O. (2006) SAS for Mixed Models, Second Edition, Cary, SAS Institute Inc.Google Scholar
  14. Little, R. J. A. and Rubin, D. B. (2002) Statistical analysis with missing data, (2nd), New York, Wiley.Google Scholar
  15. Puhani, P. (2000) The heckman correction for sample selection and its critique, Journal of Economic Surveys, 14, 53–68.CrossRefGoogle Scholar
  16. Saha, C. and Jones, M. P. Asymptotic bias in the linear mixed effects model under non-ignorable missing data mechanisms, Journal of the Royal Statistical Society B, 67, 167–182.Google Scholar
  17. Skrondal, A. and Rabe-Hesketh, S. (2004) Generalized latent variable modeling: multilevel, longitudinal and structural equation models, Boca Raton, FL: Chapman & Hall/ CRC Press.MATHCrossRefGoogle Scholar
  18. Vella, F. (1998) Estimating models with sample selection bias: a survey, Journal of Human Resources, 33, 127–169.CrossRefGoogle Scholar
  19. Vella, F. and Verbeek, M. (1999) Two-step estimation of panel data models with censored endogenous variables and selection bias, Journal of Econometrics, 90, 239–263.MATHCrossRefGoogle Scholar
  20. Verbeke, G. and Molenberghs, G. (2000) Linear mixed models for longitudinal data, New York, Springer.MATHGoogle Scholar
  21. Wooldridge, J. (1995) Selection corrections for panel data models under conditional mean independece assumptions, Journal of Econometrics, 68, 115–132.MATHCrossRefGoogle Scholar
  22. Wooldridge, J. (2002) Econometric analysis of cross section and panel data, Cambridge, MA, The MIT Press.MATHGoogle Scholar
  23. Wu, M. and Carroll, R. (1988) Estimation and comparison of changes in the presence of informative censoring by modeling the censoring process, Biometrics, 44, 175–188.MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Sapienza Università di Roma 2010

Authors and Affiliations

  1. 1.Dipartimento di Statistica “GParenti” Università di FirenzeFirenzeItalia

Personalised recommendations