Skip to main content
Log in

Selection bias in linear mixed models

  • Published:
METRON Aims and scope Submit manuscript

Summary

The paper investigates the consequences of sample selection in multilevel or mixed models, focusing on the random intercept two-level linear model under a selection mechanism acting at both hierarchical levels. The behavior of sample selection and the resulting biases on the regression coefficients and on the variance components are studied both theoretically and through a simulation study. Most theoretical results exploit the properties of Normal and Skew-Normal distributions. The analysis allows to outline a taxonomy of sample selection in the multilevel framework that can support the qualitative assessment of the problem in specific applications and the development of suitable techniques for diagnosis and correction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arellano-Valle, R. B. and Azzalini, A. (2006) On the unification of families of skew-normal distributions, Scandinavian Journal of Statistics, 33, 561–574.

    Article  MathSciNet  MATH  Google Scholar 

  • Azzalini, A. and Dalla Valle, A. (1996) The multivariate skew-normal distribution, Biometrika, 83, 715–726.

    Article  MathSciNet  MATH  Google Scholar 

  • Bellio, R. and Gori, E. (2003) Impact evaluation of job training programmes: selection bias in multilevel models, Journal of Applied Statistics, 30, 893–907.

    Article  MathSciNet  MATH  Google Scholar 

  • Borgoni, R. and Billari, F. C. (2002) A multilevel sample selection probit model with an application to contraceptive use, In: Proceedings of the XLI meeting of the Italian Statistical Society, Padova, CLEUP.

  • Follmann, D. and Wu, M. C. (1995) An approximate generalized linear model with random effects for informative missing data, Biometrics, 51, 151–168.

    Article  MathSciNet  MATH  Google Scholar 

  • Goldstein, H. (2003) Multilevel statistical models, (3rd), New York, Oxford University Press.

    Google Scholar 

  • Grilli, L. and Rampichini, C. (2007a) A multilevel multinomial logit model for the analysis of graduates’ skills, Statistical Methods and Applications, 16, 381–393.

    Article  MathSciNet  MATH  Google Scholar 

  • Grilli, L. and Rampichini, C. (2007b) Selection bias in linear mixed models, Electronic Working Papers of the Department of Statistics, No. 2007/10, University of Florence.

  • Hausman, J. and Wise, D. (1979) Attrition bias in experimental and panel data: the Gary income maintenance experiment, Econometrica, 47, 455–473.

    Article  Google Scholar 

  • Heckman, J. (1979) Sample selection bias as a specificaton error, Econometrica, 47, 153–161.

    Article  MathSciNet  MATH  Google Scholar 

  • Jensen, P., Rosholm, M. and Verner, M. (2002) A Comparison of different estimators for panel data sample selection models, Economics Working Paper No. 2002-1, University of Aarhus.

  • Kyriazidou, E. (1997) Estimation of a panel data sample selection model, Econometrica, 65, 1335–1364.

    Article  MathSciNet  MATH  Google Scholar 

  • Littell, R., Milliken, G., Stroup, W., Wolfinger, R. and Schabenberber, O. (2006) SAS for Mixed Models, Second Edition, Cary, SAS Institute Inc.

    Google Scholar 

  • Little, R. J. A. and Rubin, D. B. (2002) Statistical analysis with missing data, (2nd), New York, Wiley.

    Google Scholar 

  • Puhani, P. (2000) The heckman correction for sample selection and its critique, Journal of Economic Surveys, 14, 53–68.

    Article  Google Scholar 

  • Saha, C. and Jones, M. P. Asymptotic bias in the linear mixed effects model under non-ignorable missing data mechanisms, Journal of the Royal Statistical Society B, 67, 167–182.

  • Skrondal, A. and Rabe-Hesketh, S. (2004) Generalized latent variable modeling: multilevel, longitudinal and structural equation models, Boca Raton, FL: Chapman & Hall/ CRC Press.

    Book  MATH  Google Scholar 

  • Vella, F. (1998) Estimating models with sample selection bias: a survey, Journal of Human Resources, 33, 127–169.

    Article  Google Scholar 

  • Vella, F. and Verbeek, M. (1999) Two-step estimation of panel data models with censored endogenous variables and selection bias, Journal of Econometrics, 90, 239–263.

    Article  MATH  Google Scholar 

  • Verbeke, G. and Molenberghs, G. (2000) Linear mixed models for longitudinal data, New York, Springer.

    MATH  Google Scholar 

  • Wooldridge, J. (1995) Selection corrections for panel data models under conditional mean independece assumptions, Journal of Econometrics, 68, 115–132.

    Article  MATH  Google Scholar 

  • Wooldridge, J. (2002) Econometric analysis of cross section and panel data, Cambridge, MA, The MIT Press.

    MATH  Google Scholar 

  • Wu, M. and Carroll, R. (1988) Estimation and comparison of changes in the presence of informative censoring by modeling the censoring process, Biometrics, 44, 175–188.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo Grilli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grilli, L., Rampichini, C. Selection bias in linear mixed models. METRON 68, 309–329 (2010). https://doi.org/10.1007/BF03263542

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03263542

Key Words

Navigation