Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

  • Jee-Seon KimEmail author
  • Peter M. Steiner
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 140)


Despite their appeal, randomized experiments cannot always be conducted, for example, due to ethical or practical reasons. In order to remove selection bias and draw causal inferences from observational data, propensity score matching techniques have gained increased popularity during the past three decades. Although propensity score methods have been studied extensively for single-level data, the additional assumptions and necessary modifications for applications with multilevel data are understudied. This is troublesome considering the abundance of nested structures and multilevel data in the social sciences. This study summarizes issues and challenges for causal inference with observational multilevel data in comparison with single-level data, and discusses strategies for multilevel matching methods. We investigate within- and across-cluster matching strategies and emphasize the importance of examining both overlap within clusters and potential heterogeneity in the data before pooling cases across clusters. We introduce a multilevel latent class logit model approach that encompasses the strengths of within- and across-matching techniques. Simulation results support the effectiveness of our method in estimating treatment effects with multilevel data even when selection processes vary across clusters and a lack of overlap exists within clusters.


Propensity score matching Multilevel models Hierarchical linear models Latent class analysis Finite mixture models Causal inference 



This research was in part supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305D120005. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.


  1. Arpino, B., & Mealli, F. (2011). The specification of the propensity score in multilevel studies. Computational Statistics and Data Analysis, 55, 1770–1780.MathSciNetCrossRefGoogle Scholar
  2. Clogg, C. C. (1995). Latent class models. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook of statistical modeling for the social and behavioral sciences (Ch. 6, pp. 311–359). New York: Plenum.Google Scholar
  3. Grün, B., & Leisch, F. (2008). FlexMix version 2: Finite mixtures with concomitant variables and varying and constant parameters. Journal of Statistical Software, 28, 1–35.Google Scholar
  4. Hong, G., & Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multi-level observational data. Journal of the American Statistical Association, 101, 901–910.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Kelcey, B. M. (2009). Improving and assessing propensity score based causal inferences in multilevel and nonlinear settings. Dissertation at The University of Michigan.
  6. Kim, J., & Seltzer, M. (2007). Causal inference in multilevel settings in which selection process vary across schools. Working Paper 708, Center for the Study of Evaluation (CSE), UCLA: Los Angeles.Google Scholar
  7. McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: Wiley.CrossRefzbMATHGoogle Scholar
  8. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.MathSciNetCrossRefzbMATHGoogle Scholar
  9. Schafer, J. L., & Kang, J. (2008). Average causal effects from non-randomized studies: A practical guide and simulated example. Psychological Methods, 13(4), 279–313.CrossRefGoogle Scholar
  10. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasiexperimental designs for generalized causal inference. Boston, MA: Houghton-Mifflin.Google Scholar
  11. Steiner, P. M., & Cook, D. L. (2013). Matching and propensity scores. In T. D. Little (Ed.), The Oxford handbook of quantitative methods, volume 1, foundations. New York, NY: Oxford University Press.Google Scholar
  12. Steiner, P. M., Kim, J.-S., & Thoemmes, F. (2013). Matching strategies for observational multilevel data. In JSM proceedings (pp. 5020–5032). Alexandria, VA: American Statistical Association.Google Scholar
  13. Stuart, E. (2007). Estimating causal effects using school-level datasets. Educational Researcher, 36, 187–198.CrossRefGoogle Scholar
  14. Thoemmes, F., & Kim, E. S. (2011). A systematic review of propensity score methods in the social sciences. Multivariate Behavioral Research, 46, 90–118.CrossRefGoogle Scholar
  15. Thoemmes, F., & West, S. G. (2011). The use of propensity scores for nonrandomized designs with clustered data. Multivariate Behavioral Research, 46, 514–543.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.University of Wisconsin-MadisonMadisonUSA

Personalised recommendations