Measuring the Heterogeneity of Treatment Effects with Multilevel Observational Data

  • Youmi SukEmail author
  • Jee-Seon Kim
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 265)


Multilevel latent class analysis and mixture propensity score models have been implemented to account for heterogeneous selection mechanisms and for proper causal inference with observational multilevel data (Kim & Steiner in Quantitative Psychology Research. Springer, Cham, pp. 293–306, 2015). The scenarios imply the existence of multiple selection classes, and if class membership is unknown, homogeneous classes can be usually identified via multilevel logistic latent class models. Although latent class random-effects logistic models are frequently used, linear models and fixed-effects models can be alternatives for identifying multiple selection classes and estimating class-specific treatment effects (Kim & Suk in Specifying Multilevel Mixture Models in Propensity Score Analysis. International Meeting of Psychometric Society, New York, 2018). Using the Korea TIMSS 2015 eighth-grade student data, this study examined the potentially heterogeneous treatment effects of private science lessons by inspecting multiple selection classes (e.g., different motivations to receive the lessons) using four types of selection models: random-effects logistic, random-effects linear, fixed-effects logistic, and fixed-effects linear models. Implications of identifying selection classes in casual inference with multilevel assessment data are discussed.


Causal inference Multilevel propensity score matching Finite mixture modeling Latent class analysis Selection bias Balancing scores Heterogeneous selection processes Heterogeneous treatment effects Hierarchical linear modeling 



Support for this research was provided by the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin–Madison with funding from the Wisconsin Alumni Research Foundation.


  1. Clogg, C. C. (1995). Latent class models. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 311–359). Boston, MA: Springer.CrossRefGoogle Scholar
  2. Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960.MathSciNetCrossRefGoogle Scholar
  3. Hong, G., & Hong, Y. (2009). Reading instruction time and homogeneous grouping in kindergarten: An application of marginal mean weighting through stratification. Educational Evaluation and Policy Analysis, 31(1), 54–81.CrossRefGoogle Scholar
  4. Hong, G., & Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association, 101, 901–910.MathSciNetCrossRefGoogle Scholar
  5. Imai, K., King, G., & Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society: Series A, 171(2), 481–502.MathSciNetCrossRefGoogle Scholar
  6. Kim, J. S., & Steiner, P. M. (2015). Multilevel propensity score methods for estimating causal effects: A latent class modeling strategy. In L. van der Ark, D. Bolt, W. C. Wang, J. Douglas, & S. M. Chow (Eds.), Quantitative psychology research (pp. 293–306)., Springer proceedings in mathematics & statistics Cham: Springer.CrossRefGoogle Scholar
  7. Kim, J.-S., Steiner, P. M., & Lim, W.-C. (2016). Mixture modeling strategies for causal inference with multilevel data. In J. R. Harring, L. M. Stapleton, & S. Natasha Beretvas (Eds.), Advances in multilevel modeling for educational research: Addressing practical issues found in real-world applications (pp. 335–359). Charlotte, NC: IAP—Information Age Publishing, Inc.Google Scholar
  8. Kim, J.-S., & Suk, Y. (2018, July). Specifying multilevel mixture models in propensity score analysis. Paper presented at the International Meeting of Psychometric Society, New York City, NY, US.Google Scholar
  9. Leite, W. (2016). Practical propensity score methods using R. Sage Publications.Google Scholar
  10. Martin, M. O., Mullis, I. V. S., & Hooper, M. (Eds.). (2016). Methods and procedures in TIMSS 2015. Retrieved from Boston College, TIMSS & PIRLS International Study Center website:
  11. McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.CrossRefGoogle Scholar
  12. Muthén, L. K., & Muthén, B. O. (1998–2017). Mplus user’s guide (8th ed.). Los Angeles, CA: Muthén & Muthén.Google Scholar
  13. Rosenbaum, P. R., & Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.MathSciNetCrossRefGoogle Scholar
  14. Steiner, P. M., & Cook, D. (2013). Matching and propensity scores. In T. Little (Ed.), The oxford handbook of quantitative methods (pp. 236–258). Oxford, England: Oxford University Press.Google Scholar
  15. Suk, Y., & Kim, J.-S. (2018, April). Linear probability models as alternatives to logistic regression models for multilevel propensity score analysis. Paper presented at the American Educational Research Association, New York City, NY, US.Google Scholar
  16. Wager, S. & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 1–15.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Educational Psychology, Educational Sciences BuildingUniversity of Wisconsin-MadisonMadisonUSA

Personalised recommendations