Doubly Robust Estimation of Treatment Effects from Observational Multilevel Data

  • Courtney E. HallEmail author
  • Peter M. Steiner
  • Jee-Seon Kim
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 140)


When randomized experiments cannot be conducted, propensity score (PS) matching and regression techniques are frequently used for estimating causal treatment effects from observational data. These methods remove bias caused by baseline differences in the treatment and control groups. Instead of using a PS technique or an outcome regression singly, one might use a doubly robust estimator that combines a PS technique (matching, stratification, or inverse propensity weighting) with an outcome regression in an attempt to address bias more effectively. Theoretically, if the PS or outcome model is correctly specified, a doubly robust estimator will produce an unbiased estimate of the average treatment effect (ATE). Doubly robust estimators are not yet well studied for multilevel data where selection into treatment takes place among level-one units within clusters. Using four simulated multilevel populations, we compare doubly robust estimators to standard PS and regression estimators and investigate their relative performance with respect to bias reduction.


Propensity score Observational study Doubly robust estimator Multilevel modeling 



The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305D120005. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.


  1. Angrist, J., & Pischke, J. (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton, NJ: Princeton University Press.Google Scholar
  2. Bang, H., & Robins, J. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 64(4), 962–973. Retrieved from Scholar
  3. Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–970.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Hong, G. (2010). Marginal mean weighting through stratification: Adjustment for selection bias in multilevel data. Journal of Educational and Behavioral Statistics, 35(5), 433–531.CrossRefGoogle Scholar
  5. Hong, G., & Raudenbush, S. W. (2005). Effects of kindergarten retention policy on children’s cognitive growth in reading and mathematics. Educational Evaluation and Policy Analysis, 27(3), 205–224.CrossRefGoogle Scholar
  6. Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 523–539. Retrieved from Scholar
  7. Kelcey, B. (2011). Assessing the effects of teachers’ reading knowledge on students’ achievement using multilevel propensity score stratification. Educational Evaluation and Policy Analysis, 33(4), 458–482.CrossRefGoogle Scholar
  8. Keller, B., Kim, J.-S., & Steiner, P. (2013). Data mining alternatives to logistic regression for propensity score estimation: Neural networks and support vector machines. Multivariate Behavioral Research, 48(1), 165.Google Scholar
  9. Kim, J., & Seltzer, M. (2007). Causal inference in multilevel settings in which selection process vary across schools. Working Paper 708, Center for the Study of Evaluation (CSE), UCLA, Los Angeles.Google Scholar
  10. Kreif, N., Grieve, R., Radice, R., & Sekhon, J. (2011). Regression-adjusted matching and double-robust methods for estimating average treatment effects in health economic evaluation. Health Services and Outcomes Research Methodology, 13, (2–4), 174–202.Google Scholar
  11. Li, F., Zaslavsky, A., & Landrum, M. (2012). Propensity score weighting with multilevel data. Statistics in Medicine, 32, 3373–3387.MathSciNetCrossRefGoogle Scholar
  12. Robins, J., Hernan, M., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5), 550–560.CrossRefGoogle Scholar
  13. Rosenbaum, P., & Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701.CrossRefGoogle Scholar
  15. Schafer, J., & Kang, J. (2008). Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods, 13(4), 279–313.CrossRefGoogle Scholar
  16. Steiner, P., Kim, Y., Hall, C., & Su, D. (2015). Graphical models for quasi-experimental designs. Sociological Methods & Research, 0049124115582272Google Scholar
  17. Steiner, P., Kim, J.-S., & Thoemmes, F. (2013). Matching strategies for observational multilevel data. In JSM Proceedings (pp. 5020–5032). Alexandria, VA: American Statistical Association.Google Scholar
  18. Waernbaum, I. (2012). Model misspecification and robustness in causal inference: Comparing matching with doubly robust estimation. Statistics in Medicine, 31(15), 1572–1581.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Courtney E. Hall
    • 1
    Email author
  • Peter M. Steiner
    • 1
  • Jee-Seon Kim
    • 1
  1. 1.University of Wisconsin-MadisonMadisonUSA

Personalised recommendations