Neural Networks for Propensity Score Estimation: Simulation Results and Recommendations

  • Bryan KellerEmail author
  • Jee-Seon Kim
  • Peter M. Steiner
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 140)


Neural networks have been noted as promising for propensity score estimation because they algorithmically handle nonlinear relationships and interactions. We examine the performance neural networks as compared with main-effects logistic regression for propensity score estimation via simulation study. When the main-effects logistic propensity score model is correctly specified, the two approaches yield almost identical mean square error. When the logistic propensity score model is misspecified due to the addition of quadratic terms and interactions to the data-generating propensity score model, neural networks perform better in terms of bias and mean square error. We link the performance results to balance on observed covariates and demonstrate that our results underscore the importance of checking balance on higher-order covariate terms.


Propensity score analysis Neural networks Logistic regression Data mining Covariate balance 



This research was supported in part by the Institute of Education Sciences, U.S. Department of Education, through Grant R305D120005. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.


  1. Ali, M. S., Groenwold, R. H. H., Pestman, W. R., Belitser, S. V., Roes, K. C. B., Hoes, A. W., et al. (2014). Propensity score balance measures in pharmacoepidemiology: A simulation study. Pharmacoepidemiology and Drug Safety, 23, 802–811.Google Scholar
  2. Belitser, S. V., Martens, E. P., Pestman, W. R., Groenwold, R. H. H., de Boer, A., & Klungel, O. H. (2011). Measuring balance and model selection in propensity score methods. Pharmacoepidemiology and Drug Safety, 29, 1115–1129.CrossRefGoogle Scholar
  3. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). New York, NY: Springer.CrossRefzbMATHGoogle Scholar
  4. Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–970.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. The Review of Economics and Statistics, 86, 4–29.Google Scholar
  6. Kuhn, M. (2014). Caret: Classification and regression training. R package version 6.0-35.
  7. McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods, 9, 403–425.CrossRefGoogle Scholar
  8. Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. New York, NY: Cambridge University Press.CrossRefGoogle Scholar
  9. R Core Team. (2014). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
  10. Ripley, B. D. (1996). Pattern recognition and neural networks. New York, NY: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  11. Rosenbaum, P. R. (2002). Observational studies (2nd ed.). New York, NY: Springer.CrossRefzbMATHGoogle Scholar
  12. Rosenbaum, P. R., Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Rubin, D. B. (1978) Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6, 34–58.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Rubin, D. B. (1980). Randomization analysis of experimental data: The Fisher randomization test comment. Journal of the American Statistical Association, 75, 591–593.Google Scholar
  15. Schafer, J., & Kang, J. (2008). Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods, 13, 279–313.CrossRefGoogle Scholar
  16. Setoguchi, S., Schneeweiss, S., Brookhart, M. A., Glynn, R. J., & Cook, E. F. (2008). Evaluating uses of data mining techniques in propensity score estimation: A simulation study. Pharmacoepidemiology and Drug Safety, 17, 546–555.CrossRefGoogle Scholar
  17. Steiner, P. M., & Cook, T. D. (2013). Matching and propensity scores. In T. Little (Ed.), Oxford handbook of quantitative methods. Oxford: Oxford University Press.Google Scholar
  18. Steiner, P. M., Cook, T. D., & Shadish, W. R. (2011). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics, 36, 213–236.CrossRefGoogle Scholar
  19. Waernbaum, I. (2010). Propensity score model specification for estimation of average treatment effects. Journal of Statistical Planning and Inference, 140, 1948–1956.MathSciNetCrossRefzbMATHGoogle Scholar
  20. Westreich, D., Lessler, J., & Funk, M. J. (2010). Propensity score estimation: Neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. Journal of Clinical Epidemiology, 63, 826–833.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Teachers CollegeColumbia UniversityNew YorkUSA
  2. 2.University of Wisconsin-MadisonMadisonUSA

Personalised recommendations