Optimizing variance-bias trade-off in the TWANG package for estimation of propensity scores

  • Layla ParastEmail author
  • Daniel F. McCaffrey
  • Lane F. Burgette
  • Fernando Hoces de la Guardia
  • Daniela Golinelli
  • Jeremy N. V. Miles
  • Beth Ann Griffin


While propensity score weighting has been shown to reduce bias in treatment effect estimation when selection bias is present, it has also been shown that such weighting can perform poorly if the estimated propensity score weights are highly variable. Various approaches have been proposed which can reduce the variability of the weights and the risk of poor performance, particularly those based on machine learning methods. In this study, we closely examine approaches to fine-tune one machine learning technique [generalized boosted models (GBM)] to select propensity scores that seek to optimize the variance-bias trade-off that is inherent in most propensity score analyses. Specifically, we propose and evaluate three approaches for selecting the optimal number of trees for the GBM in the twang package in R. Normally, the twang package in R iteratively selects the optimal number of trees as that which maximizes balance between the treatment groups being considered. Because the selected number of trees may lead to highly variable propensity score weights, we examine alternative ways to tune the number of trees used in the estimation of propensity score weights such that we sacrifice some balance on the pre-treatment covariates in exchange for less variable weights. We use simulation studies to illustrate these methods and to describe the potential advantages and disadvantages of each method. We apply these methods to two case studies: one examining the effect of dog ownership on the owner’s general health using data from a large, population-based survey in California, and a second investigating the relationship between abstinence and a long-term economic outcome among a sample of high-risk youth.


Causal inference Propensity score Machine learning 



This study was funded by National Institutes of Health grant 1R01DA034065-01A1 and National Institute of Child Health and Human Development grant R01HD066591.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval and informed consent

This study used only secondary de-identified datasets.


  1. Austin, P.C.: The performance of different propensity score methods for estimating marginal odds ratios. Stat. Med. 26(16), 3078–3094 (2007)CrossRefPubMedGoogle Scholar
  2. Austin, P.C.: Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat. Med. 28, 3083–3107 (2009)CrossRefPubMedPubMedCentralGoogle Scholar
  3. Austin, P.C., Stuart, E.A.: Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat. Med. 34(28), 3661–3679 (2015)CrossRefPubMedPubMedCentralGoogle Scholar
  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  5. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press, New York (1984)Google Scholar
  6. Brookhart, M.A., Schneeweiss, S., Rothman, K.J., Glynn, R.J., Avorn, J., Stürmer, T.: Variable selection for propensity score models. Am. J. Epidemiol. 163(12), 1149–1156 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  7. Burgette, L., McCaffrey, D.F., Griffin, B.A.: Propensity score estimation with boosted regression. In: Pan, W. (ed.) Propensity Score Analysis: Fundamentals and Developments. Guilford Publications, New York (2015)Google Scholar
  8. California Health Interview Survey (CHIS): CHIS 2003 Methodology Report Series. UCLA Center for Health Policy Research, Los Angeles, CA (2005)Google Scholar
  9. Dennis, M.L.: Overview of the Global Appraisal of Individual Needs (Gain): Summary. Chestnut Health Systems, Bloomington, IL (1999)Google Scholar
  10. Golinelli, D., Ridgeway, G., Rhoades, H., Tucker, J., Wenzel, S.: Bias and variance trade-offs when combining propensity score weighting and regression: with an application to hiv status and homeless men. Health Serv. Outcomes Res. Method. 12(2–3), 104–118 (2012)CrossRefGoogle Scholar
  11. Griffin, B.A., Ramchand, R., Edelen, M.O., McCaffrey, D.F., Morral, A.R.: Associations between abstinence in adolescence and economic and educational outcomes seven years later among high-risk youth. Drug Alcohol Depend. 113(2), 118–124 (2011)CrossRefPubMedGoogle Scholar
  12. Griffin, B.A., Eibner, C., Bird, C.E., Jewell, A., Margolis, K., Shih, R., Slaughter, M.E., Whitsel, E.A., Allison, M., Escarce, J.J.: The relationship between urban sprawl and coronary heart disease in women. Health Place 20, 51–61 (2013)CrossRefPubMedGoogle Scholar
  13. Hankey, B.F., Myers, M.H.: Evaluating differences in survival between two groups of patients. J. Chronic Dis. 24(9), 523–531 (1971)CrossRefPubMedGoogle Scholar
  14. Hansen, B.B.: The prognostic analogue of the propensity score. Biometrika 95(2), 481–488 (2008)CrossRefGoogle Scholar
  15. Harder, V.S., Stuart, E.A., Anthony, J.C.: Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychol. Methods 15(3), 234 (2010)CrossRefPubMedPubMedCentralGoogle Scholar
  16. Hernán, M.Á., Brumback, B., Robins, J.M.: Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 11(5), 561–570 (2000)CrossRefPubMedGoogle Scholar
  17. Higashi, T., Shekelle, P.G., Adams, J.L., Kamberg, C.J., Roth, C.P., Solomon, D.H., Reuben, D.B., Chiang, L., MacLean, C.H., Chang, J.T., et al.: Quality of care is associated with survival in vulnerable older patients. Ann. Intern. Med. 143(4), 274–281 (2005)CrossRefPubMedGoogle Scholar
  18. Hill, J.L.: Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 20(1), 217–240 (2011)CrossRefGoogle Scholar
  19. Imai, K., Ratkovic, M.: Covariate balancing propensity score. J. R. Stat. Soc. Ser. B (Stat. Method.) 76(1), 243–263 (2014)CrossRefGoogle Scholar
  20. Imbens, G.W.: The role of the propensity score in estimating dose-response functions. Biometrika 87(3), 706–710 (2000)CrossRefGoogle Scholar
  21. Imbens, G.W., Rubin, D.B.: Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press, Cambridge (2015)CrossRefGoogle Scholar
  22. Kaestner, R.: The effect of illicit drug use on the wages of young adults. Tech. rep., National Bureau of Economic Research (1990)Google Scholar
  23. Kaestner, R.: New estimates of the effect of marijuana and cocaine use on wages. Ind. Labor Relat. Rev. 47(3), 454–470 (1994)CrossRefGoogle Scholar
  24. Kang, J.D., Schafer, J.L.: Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, pp. 523–539 (2007)Google Scholar
  25. Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting using machine learning. Stat. Med. 29(3), 337–346 (2010)PubMedPubMedCentralGoogle Scholar
  26. Lee, B.K., Lessler, J., Stuart, E.A.: Weight trimming and propensity score weighting. PloS One 6(3), e18,174 (2011)CrossRefGoogle Scholar
  27. Lee, S., Brown, E.R., Grant, D., Belin, T.R., Brick, J.M.: Exploring nonresponse bias in a health survey using neighborhood characteristics. Am. J. Public Health 99(10), 1811 (2009)CrossRefPubMedPubMedCentralGoogle Scholar
  28. Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)Google Scholar
  29. McCaffrey, D.F., Ridgeway, G., Morral, A.R.: Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol. Methods 9(4), 403 (2004)CrossRefPubMedGoogle Scholar
  30. McConnell, A.R., Brown, C.M., Shoda, T.M., Stayton, L.E., Martin, C.E.: Friends with benefits: on the positive consequences of pet ownership. J. Personal. Soc. Psychol. 101(6), 1239 (2011)CrossRefGoogle Scholar
  31. Morral, A.R., McCaffrey, D.F., Ridgeway, G.: Effectiveness of community-based treatment for substance-abusing adolescents: 12-month outcomes of youths entering phoenix academy or alternative probation dispositions. Psychol. Addict. Behav. 18(3), 257 (2004)CrossRefPubMedGoogle Scholar
  32. Normand, S.L.T., Landrum, M.B., Guadagnoli, E., Ayanian, J.Z., Ryan, T.J., Cleary, P.D., McNeil, B.J.: Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. J. Clin. Epidemiol. 54(4), 387–398 (2001)CrossRefPubMedGoogle Scholar
  33. Pirracchio, R., Petersen, M.L., van der Laan, M.: Improving propensity score estimators’ robustness to model misspecification using super learner. Am. J. Epidemiol. 181(2), 108–119 (2015)CrossRefPubMedGoogle Scholar
  34. Ponce, N.A., Lavarreda, S.A., Yen, W., Brown, E.R., DiSogra, C., Satter, D.E.: The california health interview survey 2001: translation of a major survey for california’s multiethnic population. Public Health Rep. 119(4), 388 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  35. Register, C.A., Williams, D.R.: Labor market effects of marijuana and cocaine use among young men. Ind. Labor Relat. Rev. 45(3), 435–448 (1992)CrossRefGoogle Scholar
  36. Ridgeway, G.: gbm: Generalized Boosted Regression Models. R package version 2.1.1. Retrieved from (2015)Google Scholar
  37. Ridgeway, G., McCaffrey, D., Morral, A., Griffin, B.A., Burgette, L.: Twang: Toolkit for Weighting and Analysis of Nonequivalent Groups. R package version 9.5. Retrieved from (2016)Google Scholar
  38. Ringel, J.S., Collins, R.L., Ellickson, P.L.: Time trends and demographic differences in youth exposure to alcohol advertising on television. J. Adolesc. Health 39(4), 473–480 (2006)CrossRefPubMedGoogle Scholar
  39. Ringel, J.S., Ellickson, P.L., Collins, R.L.: High school drug use predicts job-related outcomes at age 29. Addict. Behav. 32(3), 576–589 (2007)CrossRefPubMedGoogle Scholar
  40. Robins, J.M., Hernán, M.Á., Brumback, B.: Marginal structural models and causal inference in epidemiology. Epidemiology 11(5), 550–560 (2000)CrossRefPubMedGoogle Scholar
  41. Rosenbaum, P.R.: Various practical issues in matching. In: Design of Observational Studies, pp. 187–195. Springer, New York (2010)Google Scholar
  42. Rosenbaum, P.R., Rubin, D.B.: Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J. R. Stat. Soc. Ser. B (Methodol.) 45(2), 212–218 (1983a)Google Scholar
  43. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983b)CrossRefGoogle Scholar
  44. Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79(387), 516–524 (1984)CrossRefGoogle Scholar
  45. Rubin, D.B.: On principles for modeling propensity scores in medical research. Pharmacoepidemiol. Drug Saf. 13(12), 855–857 (2004)CrossRefPubMedGoogle Scholar
  46. Stuart, E.A., Lee, B.K., Leacy, F.P.: Prognostic score-based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research. J. Clin. Epidemiol. 66(8), S84–S90 (2013)CrossRefPubMedPubMedCentralGoogle Scholar
  47. Survey, C.H.I.: Technical Paper No. 1: The chis 2001 Sample: Response Rate and Representativeness. Ucla Center for Health Policy Research, Los Angeles, CA (2003)Google Scholar
  48. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)Google Scholar
  49. van der Laan, M.J.: Targeted estimation of nuisance parameters to obtain valid statistical inference. Int. J. Biostat. 10(1), 29–57 (2014)PubMedGoogle Scholar
  50. van der Laan, M.J., Polley, E.C., Hubbard, A.E.: Super learner. Stat. Appl. Genet. Mol. Biol. (2007). doi: 10.2202/1544-6115.1309
  51. Wells, D.L.: Associations between pet ownership and self-reported health status in people suffering from chronic fatigue syndrome. J. Altern. Complement. Med. 15(4), 407–413 (2009a)CrossRefPubMedGoogle Scholar
  52. Wells, D.L.: The effects of animals on human health and well-being. J. Soc. Issues 65(3), 523–543 (2009b)CrossRefGoogle Scholar
  53. Westreich, D., Cole, S.R., Funk, M.J., Brookhart, M.A., Stürmer, T.: The role of the c-statistic in variable selection for propensity score models. Pharmacoepidemiol. Drug Saf. 20(3), 317–320 (2011)CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Layla Parast
    • 1
    Email author
  • Daniel F. McCaffrey
    • 3
  • Lane F. Burgette
    • 2
  • Fernando Hoces de la Guardia
    • 1
  • Daniela Golinelli
    • 4
  • Jeremy N. V. Miles
    • 1
  • Beth Ann Griffin
    • 2
  1. 1.RAND CorporationSanta MonicaUSA
  2. 2.RAND CorporationArlingtonUSA
  3. 3.Educational Testing ServicePrincetonUSA
  4. 4.Mathematica Policy ResearchWashingtonUSA

Personalised recommendations