Assessing the Generalizability of Randomized Trial Results to Target Populations

Stuart, Elizabeth A.; Bradshaw, Catherine P.; Leaf, Philip J.

doi:10.1007/s11121-014-0513-z

Assessing the Generalizability of Randomized Trial Results to Target Populations

Published: 12 October 2014

Volume 16, pages 475–485, (2015)
Cite this article

Prevention Science Aims and scope Submit manuscript

Elizabeth A. Stuart¹,
Catherine P. Bradshaw² &
Philip J. Leaf¹

3944 Accesses
164 Citations
20 Altmetric
1 Mention
Explore all metrics

Abstract

Recent years have seen increasing interest in and attention to evidence-based practices, where the “evidence” generally comes from well-conducted randomized trials. However, while those trials yield accurate estimates of the effect of the intervention for the participants in the trial (known as “internal validity”), they do not always yield relevant information about the effects in a particular target population (known as “external validity”). This may be due to a lack of specification of a target population when designing the trial, difficulties recruiting a sample that is representative of a prespecified target population, or to interest in considering a target population somewhat different from the population directly targeted by the trial. This paper first provides an overview of existing design and analysis methods for assessing and enhancing the ability of a randomized trial to estimate treatment effects in a target population. It then provides a case study using one particular method, which weights the subjects in a randomized trial to match the population on a set of observed characteristics. The case study uses data from a randomized trial of school-wide positive behavioral interventions and supports (PBIS); our interest is in generalizing the results to the state of Maryland. In the case of PBIS, after weighting, estimated effects in the target population were similar to those observed in the randomized trial. The paper illustrates that statistical methods can be used to assess and enhance the external validity of randomized trials, making the results more applicable to policy and clinical questions. However, there are also many open research questions; future research should focus on questions of treatment effect heterogeneity and further developing these methods for enhancing external validity. Researchers should think carefully about the external validity of randomized trials and be cautious about extrapolating results to specific populations unless they are confident of the similarity between the trial sample and that target population.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Creating a Statistical Analysis Plan to Continually Evaluate Intervention Adaptations that Arise in Real-World Implementation

Article 27 May 2023

Designs for Testing Group-Based Interventions with Limited Numbers of Social Units: The Dynamic Wait-Listed and Regression Point Displacement Designs

Article 07 December 2014

Empiric validation of a process for behavior change

Article 05 October 2015

References

Bradshaw, C. P., Koth, C. W., Thornton, L. A., & Leaf, P. J. (2009). Altering school climate through school-wide positive behavioral interventions and supports: Findings from a group-randomized effectiveness trial. Prevention Science, 10, 100–115.
Article PubMed Google Scholar
Bradshaw, C. P., Waasdorp, T. E., & Leaf, P. J. (2012). Effects of school-wide positive behavioral interventions and supports on child behavior problems. Pediatrics, 130, 1136–1145.
Article Google Scholar
Braslow, J. T., Duan, N., Starks, S. L., Polo, A., Bromley, E., & Wells, K. B. (2005). Generalizability of studies on mental health treatment and outcomes, 1981–1996. Psychiatric Services, 56, 1261–1268.
Article PubMed Google Scholar
Brown, C. H., Wang, W., & Sandler, I. (2008). Examining how context changes intervention impact: The use of effect sizes in multilevel mixture meta-analysis. Child Development Perspectives, 2, 198–205.
Article PubMed Central PubMed Google Scholar
Cole, S. R., & Stuart, E. A. (2010). Generalizing evidence from randomized clinical trials to target populations: The ACTG-320 trial. American Journal of Epidemiology, 172, 107–115.
Article PubMed Central PubMed Google Scholar
Flay, B. R., Biglan, A., Boruch, R. F., Castro, F. G., Gottfredson, D., Kellam, S., et al. (2005). Standards of evidence: Criteria for efficacy, effectiveness, and dissemination. Prevention Science, 6, 151–175.
Article PubMed Google Scholar
Frangakis, C. E. (2009). The calibration of treatment effects from clinical trials to target populations. Clinical Trials, 6, 136–140.
Article PubMed Central PubMed Google Scholar
Green, L. W., & Glasgow, R. E. (2006). Evaluating the relevance, generalization, and applicability of research: Issues in external validation and translation methodology. Evaluation & the Health Professions, 29, 126–153.
Article Google Scholar
Hansen, B. B. (2008). The prognostic analogue of the propensity score. Biometrika, 95, 481–488.
Article Google Scholar
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando: Academic.
Google Scholar
Holt, D., & Smith, T. M. F. (1979). Post stratification. Journal of the Royal Statistical Society, Series A, 142, 33–46.
Article Google Scholar
Horner, R. H., Sugai, G., Smolkowski, K., Eber, L., Nakasato, J., Todd, A. W., et al. (2009). A randomized, wait-list controlled effectiveness trial assessing school-wide positive behavior support in elementary schools. Journal of Positive Behavior Interventions, 11, 133–144.
Article Google Scholar
Horvitz, D., & Thompson, D. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.
Article Google Scholar
Humphreys, K., Weingardt, K. R., & Harris, A. H. S. (2007). Influence of subject eligibility criteria on compliance with national institutes of health guidelines for inclusion of women, minorities, and children in treatment research. Alcoholism: Clinical and Experimental Research, 31, 988–995.
Article Google Scholar
Imai, K., King, G., & Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society, Series A, 171, 481–502.
Article Google Scholar
Insel, T. R. (2006). Beyond efficacy: The STAR*D trial. American Journal of Psychiatry, 163, 5–7.
Article PubMed Central PubMed Google Scholar
Kang, J. D., & Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 523–539.
Article Google Scholar
Koth, C. W., Bradshaw, C. P., & Leaf, P. J. (2009). Teacher observation of classroom adaptation-checklist: Development and factor structure. Measurement and Evaluation in Counseling and Development, 42, 15–30.
Article Google Scholar
Murray, D. M. (1998). Design and analysis of group-randomized trials. New York: Oxford.
Google Scholar
Nature. (2010). Editorial: Putting gender on the agenda. Nature, 465, 665.
Google Scholar
O’Muircheartaigh, C., & Hedges, L. V. (2014). Generalizing from unrepresentative experiments: A stratified propensity score approach. Journal of the Royal Statistical Society: Series C: Applied Statistics. doi:10.1111/rssc.12037. Early view online.
Google Scholar
Olsen, R., Bell, S., Orr, L., & Stuart, E. A. (2013). External validity in policy evaluations that choose sites purposively. Journal of Policy Analysis and Management, 32, 107–121.
Article PubMed Central PubMed Google Scholar
Pan, Q., & Schaubel, D. E. (2009). Evaluating bias correction in weighted proportional hazards regression. Lifetime Data Analysis, 15, 120–146.
Article PubMed Central PubMed Google Scholar
Pas, E., Bradshaw, C. P., & Mitchell, M. M. (2011). Examining the validity of office discipline referrals as an indicator of student behavior problems. Psychology in the Schools, 48, 541–555.
Article Google Scholar
Pressler, T. R., & Kaizar, E. E. (2013). The use of propensity scores and observational data to estimate randomized controlled trial generalizability bias. Statistics in Medicine. doi:10.1002/sim.5802.
PubMed Central PubMed Google Scholar
Prevost, T. C., Abrams, K. R., & Jones, D. R. (2000). Hierarchical models in generalized synthesis of evidence: An example based on studies of breast cancer screening. Statistics in Medicine, 19, 3359–3376.
Article CAS PubMed Google Scholar
R Core Team. (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from the R project website: http://www.R-project.org/.
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modelling. Psychometrika, 69, 167–190.
Article Google Scholar
Raudenbush, S. W., Bryk, A. S., Cheong, Y. F., Congdon, R. T., Jr., & du Toit, M. (2011). Hierarchical linear and nonlinear modeling (HLM7). Lincolnwood: Scientific Software International, Inc.
Google Scholar
Rosenbaum, P. R. (1987). Model-based direct adjustment. Journal of the American Statistical Association, 82, 387–394.
Article Google Scholar
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.
Article Google Scholar
Rothwell, P. M. (2005). External validity of randomised controlled trials: “To whom do the results of this trial apply?”. Lancet, 365, 82–93.
Article PubMed Google Scholar
Rubin, D. B. (2001). Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Services & Outcomes Research Methodology, 2, 169–188.
Article Google Scholar
Schochet, P. Z., Burghardt, J., & McConnell, S. (2008). Does job corps work? impact findings from the national job corps study. American Economic Review, 98, 1864–86.
Article Google Scholar
Shadish, W. R. (1995). The logic of generalization: Five principles common to experiments and ethnographies. American Journal of Community Psychology, 23, 419–428.
Article Google Scholar
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin Company.
Google Scholar
StataCorp. (2011). Stata Statistical Software: Release 12. College Station, TX: StataCorp LP.
Stirman, S. W., Derubeis, R. J., Crits-Christoph, P., & Rothman, A. (2005). Can the randomized controlled trial literature generalize to nonrandomized patients? Journal of Consulting and Clinical Psychology, 73, 127–35. PMID: 15709839.
Article PubMed Google Scholar
Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science, 25, 1–21.
Article PubMed Central PubMed Google Scholar
Stuart, E. A., Cole, S., Bradshaw, C. P., & Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society, Series A, 174, 369–386.
Article Google Scholar
Sugai, G., Horner, R., & Gresham, F. (2001). Behaviorally effective school environments. In M. Shinn, G. Stoner, & H. Walker (Eds.), Interventions for academic and behavior problems: Preventive and remedial approaches (pp. 315–350). Silver Spring: National Association of School Psychiatrists.
Google Scholar
Sutton, A. J., & Higgins, J. P. (2008). Recent developments in meta-analysis. Statistics in Medicine, 27, 625–650.
Article PubMed Google Scholar
Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38, 239–266.
Google Scholar
Tipton, E., Hedges, L. V., Vaden-Kiernan, M., Borman, G. D., Sullivan, K., & Caverly, S. (2014). Sample selection in randomized experiments: A new method using propensity score stratified sampling. Journal of Research on Educational Effectiveness, 7, 114–135.
Article Google Scholar
Turner, R. M., Spiegelhalter, D. J., Smith, G. C. S., & Thompson, S. G. (2009). Bias modelling in evidence synthesis. Journal of the Royal Statistical Society, Series A, 172, 21–47.
Article Google Scholar
U.S. Department of Education. (2009). The impacts of regular upward bound on postsecondary outcomes seven to nine years after scheduled high school graduation. Washington: Office of Planning, Evaluation and Policy Development, Policy and Program Studies Service.
Google Scholar
U.S. Department of Health and Human Services. (2010). Head start impact study final report. Washington: Office of Planning, Evaluation and Policy Development, Administration for Children and Families, Policy and Program Studies Service.
Google Scholar
Waasdorp, T. E., Bradshaw, C. P., & Leaf, P. J. (2012). The impact of schoolwide positive behavioral interventions and supports on bullying and peer rejection. Archives of Pediatrics and Adolescent Medicine, 166, 149–156.
Article PubMed Google Scholar
Westen, D. I., Stirman, S. W., & DeRubeis, R. J. (2006). Are research patients and clinical trials representative of clinical practice? In J. C. Norcross, L. E. Beutler, & R. F. Levant (Eds.), Evidence-based practices in mental health: Debate and dialogue on the fundamental questions (pp. 161–189). Washington: American Psychological Association.
Chapter Google Scholar
Wisniewski, S., Rush, A., Nierenberg, A., Gaynes, B., Warden, D., Luther, J., et al. (2009). Can phase III trial results of antidepresseant medications be generalized to clinical practice? a STAR*D report. American Journal of Psychiatry, 166, 599–607.
Article PubMed Google Scholar

Download references

Conflict of Interest

The authors declare they have no conflicts of interest.

Funding Source

The support for this project comes from grants from the Centers for Disease Control and Prevention (R49/CCR318627, 1U49CE 000728, and K01CE001333), the National Institute of Mental Health (1R01MH67948; K25 MH083846), the National Science Foundation (DRL-1335843), and the Institute of Education Sciences (R305A090307).

Author information

Authors and Affiliations

Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
Elizabeth A. Stuart & Philip J. Leaf
University of Virginia, Charlottesville, VA, USA
Catherine P. Bradshaw

Authors

Elizabeth A. Stuart
View author publications
You can also search for this author in PubMed Google Scholar
Catherine P. Bradshaw
View author publications
You can also search for this author in PubMed Google Scholar
Philip J. Leaf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elizabeth A. Stuart.

Additional information

Clinical Trial Registry Number: NCT01583127

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stuart, E.A., Bradshaw, C.P. & Leaf, P.J. Assessing the Generalizability of Randomized Trial Results to Target Populations. Prev Sci 16, 475–485 (2015). https://doi.org/10.1007/s11121-014-0513-z

Download citation

Published: 12 October 2014
Issue Date: April 2015
DOI: https://doi.org/10.1007/s11121-014-0513-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing the Generalizability of Randomized Trial Results to Target Populations

Abstract

Access this article

Similar content being viewed by others

Creating a Statistical Analysis Plan to Continually Evaluate Intervention Adaptations that Arise in Real-World Implementation

Designs for Testing Group-Based Interventions with Limited Numbers of Social Units: The Dynamic Wait-Listed and Regression Point Displacement Designs

Empiric validation of a process for behavior change

References

Conflict of Interest

Funding Source

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Assessing the Generalizability of Randomized Trial Results to Target Populations

Abstract

Access this article

Similar content being viewed by others

Creating a Statistical Analysis Plan to Continually Evaluate Intervention Adaptations that Arise in Real-World Implementation

Designs for Testing Group-Based Interventions with Limited Numbers of Social Units: The Dynamic Wait-Listed and Regression Point Displacement Designs

Empiric validation of a process for behavior change

References

Conflict of Interest

Funding Source

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation