Dynamic Treatment Regimes

Part of the Applied Bioinformatics and Biostatistics in Cancer Research book series (ABB)


Recent research (see Lavori and Dawson 2000, 2004) stresses the need to take into account patients’ heterogeneity in need for treatment when developing intervention programs. In order to improve patient care the type of treatment and the dosage should vary by patients. Additionally, in many cases, the need for treatment may change over time, yielding repeated opportunities to adapt the intervention.


Decision Point Approximation Space Randomization Probability Primary Research Question Sample Size Formula 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We acknowledge support for this work from NIH grants R01 MH080015 and P50 DA10075.


  1. Alcoholics Anonymous (2001) Chapter 5: how it works. Alcoholics Anonymous, 4th edn. Alcoholics Anonymous World Services, New YorkGoogle Scholar
  2. Baird L (1994) Reinforcement learning in continuous time: advantage updating. IEEE Internat Conf Neural Networks 4:2448–2453Google Scholar
  3. Banerjee A, Tsiatis A (2006) Adaptive two-stage designs in phase II clinical trials. Stat Med 25(19):3382–3395PubMedCrossRefGoogle Scholar
  4. Bellman RE (1957) Dynamic programming. Princeton University Press, New Jersey.Google Scholar
  5. Berry D, Mueller P, Grieve A, Smith M, Parke T, Blazek R, Mitchard N, Krams M (2001) Adaptive Bayesian designs for dose-ranging drug trials. In: Gatsonis C, Carlin B, Carriquiry A (eds) Case studies in Bayesian statistics, vol V. Springer, New York, pp 99–181Google Scholar
  6. Berry DA (2002) Adaptive clinical trials and Bayesian statistics (with discussion). Pharmaceutical Report, American Statistical Association, Alexandria, VA.Google Scholar
  7. Berry D (2004) Bayesian statistics and the efficiency and ethics of clinical trials. Stat Sci 19:175–187CrossRefGoogle Scholar
  8. Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore, MDGoogle Scholar
  9. Chakraborty B, Murphy SA (2009) Inference for nonregular parameters in optimal dynamic treatment regimes. Stat Meth Med Res 19(3):317–343. Available online: 16-July-2009 DOI: 10.1177/0962280209105013Google Scholar
  10. Chow SC, Chang M (2008) Adaptive design methods in clinical trials—a review. Orphanet J Rare Diseases 3:11CrossRefGoogle Scholar
  11. Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, Inc., Hillsdale, NJGoogle Scholar
  12. Collins LM Murphy SA, Nair V, Strecher V (2005) A strategy for optimizing and evaluating behavioral intervention. Ann Behav Med 30:65–73PubMedCrossRefGoogle Scholar
  13. COMBINE Study Research Group (2003) Testing combined pharmacotherapies and behavioral interventions in alcohol dependence: rationale and methods. Alcohol Clin Exp Res 27:1107–1122CrossRefGoogle Scholar
  14. Dragalin V (2006) Adaptive designs: terminology and classification. Drug Inform J 40:425–435Google Scholar
  15. Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6:503–556Google Scholar
  16. Fava M, Rush AJ, Trivedi MH, Nierenberg AA, Thase ME, Sackeim HA, Quitkin FM, Wisniewski S, Lavori PW, Rosenbaum JF, Kupfer DJ (2003) Background and rationale for the sequenced treatment alternatives to relieve depression (STAR*D) study. Psychiatr Clin North Am 26(2):457–494PubMedCrossRefGoogle Scholar
  17. Gunter LL, Zhu J, Murphy SA (2007) Variable selection for optimal decision making. Proceedings of the 11th conference on artificial intelligence in medicine, AIME 2007, Lecture notes in computer science/Lecture notes in artificial intelligence, vol 4594. pp 149–154Google Scholar
  18. Hoel P (1984) Introduction to mathematical statistics, 5th edn. John Wiley and Sons, New YorkGoogle Scholar
  19. Jennison C, Turnbull B (2000) Group sequential methods with applications to clinical trials. Chapman & Hall, Boca Raton, FLGoogle Scholar
  20. Lagoudakis MG, Parr R (2003) Least-squares policy iteration, J Mach Learn Res 4:1107–1149Google Scholar
  21. Lavori PW, Dawson R (2000) A design for testing clinical strategies: biased individually tailored within-subject randomization. J Roy Stat Soc A 163:29–38CrossRefGoogle Scholar
  22. Lavori PW, Rush AJ, Wisniewski SR, Alpert J, Fava M, Kupfer DJ, Nierenberg A, Quitkin, FM, Sackeim HA, Thase ME, Trivedi M (2001) Strengthening clinical effectiveness trials: equipoise-stratified randomization. Biol Psychiatr 48:605–614CrossRefGoogle Scholar
  23. Lavori PW, Dawson R (2004) Dynamic treatment regimes: practical design considerations. Clin Trials 1:9–20PubMedCrossRefGoogle Scholar
  24. Lizotte DJ, Laber E, Murphy SA (2009) Assessing confidence in policies learned from sequential randomized trials. Technical report, Department of Statistics, University of Michigan, Ann Arbor, MichiganGoogle Scholar
  25. Lunceford JK, Davidian M, Tsiatis AA (2002) Estimation of survival distributions of treatment policies in two-stage randomization designs in clinical trials. Biometrics 58:48–57PubMedCrossRefGoogle Scholar
  26. McLellan AT (2002) Have we evaluated addiction treatment correctly? Implications from a chronic care perspective. Addiction 97:249–252PubMedCrossRefGoogle Scholar
  27. Miller WR (ed) (2004) COMBINE monograph series, combined behavioral intervention manual: a clinical research guide for therapists treating people with alcohol abuse and dependence. DHHS Publication No. (NIH) 04–5288, vol 1. National Institute on Alcohol Abuse and Alcoholism, Bethesda, MDGoogle Scholar
  28. Moodie EEM, Richardson TS (2007) Bias correction in non-differentiable estimating equations for optimal dynamic regimes. COBRA Preprint Series. Article 17.Google Scholar
  29. Murphy SA, van der Laan MJ, Robins JM, CPPRG (2001) Marginal mean models for dynamic regimes. J Amer Stat Assoc 96:1410–1423Google Scholar
  30. Murphy SA (2003) Optimal Dynamic Treatment Regimes. J Roy Stat Soc B 65(2):331–366CrossRefGoogle Scholar
  31. Murphy SA (2005a) An experimental design for the development of adaptive treatment strategies. Stat Med 24:1455–1481PubMedCrossRefGoogle Scholar
  32. Murphy SA (2005b) A generalization error for Q-learning. J Mach Learn Res 6:1073–1097PubMedGoogle Scholar
  33. Murphy SA, Lynch KG, Oslin D, Mckay JR, TenHave T (2007) Developing adaptive treatment strategies in substance abuse research. Drug Alcohol Depend 88(2):s24–s30PubMedCrossRefGoogle Scholar
  34. Murphy SA, Bingham D (2009) Screening experiments for developing dynamic treatment regimes. J Am Stat Assoc 104:391–408PubMedCrossRefGoogle Scholar
  35. Neyman J (1923) On the application of probability theory to agricultural experiments. Stat Sci 5:465–480 (Translated in 1990)Google Scholar
  36. Oetting AI, Levy JA, Weiss RD, Murphy SA (2007) Statistical methodology for a SMART Design in the development of adaptive treatment strategies. In: Shrout PE (ed) Causality and psychopathology: finding the determinants of disorders and their cures. American Psychiatric Publishing, Inc., Arlington VAGoogle Scholar
  37. Ormoneit D, Sen S (2002) Kernel-based reinforcement learning. Mach Learn 49(2–3):161–178CrossRefGoogle Scholar
  38. Oslin DW, Sayers S, Ross J, Kane V, TenHave T, Conigliaro J, Cornelius J (2003) Disease management for depression and at-risk drinking via telephone in an older population for veterans. Psychosom Med 65:931–937PubMedCrossRefGoogle Scholar
  39. Pampallona S, Tsiatis AA (1994) Group sequential designs for one and two sided hypothesis testing with provision for early stopping in favour of the null hypothesis. J Stat Plann Infer 42:19–35CrossRefGoogle Scholar
  40. Petersen ML, Deeks SG, van der Laan MJ (2007) Individualized treatment rules: generating candidate clinical trials. Stat Med 26(25):4578–4601PubMedCrossRefGoogle Scholar
  41. Robins JM (1986) A new approach to causal inference in mortality studies with sustained exposure periods—application to control of the healthy worker survivor effect. Comput Math App 14:1393–1512Google Scholar
  42. Robins JM (1987) Addendum to “A new approach to causal inference in mortality studies with sustained exposure periods—application to control of the healthy worker survivor effect”. Comput Math App 14:923–945CrossRefGoogle Scholar
  43. Robins JM, Wasserman L (1997) Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In: Geiger D, Shenoy P (eds) Proceedings of the thirteenth conference on uncertainty in artificial intelligence, Morgan Kaufmann, San FranciscoGoogle Scholar
  44. Robins JM (2000) Robust estimation in sequentially ignorable missing data and causal inference models. In: Proceedings of the American statistical association section on Bayesian statistical science 1999, pp 6–10Google Scholar
  45. Robins JM (2004) Optimal structural nested models for optimal sequential decisions. In: Lin DY, Haegerty P (eds) Proceedings of the second Seattle symposium on biostatistics. Lecture notes in stastitics. Springer, New YorkGoogle Scholar
  46. Rubin DB (1978) Bayesian inference for causal effects: the role of randomization. Ann Stat 6:34–58CrossRefGoogle Scholar
  47. Schneider LS, Tariot PN, Lyketsos CG, Dagerman KS, Davis KL, Davis S, Hsiao JK, Jeste DV, Katz IR, Olin JT, Pollock BG, Rabins PV, Rosenheck RA, Small GW, Lebowitz B, Lieberman JA (2001) National Institute of Mental Health clinical antipsychotic trials of intervention effectiveness (CATIE) alzheimer disease trial methodology. Am J Geriatr Psychiatr 9(4):346–360Google Scholar
  48. Stone RM, Berg DT, George SL, Dodge RK, Paciucci PA, Schulman P, Lee EJ, Moore JO, Powell BL, Schiffer CA (1995) Granulocyte macrophage colony-stimulating factor after initial chemotherapy for elderly patients with primary acute myelogenous leukemia. New Engl J Med 332:1671–1677PubMedCrossRefGoogle Scholar
  49. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT, Cambridge, MAGoogle Scholar
  50. TenHave TR, Coyne J, Salzer M, Katz I (2003) Research to improve the quality of care for depression: alternatives to the simple randomized clinical trial. Gen Hosp Psychiatr 25:115–123CrossRefGoogle Scholar
  51. Thall PF, Millikan RE, Sung HG (2000) Evaluating multiple treatment courses in clinical trials. Stat Med 19:1011–1028PubMedCrossRefGoogle Scholar
  52. Thall PF, Sung HG, Estey EH (2002) Selecting therapeutic strategies based on efficacy and death in multicourse clinical trials. J Amer Stat Assoc 97:29–39CrossRefGoogle Scholar
  53. Thall PF, Wathen JK (2005) Covariate-adjusted adaptive randomization in a sarcoma trial with multi-stage treatments. Stat Med 24:1947–1964PubMedCrossRefGoogle Scholar
  54. Thall PF, Wooten LH, Logothetis CJ, Millikan R, Tannir NM (2007) Bayesian and frequentist two-stage treatment strategies based on sequential failure times subject to interval censoring. Stat Med 26:4687–4702PubMedCrossRefGoogle Scholar
  55. Thall PF, Nguyen H, Estey EH (2008) Patient-specific dose-finding based on bivariate outcomes and covariates. Biometrics 64(4):1126–1136PubMedCrossRefGoogle Scholar
  56. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J Roy Stat Soc B 32:135–166Google Scholar
  57. Tsitsiklis JN, Van Roy B (1996) Feature-based methods for large scale dynamic programming, Mach Learn 22:59–94Google Scholar
  58. Tsitsiklis JN, Van Roy B (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans Automat Contr 42(5):674–690CrossRefGoogle Scholar
  59. van der Laan MJ, Petersen ML, Joffe MM (2005) History-adjusted marginal structural models and statically-optimal dynamic treatment regimens. Internat J Biostatistics 1(1):Article 4Google Scholar
  60. van der Laan MJ, Petersen ML (2007) Statistical learning of origin-specific statically optimal individualized treatment rules. Internat J Biostatistics 3(1)Google Scholar
  61. Wahed AS, Tsiatis AA (2004) Optimal estimator for the survival distribution and related quantities for treatment policies in two-stage randomization designs in clinical trials. Biometrics 60:124–133PubMedCrossRefGoogle Scholar
  62. Wahed AS, Tsiatis AA (2006) Semiparametric efficient estimation of survival distribution for treatment policies in two-stage randomization designs in clinical trials with censored data. Biometrika 93:163–177CrossRefGoogle Scholar
  63. Watkins CJCH (1989) Learning from delayed rewards. Ph.D. thesis, Cambridge UniversityGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  1. 1.Department of BiostatisticsColumbia UniversityNew YorkUSA
  2. 2.Institute for Social ResearchUniversity of MichiganAnn ArborUSA
  3. 3.Department of StatisticsUniversity of MichiganAnn ArborUSA

Personalised recommendations