Targeting a Simple Statistical Bandit Problem

  • Antoine Chambaz
  • Wenjing Zheng
  • Mark J. van der Laan
Part of the Springer Series in Statistics book series (SSS)


An infinite sequence of independent and identically distributed (i.i.d.) random variables (W n , Y n (0), Y n (1))n ≥ 1 drawn from a common law Q0 is to be sequentially and partially disclosed during the course of a controlled experiment. The first component, W n , describes the nth context in which we will have to carry out one action out of two, denoted a = 0 and a = 1. The second and third components, Y n (0) and Y n (1), are the rewards that actions a = 0 and a = 1 would grant. The set \(\mathcal{W}\) of contexts may be high-dimensional. The rewards take their values in ]0, 1[.


  1. L.B. Balzer, M.L. Petersen, M.J. van der Laan, the SEARCH Collaboration, Targeted estimation and inference of the sample average treatment effect in trials with and without pair-matching. Stat. Med. 35(21), 3717–3732 (2016c)Google Scholar
  2. B. Chakraborty, E.E. Moodie, Statistical Methods for Dynamic Treatment Regimes (Springer, Berlin, Heidelberg, New York, 2013)CrossRefzbMATHGoogle Scholar
  3. B. Chakraborty, E.B. Laber, Y.-Q. Zhao, Inference about the expected performance of a data-driven dynamic treatment regime. Clin. Trials 11(4), 408–417 (2014)CrossRefGoogle Scholar
  4. A. Chambaz, tsml.cara.rct: targeted sequential minimum loss CARA RCT design and inference (2016).
  5. A. Chambaz, M.J. van der Laan, Inference in targeted group-sequential covariate-adjusted randomized clinical trials. Scand. J. Stat. 41(1), 104–140 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  6. A. Chambaz, M.J. van der Laan, W. Zheng, Targeted covariate-adjusted response-adaptive lasso-based randomized controlled trials, in Modern Adaptive Randomized Clinical Trials: Statistical, Operational, and Regulatory Aspects, ed. by A. Sverdlov (CRC Press, Boca Raton, 2015), pp. 345–368Google Scholar
  7. A. Chambaz, W. Zheng, M.J. van der Laan, Targeted sequential design for targeted learning of the optimal treatment rule and its mean reward. Ann Stat. 45(6), 1–28 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  8. V.H. de la Peña, E. Giné, Decoupling, in Probability and its Applications (Springer, New York, 1999)Google Scholar
  9. Y. Goldberg, R. Song, D. Zeng, M.R. Kosorok, Comment on “Dynamic treatment regimes: technical challenges and applications”. Electron. J. Stat. 8, 1290–1300 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  10. E.B. Laber, D.J. Lizotte, M. Qian, W.E. Pelham, S.A. Murphy, Dynamic treatment regimes: Technical challenges and applications. Electron. J. Stat. 8(1), 1225–1272 (2014a)Google Scholar
  11. E.B. Laber, D.J. Lizotte, M. Qian, W.E. Pelham, S.A. Murphy, Rejoinder of “Dynamic treatment regimes: technical challenges and applications”. Electron. J. Stat. 8(1), 1312–1321 (2014b)Google Scholar
  12. A.R. Luedtke, M.J. van der Laan, Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Ann. Stat. 44(2), 713–742 (2016a)Google Scholar
  13. A.R. Luedtke, M.J. van der Laan, Super-learning of an optimal dynamic treatment rule. Int. J. Biostat. 12(1), 305–332 (2016b)Google Scholar
  14. E. Mammen, A.B. Tsybakov, Smooth discrimination analysis. Ann. Stat. 27(6), 1808–1829 (1999)Google Scholar
  15. J. Pearl, Causality: Models, Reasoning, and Inference, 2nd edn. (Cambridge, New York, 2009a)Google Scholar
  16. M. Qian, S.A. Murphy, Performance guarantees for individualized treatment rules. Ann. Stat. 39(2), 1180–1210 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  17. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2016).
  18. J.M. Robins, Optimal structural nested models for optimal sequential decisions, in Proceedings of the Second Seattle Symposium in Biostatistics: Analysis of Correlated Data (2004)Google Scholar
  19. D.B. Rubin, M.J. van der Laan, Statistical issues and limitations in personalized medicine research with clinical trials. Int. J. Biostat. 8(1), Article 1 (2012)Google Scholar
  20. K. Stanley, Design of randomized controlled trials. Circulation 115, 1164–1169 (2007)CrossRefGoogle Scholar
  21. M.J. van der Laan, A.R. Luedtke, Targeted learning of the mean outcome under an optimal dynamic treatment rule. J. Causal Inference 3(1), 61–95 (2015)Google Scholar
  22. A.W. van der Vaart, J.A. Wellner, Weak Convergence and Empirical Processes (Springer, Berlin, Heidelberg, New York, 1996)CrossRefzbMATHGoogle Scholar
  23. B. Zhang, A. Tsiatis, M. Davidian, M. Zhang, E. Laber, A robust method for estimating optimal treatment regimes. Biometrics 68, 1010–1018 (2012a)Google Scholar
  24. B. Zhang, A. Tsiatis, M. Davidian, M. Zhang, E. Laber, Estimating optimal treatment regimes from a classification perspective. Stat 68(1), 103–114 (2012b)Google Scholar
  25. Y. Zhao, D. Zeng, A. Rush, M Kosorok, Estimating individual treatment rules using outcome weighted learning. J. Am. Stat. Assoc. 107, 1106–1118 (2012)Google Scholar
  26. Y. Zhao, D. Zeng, E.B. Laber, M.R. Kosorok, New statistical learning methods for estimating optimal dynamic treatment regimes. J. Am. Stat. Assoc. 110(510), 583–598 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  27. W. Zheng, A. Chambaz, M.J. van der Laan, Drawing valid targeted inference when covariate-adjusted response-adaptive RCT meets data-adaptive loss-based estimation, with an application to the lasso. Technical Report, Division of Biostatistics, University of California, Berkeley (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Antoine Chambaz
    • 1
  • Wenjing Zheng
    • 2
  • Mark J. van der Laan
    • 3
  1. 1.MAP5 (UMR CNRS 8145)Université Paris DescartesParis cedex 06France
  2. 2.NetflixLos GatosUSA
  3. 3.Division of Biostatistics and Department of StatisticsUniversity of California, BerkeleyBerkeleyUSA

Personalised recommendations