Targeted Learning in Data Science pp 437-451 | Cite as
Targeting a Simple Statistical Bandit Problem
Chapter
First Online:
Abstract
An infinite sequence of independent and identically distributed (i.i.d.) random variables (W n , Y n (0), Y n (1))n ≥ 1 drawn from a common law Q0 is to be sequentially and partially disclosed during the course of a controlled experiment. The first component, W n , describes the nth context in which we will have to carry out one action out of two, denoted a = 0 and a = 1. The second and third components, Y n (0) and Y n (1), are the rewards that actions a = 0 and a = 1 would grant. The set \(\mathcal{W}\) of contexts may be high-dimensional. The rewards take their values in ]0, 1[.
References
- L.B. Balzer, M.L. Petersen, M.J. van der Laan, the SEARCH Collaboration, Targeted estimation and inference of the sample average treatment effect in trials with and without pair-matching. Stat. Med. 35(21), 3717–3732 (2016c)Google Scholar
- B. Chakraborty, E.E. Moodie, Statistical Methods for Dynamic Treatment Regimes (Springer, Berlin, Heidelberg, New York, 2013)CrossRefMATHGoogle Scholar
- B. Chakraborty, E.B. Laber, Y.-Q. Zhao, Inference about the expected performance of a data-driven dynamic treatment regime. Clin. Trials 11(4), 408–417 (2014)CrossRefGoogle Scholar
- A. Chambaz, tsml.cara.rct: targeted sequential minimum loss CARA RCT design and inference (2016). https://github.com/achambaz/tsml.cara.rct
- A. Chambaz, M.J. van der Laan, Inference in targeted group-sequential covariate-adjusted randomized clinical trials. Scand. J. Stat. 41(1), 104–140 (2014)MathSciNetCrossRefMATHGoogle Scholar
- A. Chambaz, M.J. van der Laan, W. Zheng, Targeted covariate-adjusted response-adaptive lasso-based randomized controlled trials, in Modern Adaptive Randomized Clinical Trials: Statistical, Operational, and Regulatory Aspects, ed. by A. Sverdlov (CRC Press, Boca Raton, 2015), pp. 345–368Google Scholar
- A. Chambaz, W. Zheng, M.J. van der Laan, Targeted sequential design for targeted learning of the optimal treatment rule and its mean reward. Ann Stat. 45(6), 1–28 (2017)MathSciNetCrossRefMATHGoogle Scholar
- V.H. de la Peña, E. Giné, Decoupling, in Probability and its Applications (Springer, New York, 1999)Google Scholar
- Y. Goldberg, R. Song, D. Zeng, M.R. Kosorok, Comment on “Dynamic treatment regimes: technical challenges and applications”. Electron. J. Stat. 8, 1290–1300 (2014)MathSciNetCrossRefMATHGoogle Scholar
- E.B. Laber, D.J. Lizotte, M. Qian, W.E. Pelham, S.A. Murphy, Dynamic treatment regimes: Technical challenges and applications. Electron. J. Stat. 8(1), 1225–1272 (2014a)Google Scholar
- E.B. Laber, D.J. Lizotte, M. Qian, W.E. Pelham, S.A. Murphy, Rejoinder of “Dynamic treatment regimes: technical challenges and applications”. Electron. J. Stat. 8(1), 1312–1321 (2014b)Google Scholar
- A.R. Luedtke, M.J. van der Laan, Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Ann. Stat. 44(2), 713–742 (2016a)Google Scholar
- A.R. Luedtke, M.J. van der Laan, Super-learning of an optimal dynamic treatment rule. Int. J. Biostat. 12(1), 305–332 (2016b)Google Scholar
- E. Mammen, A.B. Tsybakov, Smooth discrimination analysis. Ann. Stat. 27(6), 1808–1829 (1999)Google Scholar
- J. Pearl, Causality: Models, Reasoning, and Inference, 2nd edn. (Cambridge, New York, 2009a)Google Scholar
- M. Qian, S.A. Murphy, Performance guarantees for individualized treatment rules. Ann. Stat. 39(2), 1180–1210 (2011)MathSciNetCrossRefMATHGoogle Scholar
- R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2016). http://www.R-project.org.
- J.M. Robins, Optimal structural nested models for optimal sequential decisions, in Proceedings of the Second Seattle Symposium in Biostatistics: Analysis of Correlated Data (2004)Google Scholar
- D.B. Rubin, M.J. van der Laan, Statistical issues and limitations in personalized medicine research with clinical trials. Int. J. Biostat. 8(1), Article 1 (2012)Google Scholar
- K. Stanley, Design of randomized controlled trials. Circulation 115, 1164–1169 (2007)CrossRefGoogle Scholar
- M.J. van der Laan, A.R. Luedtke, Targeted learning of the mean outcome under an optimal dynamic treatment rule. J. Causal Inference 3(1), 61–95 (2015)Google Scholar
- A.W. van der Vaart, J.A. Wellner, Weak Convergence and Empirical Processes (Springer, Berlin, Heidelberg, New York, 1996)CrossRefMATHGoogle Scholar
- B. Zhang, A. Tsiatis, M. Davidian, M. Zhang, E. Laber, A robust method for estimating optimal treatment regimes. Biometrics 68, 1010–1018 (2012a)Google Scholar
- B. Zhang, A. Tsiatis, M. Davidian, M. Zhang, E. Laber, Estimating optimal treatment regimes from a classification perspective. Stat 68(1), 103–114 (2012b)Google Scholar
- Y. Zhao, D. Zeng, A. Rush, M Kosorok, Estimating individual treatment rules using outcome weighted learning. J. Am. Stat. Assoc. 107, 1106–1118 (2012)Google Scholar
- Y. Zhao, D. Zeng, E.B. Laber, M.R. Kosorok, New statistical learning methods for estimating optimal dynamic treatment regimes. J. Am. Stat. Assoc. 110(510), 583–598 (2015)MathSciNetCrossRefMATHGoogle Scholar
- W. Zheng, A. Chambaz, M.J. van der Laan, Drawing valid targeted inference when covariate-adjusted response-adaptive RCT meets data-adaptive loss-based estimation, with an application to the lasso. Technical Report, Division of Biostatistics, University of California, Berkeley (2015)Google Scholar
Copyright information
© Springer International Publishing AG 2018