Skip to main content

Predictive Subgroup/Biomarker Identification and Machine Learning Methods

  • Chapter
  • First Online:

Abstract

Precision medicine or patient tailoring is very important for drug development due to its potential of increasing efficacy and/or reducing adverse reaction for the right patients (with genomic or other types of biomarker). The next two chapters will discuss statistical methods pertinent to biomarkers that are the important ingredient in developing precision medicine.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Alemayehu D, Chen Y, Markatou M. A comparative study of subgroup identification methods for differential treatment effect: performance metrics and recommendations. Statistical Methods in Medical Research 0 (0): 1–21 (2017).

    Google Scholar 

  • Battioui C, Shen L, Ruberg S. A Resampling-based Ensemble Tree Method to Identify Patient Subgroups with Enhanced Treatment Effect. JSM proceedings (2014).

    Google Scholar 

  • Berger J, Wang X, Shen L. A Bayesian approach to subgroup identification. Journal of Biopharmaceutical statistics 24: 110–129 (2014).

    Article  MathSciNet  Google Scholar 

  • Boyiadzis MM, Kirkwood JM, Marshall JL, Pritchard CC, Azad NS, Gulley JL. Significance and implications of FDA approval of pembrolizumab for biomarker-defined disease. Journal of ImmunoTherapy of Cancer 6:35 (2018).

    Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Wadsworth: Belmont, CA (1984).

    MATH  Google Scholar 

  • Breiman, L. Bagging predictors. Machine Learning 24: 123–140 (1996).

    MATH  Google Scholar 

  • Breiman, L. Random forests. Machine Learning 45: 5–32 (2001).

    Article  MATH  Google Scholar 

  • Buettner R, Wolf J, Thomas RK. Lessons learned from lung cancer genomics: the emerging concept of individualized diagnostics and treatment. Journal of Clinical Oncology 31: 1858–1865 (2013).

    Article  Google Scholar 

  • Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ & Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14: 365–376 (2013).

    Article  Google Scholar 

  • Carbone DP, Reck M, Paz-Ares L et al. First-line Nivolumab in stage IV or recurrent non-small-cell lung cancer. New England Journal of Medicine 376: 2415–26 (2017).

    Article  Google Scholar 

  • Cardoso F, van’t Veer LJ, Bogaerts J, Slaets L, Viale G, Delaloge S, Pierga JY, Brain E, Causeret S, DeLorenzi M, Glas AM. 70-gene signature as an aid to treatment decisions in early-stage breast cancer. New England Journal of Medicine. 2016 Aug 25; 375(8):717–29.

    Article  Google Scholar 

  • Chen T, Guestrin C. XGBoost: a scalable tree boosting algorithm. ACM Digital Library (2016).

    Google Scholar 

  • Chen JH, Asch SM. Machine learning and prediction in medicine – beyond the peak of inflated expectations. New England Journal of Medicine 376: 2507–2509 (2017).

    Article  Google Scholar 

  • Chen JH, Alagappan M, Goldstein MK, Asch SM, Altman RB. Decaying relevance of clinical data towards future decisions in data-driven inpatient clinical order sets. International Journal of Medical Informatics 102: 71–79 (2017).

    Article  Google Scholar 

  • Chipman HA, George EI, McCulloch RE BART: Bayesian additive regression trees. The Annals of Applied Statistics 4: 266–298 (2010).

    Article  MathSciNet  MATH  Google Scholar 

  • Christensen JG, Zou HY, Arango ME, et al. Cytoreductive antitumor activity of PF-2341066, a novel inhibitor of anaplastic lymphoma kinase and c-Met, in experimental models of anaplastic large-cell lymphoma. Molecular Cancer Therapeutics 6: 3314–22 (2007).

    Article  Google Scholar 

  • Deo RC. Machine learning in medicine. Circulation 132: 1920–1930 (2015).

    Article  Google Scholar 

  • Dmitrienko A, Muysers C, Fritsch A, Lipkovich I. General guidance on exploratory and confirmatory subgroup analysis in late-stage clinical trials. Journal of Biopharmaceutical Statistics 26: 71–98 (2016).

    Article  Google Scholar 

  • Dobashi Y, Goto A, Kimura M, Nakano T. Molecularly Targeted Therapy: Past, Present and Future. Chemotherapy. 2012;1(105):2.

    Google Scholar 

  • Domingos, P. The master algorithm. Basic Books, a member of Perseus Books Group, New York (2015).

    Google Scholar 

  • Dusseldorf E, Conversano C, Van Os BJ. Combining an additive and tree-based regression model simultaneously: STIMA. Journal of Computational and Graphical Statistics 19: 514–530 (2010).

    Article  MathSciNet  Google Scholar 

  • Dusseldorf E, Van Mechelen I. Qualitative interaction trees: a tool to identify qualitative treatment-subgroup interactions. Statistics in Medicine 33: 219–237 (2014).

    Article  MathSciNet  Google Scholar 

  • Efron, B. Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7, 1–26 (1979).

    Article  MathSciNet  MATH  Google Scholar 

  • Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21(2): 171–178 (2004).

    Article  Google Scholar 

  • Fisher, RA. The Design of Experiments. New York: Hafner (1935).

    Google Scholar 

  • Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association 96: 1348–1360 (2001).

    Article  MathSciNet  MATH  Google Scholar 

  • “FDA grants accelerated approval to first drug for Duchenne muscular dystrophy”. Press Announcements. U.S. Food & Drug Administration. September 19, 2016. Retrieved September 19, 2016.

    Google Scholar 

  • Foster JC, Taylor JMC, Ruberg SJ. Subgroup identification from randomized clinical trial data. Statistics in Medicine 30: 2867–2880 (2011).

    Article  MathSciNet  Google Scholar 

  • Foster JC, Nan B, Shen L, Kaciroti N, Taylor JMC. Permutation testing for treatment-covariate interactions and subgroup identification. Statistics in Biosciences 8 (1): 77–98 (2016).

    Article  Google Scholar 

  • Freund Y, Schapire RE. A decision-theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences 55: 119–139 (1997).

    Article  MathSciNet  MATH  Google Scholar 

  • Freidlin B, Simon R. Adaptive signature design: an adaptive clinical trial design for generating and prospectively testing a gene expression signature for sensitive patients. Clinical Cancer Research 2005; 11:7872–7878.

    Article  Google Scholar 

  • Friedman JH, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. The Annals of Statistics 28: 337–407 (2000).

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman JH. Greedy function approximation: a gradient boosting machine. The Annals of Statistics 29: 1189–1232 (2001).

    Article  MathSciNet  MATH  Google Scholar 

  • Frueh FW. Personalized medicine: What is it? How will it affect health care? 11th Annual FDA Science Forum, 2005.

    Google Scholar 

  • Fu WJ. The Bridge vs Lasso. Journal of Computational and Graphical Statistics 7 (3). Taylor & Francis: 397–416 (1998).

    Google Scholar 

  • Garon EB, Rizvi NA, Hui R, Leighl N, Balmanoukian AS, Eder JP, et al. Pembrolizumab for the treatment of non-small-cell lung cancer. New England Journal of Medicine 372: 2018–2028 (2015).

    Article  Google Scholar 

  • Gombar C and Loh E. Drug Discovery & Development magazine 10 (2): 22–27 (2007).

    Google Scholar 

  • Gu X, Yin G, Lee JJ. Bayesian two-step lasso strategy for biomarker selection in personalized medicine development for time-to-event endpoints. Contemporary Clinical Trials 36: 642–650 (2013).

    Article  Google Scholar 

  • Halsey LG, Curran-Everett D, Vowler SL & Drummond GW. The fickle P value generates irreproducible results. Nature Methods 12: 179–185 (2015).

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer (2001).

    Google Scholar 

  • Hellmann MD, Ciuleanu TE, Pluzanski A, Lee JS, Otterson GA, Audigier-Valette C, Minenza E, Linardou H, Burgers S, Salman P, Borghaei H. Nivolumab plus ipilimumab in lung cancer with a high tumor mutational burden. New England Journal of Medicine. 2018 Apr 16.

    Google Scholar 

  • Hothorn T, Hornik K, Zeileis A. Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3): 651–674 (2006).

    Article  MathSciNet  Google Scholar 

  • Ishwaran H, Kogalur UB, Lauer MS. Random survival forests. Annals of Applied Statistics 2: 841–860 (2008).

    Article  MathSciNet  MATH  Google Scholar 

  • Jia J, Tang Q, Xie W, Rode R. A Novel Method of Subgroup Identification by Combining Virtual Twins with GUIDE (VG) for Development of Precision Medicines. Presented at ICSA, and eprint arXiv: 1708.04741 2017

    Google Scholar 

  • Johnson DR, Bachan LK. What can we learn from studies based on small sample sizes? Psychological Reports 113(1): 1233–1236 (2013).

    Article  Google Scholar 

  • Kursa MB, Rudnicki WR. Feature selection with the Boruta package. Journal of Statistical Software 36 (11) (2010).

    Google Scholar 

  • Kwak, EL et al. Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. New England Journal of Medicine 363: 1693–1703 (2010).

    Article  Google Scholar 

  • Li Q, Lin N. The Bayesian elastic net. Bayesian Analysis 5 (1): 151–170 (2010).

    Article  MathSciNet  MATH  Google Scholar 

  • Lipkovich I, Dmitrienko A, Denne J, Enas G. Subgroup identification based on differential effect search (SIDES): a recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine 30: 2601–2621 (2011).

    MathSciNet  Google Scholar 

  • Lipkovich I, Dmitrienko A. Biomarker identification in clinical trials. In Clinical and Statistical Considerations in Personalized Medicine, Carini C, Chang M (eds). Chapman and Hall/CRC Press: New York: 211–264 (2014).

    Google Scholar 

  • Lipkovich I, Dmitrienko A, D’Agostino RB. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Statistics in Medicine 36: 136–196 (2017).

    Article  MathSciNet  Google Scholar 

  • Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R. A significance test for the lasso. The Annals of Statistics 42: 413–463 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  • Loh WY, Shih YS. Split selection methods for classification trees. Statistica Sinica 7: 815–840 (1997).

    MathSciNet  MATH  Google Scholar 

  • Loh WY. Variable selection for classification and regression in large p, small n problems. In Probability Approximations and Beyond. Barbour A, Chan HP, Siegmund D (eds), Lecture Notes in Statistics -Proceedings 205: 133–157 (2012).

    Google Scholar 

  • Loh WY. Fifty Years of Classification and Regression Trees. International Statistical Review 82 (3): 329–348 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  • Loh WY, He X, and Man M. A regression tree approach to identifying subgroups with differential treatment effects. Statistics in Medicine 34: 1818–1833 (2015).

    Article  MathSciNet  Google Scholar 

  • Loh WY, Fu H, Man M, Champion V, Yu M. Identification of subgroups with differential treatment effects for longitudinal and multiresponse variables. Statistics in Medicine 35: 4837–4855 (2016).

    Article  MathSciNet  Google Scholar 

  • Loh WY, Man M, Wang S. Subgroups from regression trees with adjustment for prognostic effects: identification and inference. Statistics in Medicine, accepted (2018).

    Google Scholar 

  • McDermott U, Iafrate AJ, Gray NS, et al. Genomic alterations of anaplastic lymphoma kinase may sensitize tumors to anaplastic lymphoma kinase inhibitors. Cancer Res 68: 3389–95 (2008).

    Article  Google Scholar 

  • Meinshausen N, Meier L, Buhlmann P. P-values for high-dimensional regression. Journal of the American Statistical Associations 104: 1671–1681 (2009).

    Article  MathSciNet  MATH  Google Scholar 

  • Mi G. Enhancement of the adaptive signature design for learning and confirming in a single pivotal trial. Pharmaceutical statistics. 2017 Sep 1; 16(5):312–321.

    Article  Google Scholar 

  • Morik K. Medicine: applications of machine learning. In Encyclopedia of machine learning. Sammut C, Webb GI (eds). (2011).

    Google Scholar 

  • Negassa A, Ciampi A, Abrahamowicz M, Shapiro S, Boivin JF. Tree-structured subgroup analysis for censored survival data: validation of computationally inexpensive model selection criteria. Statistics and Computing 15: 231–239 (2005).

    Article  MathSciNet  MATH  Google Scholar 

  • Obermeyer Z, Emanuel EJ. Predicting the future – big data, machine learning and clinical medicine. New England Journal of Medicine 375: 1216–1219 (2016).

    Article  Google Scholar 

  • Park T, Casella G. The Bayesian lasso. Journal of the American Statistical Association 103: 681–686 (2008).

    Article  MathSciNet  MATH  Google Scholar 

  • Reck M, et al. “Pembrolizumab versus chemotherapy for PD-L1–positive non-small-cell lung cancer”. The New England Journal of Medicine 375 (19): 1824–1833 (2016).

    Article  Google Scholar 

  • Peters S, Camidge DR, Shaw AT, Gadgeel S, Ahn JS, Kim DW, Ou SH, Pérol M, Dziadziuszko R, Rosell R, Zeaiter A. Alectinib versus crizotinib in untreated ALK-positive non–small-cell lung cancer. New England Journal of Medicine. 2017 Aug 31; 377(9):829–38.

    Article  Google Scholar 

  • R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

  • Ross JS, Hatzis C, Symmans WF, Pusztai L, Hortobagyi GN. Commercialized multigene predictors of clinical outcome for breast cancer. The Oncologist 13 (5): 477–493 (2008).

    Article  Google Scholar 

  • Ruberg S and Shen L. Personalized Medicine. Four Perspectives of Tailored Medicine. Statistics in Biopharmaceutical Research 7 (3): 214–229 (2015).

    Article  Google Scholar 

  • Soda M et al. Identification of the transforming EML4–ALK fusion gene in non-small-cell lung cancer. Nature 448: 561–567 (2007).

    Article  Google Scholar 

  • Strobl C. Data mining. In The Oxford Handbook on Quantitative Methods, Ed. T. Little pp. 678–700. USA, Chapter 29: Oxford University Press (2013).

    Google Scholar 

  • Su X, Tsai CL, Wang H, Nickerson DM, Li B. Subgroup analysis via recursive partitioning. Journal of Machine Learning Research 10: 141–158 (2009).

    Google Scholar 

  • Sutton CD. Classification and regression trees. Handbook of Statistics 24: 303–329 (2005).

    Article  Google Scholar 

  • Tibshirani R. Regression Shrinkage and Selection via the lasso. Journal of the Royal Statistical Society. Series B (methodological). Wiley. 58 (1): 267–88 (1996).

    Google Scholar 

  • Tibshirani R, Saunders M, Rosset S, Zhu J, and Knight K. Sparsity and Smoothness via the Fused lasso. Journal of the Royal Statistical Society. Series B (statistical Methodology) 67 (1). Wiley: 91–108 (2005).

    Google Scholar 

  • US Food and Drug Administration, “FDA Clears Breast Cancer Specific Molecular Prognostic Test,” news release, February 6, 2007.

    Google Scholar 

  • US Food and Drug Administration. FDA labeling information — Xalkori. FDA website (2011).

    Google Scholar 

  • Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68: 49–67 (2007).

    Article  MathSciNet  MATH  Google Scholar 

  • Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (statistical Methodology). Wiley. 67 (2): 301–20 (2005).

    Article  MathSciNet  MATH  Google Scholar 

  • Zou H. The adaptive Lasso and its oracle properties. Journal of the American Statistical Associations 101: 1418–1429 (2006).

    Article  MathSciNet  MATH  Google Scholar 

  • Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics 37 (4): 1733–1751 (2009).

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Man .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Man, M., Nguyen, T.S., Battioui, C., Mi, G. (2019). Predictive Subgroup/Biomarker Identification and Machine Learning Methods. In: Fang, L., Su, C. (eds) Statistical Methods in Biomarker and Early Clinical Development. Springer, Cham. https://doi.org/10.1007/978-3-030-31503-0_1

Download citation

Publish with us

Policies and ethics