Skip to main content

Identification and Clinical Translation of Biomarker Signatures: Statistical Considerations

  • Protocol
  • First Online:
Multiplex Biomarker Techniques

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1546))

Abstract

Powerful machine learning tools exist to extract biological patterns for diagnosis or prediction from high-dimensional datasets. Simultaneous advances in high-throughput profiling technologies have led to a rapid acceleration of biomarker discovery investigations across all areas of medicine. However, the translation of biomarker signatures into clinically useful tools has thus far been difficult. In this chapter, several important considerations are discussed that influence such translation in the context of classifier design. These include aspects of variable selection that go beyond classification accuracy, as well as effects of variability on assay stability and sample size. The consideration of such factors may lead to an adaptation of biomarker discovery approaches, aimed at an optimal balance of performance and clinical translatability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cohen Freue GV, Meredith A, Smith D, Bergman A, Sasaki M, Lam KK et al (2013) Computational biomarker pipeline from discovery to clinical implementation: plasma proteomic biomarkers for cardiac transplantation. PLoS Comput Biol 9:e1002963

    PubMed  PubMed Central  Google Scholar 

  2. Zhang Z, Chan DW (2010) The road from discovery to clinical diagnostics: lessons learned from the first FDA-cleared in vitro diagnostic multivariate index assay of proteomic biomarkers. Cancer Epidemiol Biomarkers Prev 19:2995–2999

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Alymani NA, Smith MD, Williams DJ, Petty RD (2010) Predictive biomarkers for personalised anti-cancer drug use: discovery to clinical implementation. Eur J Cancer 46:869–879

    CAS  PubMed  Google Scholar 

  4. Deyati A, Younesi E, Hofmann-Apitius M, Novac N (2013) Challenges and opportunities for oncology biomarker discovery. Drug Discov Today 18:614–624

    CAS  PubMed  Google Scholar 

  5. Jin G, Zhou X, Wang H, Wong STC (2010) The challenges in blood proteomic biomarker discovery. In: Pham T (ed) Comput Biol. Springer, New York, pp 273–299

    Google Scholar 

  6. Rifai N, Gillette MA, Carr SA (2006) Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat Biotechnol 24:971–983

    CAS  PubMed  Google Scholar 

  7. Füzéry AK, Levin J, Chan MM, Chan DW (2013) Translation of proteomic biomarkers into FDA approved cancer diagnostics: issues and challenges. Clin Proteomics 10:13

    PubMed  PubMed Central  Google Scholar 

  8. Goodsaid F, Mattes WB (2013) Thepath from biomarker discovery to regulatory qualification. 1 edn., Academic Press. Accessed 16 July 2013. ISBN: 0123914965

    Google Scholar 

  9. Kotsiantis SB (2007) Supervisedmachine learning: a review of classification techniques. Informatica 31:249–268. doi:10.1115/1.1559160

    Article  Google Scholar 

  10. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Elements 1:337–387. doi:10.1007/b94608

    Article  Google Scholar 

  11. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517

    CAS  PubMed  Google Scholar 

  12. Kononenko I (2015) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23:89–109. doi:10.1016/S0933-3657(01)00077-X

    Article  Google Scholar 

  13. Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. Proceedings of 23rd international conference machine learning. pp 161–168. doi: 10.1145/1143844.1143865

  14. Guo Y, Graber A, McBurney RN, Balasubramanian R (2010) Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms. BMC Bioinformatics 11:447

    PubMed  PubMed Central  Google Scholar 

  15. Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. Proceedings of 25th international conference machine learning. pp 96–103. doi: 10.1145/1390156.1390169

  16. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    Google Scholar 

  17. He Z, Yu W (2010) Stable feature selection for biomarker discovery. Comput Biol Chem 34:215–225

    CAS  PubMed  Google Scholar 

  18. Loscalzo S, Yu L, Ding C (2009) Consensus group stable feature selection. Proceedings of 15th ACM SIGKDD international conference on knowledge discovery and data mining. pp 567–576

    Google Scholar 

  19. Awada W, Dittman D, Wald R, Napolitano A, Khoshgoftaar TM(2012) A review of the stability of feature selection techniques for bioinformatics data. In: Proceedings of 2012 IEEE 13th international conference information reuse and integration IRI 2012. pp 356–363

    Google Scholar 

  20. Haury AC, Gestraud P, Vert JP (2011) The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLoS One 6(12), e28210

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Braun DC, Reynolds JD (2012) Cost-effective variable selection in habitat surveys. Methods Ecol Evol 3:388–396

    Google Scholar 

  22. Guns T, Nijssen S, De Raedt L (2011) Itemset mining: a constraint programming perspective. Artif Intell 175:1951–1983

    Google Scholar 

  23. Talbi EG (2013) Combining metaheuristics with mathematical programming, constraint programming and machine learning. 4OR Q J Oper Res 11:101–150

    Google Scholar 

  24. Lapin M, Hein M, Schiele B (2014) Learning using privileged information: SV M+ and weighted SVM. Neural Netw 53:95–108

    PubMed  Google Scholar 

  25. Pechyony D, Vapnik V (2010) On the theory of learning with privileged information. Nips pp 1894–1902

    Google Scholar 

  26. Vapnik V, Vashist A (2009) A new learning paradigm: learning using privileged information. Neural Netw 22:544–557

    PubMed  Google Scholar 

  27. Chapelle O, Shivaswamy P, Vadrevu S, Weinberger K, Zhang Y, Tseng B (2011) Boosted multi-task learning. Mach Learn 85:149–173

    Google Scholar 

  28. Evgeniou T, Pontil M (2004) Regularized multi--task learning. Proceedings of 10th ACM SIGKDD pp 109–117

    Google Scholar 

  29. Romera-Paredes B, Argyriou A, Pontil M, Berthouze N (2012) Exploiting unrelated tasks in multi-task learning. Proceedings of 15th international conference of artificial intelligence statistics, vol 22, pp 951–959

    Google Scholar 

  30. Wang H, Nie F, Huang H, Risacher SL, Saykin AJ, Shen L et al (2012) Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning. Bioinformatics 28:i127–i136

    PubMed  PubMed Central  Google Scholar 

  31. Gong P, Ye J, Zhang C (2012) Robust multi-task feature learning. KDD 2012:895–903

    PubMed  PubMed Central  Google Scholar 

  32. Ishibuchi H, Nojima Y (2007) Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. Int J Approx Reason 44:4–31

    Google Scholar 

  33. Schwarz E, Izmailov R, Spain M, Barnes A, Mapes JP, Guest PC et al (2010) Validation of a blood-based laboratory test to aid in the confirmation of a diagnosis of schizophrenia. Biomark Insights 5:39–47

    PubMed  PubMed Central  Google Scholar 

  34. Gyorffy B, Molnar B, Lage H, Szallasi Z, Eklund AC (2009) Evaluation of microarray preprocessing algorithms based on concordance with RT-PCR in clinical samples. PLoS One 4(5):e5645

    PubMed  PubMed Central  Google Scholar 

  35. Pollack AZ, Perkins NJ, Mumford SL, Ye A, Schisterman EF (2013) Correlated biomarker measurement error: an important threat to inference in environmental epidemiology. Am J Epidemiol 177:84–92

    CAS  PubMed  Google Scholar 

  36. Shawe-Taylor J, Anthony M, Biggs NL (1993) Bounding sample size with the Vapnik-Chervonenkis dimension. Discret Appl Math 42:65–73

    Google Scholar 

  37. Cohn D, Tesauro G (1991) Howtight are the Vapnik-Chervonenkisbounds? Neural Comput 4:249–269

    Google Scholar 

  38. Dobbin K, Simon R (2005) Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics 6:27–38

    PubMed  Google Scholar 

  39. Shao L, Fan X, Cheng N, Wu L, Cheng Y (2013) Determination of minimum training sample size for microarray-based cancer outcome prediction-an empirical assessment. PLoS One 8:e68579

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Dobbin KK, Zhao Y, Simon RM (2008) How large a training set is needed to develop a classifier for microarray data? Clin Cancer Res 14:108–114

    CAS  PubMed  Google Scholar 

  41. Hwang D, Schmitt WA, Stephanopoulos G, Stephanopoulos G (2002) Determination of minimum sample size and discriminatory expression patterns in microarray data. Bioinformatics 18:1184–1193

    CAS  PubMed  Google Scholar 

  42. De Valpine P, Bitter HM, Brown MPS, Heller J (2009) A simulation-approximation approach to sample size planning for high-dimensional classification studies. Biostatistics 10:424–435

    PubMed  PubMed Central  Google Scholar 

  43. Beleites C, Neugebauer U, Bocklitz T et al (2013) Sample size planning for classification models. Anal Chim Acta 760:25–33

    CAS  PubMed  Google Scholar 

  44. Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W et al (2009) A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 10:213

    PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

This study was supported by the DFG Emmy-Noether-Program SCHW 1768/1-1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emanuel Schwarz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Schwarz, E. (2017). Identification and Clinical Translation of Biomarker Signatures: Statistical Considerations. In: Guest, P.C. (eds) Multiplex Biomarker Techniques. Methods in Molecular Biology, vol 1546. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-6730-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6730-8_6

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-6729-2

  • Online ISBN: 978-1-4939-6730-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics