Identification and Clinical Translation of Biomarker Signatures: Statistical Considerations

Schwarz, Emanuel

doi:10.1007/978-1-4939-6730-8_6

Emanuel Schwarz³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1546))

3696 Accesses
2 Citations

Abstract

Powerful machine learning tools exist to extract biological patterns for diagnosis or prediction from high-dimensional datasets. Simultaneous advances in high-throughput profiling technologies have led to a rapid acceleration of biomarker discovery investigations across all areas of medicine. However, the translation of biomarker signatures into clinically useful tools has thus far been difficult. In this chapter, several important considerations are discussed that influence such translation in the context of classifier design. These include aspects of variable selection that go beyond classification accuracy, as well as effects of variability on assay stability and sample size. The consideration of such factors may lead to an adaptation of biomarker discovery approaches, aimed at an optimal balance of performance and clinical translatability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cohen Freue GV, Meredith A, Smith D, Bergman A, Sasaki M, Lam KK et al (2013) Computational biomarker pipeline from discovery to clinical implementation: plasma proteomic biomarkers for cardiac transplantation. PLoS Comput Biol 9:e1002963
PubMed PubMed Central Google Scholar
Zhang Z, Chan DW (2010) The road from discovery to clinical diagnostics: lessons learned from the first FDA-cleared in vitro diagnostic multivariate index assay of proteomic biomarkers. Cancer Epidemiol Biomarkers Prev 19:2995–2999
CAS PubMed PubMed Central Google Scholar
Alymani NA, Smith MD, Williams DJ, Petty RD (2010) Predictive biomarkers for personalised anti-cancer drug use: discovery to clinical implementation. Eur J Cancer 46:869–879
CAS PubMed Google Scholar
Deyati A, Younesi E, Hofmann-Apitius M, Novac N (2013) Challenges and opportunities for oncology biomarker discovery. Drug Discov Today 18:614–624
CAS PubMed Google Scholar
Jin G, Zhou X, Wang H, Wong STC (2010) The challenges in blood proteomic biomarker discovery. In: Pham T (ed) Comput Biol. Springer, New York, pp 273–299
Google Scholar
Rifai N, Gillette MA, Carr SA (2006) Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat Biotechnol 24:971–983
CAS PubMed Google Scholar
Füzéry AK, Levin J, Chan MM, Chan DW (2013) Translation of proteomic biomarkers into FDA approved cancer diagnostics: issues and challenges. Clin Proteomics 10:13
PubMed PubMed Central Google Scholar
Goodsaid F, Mattes WB (2013) Thepath from biomarker discovery to regulatory qualification. 1 edn., Academic Press. Accessed 16 July 2013. ISBN: 0123914965
Google Scholar
Kotsiantis SB (2007) Supervisedmachine learning: a review of classification techniques. Informatica 31:249–268. doi:10.1115/1.1559160
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Elements 1:337–387. doi:10.1007/b94608
Article Google Scholar
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517
CAS PubMed Google Scholar
Kononenko I (2015) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23:89–109. doi:10.1016/S0933-3657(01)00077-X
Article Google Scholar
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. Proceedings of 23rd international conference machine learning. pp 161–168. doi: 10.1145/1143844.1143865
Guo Y, Graber A, McBurney RN, Balasubramanian R (2010) Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms. BMC Bioinformatics 11:447
PubMed PubMed Central Google Scholar
Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. Proceedings of 25th international conference machine learning. pp 96–103. doi: 10.1145/1390156.1390169
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Google Scholar
He Z, Yu W (2010) Stable feature selection for biomarker discovery. Comput Biol Chem 34:215–225
CAS PubMed Google Scholar
Loscalzo S, Yu L, Ding C (2009) Consensus group stable feature selection. Proceedings of 15th ACM SIGKDD international conference on knowledge discovery and data mining. pp 567–576
Google Scholar
Awada W, Dittman D, Wald R, Napolitano A, Khoshgoftaar TM(2012) A review of the stability of feature selection techniques for bioinformatics data. In: Proceedings of 2012 IEEE 13th international conference information reuse and integration IRI 2012. pp 356–363
Google Scholar
Haury AC, Gestraud P, Vert JP (2011) The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLoS One 6(12), e28210
CAS PubMed PubMed Central Google Scholar
Braun DC, Reynolds JD (2012) Cost-effective variable selection in habitat surveys. Methods Ecol Evol 3:388–396
Google Scholar
Guns T, Nijssen S, De Raedt L (2011) Itemset mining: a constraint programming perspective. Artif Intell 175:1951–1983
Google Scholar
Talbi EG (2013) Combining metaheuristics with mathematical programming, constraint programming and machine learning. 4OR Q J Oper Res 11:101–150
Google Scholar
Lapin M, Hein M, Schiele B (2014) Learning using privileged information: SV M+ and weighted SVM. Neural Netw 53:95–108
PubMed Google Scholar
Pechyony D, Vapnik V (2010) On the theory of learning with privileged information. Nips pp 1894–1902
Google Scholar
Vapnik V, Vashist A (2009) A new learning paradigm: learning using privileged information. Neural Netw 22:544–557
PubMed Google Scholar
Chapelle O, Shivaswamy P, Vadrevu S, Weinberger K, Zhang Y, Tseng B (2011) Boosted multi-task learning. Mach Learn 85:149–173
Google Scholar
Evgeniou T, Pontil M (2004) Regularized multi--task learning. Proceedings of 10th ACM SIGKDD pp 109–117
Google Scholar
Romera-Paredes B, Argyriou A, Pontil M, Berthouze N (2012) Exploiting unrelated tasks in multi-task learning. Proceedings of 15th international conference of artificial intelligence statistics, vol 22, pp 951–959
Google Scholar
Wang H, Nie F, Huang H, Risacher SL, Saykin AJ, Shen L et al (2012) Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning. Bioinformatics 28:i127–i136
PubMed PubMed Central Google Scholar
Gong P, Ye J, Zhang C (2012) Robust multi-task feature learning. KDD 2012:895–903
PubMed PubMed Central Google Scholar
Ishibuchi H, Nojima Y (2007) Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. Int J Approx Reason 44:4–31
Google Scholar
Schwarz E, Izmailov R, Spain M, Barnes A, Mapes JP, Guest PC et al (2010) Validation of a blood-based laboratory test to aid in the confirmation of a diagnosis of schizophrenia. Biomark Insights 5:39–47
PubMed PubMed Central Google Scholar
Gyorffy B, Molnar B, Lage H, Szallasi Z, Eklund AC (2009) Evaluation of microarray preprocessing algorithms based on concordance with RT-PCR in clinical samples. PLoS One 4(5):e5645
PubMed PubMed Central Google Scholar
Pollack AZ, Perkins NJ, Mumford SL, Ye A, Schisterman EF (2013) Correlated biomarker measurement error: an important threat to inference in environmental epidemiology. Am J Epidemiol 177:84–92
CAS PubMed Google Scholar
Shawe-Taylor J, Anthony M, Biggs NL (1993) Bounding sample size with the Vapnik-Chervonenkis dimension. Discret Appl Math 42:65–73
Google Scholar
Cohn D, Tesauro G (1991) Howtight are the Vapnik-Chervonenkisbounds? Neural Comput 4:249–269
Google Scholar
Dobbin K, Simon R (2005) Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics 6:27–38
PubMed Google Scholar
Shao L, Fan X, Cheng N, Wu L, Cheng Y (2013) Determination of minimum training sample size for microarray-based cancer outcome prediction-an empirical assessment. PLoS One 8:e68579
CAS PubMed PubMed Central Google Scholar
Dobbin KK, Zhao Y, Simon RM (2008) How large a training set is needed to develop a classifier for microarray data? Clin Cancer Res 14:108–114
CAS PubMed Google Scholar
Hwang D, Schmitt WA, Stephanopoulos G, Stephanopoulos G (2002) Determination of minimum sample size and discriminatory expression patterns in microarray data. Bioinformatics 18:1184–1193
CAS PubMed Google Scholar
De Valpine P, Bitter HM, Brown MPS, Heller J (2009) A simulation-approximation approach to sample size planning for high-dimensional classification studies. Biostatistics 10:424–435
PubMed PubMed Central Google Scholar
Beleites C, Neugebauer U, Bocklitz T et al (2013) Sample size planning for classification models. Anal Chim Acta 760:25–33
CAS PubMed Google Scholar
Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W et al (2009) A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 10:213
PubMed PubMed Central Google Scholar

Download references

Acknowledgments

This study was supported by the DFG Emmy-Noether-Program SCHW 1768/1-1.

Author information

Authors and Affiliations

Department of Psychiatry and Psychotherapy, Medical Faculty Mannheim, Central Institute of Mental Health, Heidelberg University, Mannheim, Germany
Emanuel Schwarz

Authors

Emanuel Schwarz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emanuel Schwarz .

Editor information

Editors and Affiliations

Laboratory of Neuroproteomics, University of Campinas (UNICAMP), Campinas, Brazil
Paul C. Guest

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Schwarz, E. (2017). Identification and Clinical Translation of Biomarker Signatures: Statistical Considerations. In: Guest, P.C. (eds) Multiplex Biomarker Techniques. Methods in Molecular Biology, vol 1546. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-6730-8_6

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6730-8_6
Published: 29 November 2016
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-6729-2
Online ISBN: 978-1-4939-6730-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics