High-Throughput Approaches to Biomarker Discovery and Challenges of Subsequent Validation

Veytsman, Boris; Baranova, Ancha

doi:10.1007/978-94-007-7696-8_20

High-Throughput Approaches to Biomarker Discovery and Challenges of Subsequent Validation

Boris Veytsman⁴ &
Ancha Baranova^4,5

Reference work entry
First Online: 01 January 2015

2506 Accesses
3 Citations

Part of the book series: Biomarkers in Disease: Methods, Discoveries and Applications

Abstract

Recently introduced high-throughput technologies are producing unprecedented volumes of biomedical data available for mining and analysis. The early predictions of the imminent breakthroughs in our understanding of human diseases and making predictive diagnostics easy, however, turned out to be largely over optimistic.

We argue that this situation is not coincidental, but rather is caused by the statistical properties of the data collected. A typical high-throughput biological dataset is deeply imbalanced: the data matrix includes many measured quantities or “levels” in a relatively small number of subjects. Thus, any attempt to analyze these datasets would be undermined by so-called “Dimensionality Curse” that may be solved by removing a majority of variables. The feature selection aimed at increasing the classification power may be done using data mining or correlation-based approaches. In this chapter, both theory-driven and data-driven approaches to deal with complexity in biological systems are discussed in details.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 399.99; Price excludes VAT (USA)

Hardcover Book: USD 549.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bartlett JW, Frost C, Mattsson N, Skillbäck T, Blennow K, Zetterberg H, Schott JM. Determining cut-points for Alzheimer’s disease biomarkers: statistical issues, methods and challenges. Biomark Med. 2012;6(4):391–400.
Article CAS PubMed Google Scholar
Drier Y, Domany E. Do two machine-learning based prognostic signatures for breast cancer capture the same biological processes? PLoS One. 2011;6(3):e17795. doi:10.1371/journal.pone.0017795. http://dx.doi.org/10.1371%2Fjournal.pone.0017795
Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics. 2005;21(2):171–8.
Article CAS PubMed Google Scholar
Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci U S A. 2006;103(15):5923–8.
Article CAS PubMed PubMed Central Google Scholar
Gray MA, Delahunt B, Fowles JR, Weinstein P, Cookes RR, Nacey JN. Demographic and clinical factors as determinants of serum levels of prostate specific antigen and its derivatives. Anticancer Res. 2004;24:2069–72.
PubMed Google Scholar
Hekal IA, Ibrahiem E. Obesity-PSA relationship: a new formula. Prostate Cancer Prostatic Dis. 2010;13(2):186–90.
Article CAS PubMed Google Scholar
Kupershmidt I, Su QJ, Grewal A, Sundaresh S, Halperin I, Flynn J, Shekar M, Wang H, Park J, Cui W, Wall GD, Wisotzkey R, Alag S, Akhtari S, Ronaghi M. Ontology-based meta-analysis of global collections of high-throughput public data. PLoS One. 2010;5(9):e13066. doi:10.1371/journal.pone.0013066. http://dx.doi.org/10.1371%2Fjournal.pone.0013066
Mayer G, Heinze G, Mischak H, Hellemons ME, Heerspink HJ, Bakker SJ, de Zeeuw D, Haiduk M, Rossing P, Oberbauer R. Omics-bioinformatics in the context of clinical data. Methods Mol Biol. 2011;719:479–97.
Article CAS PubMed Google Scholar
McDermott JE, Wang J, Mitchell H, Webb-Robertson BJ, Hafen R, Ramey J, Rodland KD. Challenges in biomarker discovery: combining expert insights with statistical analysis of complex omics data. Expert Opin Med Diagn. 2013;7(1):37–51.
Article CAS PubMed PubMed Central Google Scholar
Pyatnitskiy M, Karpova M, Moshkovskii S, Lisitsa A, Archakov A. Clustering mass spectral peaks increases recognition accuracy and stability of SVM-based feature selection. J Proteomics Bioinform. 2010;3:048–54. doi:10.4172/jpb.1000120.
Article CAS Google Scholar
Saeys Y, Inza I, Larraaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507–17.
Article CAS PubMed Google Scholar
Sinay YG. Probability theory, an introductory course. Berlin/New York: Springer; 1992.
Google Scholar
van ’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415(6871):530–6.
Article Google Scholar
Venet D, Dumont JE, Detours V. Most random gene expression signatures are significantly associated with breast cancer outcome. PLoS Comput Biol. 2011;7(10):e1002240. doi:10.1371/journal.pcbi.1002240. http://dx.doi.org/10.1371%2Fjournal.pcbi.1002240
Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, Jatkoe T, Berns EM, Atkins D, Foekens JA. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365(9460):671–9.
Article CAS PubMed Google Scholar

Download references

Acknowledgment

The authors express gratitude to the general support provided by College of Science, George Mason University, a State Contract 14.607.21.0098 dated November 27th, 2014 (Ministry of Science and Education, Russia) and by the Human Proteome Scientific Program of the Federal Agency of Scientific Organizations, Russia.

Author information

Authors and Affiliations

Center for the Study of Chronic Metabolic Diseases, School of System Biology, George Mason University, MSN 3E1, 4400 University Dr, 22030-4444, Fairfax, VA, USA
Boris Veytsman & Ancha Baranova
Research Centre for Medical Genetics, Russian Academy of Medical Sciences, 1, Moskvorechye, 115478, Moscow, Russia
Ancha Baranova

Authors

Boris Veytsman
View author publications
You can also search for this author in PubMed Google Scholar
Ancha Baranova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ancha Baranova .

Editor information

Editors and Affiliations

Department of Nutrition and Dietetics, Division of Diabetes & Nutritional Sciences, Faculty of Life Sciences & Medicine, King's College London, London, United Kingdom
Victor R. Preedy
Faculty of Science & Technology, Department of Biomedical Sciences, University of Westminster, London, United Kingdom
Vinood B. Patel

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Veytsman, B., Baranova, A. (2015). High-Throughput Approaches to Biomarker Discovery and Challenges of Subsequent Validation. In: Preedy, V., Patel, V. (eds) General Methods in Biomarker Research and their Applications. Biomarkers in Disease: Methods, Discoveries and Applications. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-7696-8_20

Download citation

DOI: https://doi.org/10.1007/978-94-007-7696-8_20
Published: 19 June 2015
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-7695-1
Online ISBN: 978-94-007-7696-8
eBook Packages: Biomedical and Life SciencesReference Module Biomedical and Life Sciences

Publish with us

Policies and ethics