Abstract
The advent of Omics technologies as genomics and proteomics has brought the hope of discovering novel biomarkers that can be used to diagnose, predict, and monitor the progress of disease. The importance of data mining to identify biological markers for the diagnostic classification and prognostic assessment in the context of microarray and proteomic data has been increasingly recognized. We present an overview of general data mining methods and their applications to biomarker discovery with particular focus on genomics and proteomics data. Two case studies are exemplarily presented, and relevant data mining terminology and techniques are explained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Soreide K. (2009) Receiver-operating characteristic curve analysis in diagnostic, prognostic and predictive biomarker research. J Clin Pathol 62, 1–5.
Jaffe C.C. (2009) Pathology and imaging in biomarker development. Arch Pathol Lab Med 133, 547–9.
de Oliveira L.S., Andreao R.V., and Sarcinelli-Filho M. (2010) The use of bayesian networks for heart beat classification. Adv Exp Med Biol 657, 217–31.
Kwon S., Cui J., Rhodes S.L., Tsiang D., Rotter J.I., and Guo X. (2009) Application of Bayesian classification with singular value decomposition method in genome-wide association studies. BMC Proc 3, S9.
Needham C.J., Bradford J.R., Bulpitt A.J., and Westhead D.R. (2006) Inference in Bayesian networks. Nat Biotechnol 24, 51–3.
Deng X., Geng H., and Ali H.H. (2007) Cross-platform analysis of cancer biomarkers: A Bayesian network approach to incorporating mass spectrometry and microarray data. Cancer Inform 3, 183–202.
van Steensel B., Braunschweig U., Filion G.J., Chen M., van Bemmel J.G., and Ideker T. (2010) Bayesian network analysis of targeting interactions in chromatin. Genome Res 20, 190–200.
Lai K.C., Chiang H.C., Chen W.C., Tsai F.J., and Jeng L.B. (2008) Artificial neural network-based study can predict gastric cancer staging. Hepatogastroenterology 55, 1859–63.
Amiri Z., Mohammad K., Mahmoudi M., Zeraati H., and Fotouhi A. (2008) Assessment of gastric cancer survival: Using an artificial hierarchical neural network. Pac J Biol Sci 11, 1076–84.
Chi C.L., Street W.N., and Wolberg W.H. (2007) Application of artificial neural network-based survival analysis on two breast cancer datasets. AMIA Annu Symp Proc 130–4.
Anagnostopoulos I., and Maglogiannis I. (2006) Neural network-based diagnostic and prognostic estimations in breast cancer microscopic instances. Med Biol Eng Comput 44, 773–84.
Wang H.Q., Wong H.S., Zhu H., and Yip T.T. (2009) A neural network-based biomarker association information extraction approach for cancer classification. J Biomed Inform 42, 654–66.
Dolled-Filhart M., Ryden L., Cregger M., Jirstrom K., Harigopal M., Camp R.L., and Rimm D.L. (2006) Classification of breast cancer using genetic algorithms and tissue microarrays. Clin Cancer Res 12, 6459–68.
Su Y., Shen J., Qian H., Ma H., Ji J., Ma L., Zhang W., Meng L., Li Z., Wu J., et al. (2007) Diagnosis of gastric cancer using decision tree classification of mass spectral data. Cancer Sci 98, 37–43.
Kohler S., Bauer S., Horn D., and Robinson P.N. (2008) Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 82, 949–58.
Tian Z., Palmer N., Schmid P., Yao H., Galdzicki M., Berger B., Wu E., Kohane I.S. (2009) A practical platform for blood biomarker study by using global gene expression profiling of peripheral whole blood. PLoS One 4, e5157.
You Q., Fang S., and Chen J.Y. (2008) GeneTerrain: Visual exploration of differential gene expression profiles organized in native biomolecular interaction networks. J Inf Vis, doi: 10.1057/palgrave.ivs.9500169.
Liu Z., Guo Z., Tan M. (2008) Constructing tumor progression pathways and biomarker discovery with fuzzy kernel kmeans and DNA methylation data. Cancer Inform 6, 1–7.
Lee P.S., and Lee K.H. (2000) Genomic analysis. Curr Opin Biotechnol 11, 171–5.
Yang Y., Pospisil P., Iyer L.K., Adelstein S.J., and Kassis A.I. (2008) Integrative genomic data mining for discovery of potential blood-borne biomarkers for early diagnosis of cancer. PLoS One 3, e3661.
Fernandez-Suarez X.M., and Birney E. (2008) Advanced genomic data mining. PLoS Comput Biol 4, e1000121.
Dinu V., Zhao H., and Miller P.L. (2007) Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis. J Biomed Inform 40, 750–60.
Zhu Y., Shen X., and Pan W. (2009) Network-based support vector machine for classification of microarray samples. BMC Bioinformatics 10, S21.
Lancashire L.J., Lemetre C., and Ball G.R. (2009) An introduction to artificial neural networks in bioinformatics – application to complex microarray and mass spectrometry datasets in cancer studies. Brief Bioinform 10, 315–29.
Saksena A., Lucarelli D., and Wang I.J. (2005) Bayesian model selection for mining mass spectrometry data. Neural Netw 18, 843–9.
Conrads T.P., Zhou M., and Petricoin E.F., Liotta L., and Veenstra T.D. (2003) Cancer diagnosis using proteomic patterns. Expert Rev Mol Diagn 3, 411–20.
Petricoin E.F., and Liotta L.A. (2004) SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cancer. Curr Opin Biotechnol 15, 24–30.
Schaub N.P., Jones K.J., Nyalwidhe J.O., Cazares L.H., Karbassi I.D., Semmes O.J., Feliberti E.C., Perry R.R., and Drake R.R. (2009) Serum proteomic biomarker discovery reflective of stage and obesity in breast cancer patients. J Am Coll Surg 208, 970–8.
Rogers M.A., Clarke P., Noble J., Munro N.P., Paul A., Selby P.J., and Banks R.E. (2003) Proteomic profiling of urinary proteins in renal cancer by surface enhanced laser desorption ionization and neural-network analysis: Identification of key issues affecting potential clinical utility. Cancer Res 63, 6971–83.
Huang H., Li J., and Chen J.Y. (2009) Disease gene-fishing in molecular interaction networks: A case study in colorectal cancer. Engineering in Medicine and Biology Society, 2009 EMBC 2009 Annual International Conference of the IEEE 2009, 3.
Zhang F., and Chen J.Y. (2009) A neural network approach to developing multi-marker panels for breast cancer based on LC/MS/MS proteomics profiles. Proceedings of the 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2009.
Acknowledgments
This work was supported in part by a grant from the National Cancer Institute (U24CA126480-01), part of NCI’s Clinical Proteomic Technologies Initiative (http://proteomics.cancer.gov), awarded to Dr. Fred Regnier (PI) and Dr. Jake Chen (co-PI). We thank Hui Huang and Jiao Li for providing a case study.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Zhang, F., Chen, J.Y. (2011). Data Mining Methods in Omics-Based Biomarker Discovery. In: Mayer, B. (eds) Bioinformatics for Omics Data. Methods in Molecular Biology, vol 719. Humana Press. https://doi.org/10.1007/978-1-61779-027-0_24
Download citation
DOI: https://doi.org/10.1007/978-1-61779-027-0_24
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-026-3
Online ISBN: 978-1-61779-027-0
eBook Packages: Springer Protocols