Skip to main content

Applied Data Mining: From Biomarker Discovery to Decision Support Systems

  • Chapter
  • First Online:
Computational Medicine

Abstract

This chapter provides an overview of emerging bioinformatics methods for the biomarker discovery process and medical decision support. It introduces study design consideration and bioanalytic concepts for generating biomedical data, followed by various data mining and information retrieval procedures such as feature selection, classification as well as statistical and clinical validation. The reviewed methods are illustrated by real examples from preclinical and clinical studies, and the application in medical decision making is discussed. This chapter is anticipated to address to those with a bioinformatics background as well as biomedical researchers who are interested in the application of computational methods in biomarker discovery and medical decision making.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Ash J (2007) Organizational and cultural change considerations. In: Greenes R (ed) Clinical decision support: the road ahead. Elsevier, Amsterdam, pp 385–402

    Chapter  Google Scholar 

  • Baumgartner C, Rejtar T, Kullolli M, Akella LM, Karger BL (2008) Semop: a new computational strategy for the unrestricted search for modified peptides using LC-MS/MS data. J Proteome Res 7(9):4199–4208

    Article  PubMed  CAS  Google Scholar 

  • Baumgartner C, Lewis GD, Netzer M, Pfeifer B, Gerszten RE (2010) A new data mining approach for profiling and categorizing kinetic patterns of metabolic biomarkers after myocardial injury. Bioinformatics 26(14):1745–1751

    Article  PubMed  CAS  Google Scholar 

  • Baumgartner C, Osl M, Netzer M, Baumgartner D (2011) Bioinformatic-driven search for metabolic biomarkers in disease. J Clin Bioinformatics 1:2

    Article  CAS  Google Scholar 

  • Bishop C (1995) Neural networks for pattern recognition. Oxford University Press, Oxford

    Google Scholar 

  • Campagne F, Skrabanek L (2006) Mining expressed sequence tags identifies cancer markers of clinical interest. BMC Bioinformatics 7:481

    Article  PubMed  Google Scholar 

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

    Google Scholar 

  • Ding L, Wendl MC, Koboldt DC, Mardis ER (2010) Analysis of next-generation genomic data in cancer: accomplishments and challenges. Hum Mol Genet 19(R2):R188–R196

    Article  PubMed  CAS  Google Scholar 

  • Donach M, Yu Y, Artioli G, Banna G, Feng W, Bast RC, Zhang Z, Nicoletto MO (2010) Combined use of biomarkers for detection of ovarian cancer in high-risk women. Tumour Biol 31(3):209–215

    Article  PubMed  CAS  Google Scholar 

  • Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5–6):352–359

    Article  PubMed  Google Scholar 

  • Feng Z, Prentice R, Srivastava S (2004) Research issues and strategies for genomic and proteomic biomarker discovery and validation: a statistical perspective. Pharmacogenomics 5(6):709–719

    Article  PubMed  CAS  Google Scholar 

  • Gerszten RE, Wang TJ (2008) The search for new cardiovascular biomarkers. Nature 451(7181):949–952

    Article  PubMed  CAS  Google Scholar 

  • Gini C (1921) Measurement of inequality of income. Econ J 31(121):124–126

    Article  Google Scholar 

  • Gonzales A, Liao L (2010) Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines. BMC Bioinforma 11:537

    Article  Google Scholar 

  • Greene CS, Gilmore J, Kiralis J, Andrews PC, Moore JH (2009) Optimal use of expert knowledge in ant colony optimization for the analysis of epistasis in human disease. Lect Notes Comput Sci 5483(92103)

    Google Scholar 

  • Griffiths WJ, Koal T, Wang Y, Kohl M, Enot DP, Deigner H-P (2010) Targeted metabolomics for biomarker discovery. Angew Chem Int Ed Engl 49(32):5426–5445

    Article  PubMed  CAS  Google Scholar 

  • Guo L, Rivero D, Pazos A (2010) Composite MR image reconstruction and unaliasing for general trajectories using neural networks. Magn Reson Imaging 28(10):1468–1484

    Article  Google Scholar 

  • Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Machine Learning 46(13):389–422

    Article  Google Scholar 

  • Hawkridge AM, Muddiman DC (2009) Mass spectrometry-based biomarker discovery: toward a global proteome index of individuality. Annu Rev Anal Chem (Palo Alto Calif) 2:265–277

    Article  CAS  Google Scholar 

  • Horgan RP, Clancy OH, Myers JE, Baker PN (2009) An overview of proteomic and metabolomic technologies and their application to pregnancy research. BJOG 116(2):173–181

    Article  PubMed  CAS  Google Scholar 

  • Hosmer D, Hosmer T, Cessie SL, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine 1997;16:965–980

    Article  PubMed  CAS  Google Scholar 

  • Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley-Interscience, New York

    Book  Google Scholar 

  • Huttenhower C, Hofmann O (2010) A quick guide to large-scale genomic data mining. PLoS Comput Biol 6(5):e1000779

    Article  PubMed  Google Scholar 

  • John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Proceedings of the 11th international conference on machine learning, New Brunswick, NJ

    Google Scholar 

  • Kohavi R, John GH (1998) The wrapper approach. In: Liu H, Motoda H (eds) Feature selection for knowledge discovery and data mining. Kluwer, Boston, pp 33–50

    Google Scholar 

  • Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Proceedings of the European conference on machine learning, Catania, Italy, pp 171–182

    Google Scholar 

  • Kuss O (2002) Global goodness-of-fit tests in logistic regression with sparse data. Stat Med 21(24):3789–3801

    Article  PubMed  Google Scholar 

  • Langley P (1994) Selection of relevant features in machine learning. In: Proceedings of the AAAI fall symposium on relevance, New Orleans, LA, pp 140–144

    Google Scholar 

  • Mann CJ (2003) Observational research methods. Research design. II. Cohort, cross sectional, and case–control studies. Emerg Med J 20(1):54–60

    Article  PubMed  CAS  Google Scholar 

  • Martin-Merino M (2010) k-nn for the classification of human cancer samples using the gene expression profiles. Adv Exp Med Biol 680:157–164

    Article  PubMed  CAS  Google Scholar 

  • Meyerson M, Gabriel S, Getz G (2010) Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet 11(10):685–696

    Article  PubMed  CAS  Google Scholar 

  • Mitchell TM (1997) Machine learning. McGraw Hill, New York

    Google Scholar 

  • Moore JH, White BC (2007) Tuning ReliefF for genome-wide genetic analysis. Lect Notes Comput Sci 4447(166–175)

    Google Scholar 

  • Mundra P, Rajapakse J (2010) SVMRFE with MRMR filter for gene selection. IEEE Trans Nanobiosci 9(1):31–37

    Article  Google Scholar 

  • Musen M, Shahar Y, Shortliffe E (2006) Clinical decision-support systems. In: Shortliffe E, Cimino J (eds) Biomedical Informatics: computer applications in health care and biomedicine. Springer, New York, pp 698–736

    Google Scholar 

  • Netzer M, Millonig G, Osl M, Pfeifer B, Praun S, Villinger J, Vogel W, Baumgartner C (2009) A new ensemble based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry. Bioinformatics 25(7):941–947

    Article  PubMed  CAS  Google Scholar 

  • Nick T, Campbell K (2007) Logistic regression. Methods Mol Biol 404:273–301

    Article  PubMed  CAS  Google Scholar 

  • Osl M, Dreiseitl S, Pfeifer B, Weinberger K, Klocker H, Bartsch G, Schäfer G, Tilg B, Graber A, Baumgartner C (2008) A new rule-based algorithm for identifying metabolic markers in prostate cancer using tandem mass spectrometry. Bioinformatics 24(24):2908–2914

    Article  PubMed  CAS  Google Scholar 

  • Osl M, Dreiseitl S, Cerqueira F, Netzer M, Pfeifer B, Baumgartner C (2009) Demoting redundant features to improve the discriminatory ability in cancer data. J Biomed Inform 42(4):721

    Article  PubMed  CAS  Google Scholar 

  • Pan W (2002) A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18(4):546–554

    Article  PubMed  CAS  Google Scholar 

  • Parker CE, Pearson TW, Anderson NL, Borchers CH (2010) Mass-spectrometry-based clinical proteomicsa review and prospective. Analyst 135(8):1830–1838

    Article  PubMed  CAS  Google Scholar 

  • Pfeifer B, Aschaber J, Baumgartner C, Dreiseitl S, Modre R, Schreier G, Tilg B (2007) A data warehouse for prostate cancer biomarker discovery. In: BIOCOMP, Las Vegas, NV, pp 323–327

    Google Scholar 

  • Qian W-J, Jacobs JM, Liu T, Camp DG, Smith RD (2006) Advances and challenges in liquid chromatography mass spectrometry-based proteomics profiling for clinical applications. Mol Cell Proteomics 5(10):1727–1744

    Article  PubMed  CAS  Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Ranawana R, Palade V (2006) Multi-classifier systems: review and a roadmap for developers. J Hybrid Intell Sys 3(1):35–61

    Google Scholar 

  • Rocke DM (2004) Design and analysis of experiments with high throughput biological assay data. Semin Cell Dev Biol 15(6):703–713

    PubMed  CAS  Google Scholar 

  • Rodgers A, Zhu H, Fourches D, Rusyn I, Tropsha A (2010) Modeling liver-related adverse effects of drugs using k nearest neighbor quantitative structure-activity relationship method. Chem Res Toxicol 23:724–732

    Article  PubMed  CAS  Google Scholar 

  • Saeys Y, Abeel T, Peer Y (2008) Robust feature selection using ensemble feature selection techniques. In: ECML PKDD’08: Proceedings of the European conference on machine learning and knowledge discovery in databases—Part II. Springer, Heidelberg, pp 313–325

    Google Scholar 

  • Shin H, Sheu B, Markey MK (2005) guilt-by-association feature selection applied to simulated proteomic data. In: AMIA Annu Symp Proc, Washington, DC, p 1114

    Google Scholar 

  • Sinha N, Ramakrishnan A, Saranathan M (2010) Epileptic seizure detection using multiwavelet transform based approximate entropy and artificial neural networks. J Neuro Sci Methods 193(1):156–163

    Article  Google Scholar 

  • Sittig D, Wright A, Osheroff J, Middleton B, Teich J, Ash JC, Campbell E, Bates D (2008) Grand challenges in clinical decision support. J Biomed Inform 41(2):387–392

    Article  PubMed  Google Scholar 

  • Stephan C, Rittenhouse H, Cammann H, Lein M, Schrader M, Deger S, Miller K, Jung K (2009) New markers and multivariate models for prostate cancer detection. Anticancer Res 29(7):2589–2600

    PubMed  CAS  Google Scholar 

  • Tsai C, Clark S, Camargo CA Jr (2010) Risk stratification for hospitalization in acute asthma: the chop classification tree. Am J Emerg Med 28(7):803–808

    Article  PubMed  Google Scholar 

  • Turaga K, Acs G, Laronga C (2010) Gene expression profiling in breast cancer. Cancer Control 17(3):177–182

    PubMed  Google Scholar 

  • Wang M, Chen JY (2010) A GMM-IG framework for selecting genes as expression panel biomarkers. Artif Intell Med 48(2–3):75–82

    Article  PubMed  Google Scholar 

  • Wang S, Wu F, Wang B (2010) Prediction of severe sepsis using SVM model. Adv Exp Med Biol 680:75–81

    Article  PubMed  Google Scholar 

  • Wei C, Li J, Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments. BMC Genomics 5(1):87

    Article  PubMed  Google Scholar 

  • Wendt T, Knaup-Gregori P, Winter A (2000) Decision support in medicine: a survey of problems of user acceptance. In: Hasman A (ed) Medical infobahn for Europe. IOS Press, Amsterdam, pp 852–856

    Google Scholar 

  • Wiener M, Acland K, Shaw H, Soong S, Lin H, Chen D, Scolyer R, Winstanley J, Thompson J (2010) Sentinel node positive melanoma patients: prediction and prognostic significance of nonsentinel node metastases and development of a survival tree model. Ann Surg Oncol 17(8):1995–2005

    Article  PubMed  Google Scholar 

  • Zhou X, Tuck D (2007) MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23(9):1106–1114

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Baumgartner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Wien

About this chapter

Cite this chapter

Osl, M., Netzer, M., Dreiseitl, S., Baumgartner, C. (2012). Applied Data Mining: From Biomarker Discovery to Decision Support Systems. In: Trajanoski, Z. (eds) Computational Medicine. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0947-2_10

Download citation

Publish with us

Policies and ethics