Skip to main content

Analyzing Support Vector Machine Overfitting on Microarray Data

  • Conference paper
Intelligent Computing in Bioinformatics (ICIC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8590))

Included in the following conference series:

Abstract

Support vector machines (SVM) are a widely used state-of-the-art classifier in molecular diagnostics. However, there is little work done on its overfitting analysis to avoid deceptive diagnostic results. In this work, we investigate the important problem and prove that a SVM classifier would inevitably encounter overfitting for gene expression array data under a standard Gaussian kernel due to the built-in large data variations from DNA amplification mechanism in the transcriptional profiling. We have found that SVM demonstrates its own special overfitting characteristics on array data, in addition to showing that feature selection algorithms may not contribute to overcoming overfitting, and discussing overfitting in biomarker discovery algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998)

    MATH  Google Scholar 

  2. Han, X.: Nonnegative Principal component Analysis for Cancer Molecular Pattern Discovery. IEEE/ACM Transaction of Computational Biology and Bioinformatics 7(3), 537–549 (2010)

    Article  Google Scholar 

  3. Han, H., Li, X.-L.: Multi-resolution Independent Component Analysis for High-Performance Tumor Classification and Biomarker Discovery. BMC Bioinformatics 12(S1), S7 (2011)

    Google Scholar 

  4. Boersma, B.J., et al.: A stromal gene signature associated with inflammatory breast cancer. Int. J. Cancer 122(6), 1324–1332 (2008)

    Article  Google Scholar 

  5. Brunet, J., Tamayo, P., Golub, T., Mesirov, J.: Molecular pattern discovery using matrix factorization. Proc. Natl Acad. Sci. USA 101(12), 4164–4169 (2004)

    Article  Google Scholar 

  6. Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)

    Article  Google Scholar 

  7. Hedenfalk, I., et al.: Gene-Expression Profiles in Hereditary Breast Cancer. The New England Journal of Medicine 344, 539–548 (2001)

    Article  Google Scholar 

  8. van ’t Veer, L.J., et al.: Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer. Nature 415(6871), 530–536 (2001)

    Google Scholar 

  9. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer Series in Statistics. Springer, NY (2002)

    MATH  Google Scholar 

  10. Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)

    Book  Google Scholar 

  11. Lin, C.: Projected gradient methods for non-negative matrix factorization. Neural Computation 19(10), 2756–2779 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  12. Fox, R., Dimmic, M.: A two-sample Bayesian t-test for microarray data. BMC Bioinformatics 7(126) (2006)

    Google Scholar 

  13. Twyman, R., Primrose, S.: Principles of gene manipulation and genomics, 7th edn. Blackwell Publishing (2006)

    Google Scholar 

  14. Stein, A., et al.: A Serial Analysis of Gene Expression (SAGE) Database Analysis of Chemosensitivity: Comparing Solid Tumors with Cell Lines and Comparing Solid Tumors from Different Tissue Origins. Cancer Research 64, 2805–2816 (2004)

    Article  Google Scholar 

  15. Pomeroy, S.L., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)

    Article  Google Scholar 

  16. Han, H.: A novel profile-biomarker diagnosis for mass spectral proteomics. In: Pacific Symposium on Biocomputing (PSB), vol. 19, pp. 340–351 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Han, H. (2014). Analyzing Support Vector Machine Overfitting on Microarray Data. In: Huang, DS., Han, K., Gromiha, M. (eds) Intelligent Computing in Bioinformatics. ICIC 2014. Lecture Notes in Computer Science(), vol 8590. Springer, Cham. https://doi.org/10.1007/978-3-319-09330-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09330-7_19

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09329-1

  • Online ISBN: 978-3-319-09330-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics