Skip to main content

Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets

  • Conference paper
Advances in Bioinformatics and Computational Biology (BSB 2009)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5676))

Included in the following conference series:

Abstract

Supervised Machine Learning methods have been successfully applied for performing gene expression based cancer diagnosis. Characteristics intrinsic to cancer gene expression data sets, such as high dimensionality, low number of samples and presence of noise makes the classification task very difficult. Furthermore, limitations in the classifier performance may often be attributed to characteristics intrinsic to a particular data set.

This paper presents an analysis of gene expression data sets for cancer diagnosis using classification complexity measures. Such measures consider data geometry, distribution and linear separability as indications of complexity of the classification task. The results obtained indicate that the cancer data sets investigated are formed by mostly linearly separable non-overlapping classes, supporting the good predictive performance of robust linear classifiers, such as SVMs, on the given data sets. Furthermore, we found two complexity indices, which were good indicators for the difficulty of gene expression based cancer diagnosis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alberts, B., Al, E.: Molecular Biology of the Cell. Garland Science (2002)

    Google Scholar 

  2. Bernadó-Mansilla, E., Maciá-Antonilez, N.: Modeling problem transformation based on data complexity. In: Angulo, C., Godo, L. (eds.) Artificial Intelligence Research and Development, pp. 133–139. IOS Press, Amsterdam (2007)

    Google Scholar 

  3. de Souto, M.C.P., Costa, I.G., de Araujo, D.S.A., Ludermir, T.B., Schliep, A.: Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 9, 497+ (2008)

    Article  PubMed  PubMed Central  Google Scholar 

  4. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–87 (2002)

    Article  CAS  Google Scholar 

  5. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. American Statistical Association 97(457), 77–87 (2002)

    Article  CAS  Google Scholar 

  6. Dupuy, A., Simon, R.: Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J. Natl. Cancer Institute 99(2), 147–157 (2007)

    Article  Google Scholar 

  7. Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 209–217 (1998)

    Google Scholar 

  8. Friedman, H., Rafsky, L.C.: Multivariate generalization of the wald-wolfowitz and smirnov two-sample tests. Ann. Statist. 7, 697–717 (1979)

    Article  Google Scholar 

  9. Giraud-Carrier, C., Vilalta, R., Brazdil, P.: Introduction to the special issue on meta-learning. Mach. Learn. 54(3), 187–193 (2004)

    Article  Google Scholar 

  10. Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  CAS  PubMed  Google Scholar 

  11. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: Data mining, inference and prediction. Springer, New York (2001)

    Book  Google Scholar 

  12. Ho, T., Basu, M.: Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(3), 289–300 (2002)

    Article  Google Scholar 

  13. Irizarry, R.A., Warren, D., Spencer, F., Kim, I.F., Biswal, S., Frank, B.C., Gabrielson, E., Garcia, J.G.N., Geoghegan, J., Germino, G., Griffin, C., Hilmer, S.C., Hoffman, E., Jedlicka, A.E., Kawasaki, E., Martinez-Murillo, F., Morsberger, L., Lee, H., Petersen, D., Quackenbush, J., Scott, A., Wilson, M., Yang, Y., Ye, S.Q., Yu, W.: Multiple-laboratory comparison of microarray platforms. Nat. Methods 2(5), 345–350 (2005)

    Article  CAS  PubMed  Google Scholar 

  14. Kleinbaum, D.G., Klein, M.: Logistic Regression, 2nd edn. Springer, Heidelberg (2005)

    Google Scholar 

  15. Lorena, A.C., Costa, I.G., de Souto, M.C.P.: On the complexity of gene expression classification data sets. In: Proc. of the 8th International Conference on Hybrid Intelligent Systems, pp. 825–830. IEEE Computer Society Press, Los Alamitos (2008)

    Google Scholar 

  16. Lottaz, C., Kostka, D., Markowetz, F., Spang, R.: Computational diagnostics with gene expression profiles. Methods Mol. Biol. 453, 281–296 (2008)

    Article  CAS  PubMed  Google Scholar 

  17. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI/ICMC 1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)

    Google Scholar 

  18. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    Google Scholar 

  19. Monti, S., et al.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn 52, 91–118 (2003)

    Article  Google Scholar 

  20. Okun, O., Priisalu, H.: Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors. Artificial Intelligence in Medicine 45(2-3), 151–162 (2009)

    Article  PubMed  Google Scholar 

  21. Quackenbush, J.: Computational analysis of cDNA microarray data. Nature Reviews 6(2), 418–428 (2001)

    Article  Google Scholar 

  22. Ramaswamy, S., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Rosemblatt, F.: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, New York (1962)

    Google Scholar 

  24. Slonim, D.: From patterns to pathways: gene expression data analysis comes of age. Nature Genetics 32, 502–508 (2002)

    Article  CAS  PubMed  Google Scholar 

  25. Smith, F.: Pattern classifier design by linear programming. IEEE Transactions on Computers 17(4), 367–372 (1968)

    Article  Google Scholar 

  26. Sokal, R., Rohlf, F.: Biometry. W. H. Freeman and Company, New York (1995)

    Google Scholar 

  27. Spang, R.: Diagnostic signatures from microarrays: a bioinformatics concept for personalized medicine. BIOSILICO 1(2), 64–68 (2003)

    Article  CAS  Google Scholar 

  28. Statnikov, A., Aliferis, C.F., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2005)

    Article  CAS  PubMed  Google Scholar 

  29. van’t Veer, L.J., Bernards, R.: Enabling personalized cancer medicine through analysis of gene-expression patterns. Nature 452(7187), 564–570 (2008)

    Article  Google Scholar 

  30. Vapnik, V.N.: The nature of Statistical learning theory. Springer, New York (1995)

    Book  Google Scholar 

  31. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  32. Yeang, C.H., et al.: Molecular classification of multiple tumor types. In: Proc. 9th Int. Conf. on Intelligent Systems in Molecular Biology, vol. 1, pp. 316–322 (2001)

    Google Scholar 

  33. Zucknick, M., Richardson, S., Stronach, E.: Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Statist. Appl. in Genetics and Molec. Biol. 7(1), 1–31 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Costa, I.G., Lorena, A.C., Peres, L.R.M.P.y., de Souto, M.C.P. (2009). Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets. In: Guimarães, K.S., Panchenko, A., Przytycka, T.M. (eds) Advances in Bioinformatics and Computational Biology. BSB 2009. Lecture Notes in Computer Science(), vol 5676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03223-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03223-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03222-6

  • Online ISBN: 978-3-642-03223-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics