Skip to main content

Classification of Gene Expression Data with Genetic Programming

  • Chapter

Part of the book series: Genetic Programming Series ((GPEM,volume 6))

Abstract

This paper summarizes the use of a genetic programming (GP) system to develop classification rules for gene expression data that hold promise for the development of new molecular diagnostics. This work focuses on discovering simple, accurate rules that diagnose diseases based on changes of gene expression profiles within a diseased cell. GP is shown to be a useful technique for discovering classification rules in a supervised learning mode where the biological genotype is paired with a biological phenotype such as a disease state. In the process of developing these rules, it is necessary to devise new techniques for establishing fitness and interpreting the results of evolutionary runs because of the large number of independent variables and the comparatively small number of samples. These techniques are described and issues of overfitting caused by small sample sizes and the behavior of the GP system when variables are missing from the samples are discussed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Reference

  • Bojarczuk, C. C, Lopes, H. S., and Freitas, A. A. (2001). Data mining with constrained-syntax genetic programming: applications to medical data sets. Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP-2001)

    Google Scholar 

  • Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C, Furey, T. S., Manuel Ares, J. & Haussler, D. (1999). Support vector machine classification of microarray gene expression data. University of Santa Cruz Technical Report. UCSC-CRL-99–09 http://www.cse.ucsc.edu/research/compbio/genex/genex.ps/research/compbio/genex/genex.ps.

  • Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C, Furey, T. S., Manuel Ares, J. & Haussler, D. (1999). Supplemental data for “Knowledge-based analysis of microarray gene expression data by using support vector machines”, available at http://www.cse.ucsc.edu/research/compbio/genex/.

  • Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C., Furey, T. S., Manuel Ares, J. & Haussler, D. (2000). Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. (USA) 97: 262–267

    Article  Google Scholar 

  • Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998). Supplemental data for “Cluster analysis and display of genome-wide expression patterns”, Proc. Nat. Acad. Sci. (USA) 95: 14863–14868, available at http://rana.stanford.edu/clustering/clustering.

    Article  Google Scholar 

  • Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns, Proc. Nat. Acad. Sci. (USA) 95: 14863–14868.

    Article  Google Scholar 

  • Gerhold, D., et al. (1999). DNA chips: Promising Toys have become Powerful Tools. Trends Biochem Sci. ; 24(5): 168–73

    Article  Google Scholar 

  • Khan, J. et al. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7: 673–679

    Article  Google Scholar 

  • Khan, J. et al. (2001). Supplementary information for Javed Khan, et. al, Nature Medicine; 7(6):673–679, http://www.nhgri.nih.gov/DIR/Microarray/Supplement/.

    Article  Google Scholar 

  • Linden, D. and Altshuler, E. (1999). Evolving Wire Antennas using Genetic Algorithm. Proceedings of the First NASA/DoD Workshop on Evolvable Hardware, 225–232, IEEE Computer Society, Los Alamitos, CA.

    Book  Google Scholar 

  • Luke, S. and Panait, L. (2002). Is the Perfect the Enemy of the Good? In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 820–828, Morgan Kauffman, San Francisco, CA.

    Google Scholar 

  • McKay. B. et al. (1995). Using a tree structured genetic algorithm to perform symbolic regression. In First International Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications, A. M. S. Zalzala (Ed. ); GALESIA, volume 414, pages 487–492, Sheffield UK, 12–14, September. IEEE.

    Google Scholar 

  • McPhee, N. F. and Hopper, N. J. (1999). Analysis of Genetic Diversity through Population History. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1112–1120, Morgan Kauffman, San Francisco, CA.

    Google Scholar 

  • MYGD. Munich Information center for Protein Sequences (MIPS) yeast genome database, http://www.mips.biochem.mpg.de/proj/yeast/proj/yeast.

  • Raidl, G. R. (1998). A Hybrid GP Approach for Numerically Robust Symbolic Regression. In Genetic Programming 1998: Proceedings of the Third Annual Conference, J. R. Koza, et al (Eds. ), pp. 323–28. University of Wisconsin, Madison. San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Rao, C. R. (1964). The Use and Interpretation of Principal Component Analysis in Applied Research, Sankya, Series A: 26: 329–358

    MATH  Google Scholar 

  • Tan, K. C, Tay, A., Lee, T. H., and Heng, C. M. (2002). Mining multiple comprehensible classification rules using genetic programming. In Proceedings of the 2002 Congress on Evolutionary Computation CEC, 1302–1307.

    Google Scholar 

  • Teller, A. and Veloso, M. (1995). PADO: Learning Tree Structured Algorithms for Orchestration into an Object Recognition System. Technical Report CMU-CS-95–101, Carnegie Mellon University, Dept. of Computer Science.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Driscoll, J.A., Worzel, B., MacLean, D. (2003). Classification of Gene Expression Data with Genetic Programming. In: Riolo, R., Worzel, B. (eds) Genetic Programming Theory and Practice. Genetic Programming Series, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-8983-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-8983-3_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-4747-7

  • Online ISBN: 978-1-4419-8983-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics