Skip to main content

Evolving Regular Expressions for GeneChip Probe Performance Prediction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5199))

Abstract

Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI’s GEO database to indicated the quality of individual HG-U133A probes. Low concordance indicates a poor probe. Regular expressions can be data mined by a Backus-Naur form (BNF) context-free grammar using strongly typed genetic programming written in gawk and using egrep. The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Thomas B.: Evolutionary Algorithms in Theory and Practice. OUP (1996)

    Google Scholar 

  2. Barrett, T., et al.: NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Research 35, D760–D765 (2007)

    Article  Google Scholar 

  3. Beyer, H.-G.: The Theory of Evolution Strategies. Springer, Heidelberg (2001)

    Book  Google Scholar 

  4. Brameier, M., Krings, A., MacCallum, R.M.: NucPred predicting nuclear localization of proteins. Bioinformatics 23(9), 1159–1160 (2007)

    Article  Google Scholar 

  5. Brameier, M., Wiufp, C.: Ab initio identification of human microRNAs based on structure motifs. BMC Bioinformatics 8, 478 (2007)

    Article  Google Scholar 

  6. Cetinkaya, A.: Regular expression generation through grammatical evolution. In: Yu, T. (ed.) GECCO-2007 workshop program, pp. 2643–2646. ACM Press, New York (2007)

    Google Scholar 

  7. Handstad, T., Hestnes, A.J.H., Saetrom, P.: Motif kernel generated by GP improves remote homology and fold detection. BMC Bioinformatics 8(23)

    Google Scholar 

  8. Koza, J.R.: Genetic Programming. MIT press, Cambridge (1992)

    MATH  Google Scholar 

  9. Langdon, W.B.: Evolving GeneChip correlation predictors on parallel graphics hardware. In: WCCI, Hong Kong, June 1-6, 2008, pp. 4152–4157. IEEE, Los Alamitos (2008)

    Google Scholar 

  10. Langdon, W.B., Barrett, S.J.: GP in data mining for drug discovery. In: Ghosh, A., et al. (eds.) Evolutionary Computing in Data Mining, pp. 211–235 (2004)

    Google Scholar 

  11. Langdon, W.B., da Silva Camargo, R., Harrison, A.P.: Spatial defects in 5896 HG-U133A GeneChips. In: Dopazo, J., et al. (eds.) CAMDA 2007 (2007)

    Google Scholar 

  12. Langdon, W.B., Harrison, A.P.: A grammar based strongly typed genetic programming system for finding regular expression which predict affymetrix DNA probe performance. Technical report, CES-483, University of Essex, UK (2008)

    Google Scholar 

  13. Langdon, W.B., Upton, G.J.G., da Silva Camargo, R., Harrison, A.P.: A survey of spatial defects in Homo Sapiens Affymetrix GeneChips (submitted)

    Google Scholar 

  14. Langdon, W.B.: Genetic Programming and Data Structures. Kluwer, Dordrecht (1998)

    Book  MATH  Google Scholar 

  15. Langdon, W.B., Banzhaf, W.: Repeated sequences in linear genetic programming genomes. Complex Systems 15(4), 285–306 (2005)

    MATH  MathSciNet  Google Scholar 

  16. Langdon, W.B., Buxton, B.F.: Evolving receiver operating characteristics for data fusion. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tetamanzi, A.G.B., Langdon, W.B. (eds.) EuroGP 2001. LNCS, vol. 2038, pp. 87–96. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  17. McKay, R.I., Hoang, T.H., Essam, D.L., Nguyen, X.H.: Developmental evaluation in GP. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ekárt, A. (eds.) EuroGP 2006. LNCS, vol. 3905, pp. 280–289. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Montana, D.J.: Strongly typed GP. Evolutionary Computation 3(2), 199–230

    Google Scholar 

  19. Naef, F., Wijnen, H., Magnasco, M.: Reply to comment on solving the riddle of the bright mismatches. Physical Review E 73(6), 063902 (2006)

    Article  Google Scholar 

  20. Nikolaev, N.I., Slavov, V.: Concepts of inductive genetic programming. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 49–60. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  21. O’Neill, M., Ryan, C.: Grammatical evolution. IEEE TEC 5(4), 349–358 (2001)

    Google Scholar 

  22. Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming (With contributions by J. R. Koza) (2008), http://www.gp-field-guide.org.uk

  23. Radcliff, N.J.: Genetic set recombination. In: FOGA 2, pp. 203–219. Morgan Kaufmann, San Francisco

    Google Scholar 

  24. Ross, B.J.: The evaluation of a stochastic regular motif language for protein sequences. In: Spector, L., et al. (eds.) GECCO 2001, pp. 120–128 (2001)

    Google Scholar 

  25. Upton, G.J., Langdon, W.B., Harrison, A.P.: Incorrect measurement of gene expression by microarrays (submitted)

    Google Scholar 

  26. Whigham, P.A.: Search bias, language bias, and genetic programming. In: Koza, J.R., et al. (eds.) Genetic Programming 1996, pp. 230–237. MIT Press, Cambridge (1996)

    Google Scholar 

  27. Whigham, P.A., Crapper, P.F.: Time series modelling using GP: In rainfall-runoff models. In: Spector, L., et al. (eds.) AiGP3, pp. 89–104. MIT Press, Cambridge (1999)

    Google Scholar 

  28. Wong, M.L., Leung, K.S.: Evolving recursive functions for the even-parity problem using genetic programming. In: AiGP 2, pp. 221–240. MIT Press, Cambridge (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Langdon, W.B., Harrison, A.P. (2008). Evolving Regular Expressions for GeneChip Probe Performance Prediction. In: Rudolph, G., Jansen, T., Beume, N., Lucas, S., Poloni, C. (eds) Parallel Problem Solving from Nature – PPSN X. PPSN 2008. Lecture Notes in Computer Science, vol 5199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87700-4_105

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87700-4_105

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87699-1

  • Online ISBN: 978-3-540-87700-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics