Skip to main content

A Clustering Genetic Algorithm for Genomic Data Mining

  • Chapter
Foundations of Computational Intelligence Volume 4

Part of the book series: Studies in Computational Intelligence ((SCI,volume 204))

Abstract

In this chapter we summarize our work toward developing clustering algorithms based on evolutionary computing and its application to genomic data mining. We have focused on the reconstruction of protein-protein functional interactions from genomic data. The discovery of functional modules of proteins is formulated as an optimization problem in which proteins with similar genomic attributes are grouped together. By considering gene co-occurrence, gene directionality and gene proximity, clustering genetic algorithms can predict functional associations accurately. Moreover, clustering genetic algorithms eliminate the need for the a priori specification of clustering parameters (e. g. number of clusters, initial position of centroids, etc.). Several methods for the reconstruction of protein interactions are described, including single-objective and multi-objective clustering genetic algorithms. We present our preliminary results on the reconstruction of bacterial operons and protein associations as specified by the DIP and ECOCYC databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bork, P., Dandeker, T., et al.: Predicting function: from genes to genomes and back. J. Mol. Biol. 283, 707–725 (1998)

    Article  Google Scholar 

  2. Coello-Coello, C., et al.: Evolutionary Algorithms for Solving Multi Objective Problems. Kluwer Academic Publishers, Dordrecht (2002)

    MATH  Google Scholar 

  3. Curteanu, S., Leon, F., Galea, D.: Alternatives for Multi-objective Optimization of a Polymerization Process. J. Applied Polymer Science (2006)

    Google Scholar 

  4. Deb, K., Reddy, A.R.: Reliable Classification of Two-Class Cancer Data Using Evolutionary Algorithms. Biosystems 72(1-2), 111–129 (2003)

    Article  Google Scholar 

  5. Duester, G., Campen, R., et al.: Nucleotide sequence of an Escherichia coli tRNA (Leu 1) operon and identification of the transcription promoter signal. Nucleic Acids Research 9, 2121–2139 (1981)

    Article  Google Scholar 

  6. Eisenberg, D., Marcotte, E., et al.: Protein function in the post-genomic era. Nature 405, 823–826 (2000)

    Article  Google Scholar 

  7. Facelli, K., de Souto, M.: Multi-objective Clustering Ensemble. In: Proceedings of the Sixth International Conference on Hybrid Intelligent Systems (2006)

    Google Scholar 

  8. Fernández, J.C., Vallejo, E.E., Morett, E.: Fuzzy-C means for inferring functional coupling of proteins from their phylogenetic profiles. In: Ashlock, D., et al. (eds.) IEEE Computational Intelligence in Bioinformatics and Computational biology. IEEE Computer Society Press, Los Alamitos (2006)

    Google Scholar 

  9. Goldberg, D.E., Korb, B., Deb, K.: Messy genetic algorithms: Motivation, analysis, and first results. In: Complex Systems, pp. 493–530 (1989)

    Google Scholar 

  10. Holland, J.H.: Adaptation in Natural and Artificial Systems. An introduction. University of Michigan Press (1975)

    Google Scholar 

  11. Horn, J., et al.: Fitness Sharing and Niching Methods Revisited. IEEE Transactions on Evolutionary Computation, 82–87 (1994)

    Google Scholar 

  12. Huynen, M., Snel, B., et al.: Predicting Protein Function by genomic context: quantitative evaluation and qualitative inferences. Genomic Research 10(8), 1204–1210 (2000)

    Article  Google Scholar 

  13. Jang, J.S.R., Sun, C.T., et al.: Neuro-fuzzy and soft-computing. Prentice Hall, Englewood Cliffs (1997)

    Google Scholar 

  14. Jensen, L.J., Kuhn, M., et al.: STRING 8–a global view on proteins and their functional interactions in 630 organisms. In: Ashlock, D., et al. (eds.) Pubmed (2009)

    Google Scholar 

  15. Lin, C., Wang, M.: Genetic-clustering algorithm for intrusion detection system. International Journal of Information and Computer Security 2(2), 218–234 (2008)

    Article  Google Scholar 

  16. Marcotte, E.: Computational genetics: finding protein function by nonhomology methods. Current Option in Structural Biology 10, 359–365 (2000)

    Article  MathSciNet  Google Scholar 

  17. Mandal, C., Gudi, R.D., Suraishkumar, G.K.: Multi-Objective Optimization in Aspergillus Niger Fermentation for Selective Product Enhancement. Bioprocess and Biosystems Eng. 28, 149–164 (2005)

    Article  Google Scholar 

  18. Marcotte, E., Xenarios, I., et al.: Localizing proteins in the cell from their phylogenetic profiles. In: PNAS, vol. 97, pp. 12115–12120 (2000)

    Google Scholar 

  19. von Mering, C., et al.: STRING 7–recent developments in the integration and prediction of protein interactions. Nuclear Acid Res., D358–D362 (January 2007)

    Google Scholar 

  20. Milano., M., et al.: A Clustering Genetic Algorithm for Actuator Optimization in Flow Control. In: Proceedings of the 2nd NASA/DoD workshop on Evolvable Hardware (2000)

    Google Scholar 

  21. Ciria, R., Abreu-Goodger, C., Morett, E., Merino, E.: GeConT: gene context analysis. Bioinformatics 20, 2307–2308 (2004)

    Article  Google Scholar 

  22. Pellegrini, M., Marcotte, E., et al.: Assigning protein function by comparative genome analysis: Protein phylogenetic profiles. In: PNAS, vol. 96, pp. 4285–4288 (1999)

    Google Scholar 

  23. Sali, A.: Funtional links between proteins. Nature 402, 23–26 (1999)

    Article  Google Scholar 

  24. Sareni, B., Laurent, K.: Fitness Sharing and Niching Methods Revisited. IEEE Transactions on Evolutionary Computation, 97–108 (1998)

    Google Scholar 

  25. Sun, J., Xu, J., et al.: Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics 21, 3409–3415 (2005)

    Article  Google Scholar 

  26. Vert, J.F.: A tree kernel to analyze phylogenetic profiles. Bioinformatics 18, S276–S284 (2002)

    Google Scholar 

  27. Marcotte, E., Pellegrini, M., et al.: A combined algorithm for genomewide prediction of protein function. Nature 402, 83–86 (1999)

    Article  Google Scholar 

  28. Karp, P.D., Keseler, I.M., et al.: Multidimensional annotation of the Escherichia coli K-12 genome. Nucleic Acids Research (2007)

    Google Scholar 

  29. Salwinski, L., Miller, C.S., et al.: The Database of Interacting Proteins: update. NAR 32(Database issue), D449–D451 (2004)

    Google Scholar 

  30. Falkenaeur, E.: Genetic algorithms and grouping problems. Wiley, Chichester (c1998)

    Google Scholar 

  31. Sammon, J.: A Nonlinear Mapping for Data Structure Analysis. IEEE Transactions on Computers c18, 401–409 (1969)

    Article  Google Scholar 

  32. Someren, E.P., et al.: Multi-Criterion Optimization for Genetic Network Modeling. Signal Processing 83, 763–775 (2003)

    Article  MATH  Google Scholar 

  33. Tapia, J.J., Vallejo, E.E.: A Clustering Genetic Algorithm for Inferring Protein-Protein Functional Interactions from Phylogenetic Profiles. In: 2008 IEEE World Congress on Computational Intelligence (2008)

    Google Scholar 

  34. Tatusov, R.L., Natale, D.A., et al.: The COG database: new developments in phylogenetic classification of protein from complete genomes. Nucletic Acids Research 29(1), 22–28 (2001)

    Article  Google Scholar 

  35. Tatusov, R.L., Koonin, E.V., et al.: A genomic perspective on protein families. Science 278, 631–637 (1997)

    Article  Google Scholar 

  36. Tatusov, R.L., Fedorova, N.D., et al.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41–54 (2003)

    Article  Google Scholar 

  37. Watanabe, R.L.A., Morett, E., Vallejo, E.E.: Inferring modules of functionally interacting proteins using the Bond Energy Algorithm. BMC Bioinformatics 9, 285 (2008)

    Article  Google Scholar 

  38. Wren, J.: The emerging in silico scientist how text-based bioinformatics is bridging biology and artificial intelligence. IEEE Engineering in Medicine and Biology Magazine, 87–93 (2004)

    Google Scholar 

  39. Wu, J., Kasif, S., et al.: Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19, 1524–1530 (2003)

    Article  Google Scholar 

  40. Wu, F.X.: Genetic weighted k-means algorithm for clustering large-scale gene expression data. BMC Bioinformatics 28(suppl. 6), S12 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Tapia, J.J., Morett, E., Vallejo, E.E. (2009). A Clustering Genetic Algorithm for Genomic Data Mining. In: Abraham, A., Hassanien, AE., de Carvalho, A.P.d.L.F. (eds) Foundations of Computational Intelligence Volume 4. Studies in Computational Intelligence, vol 204. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01088-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01088-0_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01087-3

  • Online ISBN: 978-3-642-01088-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics