Abstract
In this chapter we summarize our work toward developing clustering algorithms based on evolutionary computing and its application to genomic data mining. We have focused on the reconstruction of protein-protein functional interactions from genomic data. The discovery of functional modules of proteins is formulated as an optimization problem in which proteins with similar genomic attributes are grouped together. By considering gene co-occurrence, gene directionality and gene proximity, clustering genetic algorithms can predict functional associations accurately. Moreover, clustering genetic algorithms eliminate the need for the a priori specification of clustering parameters (e. g. number of clusters, initial position of centroids, etc.). Several methods for the reconstruction of protein interactions are described, including single-objective and multi-objective clustering genetic algorithms. We present our preliminary results on the reconstruction of bacterial operons and protein associations as specified by the DIP and ECOCYC databases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bork, P., Dandeker, T., et al.: Predicting function: from genes to genomes and back. J. Mol. Biol. 283, 707–725 (1998)
Coello-Coello, C., et al.: Evolutionary Algorithms for Solving Multi Objective Problems. Kluwer Academic Publishers, Dordrecht (2002)
Curteanu, S., Leon, F., Galea, D.: Alternatives for Multi-objective Optimization of a Polymerization Process. J. Applied Polymer Science (2006)
Deb, K., Reddy, A.R.: Reliable Classification of Two-Class Cancer Data Using Evolutionary Algorithms. Biosystems 72(1-2), 111–129 (2003)
Duester, G., Campen, R., et al.: Nucleotide sequence of an Escherichia coli tRNA (Leu 1) operon and identification of the transcription promoter signal. Nucleic Acids Research 9, 2121–2139 (1981)
Eisenberg, D., Marcotte, E., et al.: Protein function in the post-genomic era. Nature 405, 823–826 (2000)
Facelli, K., de Souto, M.: Multi-objective Clustering Ensemble. In: Proceedings of the Sixth International Conference on Hybrid Intelligent Systems (2006)
Fernández, J.C., Vallejo, E.E., Morett, E.: Fuzzy-C means for inferring functional coupling of proteins from their phylogenetic profiles. In: Ashlock, D., et al. (eds.) IEEE Computational Intelligence in Bioinformatics and Computational biology. IEEE Computer Society Press, Los Alamitos (2006)
Goldberg, D.E., Korb, B., Deb, K.: Messy genetic algorithms: Motivation, analysis, and first results. In: Complex Systems, pp. 493–530 (1989)
Holland, J.H.: Adaptation in Natural and Artificial Systems. An introduction. University of Michigan Press (1975)
Horn, J., et al.: Fitness Sharing and Niching Methods Revisited. IEEE Transactions on Evolutionary Computation, 82–87 (1994)
Huynen, M., Snel, B., et al.: Predicting Protein Function by genomic context: quantitative evaluation and qualitative inferences. Genomic Research 10(8), 1204–1210 (2000)
Jang, J.S.R., Sun, C.T., et al.: Neuro-fuzzy and soft-computing. Prentice Hall, Englewood Cliffs (1997)
Jensen, L.J., Kuhn, M., et al.: STRING 8–a global view on proteins and their functional interactions in 630 organisms. In: Ashlock, D., et al. (eds.) Pubmed (2009)
Lin, C., Wang, M.: Genetic-clustering algorithm for intrusion detection system. International Journal of Information and Computer Security 2(2), 218–234 (2008)
Marcotte, E.: Computational genetics: finding protein function by nonhomology methods. Current Option in Structural Biology 10, 359–365 (2000)
Mandal, C., Gudi, R.D., Suraishkumar, G.K.: Multi-Objective Optimization in Aspergillus Niger Fermentation for Selective Product Enhancement. Bioprocess and Biosystems Eng. 28, 149–164 (2005)
Marcotte, E., Xenarios, I., et al.: Localizing proteins in the cell from their phylogenetic profiles. In: PNAS, vol. 97, pp. 12115–12120 (2000)
von Mering, C., et al.: STRING 7–recent developments in the integration and prediction of protein interactions. Nuclear Acid Res., D358–D362 (January 2007)
Milano., M., et al.: A Clustering Genetic Algorithm for Actuator Optimization in Flow Control. In: Proceedings of the 2nd NASA/DoD workshop on Evolvable Hardware (2000)
Ciria, R., Abreu-Goodger, C., Morett, E., Merino, E.: GeConT: gene context analysis. Bioinformatics 20, 2307–2308 (2004)
Pellegrini, M., Marcotte, E., et al.: Assigning protein function by comparative genome analysis: Protein phylogenetic profiles. In: PNAS, vol. 96, pp. 4285–4288 (1999)
Sali, A.: Funtional links between proteins. Nature 402, 23–26 (1999)
Sareni, B., Laurent, K.: Fitness Sharing and Niching Methods Revisited. IEEE Transactions on Evolutionary Computation, 97–108 (1998)
Sun, J., Xu, J., et al.: Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics 21, 3409–3415 (2005)
Vert, J.F.: A tree kernel to analyze phylogenetic profiles. Bioinformatics 18, S276–S284 (2002)
Marcotte, E., Pellegrini, M., et al.: A combined algorithm for genomewide prediction of protein function. Nature 402, 83–86 (1999)
Karp, P.D., Keseler, I.M., et al.: Multidimensional annotation of the Escherichia coli K-12 genome. Nucleic Acids Research (2007)
Salwinski, L., Miller, C.S., et al.: The Database of Interacting Proteins: update. NAR 32(Database issue), D449–D451 (2004)
Falkenaeur, E.: Genetic algorithms and grouping problems. Wiley, Chichester (c1998)
Sammon, J.: A Nonlinear Mapping for Data Structure Analysis. IEEE Transactions on Computers c18, 401–409 (1969)
Someren, E.P., et al.: Multi-Criterion Optimization for Genetic Network Modeling. Signal Processing 83, 763–775 (2003)
Tapia, J.J., Vallejo, E.E.: A Clustering Genetic Algorithm for Inferring Protein-Protein Functional Interactions from Phylogenetic Profiles. In: 2008 IEEE World Congress on Computational Intelligence (2008)
Tatusov, R.L., Natale, D.A., et al.: The COG database: new developments in phylogenetic classification of protein from complete genomes. Nucletic Acids Research 29(1), 22–28 (2001)
Tatusov, R.L., Koonin, E.V., et al.: A genomic perspective on protein families. Science 278, 631–637 (1997)
Tatusov, R.L., Fedorova, N.D., et al.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41–54 (2003)
Watanabe, R.L.A., Morett, E., Vallejo, E.E.: Inferring modules of functionally interacting proteins using the Bond Energy Algorithm. BMC Bioinformatics 9, 285 (2008)
Wren, J.: The emerging in silico scientist how text-based bioinformatics is bridging biology and artificial intelligence. IEEE Engineering in Medicine and Biology Magazine, 87–93 (2004)
Wu, J., Kasif, S., et al.: Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19, 1524–1530 (2003)
Wu, F.X.: Genetic weighted k-means algorithm for clustering large-scale gene expression data. BMC Bioinformatics 28(suppl. 6), S12 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Tapia, J.J., Morett, E., Vallejo, E.E. (2009). A Clustering Genetic Algorithm for Genomic Data Mining. In: Abraham, A., Hassanien, AE., de Carvalho, A.P.d.L.F. (eds) Foundations of Computational Intelligence Volume 4. Studies in Computational Intelligence, vol 204. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01088-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-01088-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01087-3
Online ISBN: 978-3-642-01088-0
eBook Packages: EngineeringEngineering (R0)