Abstract
Proteins can be grouped into families according to their biological functions. This paper presents a system, named GAMBIT, which discovers motifs (particular sequences of amino acids) that occur very often in proteins of a given family but rarely occur in proteins of other families. These motifs are used to classify unknown proteins, that is, to predict their function by analyzing the primary structure. To search for motifs in proteins, we developed a GA with specially tailored operators for the problem. GAMBIT was compared with MEME, a web tool for finding motifs in the TransMembrane Protein DataBase. Motifs found by both methods were used to build a decision tree and classification rules, using, respectively, C4.5 and Prism algorithms. Motifs found by GAMBIT led to significantly better results, when compared with those found by MEME, using both classification algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abola, E.E., Sussman, J.L., Prilusky, J., Manning, N.O.: Protein data bank archives of three-dimensional macromolecular structures. Meth. Enzymol. 277, 556–571 (1997)
Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc. 2nd Int. Conf. on Intelligent Systems for Molecular Biology, pp. 28–36 (1994)
Bojarczuk, C.C., Lopes, H.S., Freitas, A.A.: A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets. Artif. Intell. Med. 30(1), 27–48 (2004)
Cendrowska, J.: Prism: an algorithm for inducing modular rules. Int. J. Man-Mach. Stud. 27, 349–370 (1987)
Hanke, J., Beckmann, G., Bork, P., Reich, J.G.: Self-organizing hierarchic networks for pattern recognition in protein sequence. Protein Sci. 5(1), 72–82 (1996)
Ikeda, M., Arai, M., Lao, D.M., Shimizu, T.: Transmembrane topology prediction methods: a re-assessment and improvement by a consensus method using a dataset of experimentally-characterized transmembrane topologies. In Silico Biol. 2, 19–33 (2002)
Kihara, D., Shimizu, T., Kanehisa, M.: Prediction of membrane proteins based on classification of transmembrane segments. Protein Eng. 11, 961–970 (1998)
Lehninger, A.L., Nelson, D.L., Cox, M.M.: Principles of Biochemistry, 2nd edn., pp. 134–137. Worth Publishers, New York (1998)
Manning, A.M., Brass, A., Goble, C.A., Keane, J.A.: Clustering techniques in biological sequence analysis. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 315–322. Springer, Heidelberg (1997)
Mathura, V.S., Schein, C.H., Braun, W.: Identifying property based sequence motifs in protein families and superfamilies: application to DNase-1 related endonucleases. Bioinformatics 19(11), 1381–1390 (2003)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Shimizu, T., Nakai, K.: Construction of a membrane protein database and an evaluation of several prediction methods of transmembrane segments. In: Proc. Genome Informatics Workshop, pp. 148–149 (1994)
Tsunoda, D.F., Lopes, H.S.: Automatic motif discovery in an enzyme database using a genetic algorithm-based approach (2005) (to appear)
Weinert, W.R., Lopes, H.S.: Neural networks for protein classification. Appl. Bioinformatics 3, 41–48 (2004)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tsunoda, D.F., Lopes, H.S., Freitas, A.A. (2005). An Evolutionary Approach for Motif Discovery and Transmembrane Protein Classification. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2005. Lecture Notes in Computer Science, vol 3449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32003-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-32003-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25396-9
Online ISBN: 978-3-540-32003-6
eBook Packages: Computer ScienceComputer Science (R0)