An Evolutionary Approach for Motif Discovery and Transmembrane Protein Classification

Tsunoda, Denise F.; Lopes, Heitor S.; Freitas, Alex A.

doi:10.1007/978-3-540-32003-6_11

Denise F. Tsunoda²⁷,
Heitor S. Lopes²⁷ &
Alex A. Freitas²⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3449))

Included in the following conference series:

Workshops on Applications of Evolutionary Computation

1528 Accesses
2 Citations

Abstract

Proteins can be grouped into families according to their biological functions. This paper presents a system, named GAMBIT, which discovers motifs (particular sequences of amino acids) that occur very often in proteins of a given family but rarely occur in proteins of other families. These motifs are used to classify unknown proteins, that is, to predict their function by analyzing the primary structure. To search for motifs in proteins, we developed a GA with specially tailored operators for the problem. GAMBIT was compared with MEME, a web tool for finding motifs in the TransMembrane Protein DataBase. Motifs found by both methods were used to build a decision tree and classification rules, using, respectively, C4.5 and Prism algorithms. Motifs found by GAMBIT led to significantly better results, when compared with those found by MEME, using both classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abola, E.E., Sussman, J.L., Prilusky, J., Manning, N.O.: Protein data bank archives of three-dimensional macromolecular structures. Meth. Enzymol. 277, 556–571 (1997)
Article Google Scholar
Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc. 2nd Int. Conf. on Intelligent Systems for Molecular Biology, pp. 28–36 (1994)
Google Scholar
Bojarczuk, C.C., Lopes, H.S., Freitas, A.A.: A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets. Artif. Intell. Med. 30(1), 27–48 (2004)
Article Google Scholar
Cendrowska, J.: Prism: an algorithm for inducing modular rules. Int. J. Man-Mach. Stud. 27, 349–370 (1987)
Article MATH Google Scholar
Hanke, J., Beckmann, G., Bork, P., Reich, J.G.: Self-organizing hierarchic networks for pattern recognition in protein sequence. Protein Sci. 5(1), 72–82 (1996)
Article Google Scholar
Ikeda, M., Arai, M., Lao, D.M., Shimizu, T.: Transmembrane topology prediction methods: a re-assessment and improvement by a consensus method using a dataset of experimentally-characterized transmembrane topologies. In Silico Biol. 2, 19–33 (2002)
Google Scholar
Kihara, D., Shimizu, T., Kanehisa, M.: Prediction of membrane proteins based on classification of transmembrane segments. Protein Eng. 11, 961–970 (1998)
Article Google Scholar
Lehninger, A.L., Nelson, D.L., Cox, M.M.: Principles of Biochemistry, 2nd edn., pp. 134–137. Worth Publishers, New York (1998)
Google Scholar
Manning, A.M., Brass, A., Goble, C.A., Keane, J.A.: Clustering techniques in biological sequence analysis. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 315–322. Springer, Heidelberg (1997)
Google Scholar
Mathura, V.S., Schein, C.H., Braun, W.: Identifying property based sequence motifs in protein families and superfamilies: application to DNase-1 related endonucleases. Bioinformatics 19(11), 1381–1390 (2003)
Article Google Scholar
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Google Scholar
Shimizu, T., Nakai, K.: Construction of a membrane protein database and an evaluation of several prediction methods of transmembrane segments. In: Proc. Genome Informatics Workshop, pp. 148–149 (1994)
Google Scholar
Tsunoda, D.F., Lopes, H.S.: Automatic motif discovery in an enzyme database using a genetic algorithm-based approach (2005) (to appear)
Google Scholar
Weinert, W.R., Lopes, H.S.: Neural networks for protein classification. Appl. Bioinformatics 3, 41–48 (2004)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Lab. Bioinformática, Centro Federal Educ. Tecnol. do Paraná, Curitiba, Brazil
Denise F. Tsunoda & Heitor S. Lopes
Computing Laboratory, University of Kent, Canterbury, UK
Alex A. Freitas

Authors

Denise F. Tsunoda
View author publications
You can also search for this author in PubMed Google Scholar
Heitor S. Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Alex A. Freitas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Johannes Gutenberg University, Mainz, Germany
Franz Rothlauf
Institute AIFB, University of Karlsruhe, 76128, Karlsruhe, Germany
Jürgen Branke
Dipartimento di Ingegneria dell’Informazione, Università di Parma,
Stefano Cagnoni
School of Mathematical and Computer Science, Heriot-Watt University, EH14 8AS, Edinburgh, UK
David Wolfe Corne
Institute of Computer Science, University of Bremen, 28359, Bremen, Germany
Rolf Drechsler
Honda Research Institute Europe GmbH, Offenbach/Main, Germany
Yaochu Jin
CISUC, Department of Informatics Engineering, University of Coimbra, Polo II of the University of Coimbra, 3030, Coimbra, Portugal
Penousal Machado
ICIS, Radboud University Nijmegen, The Netherlands
Elena Marchiori
Universidade de A Coruña, CP 15071, A Coruña, Spain
Juan Romero
School of Computing Sciences, University of East Anglia, UEA Norwich, NR4 7TJ, Norwich, UK
George D. Smith
Dipartimento di Automatica e Informatica, Politecnico di Torino, Italy
Giovanni Squillero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsunoda, D.F., Lopes, H.S., Freitas, A.A. (2005). An Evolutionary Approach for Motif Discovery and Transmembrane Protein Classification. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2005. Lecture Notes in Computer Science, vol 3449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32003-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-32003-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25396-9
Online ISBN: 978-3-540-32003-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics