Skip to main content

An Evolutionary Approach for Motif Discovery and Transmembrane Protein Classification

  • Conference paper
Applications of Evolutionary Computing (EvoWorkshops 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3449))

Included in the following conference series:

Abstract

Proteins can be grouped into families according to their biological functions. This paper presents a system, named GAMBIT, which discovers motifs (particular sequences of amino acids) that occur very often in proteins of a given family but rarely occur in proteins of other families. These motifs are used to classify unknown proteins, that is, to predict their function by analyzing the primary structure. To search for motifs in proteins, we developed a GA with specially tailored operators for the problem. GAMBIT was compared with MEME, a web tool for finding motifs in the TransMembrane Protein DataBase. Motifs found by both methods were used to build a decision tree and classification rules, using, respectively, C4.5 and Prism algorithms. Motifs found by GAMBIT led to significantly better results, when compared with those found by MEME, using both classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abola, E.E., Sussman, J.L., Prilusky, J., Manning, N.O.: Protein data bank archives of three-dimensional macromolecular structures. Meth. Enzymol. 277, 556–571 (1997)

    Article  Google Scholar 

  2. Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc. 2nd Int. Conf. on Intelligent Systems for Molecular Biology, pp. 28–36 (1994)

    Google Scholar 

  3. Bojarczuk, C.C., Lopes, H.S., Freitas, A.A.: A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets. Artif. Intell. Med. 30(1), 27–48 (2004)

    Article  Google Scholar 

  4. Cendrowska, J.: Prism: an algorithm for inducing modular rules. Int. J. Man-Mach. Stud. 27, 349–370 (1987)

    Article  MATH  Google Scholar 

  5. Hanke, J., Beckmann, G., Bork, P., Reich, J.G.: Self-organizing hierarchic networks for pattern recognition in protein sequence. Protein Sci. 5(1), 72–82 (1996)

    Article  Google Scholar 

  6. Ikeda, M., Arai, M., Lao, D.M., Shimizu, T.: Transmembrane topology prediction methods: a re-assessment and improvement by a consensus method using a dataset of experimentally-characterized transmembrane topologies. In Silico Biol. 2, 19–33 (2002)

    Google Scholar 

  7. Kihara, D., Shimizu, T., Kanehisa, M.: Prediction of membrane proteins based on classification of transmembrane segments. Protein Eng. 11, 961–970 (1998)

    Article  Google Scholar 

  8. Lehninger, A.L., Nelson, D.L., Cox, M.M.: Principles of Biochemistry, 2nd edn., pp. 134–137. Worth Publishers, New York (1998)

    Google Scholar 

  9. Manning, A.M., Brass, A., Goble, C.A., Keane, J.A.: Clustering techniques in biological sequence analysis. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 315–322. Springer, Heidelberg (1997)

    Google Scholar 

  10. Mathura, V.S., Schein, C.H., Braun, W.: Identifying property based sequence motifs in protein families and superfamilies: application to DNase-1 related endonucleases. Bioinformatics 19(11), 1381–1390 (2003)

    Article  Google Scholar 

  11. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)

    Google Scholar 

  12. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)

    Google Scholar 

  13. Shimizu, T., Nakai, K.: Construction of a membrane protein database and an evaluation of several prediction methods of transmembrane segments. In: Proc. Genome Informatics Workshop, pp. 148–149 (1994)

    Google Scholar 

  14. Tsunoda, D.F., Lopes, H.S.: Automatic motif discovery in an enzyme database using a genetic algorithm-based approach (2005) (to appear)

    Google Scholar 

  15. Weinert, W.R., Lopes, H.S.: Neural networks for protein classification. Appl. Bioinformatics 3, 41–48 (2004)

    Article  Google Scholar 

  16. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tsunoda, D.F., Lopes, H.S., Freitas, A.A. (2005). An Evolutionary Approach for Motif Discovery and Transmembrane Protein Classification. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2005. Lecture Notes in Computer Science, vol 3449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32003-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-32003-6_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25396-9

  • Online ISBN: 978-3-540-32003-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics