Skip to main content

Protein Motif Prediction by Grammatical Inference

  • Conference paper
Grammatical Inference: Algorithms and Applications (ICGI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4201))

Included in the following conference series:

Abstract

The rapid growth of protein sequence databases is exceeding the capacity of biochemically and structurally characterizing new proteins. Therefore, it is very important the development of tools to locate, within protein sequences, those subsequences with an associated function or specific feature. In our work, we propose a method to predict one of those functional motifs (coiled coil), related with protein interaction. Our approach uses even linear languages inference to obtain a transductor which will be used to label unknown sequences. The experiments carried out show that our method outperforms the results of previous approaches.

Work supported by the CICYT TIC2000-1153 and the Generalitat Valenciana GV06/068.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Editorial. The fundamental role of pattern recognition for gene-expresion/micro– array data in bioinformatics. Pattern Recognition 38, 2226–2228 (2005)

    Google Scholar 

  2. Liew, A.W.-C., Yan, H., Yang, M.: Pattern recognition techniques for the emerging field of bioinformatics: A review. Pattern Recognition 38, 2055–2073 (2005)

    Article  Google Scholar 

  3. Searls, D.B.: The language of genes. Nature 420, 211–217 (2002)

    Article  Google Scholar 

  4. Sakakibara, Y.: Grammatical inference in bioinformatics. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1051–1062 (2005)

    Article  Google Scholar 

  5. Yokomori, T., Kobayashi, S.: Learning local languages and their application to dna sequence analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(10), 1067–1079 (1998)

    Article  Google Scholar 

  6. Arikawa, S., Kuhara, S., Miyano, S., Shinohara, A., Shinohara, T.: A learning algorithm for elementary formal systems and its experiments on identification of transmembrane domains. In: Proceedings of the 25th Hawaii Intl. Conf. on System Sciences. IEEE, Los Alamitos (1992)

    Google Scholar 

  7. Lopez, D., Cano, A., Vazquez de Parga, M., Calles, B., Sempere, J.M., Perez, T., Ruiz, J., Garcia, P.: Detection of functional motifs in biosequences: A grammatical inference approach. In: Proceedings of the 5th Annual Spanish Bioinformatics Conference, pp. 72–75. Univ. Politécnica de Catalunya (2004) ISBN: 84-7653-863-4

    Google Scholar 

  8. López, D., Cano, A., de Parga, M.V., Calles, B., Sempere, J.M., Pérez, T., Campos, M., Ruiz, J., García, P.: Motif discovery by k-tss grammatical inference. In: Paliouras, G., de la Higuera, C., Oates, T., Van Zaanen, M. (eds.) IJCAI-2005 Workshop on Grammatical Inference Applications: Successes and Future Challenges. Working Notes (2005)

    Google Scholar 

  9. Brazma, A., Johansen, I., Vilo, J., Ukkonen, E.: Pattern discovery in biosequences. In: Honavar, V.G., Slutzki, G. (eds.) ICGI 1998. LNCS (LNAI), vol. 1433, pp. 257–270. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  10. Arimura, H., Wataki, A., Fujino, R., Arikawa, S.: A fast algorithm for discovery optimal string patterns in large databases. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 247–261. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  11. Peris, P., López, D., Campos, M., Sempere, J.M.: Gene-finding by grammatical inference (submitted manuscript)

    Google Scholar 

  12. Skehel, J.J., Wiley, D.C.: Coiled coils in both intracellular vesicle and viral membrane fusion. Cell 95, 871–874 (1998)

    Article  Google Scholar 

  13. Chan, D.C., Kim, P.S.: Hiv entry and its inhibition. Cell 93, 681–684 (1998)

    Article  Google Scholar 

  14. Wolf, E., Kim, P.S., Berger, B.: Multicoil: a program for predicting two- and three-stranded coiled coils. Protein Science 6, 1179–1189 (1997)

    Article  Google Scholar 

  15. Lupas, A., Van Dyke, M., Stock, J.: Predicting coiled coild from protein sequences. Science 252, 1162–1164 (1991)

    Article  Google Scholar 

  16. Berger, B., Wilson, D.B., Wolf, E., Tonchev, T., Milla, M., Kim, P.S.: Predicting coiled coils by use of pairwise residue correlation. Proc. Natl. Acad. Sci. 92, 8259–8263 (1995)

    Article  Google Scholar 

  17. Mathé, C., Sagot, M.F., Schiex, T., Rouzé, P.: Current methods of gene prediction, their strengths and weakenesses. Nucleic Acid Research 30(19), 4103–4117 (2002)

    Article  Google Scholar 

  18. Singh, M., Berger, B., Kim, P.S.: Learncoil-vmf: Computational evidence for coiled-coil-like motifs in many viral membrane fusion proteins. J. Mol. Biol. 290, 1031–1041 (1999)

    Article  Google Scholar 

  19. Singh, M., Berger, B., Kim, P.S., Berger, J.M., Cochran, A.G.: Computational learning reveals coiled coil-like motifs in histidine kinase linker domains. Proc. Natl. Acad. Sci. 95, 2738–2743 (1998)

    Article  Google Scholar 

  20. Yokomori, T., Ishida, N., Kobayashi, S.: Learning local languages and its application to protein α-chain identification. In: Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences, pp. 113–122. IEEE, Los Alamitos (1994)

    Chapter  Google Scholar 

  21. Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages and Computation. Addison-Wesley Publishing Company, Reading (1979)

    MATH  Google Scholar 

  22. Sempere, J.M., García, P.: A characterization of even linear languages and its application to the learning problem. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS (LNAI), vol. 862, pp. 38–44. Springer, Heidelberg (1994)

    Google Scholar 

  23. Berstel, J.: Transductions and context-free languages. Teubner Studienbücher (1979)

    Google Scholar 

  24. Delorenzi, M., Speed, T.: An hmm model for coiled-coil domains and a comparison with pssm-based predictions. Bioinformatics 18(4), 617–625 (2002)

    Article  Google Scholar 

  25. Campos, M., López, D.: Neural network approach to locate motifs in biosequences. In: Sanfeliu, A., Cortés, M.L. (eds.) CIARP 2005. LNCS, vol. 3773, pp. 214–221. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  26. Knuutila, T.: Inference of k-Testable Tree Languages. In: Advances in Structural and Syntactic Pattern Recognition: Proc. of the International Workshop, pp. 109–120. World Scientific, Singapore (1992)

    Chapter  Google Scholar 

  27. García, P.: Learning k-testable tree sets from positive data. Technical Report DSIC/II/46/1993, Departamento de Sistemas Informáticos y Computación. Universidad Politécnica de Valencia (1993), Available on: http://www.dsic.upv.es/users/tlcc/tlcc.html

  28. Swiss-Prot groups at SIB and at EBI. Uniprot database (swissprot and trembl), http://www.expasy.ch/sprot/

  29. Protein data bank, http://www.rcsb.org/pdb/Welcome.do

  30. Burset, M., Guigó, R.: Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996)

    Article  Google Scholar 

  31. Source Code NCOILS (1999), http://www.russell.embl.de/cgi-bin/coils-svr.pl

  32. PAIRCOIL implementation by the authors (1995), http://theory.lcs.mit.edu/bab/computing

  33. Sempere, J.M., García, P.: Learning locally testable even linear languages form positive data. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS (LNAI), vol. 2484, pp. 225–236. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peris, P., López, D., Campos, M., Sempere, J.M. (2006). Protein Motif Prediction by Grammatical Inference. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2006. Lecture Notes in Computer Science(), vol 4201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11872436_15

Download citation

  • DOI: https://doi.org/10.1007/11872436_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45264-5

  • Online ISBN: 978-3-540-45265-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics