Skip to main content

Structural Analysis of Promoter Sequences Using Grammar Inference and Support Vector Machine

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5177))

  • 1948 Accesses

Abstract

Promoters are short regulatory DNA sequences located upstream of a gene. Structural analysis of promoter sequences is important for successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species, but there are many exceptions which makes the structural analysis of promoters a complex problem. Grammar rules can be used for describing the structure of promoter sequences; however, derivation of such rules is not trivial. In this paper, stochastic L-grammar rules are derived automatically from known drosophila and vertebrate promoter and non-promoter sequences using genetic programming. The fitness of grammar rules is evaluated using a machine learning technique, called Support Vector Machine (SVM). SVM is trained on the known promoter sequences to obtain a discriminating function which serves as a means of evaluating a candidate grammar (a set of rules) by determining the percentage of generated sequences that are classified correctly. The combination of SVM and grammar rule inference can mitigate the lack of structural insight in machine learning approaches such as SVM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bajic, V.B., Choudhary, V., Hock, C.K.: Content analysis of the core promoter region of human genes. Silico Biol. 4, 109–125 (2004)

    Google Scholar 

  2. Werner, T.: The state of the art of mammalian promoter recognition. Briefings in Bioinformatics 4(1), 22–30 (2003)

    Article  Google Scholar 

  3. Monteiro, M.I., de Souto, M.C.P., Gonçalves, L.M.G., Agnez-Lima, L.F.: Machine Learning Techniques for Predicting Bacillus subtilis Promoters. In: Setubal, J.C., Verjovski-Almeida, S. (eds.) BSB 2005. LNCS (LNBI), vol. 3594, pp. 77–84. Springer, Heidelberg (2005)

    Google Scholar 

  4. Ranawana, R., Palade, V.: A neural network based multiclassifier system for gene identification in DNA sequences. J. of Neural Computing Applications 14, 122–131 (2005)

    Article  Google Scholar 

  5. Florquin, K., Saeys, Y., Degroeve, S., Rouzé, P., Van de Peer, Y.: Large-scale structural analysis of the core promoter in mammalian and plant genomes. Nucleic Acids Res. 33(13), 4255–4264 (2005)

    Article  Google Scholar 

  6. Ohler, U., Liao, G.C., Niemann, H., Rubin, G.M.: Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3 (2002) RESEARCH0087

    Google Scholar 

  7. Lindenmayer, A.: Mathematical models for cellular interactions in development. Journal of Theoretical Biology 18, 280–315 (1968)

    Article  Google Scholar 

  8. Unold, O.: Grammar-Based Classifier System for Recognition of Promoter Regions. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4431, pp. 798–805. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Koza, J.R.: Discovery of Rewrite Rules in Lindenmayer Systems and State Transition Rules in Cellular Automata via Genetic Programming. In: Symp. on Pattern Formation (SPF 1993), Claremont, CA (1993)

    Google Scholar 

  10. Marcus, S.: Linguistic structures and generative devices in molecular genetics. Cahiers. Ling. Theor. Appl. 1, 77–104 (1974)

    Google Scholar 

  11. Jiménez-Montaño, M.A.: On the Syntactic Structure of Protein Sequences and the Concept of Grammar Complexity. Bull. Math. Biol. 46, 641–659 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  12. Infante-Lopez, G., de Rijke, M.: Alternative approaches for generating bodies of grammar rules. In: Proc. of 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, 21-26 July, pp. 454–461 (2004)

    Google Scholar 

  13. O’Neill, M., Brabazon, A., Adley, C.: The Automatic Generation of Programs for Classification Problems with Grammatical Swarm. In: Proc. of the Congress on Evolutionary Computation CEC 2004, Portland, OR, USA, June 2004, pp. 104–110 (2004)

    Google Scholar 

  14. Denise, A., Ponty, Y., Termier, M.: Random Generation of structured genomic sequences. In: Proc. of 7th Annual Int. Conf. on Research in Computational Molecular Biology (RECOMB 2003), Berlin, Germany, 10-13 April (2003)

    Google Scholar 

  15. Grate, L., Herbster, M., Hughey, R., Haussler, D.: RNA modelling using Gibbs sampling and stochastic context-free grammars. In: Proc. of the Second Int. Conf. on Intelligent Systems for Molecular Biology, vol. 2, pp. 138–146. AAAI/MIT Press (1994)

    Google Scholar 

  16. Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjoelander, K., Underwood, R., Haussler, D.: Stochastic context-free grammars for tRNA modelling. Nucleic Acids Res. 25, 5112–5120 (1994)

    Article  Google Scholar 

  17. Fernau, H.: Parallel Grammars: A Phenomenology. Grammars 6(1), 25–87 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  18. Prusinkiewicz, P., Lindenmayer, A.: The Algorithmic Beauty of Plants. Springer, New York (1990)

    MATH  Google Scholar 

  19. Searls, D.B.: The computational linguistics of biological sequences. In: Hunter, L. (ed.) Artificial Intelligence and Molecular Biology, pp. 47–120. AAAI/MIT Press (1993)

    Google Scholar 

  20. Yokomori, T., Kobayashi, S.: Learning local languages and their application to DNA sequence analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence 10(20), 1067–1079 (1998)

    Article  Google Scholar 

  21. Mihalache, V., Salomaa, A.: Lindenmayer and DNA: Watson-Crick D0L Systems. Current Trends in Theoretical Computer Science, 740–751 (2001)

    Google Scholar 

  22. McGowan, J.F.: Nanometer Scale Lindenmayer Systems. In: Proc. of SPIE, vol. 4807 (2002)

    Google Scholar 

  23. Gheorghe, M., Mitrana, V.: A formal language-based approach in biology. Comparative and Functional Genomics 5, 91–94 (2004)

    Article  Google Scholar 

  24. Prusinkiewicz, P., Hanan, J.: Lindenmayer Systems, Fractals, and Plants. Lecture Notes in Biomathematics. Springer, Heidelberg (1989)

    MATH  Google Scholar 

  25. Abramson, G., Cerdeira, H.A., Bruschi, C.: Fractal properties of DNA walks. Biosystems 49(1), 63–70 (1999)

    Article  Google Scholar 

  26. Vapnik, V.: Statistical Learning Theory. Wiley-Interscience, New York (1998)

    MATH  Google Scholar 

  27. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Chichester (2001)

    MATH  Google Scholar 

  28. Berkeley Drosophila Genome Project. Drosophila promoter dataset, http://www.fruitfly.org/seq_tools/datasets/Drosophila/promoter/

  29. Berkeley Drosophila Genome Project. Human promoter dataset, http://www.fruitfly.org/seq_tools/datasets/Human/promoter/

  30. SVMlight, http://svmlight.joachims.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ignac Lovrek Robert J. Howlett Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Damaševičius, R. (2008). Structural Analysis of Promoter Sequences Using Grammar Inference and Support Vector Machine. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2008. Lecture Notes in Computer Science(), vol 5177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85563-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85563-7_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85562-0

  • Online ISBN: 978-3-540-85563-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics