Abstract
Promoters are short regulatory DNA sequences located upstream of a gene. Structural analysis of promoter sequences is important for successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species, but there are many exceptions which makes the structural analysis of promoters a complex problem. Grammar rules can be used for describing the structure of promoter sequences; however, derivation of such rules is not trivial. In this paper, stochastic L-grammar rules are derived automatically from known drosophila and vertebrate promoter and non-promoter sequences using genetic programming. The fitness of grammar rules is evaluated using a machine learning technique, called Support Vector Machine (SVM). SVM is trained on the known promoter sequences to obtain a discriminating function which serves as a means of evaluating a candidate grammar (a set of rules) by determining the percentage of generated sequences that are classified correctly. The combination of SVM and grammar rule inference can mitigate the lack of structural insight in machine learning approaches such as SVM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bajic, V.B., Choudhary, V., Hock, C.K.: Content analysis of the core promoter region of human genes. Silico Biol. 4, 109–125 (2004)
Werner, T.: The state of the art of mammalian promoter recognition. Briefings in Bioinformatics 4(1), 22–30 (2003)
Monteiro, M.I., de Souto, M.C.P., Gonçalves, L.M.G., Agnez-Lima, L.F.: Machine Learning Techniques for Predicting Bacillus subtilis Promoters. In: Setubal, J.C., Verjovski-Almeida, S. (eds.) BSB 2005. LNCS (LNBI), vol. 3594, pp. 77–84. Springer, Heidelberg (2005)
Ranawana, R., Palade, V.: A neural network based multiclassifier system for gene identification in DNA sequences. J. of Neural Computing Applications 14, 122–131 (2005)
Florquin, K., Saeys, Y., Degroeve, S., Rouzé, P., Van de Peer, Y.: Large-scale structural analysis of the core promoter in mammalian and plant genomes. Nucleic Acids Res. 33(13), 4255–4264 (2005)
Ohler, U., Liao, G.C., Niemann, H., Rubin, G.M.: Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3 (2002) RESEARCH0087
Lindenmayer, A.: Mathematical models for cellular interactions in development. Journal of Theoretical Biology 18, 280–315 (1968)
Unold, O.: Grammar-Based Classifier System for Recognition of Promoter Regions. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4431, pp. 798–805. Springer, Heidelberg (2007)
Koza, J.R.: Discovery of Rewrite Rules in Lindenmayer Systems and State Transition Rules in Cellular Automata via Genetic Programming. In: Symp. on Pattern Formation (SPF 1993), Claremont, CA (1993)
Marcus, S.: Linguistic structures and generative devices in molecular genetics. Cahiers. Ling. Theor. Appl. 1, 77–104 (1974)
Jiménez-Montaño, M.A.: On the Syntactic Structure of Protein Sequences and the Concept of Grammar Complexity. Bull. Math. Biol. 46, 641–659 (1984)
Infante-Lopez, G., de Rijke, M.: Alternative approaches for generating bodies of grammar rules. In: Proc. of 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, 21-26 July, pp. 454–461 (2004)
O’Neill, M., Brabazon, A., Adley, C.: The Automatic Generation of Programs for Classification Problems with Grammatical Swarm. In: Proc. of the Congress on Evolutionary Computation CEC 2004, Portland, OR, USA, June 2004, pp. 104–110 (2004)
Denise, A., Ponty, Y., Termier, M.: Random Generation of structured genomic sequences. In: Proc. of 7th Annual Int. Conf. on Research in Computational Molecular Biology (RECOMB 2003), Berlin, Germany, 10-13 April (2003)
Grate, L., Herbster, M., Hughey, R., Haussler, D.: RNA modelling using Gibbs sampling and stochastic context-free grammars. In: Proc. of the Second Int. Conf. on Intelligent Systems for Molecular Biology, vol. 2, pp. 138–146. AAAI/MIT Press (1994)
Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjoelander, K., Underwood, R., Haussler, D.: Stochastic context-free grammars for tRNA modelling. Nucleic Acids Res. 25, 5112–5120 (1994)
Fernau, H.: Parallel Grammars: A Phenomenology. Grammars 6(1), 25–87 (2003)
Prusinkiewicz, P., Lindenmayer, A.: The Algorithmic Beauty of Plants. Springer, New York (1990)
Searls, D.B.: The computational linguistics of biological sequences. In: Hunter, L. (ed.) Artificial Intelligence and Molecular Biology, pp. 47–120. AAAI/MIT Press (1993)
Yokomori, T., Kobayashi, S.: Learning local languages and their application to DNA sequence analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence 10(20), 1067–1079 (1998)
Mihalache, V., Salomaa, A.: Lindenmayer and DNA: Watson-Crick D0L Systems. Current Trends in Theoretical Computer Science, 740–751 (2001)
McGowan, J.F.: Nanometer Scale Lindenmayer Systems. In: Proc. of SPIE, vol. 4807 (2002)
Gheorghe, M., Mitrana, V.: A formal language-based approach in biology. Comparative and Functional Genomics 5, 91–94 (2004)
Prusinkiewicz, P., Hanan, J.: Lindenmayer Systems, Fractals, and Plants. Lecture Notes in Biomathematics. Springer, Heidelberg (1989)
Abramson, G., Cerdeira, H.A., Bruschi, C.: Fractal properties of DNA walks. Biosystems 49(1), 63–70 (1999)
Vapnik, V.: Statistical Learning Theory. Wiley-Interscience, New York (1998)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Chichester (2001)
Berkeley Drosophila Genome Project. Drosophila promoter dataset, http://www.fruitfly.org/seq_tools/datasets/Drosophila/promoter/
Berkeley Drosophila Genome Project. Human promoter dataset, http://www.fruitfly.org/seq_tools/datasets/Human/promoter/
SVMlight, http://svmlight.joachims.org/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Damaševičius, R. (2008). Structural Analysis of Promoter Sequences Using Grammar Inference and Support Vector Machine. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2008. Lecture Notes in Computer Science(), vol 5177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85563-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-85563-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85562-0
Online ISBN: 978-3-540-85563-7
eBook Packages: Computer ScienceComputer Science (R0)