Abstract
We extend stochastic context-free grammars such that the probability of applying a production can depend on the length of the subword that is generated from the application and show that existing algorithms for training and determining the most probable parse tree can easily be adapted to the extended model without losses in performance. Furthermore we show that the extended model is suited to improve the quality of predictions of RNA secondary structures.
The extended model may also be applied to other fields where SCFGs are used like natural language processing. Additionally some interesting questions in the field of formal languages arise from it.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boyle, J., Robillard, G.T., Kim, S.: Sequential folding of transfer RNA. a nuclear magnetic resonance study of successively longer tRNA fragments with a common 5’ end. J. Mol. Biol. 139, 601–625 (1980)
Chi, T., Geman, S.: Estimation of probabilistic context-free grammars. Computational Linguistics 24(2), 299–305 (1998)
Dowell, R.D., Eddy, S.R.: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5, 71 (2004)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)
Furbach, F.: Earley parsing for length dependent grammars. Bachelor thesis, TU Kaiserslautern (2009)
Harrison, M.A.: Introduction to Formal Language Theory. Addison-Wesley, Reading (1978)
Knudsen, B., Hein, J.: RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15, 446–454 (1999)
Meyer, I., Miklos, I.: Co-transcriptional folding is encoded within RNA genes. BMC Molecular Biology 5(1), 10 (2004)
Nebel, M.E.: On a statistical filter for RNA secondary structures. Technical report, Frankfurter Informatik-Berichte (May 2002)
Nussinov, R., Pieczenik, G., Griggs, R., Kleitmann, D.J.: Algorithms for loop matchings. SIAM Journal of Applied Mathematics 35, 68–82 (1978)
Prescher, D.: A tutorial on the expectation-maximization algorithm including maximum-likelihood estimation and em training of probabilistic context-free grammars (2003), http://staff.science.uva.nl/~prescher/papers/bib/2003em.prescher.pdf
Sprinzl, M., Vassilenko, K.S., Emmerich, J., Bauer, F.: Compilation of tRNA sequences and sequences of tRNA genes (December 20, 1999), http://www.uni-bayreuth.de/departments/biochemie/trna/
Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2), 165–201 (1995)
Viennot, G., de Chaumont, M.: Enumeration of RNA Secondary Structures by Complexity. In: Mathematics in Biology and Medicine: Proceedings of an International Conference Held in Bari, Italy, July 18-22, 1983 (1985)
Weinberg, F.: Position-and-length-dependent context-free grammars. In: Theorietag Automaten und Formale Sprachen (2009)
Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9, 133–148 (1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Weinberg, F., Nebel, M.E. (2010). Extending Stochastic Context-Free Grammars for an Application in Bioinformatics. In: Dediu, AH., Fernau, H., Martín-Vide, C. (eds) Language and Automata Theory and Applications. LATA 2010. Lecture Notes in Computer Science, vol 6031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13089-2_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-13089-2_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13088-5
Online ISBN: 978-3-642-13089-2
eBook Packages: Computer ScienceComputer Science (R0)