Advertisement

Extending Stochastic Context-Free Grammars for an Application in Bioinformatics

  • Frank Weinberg
  • Markus E. Nebel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6031)

Abstract

We extend stochastic context-free grammars such that the probability of applying a production can depend on the length of the subword that is generated from the application and show that existing algorithms for training and determining the most probable parse tree can easily be adapted to the extended model without losses in performance. Furthermore we show that the extended model is suited to improve the quality of predictions of RNA secondary structures.

The extended model may also be applied to other fields where SCFGs are used like natural language processing. Additionally some interesting questions in the field of formal languages arise from it.

Keywords

Secondary Structure Prediction Quality Derivation Tree Rule Probability Partial Derivation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Boyle, J., Robillard, G.T., Kim, S.: Sequential folding of transfer RNA. a nuclear magnetic resonance study of successively longer tRNA fragments with a common 5’ end. J. Mol. Biol. 139, 601–625 (1980)CrossRefGoogle Scholar
  2. 2.
    Chi, T., Geman, S.: Estimation of probabilistic context-free grammars. Computational Linguistics 24(2), 299–305 (1998)MathSciNetGoogle Scholar
  3. 3.
    Dowell, R.D., Eddy, S.R.: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5, 71 (2004)CrossRefGoogle Scholar
  4. 4.
    Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)zbMATHGoogle Scholar
  5. 5.
    Furbach, F.: Earley parsing for length dependent grammars. Bachelor thesis, TU Kaiserslautern (2009)Google Scholar
  6. 6.
    Harrison, M.A.: Introduction to Formal Language Theory. Addison-Wesley, Reading (1978)zbMATHGoogle Scholar
  7. 7.
    Knudsen, B., Hein, J.: RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15, 446–454 (1999)CrossRefGoogle Scholar
  8. 8.
    Meyer, I., Miklos, I.: Co-transcriptional folding is encoded within RNA genes. BMC Molecular Biology 5(1), 10 (2004)CrossRefGoogle Scholar
  9. 9.
    Nebel, M.E.: On a statistical filter for RNA secondary structures. Technical report, Frankfurter Informatik-Berichte (May 2002)Google Scholar
  10. 10.
    Nussinov, R., Pieczenik, G., Griggs, R., Kleitmann, D.J.: Algorithms for loop matchings. SIAM Journal of Applied Mathematics 35, 68–82 (1978)zbMATHCrossRefGoogle Scholar
  11. 11.
    Prescher, D.: A tutorial on the expectation-maximization algorithm including maximum-likelihood estimation and em training of probabilistic context-free grammars (2003), http://staff.science.uva.nl/~prescher/papers/bib/2003em.prescher.pdf
  12. 12.
    Sprinzl, M., Vassilenko, K.S., Emmerich, J., Bauer, F.: Compilation of tRNA sequences and sequences of tRNA genes (December 20, 1999), http://www.uni-bayreuth.de/departments/biochemie/trna/
  13. 13.
    Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2), 165–201 (1995)MathSciNetGoogle Scholar
  14. 14.
    Viennot, G., de Chaumont, M.: Enumeration of RNA Secondary Structures by Complexity. In: Mathematics in Biology and Medicine: Proceedings of an International Conference Held in Bari, Italy, July 18-22, 1983 (1985)Google Scholar
  15. 15.
    Weinberg, F.: Position-and-length-dependent context-free grammars. In: Theorietag Automaten und Formale Sprachen (2009)Google Scholar
  16. 16.
    Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9, 133–148 (1981)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Frank Weinberg
    • 1
  • Markus E. Nebel
    • 1
  1. 1.Department of Computer SciencesUniversity of KaiserslauternKaiserslauternGermany

Personalised recommendations