Stochastic k-Tree Grammar and Its Application in Biomolecular Structure Modeling

  • Liang Ding
  • Abdul Samad
  • Xingran Xue
  • Xiuzhen Huang
  • Russell L. Malmberg
  • Liming Cai
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8370)


Stochastic context-free grammar (SCFG) has been successful in modeling biomolecular structures, typically RNA secondary structure, for statistical analysis and structure prediction. Context-free grammar rules specify parallel and nested co-occurren-ces of terminals, and thus are ideal for modeling nucleotide canonical base pairs that constitute the RNA secondary structure. Stochastic grammars have been sought, which may adequately model biomolecular tertiary structures that are beyond context-free. Some of the existing linguistic grammars, developed mostly for natural language processing, appear insufficient to account for crossing relationships incurred by distant interactions of bio-residues, while others are overly powerful and cause excessive computational complexity. This paper introduces a novel stochastic grammar, called stochastic k-tree grammar (SkTG), for the analysis of context-sensitive languages. With the new grammar rules, co-occurrences of distant terminals are characterized and recursively organized into k-tree graphs. The new grammar offers a viable approach to modeling context-sensitive interactions between bioresidues because such relationships are often constrained by k-trees, for small values of k, as demonstrated by earlier investigations. In this paper it is shown, for the first time, that probabilistic analysis of k-trees over strings are computable in polynomial time n O(k). Hence, SkTG permits not only modeling of biomolecular tertiary structures but also efficient analysis and prediction of such structures.


stochastic grammar context-sensitive language k-tree dynamic programming biomolecule RNA tertiary structure 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Achawanantakun, R., Takyar, S., Sun, Y.: Grammar string: A novel ncRNA secondary structure representation. lifesciences society org, pp. 2–13 (2010)Google Scholar
  2. 2.
    Rozenknop, A.: Gibbsian context-free grammar for parsing. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 49–56. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Arnborg, S., Proskurowski, A.: Linear time algorithms for np-hard problems restricted to partial k-trees. Discrete Applied Mathematics 23(1), 11–24 (1989)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Chiang, D., Joshi, A.K., Searls, D.B.: Grammatical representations of macromolecular structure. Journal of Computational Biology 13(5), 1077–1100 (2006)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Dill, K.A., Lucas, A., Hockenmaier, J., Huang, L., Chiang, D., Josh, A.K.: Computational linguistics: A new tool for exploring biopolymer structures and statistical mechanics. Polymer 48, 4289–4300 (2007)CrossRefGoogle Scholar
  6. 6.
    Ding, L., Samad, A., Li, G., Robinson, R., Xue, X., Malmberg, R., Cai, L.: Finding maximum spanning k-trees on backbone graphs in polynomial time (2013) (manuscript)Google Scholar
  7. 7.
    Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer (1999)Google Scholar
  8. 8.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)Google Scholar
  9. 9.
    Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley (2007)Google Scholar
  10. 10.
    Huang, Z., Mohebbi, M., Malmberg, R., Cai, L.: RNAv: Non-coding RNA secondary structure variation search via graph homomorphism. In: Proceedings of Computational Systems Bioinformatics Conference (CSB 2010), vol. 9, pp. 56–69 (2010)Google Scholar
  11. 11.
    Huang, Z., Wu, Y., Robertson, J., Feng, L., Malmberg, R., Cai, L.: Fast and accurate search for non-coding RNA pseudoknot structures in genomes. Bioinforamtics 24(20), 2281–2287 (2008)CrossRefGoogle Scholar
  12. 12.
    Thiim, J.F.I.M., Mardia, M., Ferkinghoff-Borg, K., Hamelryck, J.,, T.: A probabilistic model of RNA conformational space. PLoS Comput. Biol. 5(6) (2009)Google Scholar
  13. 13.
    Joshi, A.: How much context-sensitivity is necessary for characterizing structural descriptions. In: Dowty, D., Karttunen, L., Zwicky, A. (eds.) Natural Language Processing: Theoretical, Computational, and Psychological Perspectives, pp. 206–250. Cambridge University Press, NY (1985)Google Scholar
  14. 14.
    Joshi, A., Vijay-Shanker, K., Weir, D.: The convergence of mildly context-sensitive grammar formalisms. Issues in Natural Language Processing, pp. 31–81. MIT Press, Cambridge (1991)Google Scholar
  15. 15.
    Jurafsky, D., Wooters, C., Segal, J., Stolcke, A., Fosler, E., Tajchaman, G., Morgan, N.: Using a stochastic context-free grammar as a language model for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 189–192 (1995)Google Scholar
  16. 16.
    Klein, D., Manning, C.: Accurate unlexicalized parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)Google Scholar
  17. 17.
    Knudsen, B., Hein, J.: Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 31, 3423–3428 (2003)CrossRefGoogle Scholar
  18. 18.
    Lari, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4, 35–56 (1990)CrossRefGoogle Scholar
  19. 19.
    Martin, D., Sigal, R., Weyuker, E.J.: Computability, complexity, and languages: Fundamentals of theoretical computer science, 2nd edn. Morgan Kaufmann (1994)Google Scholar
  20. 20.
    Murzin, A.G., Brenner, S., Hubbard, T., Chothia, C.: Scop: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247(4), 536–540 (1995)Google Scholar
  21. 21.
    Nawrocki, E.P., Kolbe, D.L., Eddy, S.R.: Infernal 1.0: Inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009)CrossRefGoogle Scholar
  22. 22.
    Noller, H.F.: Structure of ribosomal RNA. Annual Review of Biochemistry 53, 119–162 (1984)CrossRefGoogle Scholar
  23. 23.
    Patil, H.P.: On the structure of k-trees. Journal of Combinatorics, Information and System Sciences 11(2-4), 57–64 (1986)zbMATHMathSciNetGoogle Scholar
  24. 24.
    Rivas, E., Lang, R., Eddy, S.R.: A range of complex probabilistic models for RNA secondary structure prediction that include the nearest neighbor model and more. RNA 18, 193–212 (2012)CrossRefGoogle Scholar
  25. 25.
    Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjolander, K., Underwood, R.C., Haussler, D.: Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research 22, 5112–5120 (1994)CrossRefGoogle Scholar
  26. 26.
    Salomaa, A.: Jewels of Formal Language Theory. Computer Science Press (1981)Google Scholar
  27. 27.
    Sánchez, I.A., Benedi, J.M., Linares, D.: Performance of a scfg-based language model with training data sets of increasing size. In: Proceedings of Conference on Pattern Recognition and Image Analysis, pp. 586–594 (2005)Google Scholar
  28. 28.
    Searls, D.B.: The computational linguistics of biological sequences. Artificial Intelligence and Molecular Biology, pp. 47–120 (1993)Google Scholar
  29. 29.
    Searls, D.B.: Molecules, languages and automata. In: Sempere, J.M., García, P. (eds.) ICGI 2010. LNCS, vol. 6339, pp. 5–10. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  30. 30.
    Sergio Caracciolo, S., Masbaum, G., Sokal, A., Sportiello, A.: A randomized polynomial-time algorithm for the spanning hypertree problem on 3-uniform hypergraphs. CoRR abs/0812.3593 (2008)Google Scholar
  31. 31.
    Song, Y., Liu, C., Huang, X., Malmberg, R., Xu, Y., Cai, L.: Efficient parameterized algorithms for biopolymer structure-sequence alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(4), 423–431 (2006)CrossRefGoogle Scholar
  32. 32.
    Srebro, N.: Maximum likelihood bounded tree-width Markov networks. Artificial Intelligence 143(2003), 123–138 (2003)CrossRefzbMATHMathSciNetGoogle Scholar
  33. 33.
    Uemura, Y., Hasegawa, A., Kobayashi, S., Yokomori, T.: Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science 210, 277–303 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
  34. 34.
    Vijay-Shanker, K., Weir, D.: The equivalence of four extensions of context-free grammars. Mathematical Systems Theory 27(6), 511–546 (1994)CrossRefzbMATHMathSciNetGoogle Scholar
  35. 35.
    Waters, C.J., MacDonald, B.A.: Efficient word-graph parsing and search with a stochastic context-free grammar. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 311–318 (1997)Google Scholar
  36. 36.
    Xu, J., Berger, B.: Fast and accurate algorithms for protein side-chain packing. Journal of the ACM 53(4), 533–557 (2006)CrossRefMathSciNetGoogle Scholar
  37. 37.
    Xu, Y., Liu, Z., Cai, L., Xu, D.: Protein structure prediction by protein threading. In: Computational Methods for Protein Structure Prediction and Modeling, pp. 389–430. Springer I&II (2006)Google Scholar
  38. 38.
    Progress, Y.Z.: challenges in protein structure prediction. Current Opinions in Structural Biology 18(3), 342–348 (2008)CrossRefGoogle Scholar
  39. 39.
    Weinberg, Z., Ruzzo, L.: Faster genome annotation of non-coding RNA families without loss of accuracy. In: Proceedings of Conference on Research in Computational Molecular Biology (RECOMB 2004), pp. 243–251 (2004)Google Scholar
  40. 40.
    Zimand, M.: The complexity of the optimal spanning hypertree problem. Technical Report, University of Rochester. Computer Science Department (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Liang Ding
    • 1
  • Abdul Samad
    • 1
  • Xingran Xue
    • 1
  • Xiuzhen Huang
    • 4
  • Russell L. Malmberg
    • 2
    • 3
  • Liming Cai
    • 1
    • 2
  1. 1.Department of Computer ScienceUniversity of GeorgiaUSA
  2. 2.Institute of BioinformaticsUniversity of GeorgiaUSA
  3. 3.Department of Plant BiologyUniversity of GeorgiaUSA
  4. 4.Dept. of Computer ScienceArkansas State UniversityJonesboroUSA

Personalised recommendations