Skip to main content

A Sparsified Four-Russian Algorithm for RNA Folding

  • Conference paper
  • First Online:
Algorithms in Bioinformatics (WABI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9289))

Included in the following conference series:

  • 1069 Accesses

Abstract

The basic RNA Secondary Structure Prediction problem or Single Sequence Folding Problem (SSF) was solved thirty-five years ago by a now well-known \(O(n^3)\)-time dynamic programming method.

Recently three methodologies - Valiant, Four-Russians, and Sparsification - have been applied to speedup RNA Secondary Structure prediction.

In this paper we combine the previously independent speedups of Sparsification and Four-Russians.

The Sparsification method exploits two properties of the input: the number of subsequence Z with the endpoints belonging to the optimal folding set and the maximum number base-pairs L. These sparsity properties satisfy \(0 \le L \le n / 2\) and \(n \le Z \le n^2 / 2\), and the method reduces the algorithmic running time to O(LZ). In this paper, we first reformulate the SSF Four-Russians \(\varTheta (\frac{n^3}{\log ^2 n})\)-time algorithm, implied by Pinhas et al. [24], to utilize an on-demand lookup table. This formulation not only removes all extraneous computation and allows us to incorporate more realistic scoring schemes, but leads us to take advantage of the sparsity properties.

Our main result is a framework that combines the fastest Sparsification and fastest Four-Russians Methods. For SSF, this combined method has worst-case running time of \(O(\tilde{L}\tilde{Z})\), where \(\frac{{L}}{\log n} \le \tilde{L}\le min({L},\frac{n}{\log n})\) and \(\frac{{Z}}{\log n}\le \tilde{Z} \le min({Z},\frac{n^2}{\log n})\).

Through asymptotic analysis and empirical testing on the base-pair maximization variant, we show that this framework is able to achieve a speedup on every problem instance, that is asymptotically never worse, and empirically better than achieved by the minimum of the two methods alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Or close to optimal.

  2. 2.

    Using some word tricks the dot product could be computed in O(1)-time.

References

  1. Akutsu, T.: Approximation and exact algorithms for RNA secondary structure prediction and recognition of stochastic context-free languages. J. Comb. Optim. 3(2–3), 321–336 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  2. Andronescu, M., Condon, A., Hoos, H.H., Mathews, D.H., Murphy, K.P.: Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 23(13), i19–i28 (2007)

    Article  Google Scholar 

  3. Backofen, R., Tsur, D., Zakov, S., Ziv-Ukelson, M.: Sparse RNA folding: time and space efficient algorithms. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 249–262. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Backofen, R., Tsur, D., Zakov, S., Ziv-Ukelson, M.: Sparse RNA folding: time and space efficient algorithms. J. Discrete Algorithms 9(1), 12–31 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chan, T.: Speeding up the Four Russians algorithm by about one more logarithmic factor. In: SODA, pp. 212–217 (2015)

    Google Scholar 

  6. Do, C.B., Woods, D.A., Batzoglou, S.: Contrafold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14), e90–e98 (2006)

    Article  Google Scholar 

  7. Dowell, R., Eddy, S.: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinform. 5(1), 71 (2004)

    Article  Google Scholar 

  8. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  9. Frid, Y., Gusfield, D.: A simple, practical and complete \(O(\frac{n^3}{ \log n})\)-time algorithm for RNA folding using the Four-Russians speedup. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 97–107. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Frid, Y., Gusfield, D.: A simple, practical and complete O(n\(^{\text{3 }}\)/log(n))-time algorithm for RNA folding using the [Four-Russians] speedup. Algorithms Mol. Biol. 5(1), 13 (2010)

    Article  Google Scholar 

  11. Frid, Y., Gusfield, D.: A worst-case and practical speedup for the RNA co-folding problem using the Four-Russians idea. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 1–12. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Frid, Y., Gusfield, D.: Speedup of RNA pseudoknotted secondary structure recurrence computation with the Four-Russians method. In: Lin, G. (ed.) COCOA 2012. LNCS, vol. 7402, pp. 176–187. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Juan, V., Wilson, C.: RNA secondary structure prediction based on free energy and phylogenetic analysis. J. Mol. Biol. 289(4), 935–947 (1999)

    Article  Google Scholar 

  14. Leontis, N.B., Westhof, E.: RNA 3D Structure Analysis and Prediction, vol. 27. Springer, Heidelberg (2012)

    Google Scholar 

  15. Markham, N.R., Zuker, M.: UNAFold. In: Keith, J.M. (ed.) Bioinformatics. Methods in Molecular Biology, vol. 453, pp. 3–31. Humana Press, New York (2008)

    Chapter  Google Scholar 

  16. Mathews, D.H., Disney, M.D., Childs, J.L., Schroeder, S.J., Zuker, M., Turner, D.H.: Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl. Acad. Sci. U.S.A. 101(19), 7287–7292 (2004)

    Article  Google Scholar 

  17. Mathews, D.H., Sabina, J., Zuker, M., Turner, D.H.: Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288(5), 911–940 (1999)

    Article  Google Scholar 

  18. Mathews, D.H., Andre, T.C., Kim, J., Turner, D.H., Zuker, M.: An updated recursive algorithm for RNA secondary structure prediction with improved thermodynamic parameters. Mol. Model. Nucleic Acids 682, 246–257 (1998)

    Article  Google Scholar 

  19. McCaskill, J.S.: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29(6–7), 1105–1119 (1990)

    Article  Google Scholar 

  20. Møhl, M., Salari, R., Will, S., Backofen, R., Sahinalp, S.C.: Sparsification of RNA structure prediction including pseudoknots. Algorithms Mol. Biol. 5, 39 (2010)

    Article  Google Scholar 

  21. Möhl, M., Salari, R., Will, S., Backofen, R., Sahinalp, S.C.: Sparsification of RNA structure prediction including pseudoknots. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 40–51. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  22. Nussinov, R., Jacobson, A.B.: Fast algorithm for predicting the secondary structure of single-stranded RNA. PNAS 77(11), 6309–6313 (1980)

    Article  Google Scholar 

  23. Nussinov, R., Pieczenik, G., Griggs, J.R., Kleitman, D.J.: Algorithms for loop matchings. SIAM J. Appl. Math. 35(1), 68–82 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  24. Pinhas, T., Zakov, S., Tsur, D., Ziv-Ukelson, M.: Efficient edit distance with duplications and contractions. Algorithms Mol. Biol. 8, 27 (2013)

    Article  Google Scholar 

  25. Reuter, J., Mathews, D.H.: RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinform. 11(1), 129 (2010)

    Article  Google Scholar 

  26. Salari, R., Möhl, M., Will, S., Sahinalp, S.C., Backofen, R.: Time and space efficient RNA-RNA interaction prediction via sparse folding. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 473–490. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  27. Sankoff, D., Kruskal, J.B., Mainville, S., Cedergreen, R.J.: Fast algorithms to determine RNA secondary structures containing multiple loops. In: Sankoff, D., Kruskal, J.B. (eds.) Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, pp. 93–120. Addison-Wesley, Reading (1983)

    Google Scholar 

  28. Tinoco, I., Borer, P.N., Dengler, B., Levine, M.D., Uhlenbec, O.C., Crothers, D.M., Gralla, J.: Improved estimation of secondary structure in ribonucleic-acid. Nat. New Biol. 246(150), 40–41 (1973)

    Article  Google Scholar 

  29. Waterman, M.S., Smith, T.F.: RNA secondary structure: a complete mathematical analysis. Math. Biosci. 42, 257–266 (1978)

    Article  MATH  Google Scholar 

  30. Wexler, Y., Zilberstein, C.: A study of accessible motifs and RNA folding complexity. J. Comput. Biol. 14(6), 856–872 (2007)

    Article  MathSciNet  Google Scholar 

  31. Williams, R.: Matrix-vector multiplication in sub-quadratic time: (some preprocessing required). In: Bansal, N., Pruhs, K., Stein, C. (eds.) SODA, pp. 995–1001. SIAM (2007)

    Google Scholar 

  32. Williams, R.: Faster all-pairs shortest paths via circuit complexity. In: Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31–June 03 2014, pp. 664–673 (2014)

    Google Scholar 

  33. Xia, T., SantaLucia, J., Burkard, M.E., Kierzek, R., Schroeder, S.J., Jiao, X., Cox, C., Turner, D.H.: Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with watson-crick base pairs. Biochemistry 37(42), 14719–14735 (1998)

    Article  Google Scholar 

  34. Zakov, S., Tsur, D., Ziv-Ukelson, M.: Reducing the worst case running times of a family of RNA and CFG problems, using valiant’s approach. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 65–77. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  35. Ziv-Ukelson, M., Gat-Viks, I., Wexler, Y., Shamir, R.: A faster algorithm for RNA co-folding. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 174–185. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  36. Zuker, M.: The use of dynamic programming algorithms in RNA secondary structure prediction. In: Waterman, M.S. (ed.) Mathematical Methods for DNA Sequences, pp. 159–184. CRC Press Inc., Boca Raton (1989). Chapter 7

    Google Scholar 

  37. Zuker, M.: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31(13), 3406–3415 (2003)

    Article  Google Scholar 

  38. Zuker, M., Sankoff, D.: RNA secondary structures and their prediction. Bull. Math. Biol. 46(4), 591–621 (1984)

    Article  MATH  Google Scholar 

  39. Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1), 133–148 (1981)

    Article  Google Scholar 

Download references

Acknowledgement

We would like to sincerely thank Shay Zakov and Michal Ziv-Ukelson for their many helpful comments and suggestions. This research was partially supported by the IIS-1219278 grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yelena Frid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Frid, Y., Gusfield, D. (2015). A Sparsified Four-Russian Algorithm for RNA Folding. In: Pop, M., Touzet, H. (eds) Algorithms in Bioinformatics. WABI 2015. Lecture Notes in Computer Science(), vol 9289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48221-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48221-6_20

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48220-9

  • Online ISBN: 978-3-662-48221-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics