Abstract
The basic RNA Secondary Structure Prediction problem or Single Sequence Folding Problem (SSF) was solved thirty-five years ago by a now well-known \(O(n^3)\)-time dynamic programming method.
Recently three methodologies - Valiant, Four-Russians, and Sparsification - have been applied to speedup RNA Secondary Structure prediction.
In this paper we combine the previously independent speedups of Sparsification and Four-Russians.
The Sparsification method exploits two properties of the input: the number of subsequence Z with the endpoints belonging to the optimal folding set and the maximum number base-pairs L. These sparsity properties satisfy \(0 \le L \le n / 2\) and \(n \le Z \le n^2 / 2\), and the method reduces the algorithmic running time to O(LZ). In this paper, we first reformulate the SSF Four-Russians \(\varTheta (\frac{n^3}{\log ^2 n})\)-time algorithm, implied by Pinhas et al. [24], to utilize an on-demand lookup table. This formulation not only removes all extraneous computation and allows us to incorporate more realistic scoring schemes, but leads us to take advantage of the sparsity properties.
Our main result is a framework that combines the fastest Sparsification and fastest Four-Russians Methods. For SSF, this combined method has worst-case running time of \(O(\tilde{L}\tilde{Z})\), where \(\frac{{L}}{\log n} \le \tilde{L}\le min({L},\frac{n}{\log n})\) and \(\frac{{Z}}{\log n}\le \tilde{Z} \le min({Z},\frac{n^2}{\log n})\).
Through asymptotic analysis and empirical testing on the base-pair maximization variant, we show that this framework is able to achieve a speedup on every problem instance, that is asymptotically never worse, and empirically better than achieved by the minimum of the two methods alone.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Or close to optimal.
- 2.
Using some word tricks the dot product could be computed in O(1)-time.
References
Akutsu, T.: Approximation and exact algorithms for RNA secondary structure prediction and recognition of stochastic context-free languages. J. Comb. Optim. 3(2–3), 321–336 (1999)
Andronescu, M., Condon, A., Hoos, H.H., Mathews, D.H., Murphy, K.P.: Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 23(13), i19–i28 (2007)
Backofen, R., Tsur, D., Zakov, S., Ziv-Ukelson, M.: Sparse RNA folding: time and space efficient algorithms. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 249–262. Springer, Heidelberg (2009)
Backofen, R., Tsur, D., Zakov, S., Ziv-Ukelson, M.: Sparse RNA folding: time and space efficient algorithms. J. Discrete Algorithms 9(1), 12–31 (2011)
Chan, T.: Speeding up the Four Russians algorithm by about one more logarithmic factor. In: SODA, pp. 212–217 (2015)
Do, C.B., Woods, D.A., Batzoglou, S.: Contrafold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14), e90–e98 (2006)
Dowell, R., Eddy, S.: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinform. 5(1), 71 (2004)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Frid, Y., Gusfield, D.: A simple, practical and complete \(O(\frac{n^3}{ \log n})\)-time algorithm for RNA folding using the Four-Russians speedup. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 97–107. Springer, Heidelberg (2009)
Frid, Y., Gusfield, D.: A simple, practical and complete O(n\(^{\text{3 }}\)/log(n))-time algorithm for RNA folding using the [Four-Russians] speedup. Algorithms Mol. Biol. 5(1), 13 (2010)
Frid, Y., Gusfield, D.: A worst-case and practical speedup for the RNA co-folding problem using the Four-Russians idea. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 1–12. Springer, Heidelberg (2010)
Frid, Y., Gusfield, D.: Speedup of RNA pseudoknotted secondary structure recurrence computation with the Four-Russians method. In: Lin, G. (ed.) COCOA 2012. LNCS, vol. 7402, pp. 176–187. Springer, Heidelberg (2012)
Juan, V., Wilson, C.: RNA secondary structure prediction based on free energy and phylogenetic analysis. J. Mol. Biol. 289(4), 935–947 (1999)
Leontis, N.B., Westhof, E.: RNA 3D Structure Analysis and Prediction, vol. 27. Springer, Heidelberg (2012)
Markham, N.R., Zuker, M.: UNAFold. In: Keith, J.M. (ed.) Bioinformatics. Methods in Molecular Biology, vol. 453, pp. 3–31. Humana Press, New York (2008)
Mathews, D.H., Disney, M.D., Childs, J.L., Schroeder, S.J., Zuker, M., Turner, D.H.: Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl. Acad. Sci. U.S.A. 101(19), 7287–7292 (2004)
Mathews, D.H., Sabina, J., Zuker, M., Turner, D.H.: Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288(5), 911–940 (1999)
Mathews, D.H., Andre, T.C., Kim, J., Turner, D.H., Zuker, M.: An updated recursive algorithm for RNA secondary structure prediction with improved thermodynamic parameters. Mol. Model. Nucleic Acids 682, 246–257 (1998)
McCaskill, J.S.: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29(6–7), 1105–1119 (1990)
Møhl, M., Salari, R., Will, S., Backofen, R., Sahinalp, S.C.: Sparsification of RNA structure prediction including pseudoknots. Algorithms Mol. Biol. 5, 39 (2010)
Möhl, M., Salari, R., Will, S., Backofen, R., Sahinalp, S.C.: Sparsification of RNA structure prediction including pseudoknots. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 40–51. Springer, Heidelberg (2010)
Nussinov, R., Jacobson, A.B.: Fast algorithm for predicting the secondary structure of single-stranded RNA. PNAS 77(11), 6309–6313 (1980)
Nussinov, R., Pieczenik, G., Griggs, J.R., Kleitman, D.J.: Algorithms for loop matchings. SIAM J. Appl. Math. 35(1), 68–82 (1978)
Pinhas, T., Zakov, S., Tsur, D., Ziv-Ukelson, M.: Efficient edit distance with duplications and contractions. Algorithms Mol. Biol. 8, 27 (2013)
Reuter, J., Mathews, D.H.: RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinform. 11(1), 129 (2010)
Salari, R., Möhl, M., Will, S., Sahinalp, S.C., Backofen, R.: Time and space efficient RNA-RNA interaction prediction via sparse folding. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 473–490. Springer, Heidelberg (2010)
Sankoff, D., Kruskal, J.B., Mainville, S., Cedergreen, R.J.: Fast algorithms to determine RNA secondary structures containing multiple loops. In: Sankoff, D., Kruskal, J.B. (eds.) Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, pp. 93–120. Addison-Wesley, Reading (1983)
Tinoco, I., Borer, P.N., Dengler, B., Levine, M.D., Uhlenbec, O.C., Crothers, D.M., Gralla, J.: Improved estimation of secondary structure in ribonucleic-acid. Nat. New Biol. 246(150), 40–41 (1973)
Waterman, M.S., Smith, T.F.: RNA secondary structure: a complete mathematical analysis. Math. Biosci. 42, 257–266 (1978)
Wexler, Y., Zilberstein, C.: A study of accessible motifs and RNA folding complexity. J. Comput. Biol. 14(6), 856–872 (2007)
Williams, R.: Matrix-vector multiplication in sub-quadratic time: (some preprocessing required). In: Bansal, N., Pruhs, K., Stein, C. (eds.) SODA, pp. 995–1001. SIAM (2007)
Williams, R.: Faster all-pairs shortest paths via circuit complexity. In: Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31–June 03 2014, pp. 664–673 (2014)
Xia, T., SantaLucia, J., Burkard, M.E., Kierzek, R., Schroeder, S.J., Jiao, X., Cox, C., Turner, D.H.: Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with watson-crick base pairs. Biochemistry 37(42), 14719–14735 (1998)
Zakov, S., Tsur, D., Ziv-Ukelson, M.: Reducing the worst case running times of a family of RNA and CFG problems, using valiant’s approach. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 65–77. Springer, Heidelberg (2010)
Ziv-Ukelson, M., Gat-Viks, I., Wexler, Y., Shamir, R.: A faster algorithm for RNA co-folding. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 174–185. Springer, Heidelberg (2008)
Zuker, M.: The use of dynamic programming algorithms in RNA secondary structure prediction. In: Waterman, M.S. (ed.) Mathematical Methods for DNA Sequences, pp. 159–184. CRC Press Inc., Boca Raton (1989). Chapter 7
Zuker, M.: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31(13), 3406–3415 (2003)
Zuker, M., Sankoff, D.: RNA secondary structures and their prediction. Bull. Math. Biol. 46(4), 591–621 (1984)
Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1), 133–148 (1981)
Acknowledgement
We would like to sincerely thank Shay Zakov and Michal Ziv-Ukelson for their many helpful comments and suggestions. This research was partially supported by the IIS-1219278 grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Frid, Y., Gusfield, D. (2015). A Sparsified Four-Russian Algorithm for RNA Folding. In: Pop, M., Touzet, H. (eds) Algorithms in Bioinformatics. WABI 2015. Lecture Notes in Computer Science(), vol 9289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48221-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-662-48221-6_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48220-9
Online ISBN: 978-3-662-48221-6
eBook Packages: Computer ScienceComputer Science (R0)