Skip to main content

Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers

  • Conference paper
  • First Online:
Book cover Algorithms in Bioinformatics (WABI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9289))

Included in the following conference series:

  • 1144 Accesses

Abstract

Current microarray technologies to determine RNA structure or measure protein-RNA interactions rely on single-stranded, unstructured RNA probes on a chip covering together all k-mers. Since space on the array is limited, the problem is to efficiently design a compact library of unstructured \(\ell \)-long RNA probes, where each k-mer is covered at least p times. Ray et al. designed such a library for specific values of k, \(\ell \) and p using ad-hoc rules. To our knowledge, there is no general method to date to solve this problem. Here, we address the problem of finding a minimum-size covering of all k-mers by \(\ell \)-long sequences with the desired properties for any value of k, \(\ell \) and p. As we prove that the problem is NP-hard, we give two solutions: the first is a greedy algorithm with a logarithmic approximation ratio; the second, a heuristic greedy approach based on random walks in de Bruijn graphs. The heuristic algorithm works well in practice and produces a library of unstructured RNA probes that is only \(\sim 1.1\)-times greater in size compared to the theoretical lower bound. We present results for typical values of k and probe lengths \(\ell \) and show that our algorithm generates a library that is significantly smaller than the library of Ray et al.; moreover, we show that our algorithm outperforms naive methods. Our approach can be generalized and extended to generate RNA or DNA oligo libraries with other desired properties. The software is freely available on curlcake.csail.mit.edu.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kudla, G., Granneman, S., Hahn, D., Beggs, J.D., Tollervey, D.: Cross-linking, ligation, and sequencing of hybrids reveals RNA-RNA interactions in yeast. Proc. Natl. Acad. Sci. 108, 10010–10015 (2011)

    Article  Google Scholar 

  2. Rinn, J.L., Ule, J.: Oming in on RNA-protein interactions. Genome Biol. 15, 401 (2014)

    Article  Google Scholar 

  3. Wan, Y., Kertesz, M., Spitale, R.C., Segal, E., Chang, H.Y.: Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12, 641–655 (2011)

    Article  Google Scholar 

  4. Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U., Segal, E.: The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007)

    Article  Google Scholar 

  5. Steffen, P., Voß, B., Rehmsmeier, M., Reeder, J., Giegerich, R.: RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics 22, 500–503 (2006)

    Article  Google Scholar 

  6. Kertesz, M., Wan, Y., Mazor, E., Rinn, J.L., Nutter, R.C., Chang, H.Y., Segal, E.: Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 (2010)

    Article  Google Scholar 

  7. Mandir, J.B., Lockett, M.R., Phillips, M.F., Allawi, H.T., Lyamichev, V.I., Smith, L.M.: Rapid determination of RNA accessible sites by surface plasmon resonance detection of hybridization to DNA arrays. Anal. Chem. 81, 8949–8956 (2009)

    Article  Google Scholar 

  8. Kierzek, E., Kierzek, R., Turner, D.H., Catrina, I.E.: Facilitating RNA structure prediction with microarrays. Biochemistry 45, 581–593 (2006)

    Article  Google Scholar 

  9. Kierzek, R., Turner, D.H., Kierzek, E.: Microarrays for identifying binding sites and probing structure of RNAs. Nucleic Acids Res. 43, 1–12 (2015)

    Article  Google Scholar 

  10. Gerstberger, S., Hafner, M., Tuschl, T.: A census of human RNA-binding proteins. Nat. Rev. Genet. 15, 829–845 (2014)

    Article  Google Scholar 

  11. König, J., Zarnack, K., Luscombe, N.M., Ule, J.: Protein-RNA interactions: new genomic technologies and perspectives. Nat. Rev. Genet. 13, 77–83 (2012)

    Article  Google Scholar 

  12. Fu, X.D., Ares Jr, M.: Context-dependent control of alternative splicing by RNA-binding proteins. Nat. Rev. Genet. 15, 689–701 (2014)

    Article  Google Scholar 

  13. Ray, D., Kazan, H., Chan, E.T., Castillo, L.P., Chaudhry, S., Talukder, S., Blencowe, B.J., Morris, Q., Hughes, T.R.: Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat. Biotechnol. 27, 667–670 (2009)

    Article  Google Scholar 

  14. Lambert, N., Robertson, A., Jangi, M., McGeary, S., Sharp, P.A., Burge, C.B.: RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. Mol. Cell 54, 887–900 (2014)

    Article  Google Scholar 

  15. Ray, D., Kazan, H., Cook, K.B., Weirauch, M.T., Najafabadi, H.S., Li, X., Gueroussov, S., Albu, M., Zheng, H., Yang, A., et al.: A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013)

    Article  Google Scholar 

  16. Stefl, R., Skrisovska, L., Allain, F.H.T.: RNA sequence-and shape-dependent recognition by proteins in the ribonucleoprotein particle. EMBO Rep. 6, 33–38 (2005)

    Article  Google Scholar 

  17. Berger, B., Peng, J., Singh, M.: Computational solutions for omics data. Nat. Rev. Genet. 14, 333–346 (2013)

    Article  Google Scholar 

  18. West, D.B., et al.: Introduction to Graph Theory, vol. 2. Prentice Hall, Upper Saddle River (2001)

    Google Scholar 

  19. Lempel, A.: On a homomorphism of the de Bruijn graph and its applications to the design of feedback shift registers. IEEE Trans. Comput. 100, 1204–1209 (1970)

    Article  MathSciNet  Google Scholar 

  20. Alhakim, A., Akinwande, M.: A recursive construction of nonbinary de Bruijn sequences. Des. Codes Crypt. 60, 155–169 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  21. Berger, M.F., Philippakis, A.A., Qureshi, A.M., He, F.S., Estep, P.W., Bulyk, M.L.: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006)

    Article  Google Scholar 

  22. Philippakis, A.A., Qureshi, A.M., Berger, M.F., Bulyk, M.L.: Design of compact, universal DNA microarrays for protein binding microarray experiments. J. Comput. Biol. 15, 655–665 (2008)

    Article  MathSciNet  Google Scholar 

  23. Orenstein, Y., Shamir, R.: Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers. Bioinformatics 29, i71–i79 (2013)

    Article  Google Scholar 

  24. Fordyce, P.M., Gerber, D., Tran, D., Zheng, J., Li, H., DeRisi, J.L., Quake, S.R.: De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis. Nat. Biotechnol. 28, 970–975 (2010)

    Article  Google Scholar 

  25. Berman, P., DasGupta, B., Sontag, E.D.: Randomized approximation algorithms for set multicover problems with applications to reverse engineering of protein and gene networks. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) RANDOM 2004 and APPROX 2004. LNCS, vol. 3122, pp. 39–50. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  26. Levin, A.: Approximating the unweighted k-set cover problem: greedy meets local search. SIAM J. Discrete Math. 23, 251–264 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  27. Grossman, T., Wool, A.: Computational experience with approximation algorithms for the set covering problem. Eur. J. Oper. Res. 101, 81–92 (1997)

    Article  MATH  Google Scholar 

  28. Lorenz, R., Bernhart, S.H., Zu Siederdissen, C.H., Tafer, H., Flamm, C., Stadler, P.F., Hofacker, I.L., et al.: ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011)

    Article  Google Scholar 

  29. MacWilliams, F.J., Sloane, N.J.: Pseudo-random sequences and arrays. Proc. IEEE 64, 1715–1729 (1976)

    Article  MathSciNet  Google Scholar 

  30. de Bruijn, N.: A combinatorial problem. Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen. Series A 49, 758 (1946)

    Google Scholar 

  31. Hurd, W.J.: Efficient generation of statistically good pseudonoise by linearly interconnected shift registers. IEEE Trans. Comput. 100, 146–152 (1974)

    Article  Google Scholar 

  32. Churkin, A., Weinbrand, L., Barash, D.: Free energy minimization to predict RNA secondary structures and computational RNA design. In: Picardi, E. (ed.) RNA Bioinformatics, pp. 3–16. Springer, New York (2015)

    Google Scholar 

  33. Burgess, D.J.: DNA elements: shaping up transcription factor binding. Nat. Rev. Genet. 16, 258–259 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by NIH grant R01GM081871.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bonnie Berger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Orenstein, Y., Berger, B. (2015). Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers. In: Pop, M., Touzet, H. (eds) Algorithms in Bioinformatics. WABI 2015. Lecture Notes in Computer Science(), vol 9289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48221-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48221-6_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48220-9

  • Online ISBN: 978-3-662-48221-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics