Abstract
Current microarray technologies to determine RNA structure or measure protein-RNA interactions rely on single-stranded, unstructured RNA probes on a chip covering together all k-mers. Since space on the array is limited, the problem is to efficiently design a compact library of unstructured \(\ell \)-long RNA probes, where each k-mer is covered at least p times. Ray et al. designed such a library for specific values of k, \(\ell \) and p using ad-hoc rules. To our knowledge, there is no general method to date to solve this problem. Here, we address the problem of finding a minimum-size covering of all k-mers by \(\ell \)-long sequences with the desired properties for any value of k, \(\ell \) and p. As we prove that the problem is NP-hard, we give two solutions: the first is a greedy algorithm with a logarithmic approximation ratio; the second, a heuristic greedy approach based on random walks in de Bruijn graphs. The heuristic algorithm works well in practice and produces a library of unstructured RNA probes that is only \(\sim 1.1\)-times greater in size compared to the theoretical lower bound. We present results for typical values of k and probe lengths \(\ell \) and show that our algorithm generates a library that is significantly smaller than the library of Ray et al.; moreover, we show that our algorithm outperforms naive methods. Our approach can be generalized and extended to generate RNA or DNA oligo libraries with other desired properties. The software is freely available on curlcake.csail.mit.edu.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kudla, G., Granneman, S., Hahn, D., Beggs, J.D., Tollervey, D.: Cross-linking, ligation, and sequencing of hybrids reveals RNA-RNA interactions in yeast. Proc. Natl. Acad. Sci. 108, 10010–10015 (2011)
Rinn, J.L., Ule, J.: Oming in on RNA-protein interactions. Genome Biol. 15, 401 (2014)
Wan, Y., Kertesz, M., Spitale, R.C., Segal, E., Chang, H.Y.: Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12, 641–655 (2011)
Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U., Segal, E.: The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007)
Steffen, P., Voß, B., Rehmsmeier, M., Reeder, J., Giegerich, R.: RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics 22, 500–503 (2006)
Kertesz, M., Wan, Y., Mazor, E., Rinn, J.L., Nutter, R.C., Chang, H.Y., Segal, E.: Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 (2010)
Mandir, J.B., Lockett, M.R., Phillips, M.F., Allawi, H.T., Lyamichev, V.I., Smith, L.M.: Rapid determination of RNA accessible sites by surface plasmon resonance detection of hybridization to DNA arrays. Anal. Chem. 81, 8949–8956 (2009)
Kierzek, E., Kierzek, R., Turner, D.H., Catrina, I.E.: Facilitating RNA structure prediction with microarrays. Biochemistry 45, 581–593 (2006)
Kierzek, R., Turner, D.H., Kierzek, E.: Microarrays for identifying binding sites and probing structure of RNAs. Nucleic Acids Res. 43, 1–12 (2015)
Gerstberger, S., Hafner, M., Tuschl, T.: A census of human RNA-binding proteins. Nat. Rev. Genet. 15, 829–845 (2014)
König, J., Zarnack, K., Luscombe, N.M., Ule, J.: Protein-RNA interactions: new genomic technologies and perspectives. Nat. Rev. Genet. 13, 77–83 (2012)
Fu, X.D., Ares Jr, M.: Context-dependent control of alternative splicing by RNA-binding proteins. Nat. Rev. Genet. 15, 689–701 (2014)
Ray, D., Kazan, H., Chan, E.T., Castillo, L.P., Chaudhry, S., Talukder, S., Blencowe, B.J., Morris, Q., Hughes, T.R.: Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat. Biotechnol. 27, 667–670 (2009)
Lambert, N., Robertson, A., Jangi, M., McGeary, S., Sharp, P.A., Burge, C.B.: RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. Mol. Cell 54, 887–900 (2014)
Ray, D., Kazan, H., Cook, K.B., Weirauch, M.T., Najafabadi, H.S., Li, X., Gueroussov, S., Albu, M., Zheng, H., Yang, A., et al.: A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013)
Stefl, R., Skrisovska, L., Allain, F.H.T.: RNA sequence-and shape-dependent recognition by proteins in the ribonucleoprotein particle. EMBO Rep. 6, 33–38 (2005)
Berger, B., Peng, J., Singh, M.: Computational solutions for omics data. Nat. Rev. Genet. 14, 333–346 (2013)
West, D.B., et al.: Introduction to Graph Theory, vol. 2. Prentice Hall, Upper Saddle River (2001)
Lempel, A.: On a homomorphism of the de Bruijn graph and its applications to the design of feedback shift registers. IEEE Trans. Comput. 100, 1204–1209 (1970)
Alhakim, A., Akinwande, M.: A recursive construction of nonbinary de Bruijn sequences. Des. Codes Crypt. 60, 155–169 (2011)
Berger, M.F., Philippakis, A.A., Qureshi, A.M., He, F.S., Estep, P.W., Bulyk, M.L.: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006)
Philippakis, A.A., Qureshi, A.M., Berger, M.F., Bulyk, M.L.: Design of compact, universal DNA microarrays for protein binding microarray experiments. J. Comput. Biol. 15, 655–665 (2008)
Orenstein, Y., Shamir, R.: Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers. Bioinformatics 29, i71–i79 (2013)
Fordyce, P.M., Gerber, D., Tran, D., Zheng, J., Li, H., DeRisi, J.L., Quake, S.R.: De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis. Nat. Biotechnol. 28, 970–975 (2010)
Berman, P., DasGupta, B., Sontag, E.D.: Randomized approximation algorithms for set multicover problems with applications to reverse engineering of protein and gene networks. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) RANDOM 2004 and APPROX 2004. LNCS, vol. 3122, pp. 39–50. Springer, Heidelberg (2004)
Levin, A.: Approximating the unweighted k-set cover problem: greedy meets local search. SIAM J. Discrete Math. 23, 251–264 (2008)
Grossman, T., Wool, A.: Computational experience with approximation algorithms for the set covering problem. Eur. J. Oper. Res. 101, 81–92 (1997)
Lorenz, R., Bernhart, S.H., Zu Siederdissen, C.H., Tafer, H., Flamm, C., Stadler, P.F., Hofacker, I.L., et al.: ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011)
MacWilliams, F.J., Sloane, N.J.: Pseudo-random sequences and arrays. Proc. IEEE 64, 1715–1729 (1976)
de Bruijn, N.: A combinatorial problem. Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen. Series A 49, 758 (1946)
Hurd, W.J.: Efficient generation of statistically good pseudonoise by linearly interconnected shift registers. IEEE Trans. Comput. 100, 146–152 (1974)
Churkin, A., Weinbrand, L., Barash, D.: Free energy minimization to predict RNA secondary structures and computational RNA design. In: Picardi, E. (ed.) RNA Bioinformatics, pp. 3–16. Springer, New York (2015)
Burgess, D.J.: DNA elements: shaping up transcription factor binding. Nat. Rev. Genet. 16, 258–259 (2015)
Acknowledgments
This work was supported by NIH grant R01GM081871.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Orenstein, Y., Berger, B. (2015). Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers. In: Pop, M., Touzet, H. (eds) Algorithms in Bioinformatics. WABI 2015. Lecture Notes in Computer Science(), vol 9289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48221-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-662-48221-6_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48220-9
Online ISBN: 978-3-662-48221-6
eBook Packages: Computer ScienceComputer Science (R0)