Advertisement

Efficient Pattern Matching in Elastic-Degenerate Texts

  • Costas S. Iliopoulos
  • Ritu KunduEmail author
  • Solon P. Pissis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10168)

Abstract

Motivated by applications in bioinformatics, in what follows, we extend the notion of gapped strings to elastic-degenerate strings. An elastic-degenerate string can been seen as an ordered collection of solid (standard) strings interleaved by elastic-degenerate symbols; each such symbol corresponds to a set of two or more variable-length solid strings. In this article, we present an algorithm for solving the pattern matching problem with a solid pattern and an elastic-degenerate text running in \(\mathcal {O}(N+\alpha \gamma mn)\) time; where m is the length of the pattern; n and N are the length and total size of the elastic-degenerate text, respectively; \(\alpha \) and \(\gamma \) are parameters, respectively representing the maximum number of strings in any elastic-degenerate symbol of the text and the maximum number of elastic-degenerate symbols spanned by any occurrence of the pattern in the text. The space used by the proposed algorithm is \(\mathcal {O}(N)\).

Keywords

String processing algorithms Degenerate strings Indeterminate strings Elastic-degenerate strings Gapped strings 

References

  1. 1.
    Amir, A., Farach, M., Galil, Z., Giancarlo, R., Park, K.: Dynamic dictionary matching. J. Comput. Syst. Sci. 49(2), 208–222 (1994). http://www.sciencedirect.com/science/article/pii/S0022000005800479 MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Church, D.M., Schneider, V.A., Steinberg, K.M., Schatz, M.C., Quinlan, A.R., Chin, C.S., Kitts, P.A., Aken, B., Marth, G.T., Hoffman, M.M., Herrero, J., Mendoza, M.L.Z., Durbin, R., Flicek, P.: Extending reference assembly models. Genome Biol. 16(1), 13 (2015). http://dx.doi.org/10.1186/s13059-015-0587-3 CrossRefGoogle Scholar
  3. 3.
    Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings, 392 p. Cambridge University Press, Cambridge (2007)Google Scholar
  4. 4.
    Crochemore, M., Sagot, M.F.: Motifs in Sequences: Localization and Extraction, pp. 47–97. Marcel Dekker, New York (2004)Google Scholar
  5. 5.
    Dilthey, A., Cox, C., Iqbal, Z., Nelson, M.R., McVean, G.: Improved genome inference in the MHC using a population reference graph. Nat. Genet. 47(6), 682–688 (2015). Technical report, http://dx.doi.org/10.1038/ng.3257 CrossRefGoogle Scholar
  6. 6.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)CrossRefzbMATHGoogle Scholar
  7. 7.
    Harel, H.T., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Huang, L., Popic, V., Batzoglou, S.: Short read alignment with populations of genomes. Bioinformatics 29(13), i361–i370 (2013). http://bioinformatics.oxfordjournals.org/content/29/13/i361.abstract CrossRefGoogle Scholar
  9. 9.
    Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977). http://dx.doi.org/10.1137/0206024 MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Li, Y., Bailey, J., Kulik, L., Pei, J.: Efficient matching of substrings in uncertain sequences. In: Zaki, M.J., Obradovic, Z., Tan, P., Banerjee, A., Kamath, C., Parthasarathy, S. (eds.) Proceedings of 2014 SIAM International Conference on Data Mining, 24–26 April 2014, pp. 767–775. SIAM, Philadelphia (2014). http://dx.doi.org/10.1137/1.9781611973440.88
  11. 11.
    Liu, Y., Koyutürk, M., Maxwell, S., Xiang, M., Veigl, M., Cooper, R.S., Tayo, B.O., Li, L., LaFramboise, T., Wang, Z., Zhu, X., Chance, M.R.: Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing. BMC Genomics 15(1), 685 (2014). http://dx.doi.org/10.1186/1471-2164-15-685 CrossRefGoogle Scholar
  12. 12.
    Maciuca, S., del Ojo Elias, C., McVean, G., Iqbal, Z.: A natural encoding of genetic variation in a burrows-wheeler transform to enable mapping and genome inference. In: Frith, M., Storm Pedersen, C.N. (eds.) WABI 2016. LNCS, vol. 9838, pp. 222–233. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-43681-4_18 CrossRefGoogle Scholar
  13. 13.
    McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM (JACM) 23(2), 262–272 (1976)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Pissis, S.P.: MoTeX-II: structured MoTif eXtraction from large-scale datasets. BMC Bioinform. 15(1), 235 (2014). http://dx.doi.org/10.1186/1471-2105-15-235 CrossRefGoogle Scholar
  15. 15.
    Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006). doi: 10.1007/11809678_17 CrossRefGoogle Scholar
  16. 16.
    Schieber, B., Vishkin, U.: On finding lowest common ancestors: simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988). http://dx.doi.org/10.1137/0217079 MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Weiner, P.: Linear pattern matching algorithms. In: Proceedings of 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11. Institute of Electrical Electronics Engineer (1973)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Costas S. Iliopoulos
    • 1
  • Ritu Kundu
    • 1
    Email author
  • Solon P. Pissis
    • 1
  1. 1.Department of InformaticsKing’s College LondonLondonUK

Personalised recommendations