Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Resequencing a Set of Strings Based on a Target String

  • 292 Accesses

  • 4 Citations

Abstract

Given a set S={S 1,S 2,…,S l } of l strings, a text T, and a natural number k, find a string M, which is a concatenation of k strings (not necessarily distinct, i.e., a string in S may occur more than once in M) from S, whose longest common subsequence with T is largest, where a string in S may occur more than once in M. Such a string is called a k-inlay. The resequencing longest common subsequence problem (resequencing LCS problem for short) is to find a k-inlay for each query with parameter k after T and S are given. In this paper, we propose an algorithm for solving this problem which takes O(nml) preprocessing time and O(ϑ k k) query time for each query with parameter k, where n is the length of T, m is the maximal length of strings in S, and ϑ k is the length of the longest common subsequence between a k-inlay and T.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. 1.

    Aggarwal, A., Klawe, M.M., Moran, S., Shor, P., Wilber, R.: Geometric applications of a matrix-searching algorithm. Algorithmica 2(1), 195–208 (1987)

  2. 2.

    Aho, A., Hirschberg, D., Ullman, J.: Bounds on the complexity of the longest common subsequence problem. J. ACM 23(1), 1–12 (1976)

  3. 3.

    Alves, C.E.R., Cáceres, E.N., Song, S.W.: An all-substrings common subsequence algorithm. Discrete Appl. Math. 156(7), 1025–1035 (2008)

  4. 4.

    Amir, A., Hartman, T., Kapah, O., Shalom, R., Tsur, D.: Generalized LCS. Theor. Comput. Sci. 409(3), 438–449 (2008)

  5. 5.

    Amir, A., Gothilf, T., Shalom, R.: Weighted LCS. In: Proceedings of Combinatorial Algorithms: 20th International Workshop, IWOCA 2009, pp. 36–47 (2009)

  6. 6.

    Bein, W.W., Golin, M.J., Larmore, L.L., Zhang, Y.: The Knuth-Yao quadrangle-inequality speedup is a consequence of total-monotonicity. In: Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2006), pp. 31–40 (2006)

  7. 7.

    Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings of 7th Symposium on String Processing and Information Retrieval (SPIRE 2000), pp. 39–48 (2000)

  8. 8.

    Brent, R.P.: The parallel evaluation of general arithmetic expressions. J. ACM 21, 201–206 (1974)

  9. 9.

    Burkard, R.E.: Monge properties, discrete convexity and applications. Eur. J. Oper. Res. 176(1), 1–14 (2007)

  10. 10.

    Burkard, R.E., Klinz, B., Rudolf, R.: Perspectives of Monge properties in optimization. Discrete Appl. Math. 70(2), 95–161 (1996)

  11. 11.

    Chvatal, V., Klarner, D.A., Knuth, D.E.: Selected combinatorial research problem. Technical Report CS-TR-72-292, Stanford University (1972)

  12. 12.

    Demaine, E.D., Mozes, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. In: Proc. 34th International Colloquium on Automata, Languages and Programming (ICALP). Lecture Notes in Computer Science, vol. 4596, pp. 146–157 (2007)

  13. 13.

    Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

  14. 14.

    Hirschberg, D.S.: Algorithms for the longest common subsequence problem. J. ACM 24(4), 664–675 (1977)

  15. 15.

    Huang, K.S., Yang, C.B., Tseng, K.T., Peng, Y.H., Ann, H.Y.: Dynamic programming algorithms for the mosaic longest common subsequence problem. Inf. Process. Lett. 102, 99–103 (2007)

  16. 16.

    Knuth, D.E.: The Art of Computer Programming, pp. 560–563. Addison-Wesley, Reading (1973)

  17. 17.

    Komatsoulis, G.A., Waterman, M.S.: Chimeric alignment by dynamic programming: algorithm and biological uses. In: RECOMB97: Proceedings of the First Annual International Conference on Computational Molecular Biology, pp. 174–180. ACM Press, New York (1997)

  18. 18.

    Komatsoulis, G.A., Waterman, M.S.: A new computational method for detection of chimeric 16S rRNA artifacts generated by PCR amplification from mixed bacterial populations. Appl. Environ. Microbiol. 63(6), 2338–2346 (1997)

  19. 19.

    Landau, G.M., Ziv-Ukelson, M.: On the common substring alignment problem. J. Algorithms 41(2), 338–359 (2001)

  20. 20.

    Liu, J.J., Wang, Y.L., Lee, R.C.T.: Finding a longest common subsequence between a run-length-encoded string and an uncompressed string. J. Complex. 24, 173–184 (2008)

  21. 21.

    Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20, 18–31 (1980)

  22. 22.

    Modelevsky, J.: Computer applications in applied genetic engineering. Adv. Appl. Microbiol. 30, 169–195 (1984)

  23. 23.

    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)

  24. 24.

    Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)

Download references

Acknowledgements

The authors would like to thank anonymous referees for their careful reading with corrections and useful comments which helped to improve the paper.

Author information

Correspondence to Yue-Li Wang.

Additional information

This work was supported in part by the National Science Council of the Republic of China under contracts NSC 100-2221-E-011-067-MY3 and NSC 101-2221-E-011-038-MY3.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Kuo, C., Wang, Y., Liu, J. et al. Resequencing a Set of Strings Based on a Target String. Algorithmica 72, 430–449 (2015). https://doi.org/10.1007/s00453-013-9859-z

Download citation

Keywords

  • Dynamic programming
  • Longest common subsequences
  • Resequencing
  • Inverted indexing
  • Totally monotone matrices