Skip to main content

Weighted Shortest Common Supersequence Problem Revisited

  • Conference paper
  • First Online:
Book cover String Processing and Information Retrieval (SPIRE 2019)

Abstract

A weighted string, also known as a position weight matrix, is a sequence of probability distributions over some alphabet. We revisit the Weighted Shortest Common Supersequence (WSCS) problem, introduced by Amir et al. [SPIRE 2011], that is, the SCS problem on weighted strings. In the WSCS problem, we are given two weighted strings \(W_1\) and \(W_2\) and a threshold \(\tfrac{1}{z} \) on probability, and we are asked to compute the shortest (standard) string S such that both \(W_1\) and \(W_2\) match subsequences of S (not necessarily the same) with probability at least \(\tfrac{1}{z} \). Amir et al. showed that this problem is NP-complete if the probabilities, including the threshold \(\tfrac{1}{z} \), are represented by their logarithms (encoded in binary).

We present an algorithm that solves the WSCS problem for two weighted strings of length n over a constant-sized alphabet in \(\mathcal {O}(n^2\sqrt{z} \log {z})\) time. Notably, our upper bound matches known conditional lower bounds stating that the WSCS problem cannot be solved in \(\mathcal {O}(n^{2-\varepsilon })\) time or in \(\mathcal {O}^*(z^{0.5-\varepsilon })\) with time, where the \(\mathcal {O}^*\) notation suppresses factors polynomial with respect to the instance size (with numeric values encoded in binary), unless there is a breakthrough improving upon long-standing upper bounds for fundamental NP-hard problems (CNF-SAT and Subset Sum, respectively).

We also discover a fundamental difference between the WSCS problem and the Weighted Longest Common Subsequence (WLCS) problem, introduced by Amir et al. [JDA 2010]. We show that the WLCS problem cannot be solved in \(\mathcal {O}(n^{f(z)})\) time, for any function f(z), unless \(\mathrm {P}=\mathrm {NP}\).

Tomasz Kociumaka was supported by ISF grants no. 824/17 and 1278/16 and by an ERC grant MPM under the EU’s Horizon 2020 Research and Innovation Programme (grant no. 683064).

Jakub Radoszewski and Juliusz Straszyński were supported by the “Algorithms for text processing with errors and uncertainties” project carried out within the HOMING program of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that in general \(z\notin \mathcal {O}^*(1)\) unless z is encoded in unary.

  2. 2.

    We consider the case of \(|\varSigma |=\mathcal {O}(1)\) just for simplicity. For a general alphabet, our algorithm can be modified to work in \(\mathcal {O}(n^2 |\varSigma |\sqrt{z} \log {z})\) time.

  3. 3.

    For any two integers \(\ell \le r\), we use \([\ell \mathinner {.\,.}r]\) to denote the integer range \(\{\ell ,\ldots ,r\}\).

References

  1. Abboud, A., Backurs, A., Williams, V.V.: Tight hardness results for LCS and other sequence similarity measures. In: Guruswami, V. (ed.) 56th IEEE Annual Symposium on Foundations of Computer Science, pp. 59–78. IEEE Computer Society (2015). https://doi.org/10.1109/FOCS.2015.14

  2. Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009). https://doi.org/10.1109/TKDE.2008.190

    Article  Google Scholar 

  3. Amir, A., Chencinski, E., Iliopoulos, C.S., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. Theor. Comput. Sci. 395(2–3), 298–310 (2008). https://doi.org/10.1016/j.tcs.2008.01.006

    Article  MathSciNet  MATH  Google Scholar 

  4. Amir, A., Gotthilf, Z., Shalom, B.R.: Weighted LCS. J. Discrete Algorithms 8(3), 273–281 (2010). https://doi.org/10.1016/j.jda.2010.02.001

    Article  MathSciNet  MATH  Google Scholar 

  5. Amir, A., Gotthilf, Z., Shalom, B.R.: Weighted shortest common supersequence. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 44–54. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24583-1_6

    Chapter  Google Scholar 

  6. Bansal, N., Garg, S., Nederlof, J., Vyas, N.: Faster space-efficient algorithms for subset sum, \(k\)-sum, and related problems. SIAM J. Comput. 47(5), 1755–1777 (2018). https://doi.org/10.1137/17M1158203

    Google Scholar 

  7. Barton, C., Kociumaka, T., Liu, C., Pissis, S.P., Radoszewski, J.: Indexing weighted sequences: neat and efficient. Inf. Comput. (2019). https://doi.org/10.1016/j.ic.2019.104462

  8. Barton, C., Kociumaka, T., Pissis, S.P., Radoszewski, J.: Efficient index for weighted sequences. In: Grossi, R., Lewenstein, M. (eds.) 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016. LIPIcs, vol. 54, pp. 4:1–4:13. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2016). https://doi.org/10.4230/LIPIcs.CPM.2016.4

  9. Barton, C., Liu, C., Pissis, S.P.: Linear-time computation of prefix table for weighted strings & applications. Theor. Comput. Sci. 656, 160–172 (2016). https://doi.org/10.1016/j.tcs.2016.04.029

    Article  MathSciNet  MATH  Google Scholar 

  10. Barton, C., Pissis, S.P.: Crochemore’s partitioning on weighted strings and applications. Algorithmica 80(2), 496–514 (2018). https://doi.org/10.1007/s00453-016-0266-0

    Article  MathSciNet  MATH  Google Scholar 

  11. Charalampopoulos, P., Iliopoulos, C.S., Liu, C., Pissis, S.P.: Property suffix array with applications. In: Bender, M.A., Farach-Colton, M., Mosteiro, M.A. (eds.) LATIN 2018. LNCS, vol. 10807, pp. 290–302. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77404-6_22

    Chapter  Google Scholar 

  12. Charalampopoulos, P., Iliopoulos, C.S., Pissis, S.P., Radoszewski, J.: On-line weighted pattern matching. Inf. Comput. 266, 49–59 (2019). https://doi.org/10.1016/j.ic.2019.01.001

    Article  MathSciNet  MATH  Google Scholar 

  13. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press (2009). https://mitpress.mit.edu/books/introduction-algorithms-third-edition

  14. Cygan, M., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Polynomial-time approximation algorithms for weighted LCS problem. Discrete Appl. Math. 204, 38–48 (2016). https://doi.org/10.1016/j.dam.2015.11.011

    Article  MathSciNet  MATH  Google Scholar 

  15. Horowitz, E., Sahni, S.: Computing partitions with applications to the knapsack problem. J. ACM 21(2), 277–292 (1974). https://doi.org/10.1145/321812.321823

    Article  MathSciNet  MATH  Google Scholar 

  16. Impagliazzo, R., Paturi, R.: On the complexity of \(k\)-SAT. J. Comput. Syst. Sci. 62(2), 367–375 (2001). https://doi.org/10.1006/jcss.2000.1727

    Google Scholar 

  17. Impagliazzo, R., Paturi, R., Zane, F.: Which problems have strongly exponential complexity? J. Comput. Syst. Sci. 63(4), 512–530 (2001). https://doi.org/10.1006/jcss.2001.1774

    Article  MathSciNet  MATH  Google Scholar 

  18. Jiang, T., Li, M.: On the approximation of shortest common supersequences and longest common subsequences. SIAM J. Comput. 24(5), 1122–1139 (1995). https://doi.org/10.1137/S009753979223842X

    Article  MathSciNet  MATH  Google Scholar 

  19. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Symposium on the Complexity of Computer Computations. pp. 85–103. The IBM Research Symposia Series, Plenum Press, New York (1972). https://doi.org/10.1007/978-1-4684-2001-2_9

  20. Kipouridis, E., Tsichlas, K.: Longest common subsequence on weighted sequences (2019). http://arxiv.org/abs/1901.04068

  21. Kociumaka, T., Pissis, S.P., Radoszewski, J.: Pattern matching and consensus problems on weighted sequences and profiles. Theory Comput. Syst. 63(3), 506–542 (2019). https://doi.org/10.1007/s00224-018-9881-2

    Article  MathSciNet  MATH  Google Scholar 

  22. Lokshtanov, D., Marx, D., Saurabh, S.: Lower bounds based on the Exponential Time Hypothesis. Bull. EATCS 105, 41–72 (2011). http://eatcs.org/beatcs/index.php/beatcs/article/view/92

    MathSciNet  MATH  Google Scholar 

  23. Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1978). https://doi.org/10.1145/322063.322075

    Article  MathSciNet  MATH  Google Scholar 

  24. Radoszewski, J., Starikovskaya, T.: Streaming \(k\)-mismatch with error correcting and applications. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) Data Compression Conference, DCC 2017, pp. 290–299. IEEE (2017). https://doi.org/10.1109/DCC.2017.14

  25. Räihä, K., Ukkonen, E.: The shortest common supersequence problem over binary alphabet is NP-complete. Theor. Comput. Sci. 16, 187–198 (1981). https://doi.org/10.1016/0304-3975(81)90075-X

    Article  MathSciNet  MATH  Google Scholar 

  26. Stormo, G.D., Schneider, T.D., Gold, L., Ehrenfeucht, A.: Use of the ‘perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucl. Acids Res. 10(9), 2997–3011 (1982). https://doi.org/10.1093/nar/10.9.2997

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jakub Radoszewski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Charalampopoulos, P. et al. (2019). Weighted Shortest Common Supersequence Problem Revisited. In: Brisaboa, N., Puglisi, S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science(), vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32686-9_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32685-2

  • Online ISBN: 978-3-030-32686-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics