Abstract
A weighted string, also known as a position weight matrix, is a sequence of probability distributions over some alphabet. We revisit the Weighted Shortest Common Supersequence (WSCS) problem, introduced by Amir et al. [SPIRE 2011], that is, the SCS problem on weighted strings. In the WSCS problem, we are given two weighted strings \(W_1\) and \(W_2\) and a threshold \(\tfrac{1}{z} \) on probability, and we are asked to compute the shortest (standard) string S such that both \(W_1\) and \(W_2\) match subsequences of S (not necessarily the same) with probability at least \(\tfrac{1}{z} \). Amir et al. showed that this problem is NP-complete if the probabilities, including the threshold \(\tfrac{1}{z} \), are represented by their logarithms (encoded in binary).
We present an algorithm that solves the WSCS problem for two weighted strings of length n over a constant-sized alphabet in \(\mathcal {O}(n^2\sqrt{z} \log {z})\) time. Notably, our upper bound matches known conditional lower bounds stating that the WSCS problem cannot be solved in \(\mathcal {O}(n^{2-\varepsilon })\) time or in \(\mathcal {O}^*(z^{0.5-\varepsilon })\) with time, where the \(\mathcal {O}^*\) notation suppresses factors polynomial with respect to the instance size (with numeric values encoded in binary), unless there is a breakthrough improving upon long-standing upper bounds for fundamental NP-hard problems (CNF-SAT and Subset Sum, respectively).
We also discover a fundamental difference between the WSCS problem and the Weighted Longest Common Subsequence (WLCS) problem, introduced by Amir et al. [JDA 2010]. We show that the WLCS problem cannot be solved in \(\mathcal {O}(n^{f(z)})\) time, for any function f(z), unless \(\mathrm {P}=\mathrm {NP}\).
Tomasz Kociumaka was supported by ISF grants no. 824/17 and 1278/16 and by an ERC grant MPM under the EU’s Horizon 2020 Research and Innovation Programme (grant no. 683064).
Jakub Radoszewski and Juliusz Straszyński were supported by the “Algorithms for text processing with errors and uncertainties” project carried out within the HOMING program of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that in general \(z\notin \mathcal {O}^*(1)\) unless z is encoded in unary.
- 2.
We consider the case of \(|\varSigma |=\mathcal {O}(1)\) just for simplicity. For a general alphabet, our algorithm can be modified to work in \(\mathcal {O}(n^2 |\varSigma |\sqrt{z} \log {z})\) time.
- 3.
For any two integers \(\ell \le r\), we use \([\ell \mathinner {.\,.}r]\) to denote the integer range \(\{\ell ,\ldots ,r\}\).
References
Abboud, A., Backurs, A., Williams, V.V.: Tight hardness results for LCS and other sequence similarity measures. In: Guruswami, V. (ed.) 56th IEEE Annual Symposium on Foundations of Computer Science, pp. 59–78. IEEE Computer Society (2015). https://doi.org/10.1109/FOCS.2015.14
Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009). https://doi.org/10.1109/TKDE.2008.190
Amir, A., Chencinski, E., Iliopoulos, C.S., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. Theor. Comput. Sci. 395(2–3), 298–310 (2008). https://doi.org/10.1016/j.tcs.2008.01.006
Amir, A., Gotthilf, Z., Shalom, B.R.: Weighted LCS. J. Discrete Algorithms 8(3), 273–281 (2010). https://doi.org/10.1016/j.jda.2010.02.001
Amir, A., Gotthilf, Z., Shalom, B.R.: Weighted shortest common supersequence. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 44–54. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24583-1_6
Bansal, N., Garg, S., Nederlof, J., Vyas, N.: Faster space-efficient algorithms for subset sum, \(k\)-sum, and related problems. SIAM J. Comput. 47(5), 1755–1777 (2018). https://doi.org/10.1137/17M1158203
Barton, C., Kociumaka, T., Liu, C., Pissis, S.P., Radoszewski, J.: Indexing weighted sequences: neat and efficient. Inf. Comput. (2019). https://doi.org/10.1016/j.ic.2019.104462
Barton, C., Kociumaka, T., Pissis, S.P., Radoszewski, J.: Efficient index for weighted sequences. In: Grossi, R., Lewenstein, M. (eds.) 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016. LIPIcs, vol. 54, pp. 4:1–4:13. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2016). https://doi.org/10.4230/LIPIcs.CPM.2016.4
Barton, C., Liu, C., Pissis, S.P.: Linear-time computation of prefix table for weighted strings & applications. Theor. Comput. Sci. 656, 160–172 (2016). https://doi.org/10.1016/j.tcs.2016.04.029
Barton, C., Pissis, S.P.: Crochemore’s partitioning on weighted strings and applications. Algorithmica 80(2), 496–514 (2018). https://doi.org/10.1007/s00453-016-0266-0
Charalampopoulos, P., Iliopoulos, C.S., Liu, C., Pissis, S.P.: Property suffix array with applications. In: Bender, M.A., Farach-Colton, M., Mosteiro, M.A. (eds.) LATIN 2018. LNCS, vol. 10807, pp. 290–302. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77404-6_22
Charalampopoulos, P., Iliopoulos, C.S., Pissis, S.P., Radoszewski, J.: On-line weighted pattern matching. Inf. Comput. 266, 49–59 (2019). https://doi.org/10.1016/j.ic.2019.01.001
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press (2009). https://mitpress.mit.edu/books/introduction-algorithms-third-edition
Cygan, M., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Polynomial-time approximation algorithms for weighted LCS problem. Discrete Appl. Math. 204, 38–48 (2016). https://doi.org/10.1016/j.dam.2015.11.011
Horowitz, E., Sahni, S.: Computing partitions with applications to the knapsack problem. J. ACM 21(2), 277–292 (1974). https://doi.org/10.1145/321812.321823
Impagliazzo, R., Paturi, R.: On the complexity of \(k\)-SAT. J. Comput. Syst. Sci. 62(2), 367–375 (2001). https://doi.org/10.1006/jcss.2000.1727
Impagliazzo, R., Paturi, R., Zane, F.: Which problems have strongly exponential complexity? J. Comput. Syst. Sci. 63(4), 512–530 (2001). https://doi.org/10.1006/jcss.2001.1774
Jiang, T., Li, M.: On the approximation of shortest common supersequences and longest common subsequences. SIAM J. Comput. 24(5), 1122–1139 (1995). https://doi.org/10.1137/S009753979223842X
Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Symposium on the Complexity of Computer Computations. pp. 85–103. The IBM Research Symposia Series, Plenum Press, New York (1972). https://doi.org/10.1007/978-1-4684-2001-2_9
Kipouridis, E., Tsichlas, K.: Longest common subsequence on weighted sequences (2019). http://arxiv.org/abs/1901.04068
Kociumaka, T., Pissis, S.P., Radoszewski, J.: Pattern matching and consensus problems on weighted sequences and profiles. Theory Comput. Syst. 63(3), 506–542 (2019). https://doi.org/10.1007/s00224-018-9881-2
Lokshtanov, D., Marx, D., Saurabh, S.: Lower bounds based on the Exponential Time Hypothesis. Bull. EATCS 105, 41–72 (2011). http://eatcs.org/beatcs/index.php/beatcs/article/view/92
Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1978). https://doi.org/10.1145/322063.322075
Radoszewski, J., Starikovskaya, T.: Streaming \(k\)-mismatch with error correcting and applications. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) Data Compression Conference, DCC 2017, pp. 290–299. IEEE (2017). https://doi.org/10.1109/DCC.2017.14
Räihä, K., Ukkonen, E.: The shortest common supersequence problem over binary alphabet is NP-complete. Theor. Comput. Sci. 16, 187–198 (1981). https://doi.org/10.1016/0304-3975(81)90075-X
Stormo, G.D., Schneider, T.D., Gold, L., Ehrenfeucht, A.: Use of the ‘perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucl. Acids Res. 10(9), 2997–3011 (1982). https://doi.org/10.1093/nar/10.9.2997
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Charalampopoulos, P. et al. (2019). Weighted Shortest Common Supersequence Problem Revisited. In: Brisaboa, N., Puglisi, S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science(), vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-32686-9_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32685-2
Online ISBN: 978-3-030-32686-9
eBook Packages: Computer ScienceComputer Science (R0)