Skip to main content

On Hardness of Several String Indexing Problems

  • Conference paper
Book cover Combinatorial Pattern Matching (CPM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8486))

Included in the following conference series:

Abstract

Let \({\cal D} =\{d_1,d_2,...,d_D\}\) be a collection of D string documents of n characters in total. The two-pattern matching problems ask to index \({\cal D}\) for answering the following queries efficiently.

  • report/count the unique documents containing P 1 and P 2.

  • report/count the unique documents containing P 1 , but not P 2.

Here P 1 and P 2 represent input patterns of length p 1 and p 2 respectively. Linear space data structures with \(O(p_1+p_2+\sqrt{nk}\log^{O(1)} n)\) query cost are already known for the reporting version, where k represents the output size. For the counting version (i.e., report the value k), a simple linear-space index with \(O(p_1+p_2+ \sqrt{n})\) query cost can be constructed in O(n 3/2) time. However, it is still not known if these are the best possible bounds for these problems. In this paper, we show a strong connection between these string indexing problems and the boolean matrix multiplication problem. Based on this, we argue that these results cannot be improved significantly using purely combinatorial techniques. We also provide an improved upper bound for a related problem known as two-dimensional substring indexing.

Work supported in part by the Danish National Research Foundation grant DNRF84 through Center for Massive Data Algorithmics (MADALGO).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bansal, N., Williams, R.: Regularity lemmas and combinatorial algorithms. Theory of Computing 8(1), 69–94 (2012)

    Article  MathSciNet  Google Scholar 

  2. Brodal, G.S., Davoodi, P., Rao, S.S.: On space efficient two dimensional range minimum data structures. Algorithmica 63(4), 815–830 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  3. Chan, T.M., Durocher, S., Larsen, K.G., Morrison, J., Wilkinson, B.T.: Linear-space data structures for range mode query in arrays. In: STACS. LIPIcs, vol. 14, pp. 290–301. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2012)

    Google Scholar 

  4. Chan, T.M., Durocher, S., Skala, M., Wilkinson, B.T.: Linear-space data structures for range minority query in arrays. In: Fomin, F.V., Kaski, P. (eds.) SWAT 2012. LNCS, vol. 7357, pp. 295–306. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  5. Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the ram, revisited. In: Symposium on Computational Geometry, pp. 1–10. ACM (2011)

    Google Scholar 

  6. Cohen, H., Porat, E.: Fast set intersection and two-patterns matching. Theor. Comput. Sci. 411(40-42), 3795–3800 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  7. Ferragina, P., Koudas, N., Muthukrishnan, S., Srivastava, D.: Two-dimensional substring indexing. J. Comput. Syst. Sci. 66(4), 763–774 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  8. Fischer, J., Gagie, T., Kopelowitz, T., Lewenstein, M., Mäkinen, V., Salmela, L., Välimäki, N.: Forbidden patterns. In: Fernández-Baca, D. (ed.) LATIN 2012. LNCS, vol. 7256, pp. 327–337. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Gall, F.L.: Powers of tensors and fast matrix multiplication. CoRR, abs/1401.7714 (2014)

    Google Scholar 

  10. Golynski, A., Munro, J.I., Rao, S.S.: Rankselect operations on large alphabets: A tool for text indexing. In: SODA, pp. 368–373. ACM Press (2006)

    Google Scholar 

  11. Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: String retrieval for multi-pattern queries. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 55–66. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Document listing for queries with excluded pattern. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 185–195. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Space-efficient framework for top-k string retrieval. In: JACM (2014)

    Google Scholar 

  14. JáJá, J., Mortensen, C.W., Shi, Q.: Space-efficient and fast algorithms for multidimensional dominance reporting and counting. In: Fleischer, R., Trippen, G. (eds.) ISAAC 2004. LNCS, vol. 3341, pp. 558–568. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. Matias, Y., Muthukrishnan, S.M., Şahinalp, S.C., Ziv, J.: Augmenting suffix trees, with applications. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 67–78. Springer, Heidelberg (1998)

    Google Scholar 

  16. Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: SODA, pp. 657–666. ACM/SIAM (2002)

    Google Scholar 

  17. Navarro, G.: Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences. CoRR, abs/1304.6023 (2013)

    Google Scholar 

  18. Nekrich, Y., Navarro, G.: Sorted range reporting. In: Fomin, F.V., Kaski, P. (eds.) SWAT 2012. LNCS, vol. 7357, pp. 271–282. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Larsen, K.G., Munro, J.I., Nielsen, J.S., Thankachan, S.V. (2014). On Hardness of Several String Indexing Problems. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds) Combinatorial Pattern Matching. CPM 2014. Lecture Notes in Computer Science, vol 8486. Springer, Cham. https://doi.org/10.1007/978-3-319-07566-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07566-2_25

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07565-5

  • Online ISBN: 978-3-319-07566-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics