Abstract
We consider the problem of indexing a collection of documents (a.k.a. strings) of total length n such that the following kind of queries are supported: given two patterns P + and P −, list all \(\ensuremath{n_\text{match}}\) documents containing P + but not P −. This is a natural extension of the classic problem of document listing as considered by Muthukrishnan [SODA’02], where only the positive pattern P + is given. Our main solution is an index of size O(n 3/2) bits that supports queries in \(O(|\ensuremath{P^+}|+|\ensuremath{P^-}|+\ensuremath{n_\text{match}}+\sqrt{n})\) time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chazelle, B.: Lower bounds for orthogonal range searching, I: The reporting case. J. ACM 37, 200–212 (1990)
Chien, Y.-F., Hon, W.-K., Shah, R., Vitter, J.S.: Geometric Burrows-Wheeler transform: Linking range searching and text indexing. In: Proc. DCC, pp. 252–261. IEEE Press (2008)
Cohen, H., Porat, E.: Fast set intersection and two-patterns matching. Theor. Comput. Sci. 411(40-42), 3795–3800 (2010)
Cole, R., Farach-Colton, M., Hariharan, R., Przytycka, T.M., Thorup, M.: An O(nlogn) algorithm for the maximum agreement subtree problem for binary trees. SIAM J. Comput. 30(5), 1385–1404 (2000)
Ferragina, P., Koudas, N., Muthukrishnan, S., Srivastava, D.: Two-dimensional substring indexing. J. Comput. Syst. Sci. 66(4), 763–774 (2003)
Fischer, J., Heun, V.: Space efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)
Gagie, T., Puglisi, S.J., Turpin, A.: Range Quantile Queries: Another Virtue of Wavelet Trees. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 1–6. Springer, Heidelberg (2009)
Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: String Retrieval for Multi-pattern Queries. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 55–66. Springer, Heidelberg (2010)
Hon, W.K., Shah, R., Vitter, J.S.: Space-efficient framework for top-k string retrieval problems. In: Proc. FOCS, pp. 713–722. IEEE Computer Society (2009)
Hui, L.C.K.: Color Set Size Problem with Application to String Matching. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 230–243. Springer, Heidelberg (1992)
Karpinski, M., Nekrich, Y.: Top-k color queries for document retrieval. In: Proc. SODA, pp. 401–411. ACM/SIAM (2011)
Matias, Y., Muthukrishnan, S., Şahinalp, S.C., Ziv, J.: Augmenting Suffix Trees, with Applications. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 67–78. Springer, Heidelberg (1998)
Munro, J.I., Raman, V.: Succinct representation of balanced parentheses and static trees. SIAM J. Comput. 31(3), 762–776 (2001)
Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proc. SODA, pp. 657–666. ACM/SIAM (2002)
Navarro, G., Nekrich, Y.: Top-k document retrieval in optimal time and linear space. In: Proc. SODA. ACM/SIAM (to appear, 2012)
Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5(1), 12–22 (2007)
Välimäki, N., Mäkinen, V.: Space-Efficient Algorithms for Document Retrieval. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 205–215. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fischer, J. et al. (2012). Forbidden Patterns. In: Fernández-Baca, D. (eds) LATIN 2012: Theoretical Informatics. LATIN 2012. Lecture Notes in Computer Science, vol 7256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29344-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-29344-3_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29343-6
Online ISBN: 978-3-642-29344-3
eBook Packages: Computer ScienceComputer Science (R0)