Abstract
We address the problems of pattern matching and approximate pattern matching in the sketching model. We show that it is impossible to compress the text into a small sketch and use only the sketch to decide whether a given pattern occurs in the text. We also prove a sketch size lower bound for approximate pattern matching, and show it is tight up to a logarithmic factor.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)
Amir, A., Benson, G.: Efficient two-dimensional compressed matching. In: Proceedings of IEEE Data Compression Conference, DCC, pp. 279–288 (1992)
Amir, A., Benson, G., Farach, M.: Let sleeping files lie: Pattern matching in Z-compressed files. J. of Computer and System Sciences 52(2), 299–307 (1996)
Bar-Yossef, Z., Jayram, T.S., Krauthgamer, R., Kumar, R.: Approximating edit distance efficientl (2004) (manuscript)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: Information theory methods in communication complexity. In: Proceedings of the 17th Annual IEEE Conference on Computational Complexity, pp. 93–102 (2002)
Batu, T., Ergün, F., Kilian, J., Magen, A., Raskhodnikova, S., Rubinfeld, R., Sami, R.: A sublinear algorithm for weakly approximating edit distance. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing, pp. 316–324 (2003)
Broder, A., Charikar, M., Frieze, A., Mitzenmacher, M.: Min-wise independent permutations. Journal of Computer and System Sciences 60(3), 630–659 (2000)
Broder, A., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. WWW6/Computer Networks 29(8-13), 1157–1166 (1997)
Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pp. 380–388 (2002)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, Inc. Chichester (1991)
de Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Transactions on Information Systems 18(2), 113–139 (2000)
Farach, M., Thorup, M.: String matching in Lempel-Ziv compressed strings. Algorithmica 20(4), 388–404 (1998)
Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M.J., Wright, R.N.: Secure multiparty computation of approximations. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 927–938. Springer, Heidelberg (2001)
Feigenbaum, J., Kannan, S., Strauss, M.J., Viswanathan, M.: An approximate L1-difference algorithm for massive data streams. SIAM J. Comput. 32(1), 131–151 (2002)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE Computer Society, Los Alamitos (2000)
Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, STOC, pp. 604–613 (1998)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31(2), 249–260 (1987)
Kremer, I., Nisan, N., Ron, D.: On randomized one-round communication complexity. Computational Complexity 8(1), 21–49 (1999)
Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM Journal on Computing 30(2), 457–474 (2000)
Lonardi, S.: Pattern matching pointers (2004), Available http://www.cs.ucr.edu/~stelo/pattern.html
Manber, U.: A text compression scheme that allows fast searching directly in the compressed file. ACM Transactions on Information Systems 15(2), 124–136 (1997)
Navarro, G., Tarhio, J.: Boyer-Moore string matching over Ziv-Lempel compressed text. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 166–180. Springer, Heidelberg (2000)
Newman, I.: Private vs. common random bits in communication complexity. Inf. Process. Lett. 39(2), 67–71 (1991)
Shibata, Y., Matsumoto, T., Takeda, M., Shinohara, A., Arikawa, S.: A Boyer- Moore type algorithm for compressed pattern matching. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 181–194. Springer, Heidelberg (2000)
Yao, C.-C.: Lower bounds by probabilistic arguments. In: Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science, pp. 420–428 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bar-Yossef, Z., Jayram, T.S., Krauthgamer, R., Kumar, R. (2004). The Sketching Complexity of Pattern Matching. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. RANDOM APPROX 2004 2004. Lecture Notes in Computer Science, vol 3122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27821-4_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-27821-4_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22894-3
Online ISBN: 978-3-540-27821-4
eBook Packages: Springer Book Archive