Skip to main content

Shift-And Approach to Pattern Matching in LZW Compressed Text

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1645))

Included in the following conference series:

Abstract

This paper considers the Shift-And approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it. The algorithm is indeed fast when a pattern length is at most 32, or the word length. After an O(m + ||) time and O(||) space preprocessing of a pattern, it scans an LZW compressed text in O(n + r) time and reports all occurrences of the pattern, where n is the compressed text length, m is the pattern length, and r is the number of the pattern occurrences. Experimental results show that it runs approximately 1.5 times faster than a decompression followed by a simple search using the Shift-And algorithm. Moreover, the algorithm can be extended to the generalized pattern matching, to the pattern matching with k mismatches, and to the multiple pattern matching, like the Shift-And algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039–1051, December 1987.

    Article  MathSciNet  MATH  Google Scholar 

  2. A. Amir and G. Benson. Efficient two-dimensional compressed matching. In Proc. Data Compression Conference, page 279, 1992.

    Google Scholar 

  3. A. Amir and G. Benson. Two-dimensional periodicity and its application. In Proc. 3rd Symposium on Discrete Algorithms, page 440, 1992.

    Google Scholar 

  4. A. Amir, G. Benson, and M. Farach. Optimal two-dimensional compressed matching. In Proc. 21st International Colloquium on Automata, Languages and Programming, 1994.

    Google Scholar 

  5. A. Amir, G. Benson, and M. Farach. Let sleeping files lie: Pattern matching in Z-compressed files. Journal of Computer and System Sciences, 52:299–307, 1996.

    Article  MathSciNet  MATH  Google Scholar 

  6. A. Amir, G.M. Landau, and U. Vishkin. Efficient pattern matching with scaling. Journal of Algorithms, 13(1):2–32, 1992.

    Article  MATH  Google Scholar 

  7. R. Baeza-Yaltes and G.H. Gonnet. A new approach to text searching. Comm. ACM, 35(10):74–82, October 1992.

    Article  Google Scholar 

  8. T. Eilam-Tzoreff and U. Vishkin. Matching patterns in a string subject to multilinear transformations. In Proc. International Workshop on Sequences, Combinatorics, Compression, Security and Transmission, 1988.

    Google Scholar 

  9. M. Farach and M. Thorup. String-matching in Lempel-Ziv compressed strings. In 27th ACM STOC, pages 703–713, 1995.

    Google Scholar 

  10. Z. Galil and R. Giancarlo. Data structures and algorithms for approximate string matching. Journal of Complexity, 4:33–72, 1988.

    Article  MathSciNet  MATH  Google Scholar 

  11. L. Gąsieniec, M. Karpinski, W. Plandowski, and W. Rytter. Efficient algorithms for Lempel-Ziv encoding. In Proc. 4th Scandinavian Workshop on Algorithm Theory, volume 1097 of Lecture Notes in Computer Science, pages 392–403. Springer-Verlag, 1996.

    Google Scholar 

  12. M. Karpinski, W. Rytter, and A. Shinohara. An efficient pattern-matching algorithm for strings with short descriptions. Nordic Journal of Computing, 4:172–186, 1997.

    MathSciNet  MATH  Google Scholar 

  13. T. Kida, M. Takeda, A. Shinohara, M. Miyazaki, and S. Arikawa. Multiple pattern matching in LZW compressed text. In J.A. Atorer and M. Cohn, editors, Proc. of Data Compression Conference’ 98, pages 103–112. IEEE Computer Society, March 1998.

    Google Scholar 

  14. U. Manber. A text compression scheme that allows fast searching directly in the compressed file. In Proc. 5th Annu. Symp. Combinatorial Pattern Matching, volume 807 of Lecture Notes in Computer Science, pages 113–124. Springer-Verlag, 1994.

    Google Scholar 

  15. M. Miyazaki, A. Shinohara, and M. Takeda. An improved pattern matching algorithm for strings in terms of straight-line programs. In Proc. 8th Annu. Symp. Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 1–11. Springer-Verlag, 1997.

    Google Scholar 

  16. T.A. Welch. A technique for high performance data compression. IEEE Comput., 17:8–19, June 1984.

    Article  Google Scholar 

  17. S. Wu and U. Manber. Fast text searching allowing errors. Comm. ACM, 35(10):83–91, October 1992.

    Article  Google Scholar 

  18. J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory, IT-23(3):337–349, May 1977.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kida, T., Takeda, M., Shinohara, A., Arikawa, S. (1999). Shift-And Approach to Pattern Matching in LZW Compressed Text. In: Crochemore, M., Paterson, M. (eds) Combinatorial Pattern Matching. CPM 1999. Lecture Notes in Computer Science, vol 1645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48452-3_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-48452-3_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66278-5

  • Online ISBN: 978-3-540-48452-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics