Skip to main content

Boyer—Moore String Matching over Ziv-Lempel Compressed Text

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1848))

Abstract

We present a Boyer-Moore approach to string matching over LZ78 and LZW compressed text. The key idea is that, despite that we cannot exactly choose which text characters to inspect, we can still use the characters explicitly represented in those formats to shift the pattern in the text. We present a basic approach and more advanced ones. Despite that the theoretical average complexity does not improve because still all the symbols in the compressed text have to be scanned, we show experimentally that speedups of up to 30% over the fastest previous approaches are obtained. Moreover, we show that using an encoding method that sacrifices some compression ratio our method is twice as fast as decompressing plus searching using the best available algorithms.

Work developed during postdoctoral stay at the University of Helsinki, partially supported by the Academy of Finland and Fundacíon Andes. Also supported by Fondecyt grant 1-990627.

Supported in part by the Academy of Finland.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Amir and G. Benson. Efficient two-dimensional compressed matching. In Proc. DCC’92, pages 279–288, 1992.

    Google Scholar 

  2. A. Amir, G. Benson, and M. Farach. Let sleeping files lie: Pattern matching in Z-compressed files. J. of and Sys. Sciences, 52(2):299–307, 1996.

    Article  MathSciNet  Google Scholar 

  3. A. Apostolico and Z. Galil. Pattern Matching Algorithms. Oxford University Press, Oxford, UK, 1997.

    MATH  Google Scholar 

  4. T. Bell, J. Cleary, and I. Witten. Text Compression. Prentice Hall, 1990.

    Google Scholar 

  5. R. S. Boyer and J. S. Moore. A fast string searching algorithm. CACM, 20(10):762–772, 1977.

    Google Scholar 

  6. M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, 1994.

    Google Scholar 

  7. M. Farach and M. Thorup. String matching in Lempel-Ziv compressed strings. Algorithmica, 20:388–404, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  8. L. Gasieniec, M. Karpinksi, W. Plandowski, and W. Rytter. Efficient algorithms for Lempel-Ziv encodings. In Proc. SWAT’96, 1996.

    Google Scholar 

  9. R. N. Horspool. Practical fast searching in strings. Software Practice and Experience, 10:501–506, 1980.

    Article  Google Scholar 

  10. D. Huffman. A method for the construction of minimum-redundancy codes. Proc. of the I.R.E., 40(9):1090–1101, 1952.

    Google Scholar 

  11. J. Kärkkäinen, G. Navarro, and E. Ukkonen. Approximate string matching over ziv-lempel compressed text. In Proc. CPM’2000, LNCS1848, 2000, pp. 195–209.

    Google Scholar 

  12. T. Kida, Y. Shibata, M. Takeda, A. Shinohara, and S. Arikawa. A unifying framework for compressed pattern matching. In Proc. 6th Intl. Symp. on String Processing and Information Retrieval (SPIRE’99), pages 89–96. IEEE CS Press, 1999.

    Google Scholar 

  13. T. Kida, M. Takeda, A. Shinohara, M. Miyazaki, and S. Arikawa. Multiple pattern matching in LZW compressed text. In Proc. DCC’98, 1998.

    Google Scholar 

  14. T. Kida, M. Takeda, A. Shinohara, M. Miyazaki, and S. Arikawa. Shift-And approach to pattern matching in LZW compressed text. In Proc. CPM’99, LNCS 1645, pages 1–13, 1999.

    Google Scholar 

  15. U. Manber. A text compression scheme that allows fast searching directly in the compressed file. ACM Trans. on Information Systems, 15(2):124–136, 1997.

    Article  Google Scholar 

  16. E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Trans. on Information Systems, 2000. To appear. Previous versions in SIGIR’98 and SPIRE’98.

    Google Scholar 

  17. G. Navarro and M. Raffinot. A general practical approach to pattern matching over Ziv-Lempel compressed text. In Proc. CPM’99, LNCS 1645, pages 14–36, 1999.

    Google Scholar 

  18. H. Peltola and J. Tarhio. String matching in the DNA alphabet. Software Practice and Experience, 27(7):851–861, 1997.

    Article  Google Scholar 

  19. D. Sunday. A very fast substring search algorithm. CACM, 33(8):132–142, 1990.

    Google Scholar 

  20. T. A. Welch. A technique for high performance data compression. IEEE Computer Magazine, 17(6):8–19, June 1984.

    Google Scholar 

  21. S. Wu and U. Manber. Fast text searching allowing errors. Comm. of the ACM, 35(10):83–91, October 1992.

    Google Scholar 

  22. S. Wu and U. Manber. Agrep-a fast approximate pattern-matching tool. In Proc. USENIX Technical Conference, pages 153–162, Berkeley, CA, USA, Winter 1992.

    Google Scholar 

  23. J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, 23:337–343, 1977.

    Article  MATH  MathSciNet  Google Scholar 

  24. J. Ziv and A. Lempel. Compression of individual sequences via variable length coding. IEEE Trans. Inf. Theory, 24:530–536, 1978.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Navarro, G., Tarhio, J. (2000). Boyer—Moore String Matching over Ziv-Lempel Compressed Text. In: Giancarlo, R., Sankoff, D. (eds) Combinatorial Pattern Matching. CPM 2000. Lecture Notes in Computer Science, vol 1848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45123-4_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-45123-4_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67633-1

  • Online ISBN: 978-3-540-45123-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics