Skip to main content

String Matching with Stopper Encoding and Code Splitting

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2373))

Included in the following conference series:

Abstract

We consider exact string searching in compressed texts. We utilize a semi-static compression scheme, where characters of the text are encoded as variable-length sequences of base symbols, each of which is represented by a fixed number of bits. In addition, we split the symbols into two parallel files in order to allow faster access. Our searching algorithm is a modification of the Boyer-Moore-Horspool algorithm. Our approach is practical and enables faster searching of string patterns than earlier character-based compression models and the best Boyer-Moore variants in uncompressed texts.

This work has been supported by the National Technology Agency (Tekes).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Amir and G. Benson. Efficient two-dimensional compressed matching. In Proc. DCC’92, pages 279–288, 1992.

    Google Scholar 

  2. A. Amir, G. Benson, and M. Farach. Let sleeping files lie: Pattern matching in Z-compressed files. J. of Comp. and Sys. Sciences, 52(2):299–307, 1996.

    Article  MathSciNet  Google Scholar 

  3. R. Arnold and T. Bell. A corpus for the evaluation of lossless compression algorithms. In Proc. DCC’ 97, Data Compression Conference. IEEE, 1997.

    Google Scholar 

  4. R. Baeza-Yates. Improved string searching. Software-Practice and Experience, 19(3):257–271, 1989.

    Article  MathSciNet  Google Scholar 

  5. R. S. Boyer and J. S. Moore. A fast string searching algorithm. CACM, 20(10):762–772, 1977.

    Google Scholar 

  6. M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, 1994.

    Google Scholar 

  7. P. Gage. A new algorithm for data compression. C/C++ Users Journal, 12(2), 1994.

    Google Scholar 

  8. R. N. Horspool. Practical fast searching in strings. Software Practice and Experience, 10:501–506, 1980.

    Article  Google Scholar 

  9. D. Huffman. A method for the construction of minimum-redundancy codes. Proc. of the I. R. E., 40(9):1090–1101, 1952.

    Google Scholar 

  10. A. Hume and D. Sunday. Fast string searching. Software-Practice and Experience, 21(11):1221–1248, 1991.

    Article  Google Scholar 

  11. U. Manber. A text compression scheme that allows fast searching directly in the compressed file. ACM Trans. on Information Systems, 15(2):124–136, 1997.

    Article  Google Scholar 

  12. E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Trans. on Information Systems, 18(2):113–139, 2000.

    Article  Google Scholar 

  13. G. Navarro, T. Kida, M. Takeda, A. Shinohara, and S. Arikawa. Faster approximate string matching over compressed text. In Proc. 11th IEEE Data Compression Conference (DCC’01), pages 459–468, 2001.

    Google Scholar 

  14. G. Navarro and M. Raffinot. Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics (JEA), 5, 2000.

    Google Scholar 

  15. G. Navarro and J. Tarhio. Boyer-Moore string matching over Ziv-Lempel compressed text. In Proc. 11st Annual Symposium on Combinatorial Pattern Matching (CPM 2000), LNCS 1848, pages 166–180, 2000.

    Chapter  Google Scholar 

  16. H. Peltola and J. Tarhio. String matching in the DNA alphabet. Software-Practice and Experience, 27:851–861, 1997.

    Article  Google Scholar 

  17. Y. Shibata, T. Matsumoto, M. Takeda, A. Shiohara, and S. Arikawa. A Boyer-Moore type algorithm for compressed pattern matching. In Proc. 11st Annual Symposium on Combinatorial Pattern Matching (CPM 2000), LNCS 1848, pages 181–194, 2000.

    Chapter  Google Scholar 

  18. S. Wu and U. Manber. Agrep-a fast approximate pattern-matching tool. In Proc. USENIX Technical Conference, pages 153–162, Berkeley, CA, USA, 1992.

    Google Scholar 

  19. R. Zhu and T. Takaoka. On improving the average case of Boyer-Moore string matching algorithm. Journal of Information Processing, 10:173–177, 1987.

    MATH  Google Scholar 

  20. J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, 23:337–343, 1977.

    Article  MATH  MathSciNet  Google Scholar 

  21. J. Ziv and A. Lempel. Compression of individual sequences via variable length coding. IEEE Trans. Inf. Theory, 24:530–536, 1978.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rautio, J., Tanninen, J., Tarhio, J. (2002). String Matching with Stopper Encoding and Code Splitting. In: Apostolico, A., Takeda, M. (eds) Combinatorial Pattern Matching. CPM 2002. Lecture Notes in Computer Science, vol 2373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45452-7_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-45452-7_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43862-5

  • Online ISBN: 978-3-540-45452-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics