Skip to main content

Block Merging for Off-Line Compression

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2373))

Included in the following conference series:

  • 396 Accesses

Abstract

To bound memory consumption, most compression systems provide a facility that controls the amount of data that may be processed at once. In this work we consider the Re-Pair mechanism of [2000], which processes large messages as disjoint blocks. We show that the blocks emitted by Re-Pair can be post-processed to yield further savings, and describe techniques that allow files of 500 MB or more to be compressed in a holistic manner using less than that much main memory. The block merging process we describe has the additional advantage of allowing new text to be appended to the end of the compressed file.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • A. Apostolico and S. Lonardi. Off-line compression by greedy textual substitution. Proc. IEEE, 88(11):1733–1744, Nov. 2000.

    Google Scholar 

  • D. Bahle, H. E. Williams, and J. Zobel. Compaction techniques for nextword indexes. In G. Navarro, editor, Proc. 8th International Symposium on String Processing and Information Retrieval, pages 33–45. IEEE Computer Society Press, Los Alamitos, CA, Nov. 2001.

    Chapter  Google Scholar 

  • J. Bentley and D. McIlroy. Data compression using long common strings. In J. A. Storer and M. Cohn, editors, Proc. 1999 IEEE Data Compression Conference, pages 287–295. IEEE Computer Society Press, Los Alamitos, California, Mar. 1999.

    Google Scholar 

  • A. Cannane and H. E. Williams. A compression scheme for large databases. In M. E. Orlowska, editor, Proc. 11th Australasian Database Conference, pages 6–11, Canberra, Australia, 2000. IEEE Computer Society Press, Los Alamitos, CA.

    Google Scholar 

  • A. Cannane and H. E. Williams. General-purpose compression for efficient retrieval. Journal of the American Society for Information Science and Technology, 52(5):430–437, Mar. 2001.

    Google Scholar 

  • E. S. de Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems, 18(2): 113–139, 2000.

    Article  Google Scholar 

  • J. Katajainen and T. Raita. An approximation algorithm for space-optimal encoding of a text. The Computer Journal, 32(3):228–237, 1989.

    Article  Google Scholar 

  • S. T. Klein. Efficient optimal recompression. The Computer Journal, 40(2/3): 117–126, 1997.

    Article  Google Scholar 

  • N. J. Larsson and A. Moffat. Offline dictionary-based compression. Proc. IEEE, 88(11):1722–1732, Nov. 2000.

    Google Scholar 

  • U. Manber. A text compression scheme that allows fast searching directly in the compressed file. ACM Transactions on Information Systems, 15(2): 124–136, Apr. 1997.

    Google Scholar 

  • A. Moffat and A. Turpin. Compression and Coding Algorithms. Kluwer Academic Publishers, Boston, MA, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wan, R., Moffat, A. (2002). Block Merging for Off-Line Compression. In: Apostolico, A., Takeda, M. (eds) Combinatorial Pattern Matching. CPM 2002. Lecture Notes in Computer Science, vol 2373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45452-7_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-45452-7_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43862-5

  • Online ISBN: 978-3-540-45452-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics