Abstract
To bound memory consumption, most compression systems provide a facility that controls the amount of data that may be processed at once. In this work we consider the Re-Pair mechanism of [2000], which processes large messages as disjoint blocks. We show that the blocks emitted by Re-Pair can be post-processed to yield further savings, and describe techniques that allow files of 500 MB or more to be compressed in a holistic manner using less than that much main memory. The block merging process we describe has the additional advantage of allowing new text to be appended to the end of the compressed file.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Apostolico and S. Lonardi. Off-line compression by greedy textual substitution. Proc. IEEE, 88(11):1733–1744, Nov. 2000.
D. Bahle, H. E. Williams, and J. Zobel. Compaction techniques for nextword indexes. In G. Navarro, editor, Proc. 8th International Symposium on String Processing and Information Retrieval, pages 33–45. IEEE Computer Society Press, Los Alamitos, CA, Nov. 2001.
J. Bentley and D. McIlroy. Data compression using long common strings. In J. A. Storer and M. Cohn, editors, Proc. 1999 IEEE Data Compression Conference, pages 287–295. IEEE Computer Society Press, Los Alamitos, California, Mar. 1999.
A. Cannane and H. E. Williams. A compression scheme for large databases. In M. E. Orlowska, editor, Proc. 11th Australasian Database Conference, pages 6–11, Canberra, Australia, 2000. IEEE Computer Society Press, Los Alamitos, CA.
A. Cannane and H. E. Williams. General-purpose compression for efficient retrieval. Journal of the American Society for Information Science and Technology, 52(5):430–437, Mar. 2001.
E. S. de Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems, 18(2): 113–139, 2000.
J. Katajainen and T. Raita. An approximation algorithm for space-optimal encoding of a text. The Computer Journal, 32(3):228–237, 1989.
S. T. Klein. Efficient optimal recompression. The Computer Journal, 40(2/3): 117–126, 1997.
N. J. Larsson and A. Moffat. Offline dictionary-based compression. Proc. IEEE, 88(11):1722–1732, Nov. 2000.
U. Manber. A text compression scheme that allows fast searching directly in the compressed file. ACM Transactions on Information Systems, 15(2): 124–136, Apr. 1997.
A. Moffat and A. Turpin. Compression and Coding Algorithms. Kluwer Academic Publishers, Boston, MA, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wan, R., Moffat, A. (2002). Block Merging for Off-Line Compression. In: Apostolico, A., Takeda, M. (eds) Combinatorial Pattern Matching. CPM 2002. Lecture Notes in Computer Science, vol 2373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45452-7_4
Download citation
DOI: https://doi.org/10.1007/3-540-45452-7_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43862-5
Online ISBN: 978-3-540-45452-6
eBook Packages: Springer Book Archive