Abstract
Due to current real-time data compression algorithms is not efficient enough, we have proposed a two-phase real-time data compression algorithm which can be very fast in data compression with high compression rate. The algorithm can adapt to both text and binary files. The first phase compresses the file with long common strings into short forms. The second phase compresses the result of the first phase with short bytes in common into the final results. We name the algorithm TDSC Algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bentley, J., McIlroy, D.: Data Compression Using Long Common Strings. In: Data Compression Conference, pp. 287–295. IEEE Press, New York (1999)
Karp, R.M., Rabin, M.O.: Efficient Randomized Pattern-matching Algorithms. IBM Journal of Research and Development 31(3), 249–260 (1987)
Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. Information Theory 23(3), 337–343 (1977)
Snappy - A fast compressor/decompressor, http://code.google.com/p/snappy
Apache Hadoop, http://hadoop.apache.org
Apache Pig, http://pig.apache.org
Apache Hive, http://hive.apache.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, Z., You, J., Zhou, M. (2012). TDSC: A Two-phase Duplicate String Compression Algorithm. In: Wang, H., et al. Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29426-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-29426-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29425-9
Online ISBN: 978-3-642-29426-6
eBook Packages: Computer ScienceComputer Science (R0)