Advertisement

Transforming the Natural Language Text for Improving Compression Performance

  • Ashutosh Gupta
  • Suneeta Agarwal
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 6)

In the last 20 years, we have seen a vast explosion of textual information flow over the Web through electronic mail, Web browsing, information retrieval systems, and so on. The importance of data compression is likely to be enhanced in the future as there is a continuous increase in the amount of data that needs to be transformed or archived. In the field of data compression, researchers developed various approaches such as Huffman encoding, arithmetic encoding, Ziv— Lempel family, dynamic Markov compression, prediction with partial matching (PPM [1] and Burrows–Wheeler transform (BWT [2]) based algorithms, among others. BWT permutes the symbol of a data sequence that shares the same unbounded context by cyclic rotation followed by lexicographic sort operations. BWT uses move-to-front and an entropy coder as the backend compressor. PPM is slow and also consumes a large amount of memory to store context information but PPM achieves better compression than almost all existing compression algorithms.

Keywords

Data Compression Compression Algorithm Entropy Coder Test Corpus Natural Language Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Moffat (1990) Implementing the PPM data compression scheme. IEEE Transactions on Communications, 38(11):1917–1921.CrossRefGoogle Scholar
  2. 2.
    M. Burrows and D. Wheeler (1994) A block-sorting lossless data compression algorithm. Technical Report, SRC Research Report 124, Digital Systems Research Center, Palo Alto, CA.Google Scholar
  3. 3.
    F.S. Awan and A. Mukherjee (2001) LIPT: A lossless text transform to improve compression. In Proceedings of International Conference on Information and Theory: Coding and Computing, Las Vegas, Nevada, IEEE Computer Society.Google Scholar
  4. 4.
    R. Franceschini and A. Mukherjee (1996) Data compression using encrypted text. In Proceedings of the Third Forum on Research and Technology, Advances on Digital Libraries, 130–138. ADL.Google Scholar
  5. 5.
    J. Heaps (1978) Information Retrieval—Computational and Theoretical Aspects. Academic Press, New York.MATHGoogle Scholar
  6. 6.
    M.D. Araujo, G. Navaaro, and N. Ziviani (1997) Large text searching allowing errors. In Proceedings of the 4th South American Workshop on String Processing. R. Baeza-Yates, Ed. Carleton University Press International Informatics Series, vol. 8. Carleton University Press, Ottawa, Canada, 2–20.Google Scholar
  7. 7.
    E.S. Moura, G. Navarro, and N. Ziviani (1997) Indexing Compressed text. In Proceedings of the 4th South American Workshop on String Processing. R. Baeza-Yates, Ed. Carleton University Press International Informatics Series, vol. 8. Carleton University Press, Ottawa, Canada, 95–111.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Ashutosh Gupta
    • 1
  • Suneeta Agarwal
    • 2
  1. 1.Computer Science & Engineering DepartmentInstitute of Engineering and Rural TechnologyAllahabadIndia
  2. 2.Computer Science & Engineering DepartmentMotilal Nehru National Institute of TechnologyAllahabadIndia

Personalised recommendations