Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Text Compression

  • Paolo Ferragina
  • Igor Nitto
  • Rossano Venturini
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1151

Synonyms

Lossless data compression

Definition

Text compression involves changing the representation of a file so that the (binary) compressed output takes less space to store, or less time to transmit, but still the original file can be reconstructed exactly from its compressed representation.

Key Points

The benefit of compressing texts in computer applications is threefold: it reduces the amount of memory to store a text, it reduces the time for transmitting the text over a computer network, and, recently, it has been deployed to speed up algorithmic computations because they can better exploit the memory hierarchy available in modern PCs by reducing the disk access time, by increasing virtually the bandwidth and size of disk (or memory, cache), and by coming at a negligible cost because of the significant speed of current CPUs.

A text in uncompressed format, also called raw or plain text, is a sequence of symbols drawn from an alphabet Σ and represented in |log2|Σ|| bits each. Text...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Farruggia A, Ferragina P, Frangioni A, Venturini R. Bicriteria data compression. In: Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms; 2014. p. 1582–95.Google Scholar
  2. 2.
    Ferragina P, Giancarlo R, Manzini G, Sciortino M. Boosting textual compression in optimal linear time. J ACM. 2005;52(4):688–713.MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Ferragina P, Luccio F, Manzini G, Muthukrishnan S. Compressing and searching XML data via two zips. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 751–60.Google Scholar
  4. 4.
    Ferragina P, Nitto I, Venturini R. On optimally partitioning a text to improve its compression. Algorithmica. 2011;61(1):51–74.MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Ferragina P, Nitto I, Venturini R. On the bit-complexity of Lempel-Ziv compression. SIAM J Comput. 2013;42(4):1521–41.MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Navarro G, Mäkinen V. Compressed full-text indexes. ACM Comput Surv. 2007;39(1): Article no. 2.zbMATHCrossRefGoogle Scholar
  7. 7.
    Salomon D. Data compression: the complete reference. 4th ed. London: Springer; 2007.zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Paolo Ferragina
    • 1
  • Igor Nitto
    • 1
  • Rossano Venturini
    • 1
  1. 1.Department of Computer ScienceUniversity of PisaPisaItaly

Section editors and affiliations

  • Mario A. Nascimento
    • 1
  1. 1.Dept. of Computing ScienceUniv. of AlbertaEdmontonCanada