Encyclopedia of Database Systems

2018 Edition
Editors: Ling Liu, M. Tamer Özsu

Text Compression

Lossless data compression


Text compression involves changing the representation of a file so that the (binary) compressed output takes less space to store, or less time to transmit, but still the original file can be reconstructed exactly from its compressed representation.

Key Points

The benefit of compressing texts in computer applications is threefold: it reduces the amount of memory to store a text, it reduces the time for transmitting the text over a computer network, and, recently, it has been deployed to speed up algorithmic computations because they can better exploit the memory hierarchy available in modern PCs by reducing the disk access time, by increasing virtually the bandwidth and size of disk (or memory, cache), and by coming at a negligible cost because of the significant speed of current CPUs.

A text in uncompressed format, also called raw or plain text, is a sequence of symbols drawn from an alphabet Σ and represented in |log2|Σ|| bits each. Text...

