Abstract
The goal here is to encode a sequence of symbols in such a way that it is possible to decode it perfectly (lossless coding), and to decode it sequentially (prefix coding). One may then relate codes and probabilities: this is the essence of the Kraft-McMillan inequalities. If one aims at minimizing the codeword’s length, Shannon’s entropy gives an intrinsic limit, when the word to be encoded is regarded as a random variable. When the distribution of this random variable is known, then the optimal compression rate can be achieved (Shannon’s coding and Huffman’s coding). Moreover, as codeword lengths are identified with probability distributions, for any probability distribution, one may design a prefix code which encodes sequentially. This will be referred to as “coding according to this distribution”. Arithmetic coding, based on a probability distribution which is not necessarily the one of the source, will be particularly detailed. In this way, the algorithmic aspect of coding and the modeling of the source distribution are separated. Here the word “source” is used as a synonym for a random process. We finally point out some essential tools needed to quantify information, in particular the entropy rate of a process. This rate appears as an intrinsic lower bound for the asymptotic compression rate, for almost every source trajectory, as soon as it is ergodic and stationary. This also shows that is is crucial to encode words in blocks. Arithmetic coding has the advantage of encoding in blocks and “online”. If arithmetic coding is devised with the source distribution, then it asymptotically achieves the optimal compression rate. In the following chapters, we will be interested in the question of adapting the code to an unknown source distribution, which corresponds to a fundamentally statistical question.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
D. Huffman, A method for the construction of minimum redundancy codes. Proc. IRE 40, 1098–1101 (1952)
P. Algoet, T. Cover, A sandwich proof of the Shannon-McMillan-Breiman theorem. Annals of Prob. 16, 899–909 (1988)
K. Chung, A note on the ergodic theorem of information theory. Annals of Math. Stat. 32, 612–614 (1961)
R. Dudley, Real analysis and probability, 2nd edn. (Cambridge University Press, New York, 2002)
T.M. Cover, J.A. Thomas, Elements of Information Theory. Wiley series in telecommunications (Wiley, New York, 1991)
C. Shannon, A mathematical theory of communication. Bell Sys. Tech. J. 27(379–423), 623–656 (1948)
J. Rissanen, Generelized Kraft inequality and arithmetic coding. IBM J. Res. Devl. 20, 20–198 (1976)
R. Pasco. Source coding algorithms for fast data compression. Ph.D. Thesis, Stanford Univ (1976)
A. Garivier. Codage universel: la méthode arithmétique. Texte de préparation à l’agrégation (2006)
B. McMillan, The basic theorems of information theory. Ann. Math. Stat. 24, 196–219 (1953)
L. Breiman, The individual ergodic theorem of information theory. Ann. Math. Stat. 28, 809–811 (1957)
A. Barron, The strong ergodic theorem for densities: generalized Shannon-McMillan-Breiman theorem. Annals Probab. 13, 1292–1303 (1985)
J. Kieffer, A counter-example to Perez’s generalization of Shannon-McMillan theorem. Annals Probab. 1, 362–364 (1973)
J. Kieffer, Correction to “a counter-example to Perez’s generalization of Shannon-McMillan theorem”. Annals of Probab. 4, 153–154 (1976)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Gassiat, É. (2018). Lossless Coding. In: Universal Coding and Order Identification by Model Selection Methods. Springer Monographs in Mathematics. Springer, Cham. https://doi.org/10.1007/978-3-319-96262-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-96262-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96261-0
Online ISBN: 978-3-319-96262-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)