Universal Lossless Coding of Sources with Large and Unbounded Alphabets

  • En-hui Yang
  • Yunwei Jia


A multilevel arithmetic coding algorithm is proposed to encode data sequences with large or unbounded source alphabets. The algorithm first converts the source alphabet into a dynamic tree, and then represents each symbol in the input sequence by its path in the tree and its index in the corresponding leaf. Encoding of the input sequence is then accomplished by encoding the path sequence and the index sequence conditionally. It is shown that the proposed algorithm is universal in the sense that it can achieve asymptotically the entropy rate of any independently and identically distributed integer source with a finite or infinite alphabet, as long as the mean value is finite. The advantages of the proposed algorithm over the traditional adaptive arithmetic coding algorithm are two folds: (1) the proposed algorithm can be used to encode any data sequence no matter whether the corresponding source alphabet is finite or infinite, while the traditional adaptive arithmetic coding algorithm can work only for data sequences with bounded, small alphabets; (2) in the situation in which the traditional adaptive arithmetic coding algorithm can work, the proposed algorithm can reduce coding complexity and improve compression performance. The proposed algorithm is then used to implement the recent Multilevel Pattern Matching(MPM) algorithms. Simulation results show that for a variety of files, the combination of the proposed algorithm with the MPM algorithms results in compression performance better than that afforded by the UNIX Compress algorithm, which is based on the LZ78 algorithm. Other applications of the proposed algorithm are also discussed.


Input Sequence Compression Rate Binary Search Tree Integer Sequence Arithmetic Code 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    R. Ahlswede, T. S. Han, and K. Kobayashi, “Universal coding of integers and unbounded search trees,” IEEE Trans. Inform. Theory 43, no. 2, 1997, 669 – 682.MathSciNetzbMATHCrossRefGoogle Scholar
  2. [2]
    P. Elias, “Universal codeword sets and representations of the integers,” IEEE Trans. Inform. Theory 21, 1975, 194 – 203.MathSciNetzbMATHCrossRefGoogle Scholar
  3. [3]
    R. G. Gallager and D. VanVoorhis, “Optimal Source Codes for Geometrically Distributed Integer Alphabets”, IEEE Trans. on Inform. Theory 21, 1975, 228 – 230.zbMATHCrossRefGoogle Scholar
  4. [4]
    A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Norwell, MA: Kluwer, 1992zbMATHCrossRefGoogle Scholar
  5. [5]
    S. Golomb, “Run-length encodings,” IEEE Trans. Inform. Theory 12, 1966, 399 – 401.MathSciNetzbMATHCrossRefGoogle Scholar
  6. [6]
    J. C. Kieffer, “Sample converses in source coding theory,” IEEE Trans. Inform. Theory 37, 1991, 263 – 268.MathSciNetzbMATHCrossRefGoogle Scholar
  7. [7]
    J. C. Kieffer, E.-H. Yang, G. Nelson and P. Cosman, “Universal lossless compression via multilevel pattern matching”, accepted pending for revisions for publication in IEEE Trans. Inform. Theory. Google Scholar
  8. [8]
    J. C. Kieffer and E.-H. Yang, “Grammar based codes: A new class of universal lossless source codes,” IEEE Trans. Inform. Theory, revised October 1998.Google Scholar
  9. [9]
    A. Moffat, R. Neal and I.H. Witten, “Arithmetic coding revisited”, Comm. for ACM 16, no. 3, 1998, 256 – 294.Google Scholar
  10. [10]
    I.H. Witten, R. Neal and J. G. Cleary, “Arithmetic coding for data compression”, Comm. for ACM, 30, no. 6, 1987, 520 – 540.CrossRefGoogle Scholar
  11. [11]
    E.-H. Yang and Y. Jia, “Efficient universal compression of integer sequences by using multilevel arithmetic coding”, Proc. of the Sixth Canadian Workshop on Inform. Theory 1999,Kingston, Ontario.Google Scholar
  12. [12]
    E.-H. Yang and J. C. Kieffer, “Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform — Part one: without context models”, accepted for publication in IEEE Trans. Inform. Theory. Google Scholar

Copyright information

© Springer Science+Business Media New York 2000

Authors and Affiliations

  • En-hui Yang
    • 1
  • Yunwei Jia
    • 1
  1. 1.University of WaterlooWaterlooCanada

Personalised recommendations