Speech Compression

  • Manfred R. Schroeder
Part of the Springer Series in Information Sciences book series (SSINF, volume 35)


Speech compression, once an esoteric preoccupation of a few speech enthusiasts, has taken on a practical significance of singular proportion. As mentioned before, it all began in 1928 when Homer Dudley, an engineer at Bell Laboratories, had a brilliant idea for compressing a speech signal with a bandwidth of over 3000 Hz into the 100-Hz bandwidth of a new transatlantic telegraph cable. Instead of sending the speech signal itself, he thought it would suffice to transmit a description of the signal to the far end. This basic idea of substituting for the signal a sufficient specification from which it could be recreated is still with us in the latest linear prediction standards and other methods of speech compression for mobile phones, secure digital voice channels, compressed-speech storage for multimedia applications, and, last but not least, Internet telephony and broadcasting via the World Wide Web.


Speech Signal Linear Prediction Vocal Tract Speech Sound Speech Quality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Chapter 4—Speech Compression

  1. 4.1
    H.W. Dudley: Remaking speech. J. Acoust. Soc. Am. 11, 169–177 (1939)ADSCrossRefGoogle Scholar
  2. 4.2
    M.D. Fagen (ed.): A History of Engineering and Science in the Bell System: National Service in War and Peace (1925–1975) Sect. IV. Secure Speech Transmission (pp. 291–317) (Bell Telephone Laboratories, Murray Hill, New Jersey, 1978)Google Scholar
  3. 4.3
    R.L. Miller: personal communication.Google Scholar
  4. 4.4
    B.M. Oliver, J.R. Pierce, CE. Shannon: The philosophy of PCM. Proc. IEEE 36, 1324–1331 (1948)Google Scholar
  5. 4.5
    N.J.A. Sloane, A.D. Wyner: Claude Elwood Shannon — Collected Papers (IEEE Press, New York 1993)CrossRefGoogle Scholar
  6. 4.6
    C.E. Shannon: Communication theory of secrecy systems. Bell Syst. Tech. J. 28, 656–715 (1949)MathSciNetMATHGoogle Scholar
  7. 4.7
    R.L. Miller, personal communication.Google Scholar
  8. 4.8
    L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust. Speech, and Signal Proc. ASSP-24, 399–418 (1976)CrossRefGoogle Scholar
  9. 4.9
    A.M. Noll, M.R. Schroeder: Short time ‘cepstrum’ pitch detection. J. Acoust. Soc. Am. 36, 1030 (1967).CrossRefGoogle Scholar
  10. 4.9a
    See also: A.M. Noll, M.R. Schroeder: Real Time Cepstrum Analyzer (U.S. Patent 3,566,035, filed July 17, 1969, issued February 23, 1971)Google Scholar
  11. 4.10
    M.R. Schroeder (unpublished)Google Scholar
  12. 4.11
    M.R. Schroeder: Period histogram and product spectrum: New methods for fundamental frequency detection. J. Acoust. Soc. Am. 43, 829–834 (1968).ADSCrossRefGoogle Scholar
  13. 4.11a
    See also R.L. Miller: Performance characteristic of an experimental harmonic identification pitch extraction (HIPEX) system. J. Acoust. Soc. Am. 47, 1593–1601 (1970)ADSCrossRefGoogle Scholar
  14. 4.12
    J.L. Flanagan: Bandwidth and channel capacity necessary to transmit the formant information of speech. J. Acoust. Soc. Am. 28, 592–596 (1956)ADSCrossRefGoogle Scholar
  15. 4.13
    M.R. Schroeder, B.F. Logan, A.J. Prestigiacomo: New methods for speech analysis-synthesis and bandwidth compression. Proc. Stockholm Speech Comm. Seminar, Royal Institute of Technology (KTH), Stockholm 1962.Google Scholar
  16. 4.14
    M.R. Schroeder: Correlation techniques for speech bandwidth compression. J. Audio Eng. Soc. 10, 163–166 (1962)Google Scholar
  17. 4.15
    J.L. Flanagan, R.M. Golden: Phase vocoder. Bell Syst. Tech. J. 45, 1493–1509 (1966)Google Scholar
  18. 4.16
    M.R. Schroeder: Vocoders: Analysis and synthesis of speech. Proc. IEEE 55, 396–401 (1967)CrossRefGoogle Scholar
  19. 4.17
    J.L. Flanagan: Speech Analysis, Synthesis and Perception, 2nd ed. (Springer, Berlin, Heidelberg 1972)CrossRefGoogle Scholar
  20. 4.18
    E.E. David Jr., M.V. Mathews, H.S. McDonald: Description of results of experiments with speech using digital computer simulation. Proc. Natl. Elect Conf. pp. 766–775 (1958)Google Scholar
  21. 4.19
    J.L. Kelly Jr., C. Lochbaum, V.A. Vyssotsky: A block diagram compiler. Bell System Tech. J. 40, 669–676 (1961)Google Scholar
  22. 4.20
    M.V. Mathews: Extremal coding for speech transmission. IRE Trans. Inform. Theory IT-5, 129–136 (1959)CrossRefGoogle Scholar
  23. 4.21
    M.R. Schroeder, B.S. Atal: Computer simulation of sound transmission in rooms. IEEE Internati. Convention Record, Part 7 (1963)Google Scholar
  24. 4.22
    B.S. Atal, M.R. Schroeder: Predictive coding of speech signals. Proc. Sixth Internati. Congr. of Acoustics, Tokyo, paper C-5–4 (1968). Originally published in Proc. 1967 IEEE Conf. on Communication and Processing, pp. 360–361 (1967)Google Scholar
  25. 4.23
    B.S. Atal, M.R. Schroeder: Adaptive predictive coding of speech signals. Bell Syst. Tech. J. 49, 1973–1986 (1970)Google Scholar
  26. 4.24
    M.R. Schroeder, B.S. Atal, J.L. Hall: Optimizing digital speech coders by exploiting masking properties of the human ear. J. Acoust. Soc. Am. 66, 1647–1652Google Scholar
  27. 4.25
    B.S. Atal, M.R. Schroeder: Predictive coding of speech signals and subjective error criteria. IEEE Trans. Acoust., Speech, Signal Processing ASSP-27, 247–254 (1979)CrossRefGoogle Scholar
  28. 4.26
    B.S. Atal, M.R. Schroeder: Stochastic coding of speech signals at very low bit rates. Proc. Internati. Conf. on Communication (North-Holland, Amsterdam 1984, pp. 1610–1613).Google Scholar
  29. 4.26a
    See also A. Gersho, R.M. Gray: Vector Quantization and Signal Compression (Kluwer Academic, Boston 1992)MATHCrossRefGoogle Scholar
  30. 4.27
    D. Sinha, J.D. Johnson, S. Dorward, S.R. Quackenbush: The perceptional audio coder. In V.K. Machisetti, D.B. Williams: The Digital Signal Processing Handbook pp. 42–1 to 42–17. (IEEE Press, New York 1998)Google Scholar
  31. 4.28
    J.D. Markel, A.H. Gray Jr.: Linear Prediction of Speech (Springer, Berlin, Heidelberg 1976)MATHCrossRefGoogle Scholar
  32. 4.29
    F. Itakura, S. Saito: Speech analysis-synthesis systems based on the partial correlation coeficients (Acoustic Soc. of Japan Meeting, Tokyo 1969)Google Scholar
  33. 4.30
    B.S. Atal, S.L. Hanauer: Speech analysis and synthesis by linear prédit ion of the speech wave. J. Acoust. Soc. Am. 50, 637–655 (1971)ADSCrossRefGoogle Scholar
  34. 4.31
    M.R. Schroeder, B.S. Atal: Rate distortion theory and predictive coding. Proc. IEEE Internati. Conf. on Acoustics, Speech and Signal Processing pp. 201–204 (Atlanta 1981)Google Scholar
  35. 4.32
    W. Hess: Pitch Determination of Speech Signals (Springer, Berlin, Heidelberg 1983)CrossRefGoogle Scholar
  36. 4.33
    M.R. Schroeder, E.E. David , Jr.: A vocoder for transmitting 10kc/s speech over a 3.5kc/s channel. Acustica 10, 35–43 (1960)Google Scholar
  37. 4.34
    M.M. Sondhi: New methods for pitch extraction. Proc. Conf. on Speech Communication and Processing (IEEE Audio and Electoacoustics Group, Cambridge, Massachusetts, 1967)Google Scholar
  38. 4.35
    B.S. Atal, J.R. Remde: A new model of LPC excitation for producing natural-sounding speech at low bit rates. Proc. IEEE Internati. Conf. on Acoustics, Speech and Signal Processing 1, 614–617 (1982)Google Scholar
  39. 4.36
    M.R. Schroeder: Die statistischen Parameter der Frequenzkurven von grossen Räumen. Acustica 4, 594–600 (1954).Google Scholar
  40. 4.36a
    English translation: M.R. Schroeder: Statistical parameters of the frequency response of large rooms. J. Audio Eng. Soc. 35, 299–306 (1987)MathSciNetGoogle Scholar
  41. 4.37
    J.B. Anderson, J.B. Bodie: Tree encoding of speech. IEEE Trans. Inform. Theory IT-21, 379–387 (1975).MathSciNetMATHCrossRefGoogle Scholar
  42. 4.37a
    See also [4.31] and M.R. Schroeder, B.S. Atal: Speech coding using efficient block codes. Proc. IEEE Internati. Conf. on Acoustics, Speech and Signal Processing. 3, 1668–1671 (1982)Google Scholar
  43. 4.38
    M.R. Schroeder, B.S. Atal: Code-excited linear prediction (CELP) — high quality speech at very low bit rates. Proc. IEEE Internati Conf. on Acoustics, Speech, and Signal Processing (1985) pp. 937–940.Google Scholar
  44. 4.38
    See also M.R. Schroeder, B.S. Atal: Code-excited linear prediction. Speech Communication 4, 155–162 (1985)CrossRefGoogle Scholar
  45. 4.39
    M.R. Schroeder, N.J.A. Sloane: New permutation codes using Hadamard unscrambling. IEEE Trans, on Inform. Theory IT-33, 144–146 (1987)MATHCrossRefGoogle Scholar
  46. 4.40
    J.L. Flanagan, M.R. Schroeder, B.S. Atal, R.E. Crochiere, N.S. Jayant, J.M. Tribolet: Speech coding. IEEE Trans, on Communications COM-27, No. 4 (1979)Google Scholar
  47. 4.41
    J. Max: Quantizing for minimum distortion. IRE Trans. Inform. Theory IT-6, 7–12 (1960).MathSciNetCrossRefGoogle Scholar
  48. 4.41
    See also S.P. Lloyd: Least squares quantization in PCM: IEEE Trans, on Information Theory IT-28, 127–135 (1982)MathSciNetCrossRefGoogle Scholar
  49. 4.42
    F. DeJager: Delta modulation: A method of PCM transmission using a one-unit code. Philips Res. Rep. 7, 442–466 (1952)Google Scholar
  50. 4.43
    C.C. Cutler: Differential Pulse Code Modulation. (U.S. Patent 2,605,361, filed June 29, 1950, patented July 29, 1952)Google Scholar
  51. 4.44
    N.S. Jayant: Adaptive quantization with a one-word memory. Bell Syst. Tech. J. 52, 1119–1144 (1973)Google Scholar
  52. 4.45
    D.J. Goodman, J.L. Flanagan: Direct digital conversion between linear and adaptive delta modulation formats. Proc. IEEE Int. Commun. Conf., Montreal, Canada, (1971)Google Scholar
  53. 4.46
    P. Cummiskey, N.S. Jayant, J.L. Flanagan: Adaptive quantization in differential PCM coding of speech. Bell Syst. Tech. J. 52, 1105–1118 (1973)Google Scholar
  54. 4.47
    R.E. Crochiere, S.A. Webber, J.L. Flanagan: Digital coding of speech in subbands. Bell Syst. Tech. J. 55, 1069–1085 (1976)Google Scholar
  55. 4.48
    M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K. Akagiri, H. Fuchs, M. Dietz, J. Herre, G. Davidson, Y. Oikawa: ISO/IEC MPEG-2 Advanced Audio Coding. J. Audio Eng. Soc. 45, 789–814 (1997)Google Scholar
  56. 4.49
    J. S. Byrnes, B. Saffari, H. S. Shapiro: Energy spreading and data compression using the Prometheus orthogonal set. Proc. IEEE DSP Conf. Loen, Norway (1996)Google Scholar
  57. 4.50
    M.R. Schroeder: Number Theory in Science and Communication, 3rd ed. (Springer, Berlin, Heidelberg 1997)MATHCrossRefGoogle Scholar
  58. 4.51
    J.S. Byrnes: A low complexity energy spreading transform coder. Proc. Conf. Haifa (1995)Google Scholar
  59. 4.52
    A. Gersho: Advances in Speech and Audio Compression. Proc. IEEE 82, 900–918 (1994)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Manfred R. Schroeder
    • 1
  1. 1.Drittes Physikalisches InstitutUniversität GöttingenGöttingenGermany

Personalised recommendations