Skip to main content

Speech Compression

  • Chapter
Book cover Computer Speech

Part of the book series: Springer Series in Information Sciences ((SSINF,volume 35))

Abstract

Speech compression, once an esoteric preoccupation of a few speech enthusiasts, has taken on a practical significance of singular proportion. As mentioned before, it all began in 1928 when Homer Dudley, an engineer at Bell Laboratories, had a brilliant idea for compressing a speech signal with a bandwidth of over 3000 Hz into the 100-Hz bandwidth of a new transatlantic telegraph cable. Instead of sending the speech signal itself, he thought it would suffice to transmit a description of the signal to the far end. This basic idea of substituting for the signal a sufficient specification from which it could be recreated is still with us in the latest linear prediction standards and other methods of speech compression for mobile phones, secure digital voice channels, compressed-speech storage for multimedia applications, and, last but not least, Internet telephony and broadcasting via the World Wide Web.

Speeches in our culture are the vacuum that fill a vacuum.

John Kenneth Galbraith (born 1908)

[British Prime Minister Ramsey] MacDonald has the gift of compressing the largest amount of words into the smallest amount of thought.

Winston Churchill (1874–1965)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H.W. Dudley: Remaking speech. J. Acoust. Soc. Am. 11, 169–177 (1939)

    Article  ADS  Google Scholar 

  2. M.D. Fagen (ed.): A History of Engineering and Science in the Bell System: National Service in War and Peace (1925–1975) Sect. IV. Secure Speech Transmission (pp. 291–317) (Bell Telephone Laboratories, Murray Hill, New Jersey, 1978)

    Google Scholar 

  3. R.L. Miller: personal communication.

    Google Scholar 

  4. B.M. Oliver, J.R. Pierce, C.E. Shannon: The philosophy of PCM. Proc. IEEE 36, 1324–1331 (1948)

    Google Scholar 

  5. N.J.A. Sloane, A.D. Wyner: Claude Elwood Shannon — Collected Papers (IEEE Press, New York 1993)

    Book  Google Scholar 

  6. C.E. Shannon: Communication theory of secrecy systems. Bell Syst. Tech. J. 28, 656–715 (1949)

    MathSciNet  MATH  Google Scholar 

  7. R.L. Miller, personal communication.

    Google Scholar 

  8. L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust., Speech, and Signal Processing ASSP-24, 399–418 (1976)

    Article  Google Scholar 

  9. A.M. Noll, M.R. Schroeder: Short time ‘cepstrum’ pitch detection. J. Acoust. Soc. Am. 36, 1030 (1967). See also: A.M. Noll, M.R. Schroeder: Real Time Cepstrum Analyzer (U.S. Patent 3,566,035, filed July 17, 1969, issued February 23, 1971)

    Article  Google Scholar 

  10. M.R. Schroeder (unpublished)

    Google Scholar 

  11. M.R. Schroeder: Period histogram and product spectrum: New methods for fundamental frequency detection. J. Acoust. Soc. Am. 43, 829–834 (1968).

    Article  ADS  Google Scholar 

  12. See also R.L. Miller: Performance characteristic of an experimental harmonic identification pitch extraction (HIPEX) system. J. Acoust. Soc. Am. 47, 1593–1601 (1970)

    Article  ADS  Google Scholar 

  13. J.L. Flanagan: Bandwidth and channel capacity necessary to transmit the formant information of speech. J. Acoust. Soc. Am. 28, 592–596 (1956)

    Article  ADS  Google Scholar 

  14. M.R. Schroeder, B.F. Logan, A.J. Prestigiacomo: New methods for speech analysis-synthesis and bandwidth compression. Proc. Stockholm Speech Comm. Seminar, Royal Institute of Technology (KTH), Stockholm 1962.

    Google Scholar 

  15. M.R. Schroeder: Correlation techniques for speech bandwidth compression. J. Audio Eng. Soc. 10, 163–166 (1962)

    Google Scholar 

  16. J.L. Flanagan, R.M. Golden: Phase vocoder. Bell Syst. Tech. J. 45, 1493–1509 (1966)

    Google Scholar 

  17. M.R. Schroeder: Vocoders: Analysis and synthesis of speech. Proc. IEEE 55, 396–401 (1967)

    Article  Google Scholar 

  18. J.L. Flanagan: Speech Analysis, Synthesis and Perception, 2nd ed. (Springer, Berlin, Heidelberg 1972)

    Book  Google Scholar 

  19. E.E. David Jr., M.V. Mathews, H.S. McDonald: Description of results of experiments with speech using digital computer simulation. Proc. Natl. Elect Conf. pp. 766–775 (1958)

    Google Scholar 

  20. J.L. Kelly Jr., C. Lochbaum, V.A. Vyssotsky: A block diagram compiler. Bell System Tech. J. 40, 669–676 (1961)

    Google Scholar 

  21. M.V. Mathews: Extremal coding for speech transmission. IRE Trans. Inform. Theory IT-5, 129–136 (1959)

    Article  Google Scholar 

  22. M.R. Schroeder, B.S. Atal: Computer simulation of sound transmission in rooms. IEEE Internatl. Convention Record, Part 7 (1963)

    Google Scholar 

  23. B.S. Atal, M.R. Schroeder: Predictive coding of speech signals. Proc. Sixth Internatl. Congr. of Acoustics, Tokyo, paper C-5–4 (1968). Originally published in Proc. 1967 IEEE Conf. on Communication and Processing, pp. 360–361 (1967)

    Google Scholar 

  24. B.S. Atal, M.R. Schroeder: Adaptive predictive coding of speech signals. Bell Syst. Tech. J. 49, 1973–1986 (1970)

    Google Scholar 

  25. M.R. Schroeder, B.S. Atal, J.L. Hall: Optimizing digital speech coders by exploiting masking properties of the human ear. J. Acoust. Soc. Am. 66, 1647–1652 (1979)

    Article  ADS  Google Scholar 

  26. B.S. Atal, M.R. Schroeder: Predictive coding of speech signals and subjective error criteria. IEEE Trans. Acoust., Speech, Signal Processing ASSP-27, 247–254 (1979)

    Article  Google Scholar 

  27. B.S. Atal, M.R. Schroeder: Stochastic coding of speech signals at very low bit rates. Proc. Internatl. Conf. on Communication (North-Holland, Amsterdam 1984, pp. 1610–1613).

    Google Scholar 

  28. See also A. Gersho, R.M. Gray: Vector Quantization and Signal Compression (Kluwer Academic, Boston 1992)

    Book  MATH  Google Scholar 

  29. D. Sinha, J.D. Johnston, S. Dorward, S.R. Quackenbush: The perceptional audio coder. In V.K. Machisetti, D.B. Williams: The Digital Signal Processing Handbook pp. 42–1 to 42–17. (IEEE Press, New York 1998)

    Google Scholar 

  30. J.D. Markel, A.H. Gray Jr.: Linear Prediction of Speech (Springer, Berlin, Heidelberg 1976)

    Book  MATH  Google Scholar 

  31. F. Itakura, S. Saito: Speech analysis-synthesis systems based on the partial correlation coefficients (Acoustic Soc. of Japan Meeting, Tokyo 1969)

    Google Scholar 

  32. B.S. Atal, S.L. Hanauer: Speech analysis and synthesis by linear predition of the speech wave. J. Acoust. Soc. Am. 50, 637–655 (1971)

    Article  ADS  Google Scholar 

  33. M.R. Schroeder, B.S. Atal: Rate distortion theory and predictive coding. Proc. IEEE Internatl. Conf. on Acoustics, Speech, and Signal Processing pp. 201–204 (Atlanta 1981)

    Google Scholar 

  34. W. Hess: Pitch Determination of Speech Signals (Springer, Berlin, Heidelberg 1983)

    Book  Google Scholar 

  35. M.R. Schroeder, E.E. David Jr.: A vocoder for transmitting 10 kc/s speech over a 3.5kc/s channel. Acustica 10, 35–43 (1960)

    Google Scholar 

  36. M.M. Sondhi: New methods for pitch extraction. Proc. Conf. on Speech Communication and Processing (IEEE Audio and Electoacoustics Group, Cambridge, Massachusetts, 1967)

    Google Scholar 

  37. B.S. Atal, J.R. Remde: A new model of LPC excitation for producing natural-sounding speech at low bit rates. Proc. IEEE Internatl. Conf. on Acoustics, Speech, and Signal Processing 1, 614–617 (1982)

    Google Scholar 

  38. M.R. Schroeder: Die statistischen Parameter der Frequenzkurven von grossen Räumen. Acustica 4, 594–600 (1954). English translation: M.R. Schroeder: Statistical parameters of the frequency response of large rooms. J. Audio Eng. Soc. 35, 299–306 (1987)

    Google Scholar 

  39. J.B. Anderson, J.B. Bodie: Tree encoding of speech. IEEE Trans. Inform. Theory IT-21, 379–387 (1975).

    Article  MathSciNet  Google Scholar 

  40. See also [5.31] and M.R. Schroeder, B.S. Atal: Speech coding using efficient block codes. Proc. IEEE Internatl. Conf. on Acoustics, Speech and Signal Processing. 3, 1668–1671 (1982)

    Google Scholar 

  41. M.R. Schroeder, B.S. Atal: Code-excited linear prediction (CELP) — high quality speech at very low bit rates. Proc. IEEE Internatl. Conf. on Acoustics, Speech, and Signal Processing (1985) pp. 937–940.

    Google Scholar 

  42. See also M.R. Schroeder, B.S. Atal: Code-excited linear prediction. Speech Communication 4, 155–162 (1985)

    Article  Google Scholar 

  43. M.R. Schroeder, N.J.A. Sloane: New permutation codes using Hadamard unscrambling. IEEE Trans, on Inform. Theory IT-33, 144–146 (1987)

    Article  Google Scholar 

  44. J.L. Flanagan, M.R. Schroeder, B.S. Atal, R.E. Crochiere, N.S. Jayant, J.M. Tribolet: Speech coding. IEEE Trans, on Communications COM-27, No. 4, pp. 710–737 (1979)

    Article  ADS  Google Scholar 

  45. J. Max: Quantizing for minimum distortion. IRE Trans. Inform. Theory IT-6, 7–12 (1960).

    Article  MathSciNet  Google Scholar 

  46. See also S.P. Lloyd: Least squares quantization in PCM: IEEE Trans, on Information Theory IT-28, 127–135 (1982)

    MathSciNet  Google Scholar 

  47. F. DeJager: Delta modulation: A method of PCM transmission using a one-unit code. Philips Res. Rep. 7, 442–466 (1952)

    Google Scholar 

  48. C.C. Cutler: Differential Pulse Code Modulation. (U.S. Patent 2,605,361, filed June 29, 1950, patented July 29, 1952)

    Google Scholar 

  49. N.S. Jayant: Adaptive quantization with a one-word memory. Bell Syst. Tech. J. 52, 1119–1144 (1973)

    Google Scholar 

  50. D.J. Goodman, J.L. Flanagan: Direct digital conversion between linear and adaptive delta modulation formats. Proc. IEEE Int. Commun. Conf., Montreal, Canada, (1971)

    Google Scholar 

  51. P. Cummiskey, N.S. Jayant, J.L. Flanagan: Adaptive quantization in differential PCM coding of speech. Bell Syst. Tech. J. 52, 1105–1118 (1973)

    Google Scholar 

  52. R.E. Crochiere, S.A. Webber, J.L. Flanagan: Digital coding of speech in sub-bands. Bell Syst. Tech. J. 55, 1069–1085 (1976)

    Google Scholar 

  53. M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K. Akagiri, H. Fuchs, M. Dietz, J. Herre, G. Davidson, Y. Oikawa: ISO/IEC MPEG-2 Advanced Audio Coding. J. Audio Eng. Soc. 45, 789–814 (1997)

    Google Scholar 

  54. J.S. Byrnes, B. Saffari, H.S. Shapiro: Energy spreading and data compression using the Prometheus orthogonal set. Proc. IEEE DSP Conf. Loen, Norway (1996)

    Google Scholar 

  55. M.R. Schroeder: Number Theory in Science and Communication, 3rd ed. (Springer, Berlin, Heidelberg 1997)

    MATH  Google Scholar 

  56. J.S. Byrnes: A low complexity energy spreading transform coder. In Y. Zeevi and R. Coifmman (eds.): Signal and Image Representation in Combined Spaces (Haifa, 1997)

    Google Scholar 

  57. A. Gersho: Advances in Speech and Audio Compression. Proc. IEEE 82, 900–918 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Schroeder, M.R. (2004). Speech Compression. In: Computer Speech. Springer Series in Information Sciences, vol 35. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-06384-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-06384-2_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05956-8

  • Online ISBN: 978-3-662-06384-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics