Speech Compression

Schroeder, Manfred R.

doi:10.1007/978-3-662-06384-2_5

Manfred R. Schroeder⁵

Part of the book series: Springer Series in Information Sciences ((SSINF,volume 35))

359 Accesses
1 Citations

Abstract

Speech compression, once an esoteric preoccupation of a few speech enthusiasts, has taken on a practical significance of singular proportion. As mentioned before, it all began in 1928 when Homer Dudley, an engineer at Bell Laboratories, had a brilliant idea for compressing a speech signal with a bandwidth of over 3000 Hz into the 100-Hz bandwidth of a new transatlantic telegraph cable. Instead of sending the speech signal itself, he thought it would suffice to transmit a description of the signal to the far end. This basic idea of substituting for the signal a sufficient specification from which it could be recreated is still with us in the latest linear prediction standards and other methods of speech compression for mobile phones, secure digital voice channels, compressed-speech storage for multimedia applications, and, last but not least, Internet telephony and broadcasting via the World Wide Web.

Speeches in our culture are the vacuum that fill a vacuum.

John Kenneth Galbraith (born 1908)

[British Prime Minister Ramsey] MacDonald has the gift of compressing the largest amount of words into the smallest amount of thought.

Winston Churchill (1874–1965)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H.W. Dudley: Remaking speech. J. Acoust. Soc. Am. 11, 169–177 (1939)
Article ADS Google Scholar
M.D. Fagen (ed.): A History of Engineering and Science in the Bell System: National Service in War and Peace (1925–1975) Sect. IV. Secure Speech Transmission (pp. 291–317) (Bell Telephone Laboratories, Murray Hill, New Jersey, 1978)
Google Scholar
R.L. Miller: personal communication.
Google Scholar
B.M. Oliver, J.R. Pierce, C.E. Shannon: The philosophy of PCM. Proc. IEEE 36, 1324–1331 (1948)
Google Scholar
N.J.A. Sloane, A.D. Wyner: Claude Elwood Shannon — Collected Papers (IEEE Press, New York 1993)
Book Google Scholar
C.E. Shannon: Communication theory of secrecy systems. Bell Syst. Tech. J. 28, 656–715 (1949)
MathSciNet MATH Google Scholar
R.L. Miller, personal communication.
Google Scholar
L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust., Speech, and Signal Processing ASSP-24, 399–418 (1976)
Article Google Scholar
A.M. Noll, M.R. Schroeder: Short time ‘cepstrum’ pitch detection. J. Acoust. Soc. Am. 36, 1030 (1967). See also: A.M. Noll, M.R. Schroeder: Real Time Cepstrum Analyzer (U.S. Patent 3,566,035, filed July 17, 1969, issued February 23, 1971)
Article Google Scholar
M.R. Schroeder (unpublished)
Google Scholar
M.R. Schroeder: Period histogram and product spectrum: New methods for fundamental frequency detection. J. Acoust. Soc. Am. 43, 829–834 (1968).
Article ADS Google Scholar
See also R.L. Miller: Performance characteristic of an experimental harmonic identification pitch extraction (HIPEX) system. J. Acoust. Soc. Am. 47, 1593–1601 (1970)
Article ADS Google Scholar
J.L. Flanagan: Bandwidth and channel capacity necessary to transmit the formant information of speech. J. Acoust. Soc. Am. 28, 592–596 (1956)
Article ADS Google Scholar
M.R. Schroeder, B.F. Logan, A.J. Prestigiacomo: New methods for speech analysis-synthesis and bandwidth compression. Proc. Stockholm Speech Comm. Seminar, Royal Institute of Technology (KTH), Stockholm 1962.
Google Scholar
M.R. Schroeder: Correlation techniques for speech bandwidth compression. J. Audio Eng. Soc. 10, 163–166 (1962)
Google Scholar
J.L. Flanagan, R.M. Golden: Phase vocoder. Bell Syst. Tech. J. 45, 1493–1509 (1966)
Google Scholar
M.R. Schroeder: Vocoders: Analysis and synthesis of speech. Proc. IEEE 55, 396–401 (1967)
Article Google Scholar
J.L. Flanagan: Speech Analysis, Synthesis and Perception, 2nd ed. (Springer, Berlin, Heidelberg 1972)
Book Google Scholar
E.E. David Jr., M.V. Mathews, H.S. McDonald: Description of results of experiments with speech using digital computer simulation. Proc. Natl. Elect Conf. pp. 766–775 (1958)
Google Scholar
J.L. Kelly Jr., C. Lochbaum, V.A. Vyssotsky: A block diagram compiler. Bell System Tech. J. 40, 669–676 (1961)
Google Scholar
M.V. Mathews: Extremal coding for speech transmission. IRE Trans. Inform. Theory IT-5, 129–136 (1959)
Article Google Scholar
M.R. Schroeder, B.S. Atal: Computer simulation of sound transmission in rooms. IEEE Internatl. Convention Record, Part 7 (1963)
Google Scholar
B.S. Atal, M.R. Schroeder: Predictive coding of speech signals. Proc. Sixth Internatl. Congr. of Acoustics, Tokyo, paper C-5–4 (1968). Originally published in Proc. 1967 IEEE Conf. on Communication and Processing, pp. 360–361 (1967)
Google Scholar
B.S. Atal, M.R. Schroeder: Adaptive predictive coding of speech signals. Bell Syst. Tech. J. 49, 1973–1986 (1970)
Google Scholar
M.R. Schroeder, B.S. Atal, J.L. Hall: Optimizing digital speech coders by exploiting masking properties of the human ear. J. Acoust. Soc. Am. 66, 1647–1652 (1979)
Article ADS Google Scholar
B.S. Atal, M.R. Schroeder: Predictive coding of speech signals and subjective error criteria. IEEE Trans. Acoust., Speech, Signal Processing ASSP-27, 247–254 (1979)
Article Google Scholar
B.S. Atal, M.R. Schroeder: Stochastic coding of speech signals at very low bit rates. Proc. Internatl. Conf. on Communication (North-Holland, Amsterdam 1984, pp. 1610–1613).
Google Scholar
See also A. Gersho, R.M. Gray: Vector Quantization and Signal Compression (Kluwer Academic, Boston 1992)
Book MATH Google Scholar
D. Sinha, J.D. Johnston, S. Dorward, S.R. Quackenbush: The perceptional audio coder. In V.K. Machisetti, D.B. Williams: The Digital Signal Processing Handbook pp. 42–1 to 42–17. (IEEE Press, New York 1998)
Google Scholar
J.D. Markel, A.H. Gray Jr.: Linear Prediction of Speech (Springer, Berlin, Heidelberg 1976)
Book MATH Google Scholar
F. Itakura, S. Saito: Speech analysis-synthesis systems based on the partial correlation coefficients (Acoustic Soc. of Japan Meeting, Tokyo 1969)
Google Scholar
B.S. Atal, S.L. Hanauer: Speech analysis and synthesis by linear predition of the speech wave. J. Acoust. Soc. Am. 50, 637–655 (1971)
Article ADS Google Scholar
M.R. Schroeder, B.S. Atal: Rate distortion theory and predictive coding. Proc. IEEE Internatl. Conf. on Acoustics, Speech, and Signal Processing pp. 201–204 (Atlanta 1981)
Google Scholar
W. Hess: Pitch Determination of Speech Signals (Springer, Berlin, Heidelberg 1983)
Book Google Scholar
M.R. Schroeder, E.E. David Jr.: A vocoder for transmitting 10 kc/s speech over a 3.5kc/s channel. Acustica 10, 35–43 (1960)
Google Scholar
M.M. Sondhi: New methods for pitch extraction. Proc. Conf. on Speech Communication and Processing (IEEE Audio and Electoacoustics Group, Cambridge, Massachusetts, 1967)
Google Scholar
B.S. Atal, J.R. Remde: A new model of LPC excitation for producing natural-sounding speech at low bit rates. Proc. IEEE Internatl. Conf. on Acoustics, Speech, and Signal Processing 1, 614–617 (1982)
Google Scholar
M.R. Schroeder: Die statistischen Parameter der Frequenzkurven von grossen Räumen. Acustica 4, 594–600 (1954). English translation: M.R. Schroeder: Statistical parameters of the frequency response of large rooms. J. Audio Eng. Soc. 35, 299–306 (1987)
Google Scholar
J.B. Anderson, J.B. Bodie: Tree encoding of speech. IEEE Trans. Inform. Theory IT-21, 379–387 (1975).
Article MathSciNet Google Scholar
See also [5.31] and M.R. Schroeder, B.S. Atal: Speech coding using efficient block codes. Proc. IEEE Internatl. Conf. on Acoustics, Speech and Signal Processing. 3, 1668–1671 (1982)
Google Scholar
M.R. Schroeder, B.S. Atal: Code-excited linear prediction (CELP) — high quality speech at very low bit rates. Proc. IEEE Internatl. Conf. on Acoustics, Speech, and Signal Processing (1985) pp. 937–940.
Google Scholar
See also M.R. Schroeder, B.S. Atal: Code-excited linear prediction. Speech Communication 4, 155–162 (1985)
Article Google Scholar
M.R. Schroeder, N.J.A. Sloane: New permutation codes using Hadamard unscrambling. IEEE Trans, on Inform. Theory IT-33, 144–146 (1987)
Article Google Scholar
J.L. Flanagan, M.R. Schroeder, B.S. Atal, R.E. Crochiere, N.S. Jayant, J.M. Tribolet: Speech coding. IEEE Trans, on Communications COM-27, No. 4, pp. 710–737 (1979)
Article ADS Google Scholar
J. Max: Quantizing for minimum distortion. IRE Trans. Inform. Theory IT-6, 7–12 (1960).
Article MathSciNet Google Scholar
See also S.P. Lloyd: Least squares quantization in PCM: IEEE Trans, on Information Theory IT-28, 127–135 (1982)
MathSciNet Google Scholar
F. DeJager: Delta modulation: A method of PCM transmission using a one-unit code. Philips Res. Rep. 7, 442–466 (1952)
Google Scholar
C.C. Cutler: Differential Pulse Code Modulation. (U.S. Patent 2,605,361, filed June 29, 1950, patented July 29, 1952)
Google Scholar
N.S. Jayant: Adaptive quantization with a one-word memory. Bell Syst. Tech. J. 52, 1119–1144 (1973)
Google Scholar
D.J. Goodman, J.L. Flanagan: Direct digital conversion between linear and adaptive delta modulation formats. Proc. IEEE Int. Commun. Conf., Montreal, Canada, (1971)
Google Scholar
P. Cummiskey, N.S. Jayant, J.L. Flanagan: Adaptive quantization in differential PCM coding of speech. Bell Syst. Tech. J. 52, 1105–1118 (1973)
Google Scholar
R.E. Crochiere, S.A. Webber, J.L. Flanagan: Digital coding of speech in sub-bands. Bell Syst. Tech. J. 55, 1069–1085 (1976)
Google Scholar
M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K. Akagiri, H. Fuchs, M. Dietz, J. Herre, G. Davidson, Y. Oikawa: ISO/IEC MPEG-2 Advanced Audio Coding. J. Audio Eng. Soc. 45, 789–814 (1997)
Google Scholar
J.S. Byrnes, B. Saffari, H.S. Shapiro: Energy spreading and data compression using the Prometheus orthogonal set. Proc. IEEE DSP Conf. Loen, Norway (1996)
Google Scholar
M.R. Schroeder: Number Theory in Science and Communication, 3rd ed. (Springer, Berlin, Heidelberg 1997)
MATH Google Scholar
J.S. Byrnes: A low complexity energy spreading transform coder. In Y. Zeevi and R. Coifmman (eds.): Signal and Image Representation in Combined Spaces (Haifa, 1997)
Google Scholar
A. Gersho: Advances in Speech and Audio Compression. Proc. IEEE 82, 900–918 (1994)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Drittes Physikalisches Institut, Universität Göttingen, Bürgerstrasse 42-44, 37073, Göttingen, Germany
Professor Dr. Manfred R. Schroeder

Authors

Professor Dr. Manfred R. Schroeder
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schroeder, M.R. (2004). Speech Compression. In: Computer Speech. Springer Series in Information Sciences, vol 35. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-06384-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-06384-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05956-8
Online ISBN: 978-3-662-06384-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics