Abstract
Speech compression, once an esoteric preoccupation of a few speech enthusiasts, has taken on a practical significance of singular proportion. As mentioned before, it all began in 1928 when Homer Dudley, an engineer at Bell Laboratories, had a brilliant idea for compressing a speech signal with a bandwidth of over 3000 Hz into the 100-Hz bandwidth of a new transatlantic telegraph cable. Instead of sending the speech signal itself, he thought it would suffice to transmit a description of the signal to the far end. This basic idea of substituting for the signal a sufficient specification from which it could be recreated is still with us in the latest linear prediction standards and other methods of speech compression for mobile phones, secure digital voice channels, compressed-speech storage for multimedia applications, and, last but not least, Internet telephony and broadcasting via the World Wide Web.
Speeches in our culture are the vacuum that fill a vacuum.
John Kenneth Galbraith (born 1908)
[British Prime Minister Ramsey] MacDonald has the gift of compressing the largest amount of words into the smallest amount of thought.
Winston Churchill (1874–1965)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H.W. Dudley: Remaking speech. J. Acoust. Soc. Am. 11, 169–177 (1939)
M.D. Fagen (ed.): A History of Engineering and Science in the Bell System: National Service in War and Peace (1925–1975) Sect. IV. Secure Speech Transmission (pp. 291–317) (Bell Telephone Laboratories, Murray Hill, New Jersey, 1978)
R.L. Miller: personal communication.
B.M. Oliver, J.R. Pierce, C.E. Shannon: The philosophy of PCM. Proc. IEEE 36, 1324–1331 (1948)
N.J.A. Sloane, A.D. Wyner: Claude Elwood Shannon — Collected Papers (IEEE Press, New York 1993)
C.E. Shannon: Communication theory of secrecy systems. Bell Syst. Tech. J. 28, 656–715 (1949)
R.L. Miller, personal communication.
L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust., Speech, and Signal Processing ASSP-24, 399–418 (1976)
A.M. Noll, M.R. Schroeder: Short time ‘cepstrum’ pitch detection. J. Acoust. Soc. Am. 36, 1030 (1967). See also: A.M. Noll, M.R. Schroeder: Real Time Cepstrum Analyzer (U.S. Patent 3,566,035, filed July 17, 1969, issued February 23, 1971)
M.R. Schroeder (unpublished)
M.R. Schroeder: Period histogram and product spectrum: New methods for fundamental frequency detection. J. Acoust. Soc. Am. 43, 829–834 (1968).
See also R.L. Miller: Performance characteristic of an experimental harmonic identification pitch extraction (HIPEX) system. J. Acoust. Soc. Am. 47, 1593–1601 (1970)
J.L. Flanagan: Bandwidth and channel capacity necessary to transmit the formant information of speech. J. Acoust. Soc. Am. 28, 592–596 (1956)
M.R. Schroeder, B.F. Logan, A.J. Prestigiacomo: New methods for speech analysis-synthesis and bandwidth compression. Proc. Stockholm Speech Comm. Seminar, Royal Institute of Technology (KTH), Stockholm 1962.
M.R. Schroeder: Correlation techniques for speech bandwidth compression. J. Audio Eng. Soc. 10, 163–166 (1962)
J.L. Flanagan, R.M. Golden: Phase vocoder. Bell Syst. Tech. J. 45, 1493–1509 (1966)
M.R. Schroeder: Vocoders: Analysis and synthesis of speech. Proc. IEEE 55, 396–401 (1967)
J.L. Flanagan: Speech Analysis, Synthesis and Perception, 2nd ed. (Springer, Berlin, Heidelberg 1972)
E.E. David Jr., M.V. Mathews, H.S. McDonald: Description of results of experiments with speech using digital computer simulation. Proc. Natl. Elect Conf. pp. 766–775 (1958)
J.L. Kelly Jr., C. Lochbaum, V.A. Vyssotsky: A block diagram compiler. Bell System Tech. J. 40, 669–676 (1961)
M.V. Mathews: Extremal coding for speech transmission. IRE Trans. Inform. Theory IT-5, 129–136 (1959)
M.R. Schroeder, B.S. Atal: Computer simulation of sound transmission in rooms. IEEE Internatl. Convention Record, Part 7 (1963)
B.S. Atal, M.R. Schroeder: Predictive coding of speech signals. Proc. Sixth Internatl. Congr. of Acoustics, Tokyo, paper C-5–4 (1968). Originally published in Proc. 1967 IEEE Conf. on Communication and Processing, pp. 360–361 (1967)
B.S. Atal, M.R. Schroeder: Adaptive predictive coding of speech signals. Bell Syst. Tech. J. 49, 1973–1986 (1970)
M.R. Schroeder, B.S. Atal, J.L. Hall: Optimizing digital speech coders by exploiting masking properties of the human ear. J. Acoust. Soc. Am. 66, 1647–1652 (1979)
B.S. Atal, M.R. Schroeder: Predictive coding of speech signals and subjective error criteria. IEEE Trans. Acoust., Speech, Signal Processing ASSP-27, 247–254 (1979)
B.S. Atal, M.R. Schroeder: Stochastic coding of speech signals at very low bit rates. Proc. Internatl. Conf. on Communication (North-Holland, Amsterdam 1984, pp. 1610–1613).
See also A. Gersho, R.M. Gray: Vector Quantization and Signal Compression (Kluwer Academic, Boston 1992)
D. Sinha, J.D. Johnston, S. Dorward, S.R. Quackenbush: The perceptional audio coder. In V.K. Machisetti, D.B. Williams: The Digital Signal Processing Handbook pp. 42–1 to 42–17. (IEEE Press, New York 1998)
J.D. Markel, A.H. Gray Jr.: Linear Prediction of Speech (Springer, Berlin, Heidelberg 1976)
F. Itakura, S. Saito: Speech analysis-synthesis systems based on the partial correlation coefficients (Acoustic Soc. of Japan Meeting, Tokyo 1969)
B.S. Atal, S.L. Hanauer: Speech analysis and synthesis by linear predition of the speech wave. J. Acoust. Soc. Am. 50, 637–655 (1971)
M.R. Schroeder, B.S. Atal: Rate distortion theory and predictive coding. Proc. IEEE Internatl. Conf. on Acoustics, Speech, and Signal Processing pp. 201–204 (Atlanta 1981)
W. Hess: Pitch Determination of Speech Signals (Springer, Berlin, Heidelberg 1983)
M.R. Schroeder, E.E. David Jr.: A vocoder for transmitting 10 kc/s speech over a 3.5kc/s channel. Acustica 10, 35–43 (1960)
M.M. Sondhi: New methods for pitch extraction. Proc. Conf. on Speech Communication and Processing (IEEE Audio and Electoacoustics Group, Cambridge, Massachusetts, 1967)
B.S. Atal, J.R. Remde: A new model of LPC excitation for producing natural-sounding speech at low bit rates. Proc. IEEE Internatl. Conf. on Acoustics, Speech, and Signal Processing 1, 614–617 (1982)
M.R. Schroeder: Die statistischen Parameter der Frequenzkurven von grossen Räumen. Acustica 4, 594–600 (1954). English translation: M.R. Schroeder: Statistical parameters of the frequency response of large rooms. J. Audio Eng. Soc. 35, 299–306 (1987)
J.B. Anderson, J.B. Bodie: Tree encoding of speech. IEEE Trans. Inform. Theory IT-21, 379–387 (1975).
See also [5.31] and M.R. Schroeder, B.S. Atal: Speech coding using efficient block codes. Proc. IEEE Internatl. Conf. on Acoustics, Speech and Signal Processing. 3, 1668–1671 (1982)
M.R. Schroeder, B.S. Atal: Code-excited linear prediction (CELP) — high quality speech at very low bit rates. Proc. IEEE Internatl. Conf. on Acoustics, Speech, and Signal Processing (1985) pp. 937–940.
See also M.R. Schroeder, B.S. Atal: Code-excited linear prediction. Speech Communication 4, 155–162 (1985)
M.R. Schroeder, N.J.A. Sloane: New permutation codes using Hadamard unscrambling. IEEE Trans, on Inform. Theory IT-33, 144–146 (1987)
J.L. Flanagan, M.R. Schroeder, B.S. Atal, R.E. Crochiere, N.S. Jayant, J.M. Tribolet: Speech coding. IEEE Trans, on Communications COM-27, No. 4, pp. 710–737 (1979)
J. Max: Quantizing for minimum distortion. IRE Trans. Inform. Theory IT-6, 7–12 (1960).
See also S.P. Lloyd: Least squares quantization in PCM: IEEE Trans, on Information Theory IT-28, 127–135 (1982)
F. DeJager: Delta modulation: A method of PCM transmission using a one-unit code. Philips Res. Rep. 7, 442–466 (1952)
C.C. Cutler: Differential Pulse Code Modulation. (U.S. Patent 2,605,361, filed June 29, 1950, patented July 29, 1952)
N.S. Jayant: Adaptive quantization with a one-word memory. Bell Syst. Tech. J. 52, 1119–1144 (1973)
D.J. Goodman, J.L. Flanagan: Direct digital conversion between linear and adaptive delta modulation formats. Proc. IEEE Int. Commun. Conf., Montreal, Canada, (1971)
P. Cummiskey, N.S. Jayant, J.L. Flanagan: Adaptive quantization in differential PCM coding of speech. Bell Syst. Tech. J. 52, 1105–1118 (1973)
R.E. Crochiere, S.A. Webber, J.L. Flanagan: Digital coding of speech in sub-bands. Bell Syst. Tech. J. 55, 1069–1085 (1976)
M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K. Akagiri, H. Fuchs, M. Dietz, J. Herre, G. Davidson, Y. Oikawa: ISO/IEC MPEG-2 Advanced Audio Coding. J. Audio Eng. Soc. 45, 789–814 (1997)
J.S. Byrnes, B. Saffari, H.S. Shapiro: Energy spreading and data compression using the Prometheus orthogonal set. Proc. IEEE DSP Conf. Loen, Norway (1996)
M.R. Schroeder: Number Theory in Science and Communication, 3rd ed. (Springer, Berlin, Heidelberg 1997)
J.S. Byrnes: A low complexity energy spreading transform coder. In Y. Zeevi and R. Coifmman (eds.): Signal and Image Representation in Combined Spaces (Haifa, 1997)
A. Gersho: Advances in Speech and Audio Compression. Proc. IEEE 82, 900–918 (1994)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schroeder, M.R. (2004). Speech Compression. In: Computer Speech. Springer Series in Information Sciences, vol 35. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-06384-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-06384-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05956-8
Online ISBN: 978-3-662-06384-2
eBook Packages: Springer Book Archive