Abstract
Speech signal processing and feature extraction is the initial stage of any speech recognition system; it is through this component that the system views the speech signal itself. This chapter introduces general approaches to signal processing and feature extraction and surveys the techniques currently available in these areas.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. L. Flanagan, “Speech Analysis, Synthesis and Perception,” 2nd ed. (Springer-Verlag, Berlin, Heidelberg, New York, 1972).
R. W. Schafer, J. D. Markel, “Speech Analysis” (IEEE Press, New York, 1979).
L. R. Rabiner, R. W. Schafer, “Digital Processing of Speech” (Prentice-Hall, Englewood Cliffs, N.J., 1978).
A. V. Oppenheim, R. W. Schafer, “Digital Signal Processing” (Prentice-Hall, Englewood Cliffs, N.J., 1975).
L. R. Rabiner, B. Gold, “Theory and Application of Digital Signal Processing” (Prentice-Hall, Englewood Cliffs, N.J., 1975).
A. Peled and B. Liu, “Digital Signal Processing, Theory, Design, and Implementation” (Wiley, New York, 1976).
P. B. Denes, E. N. Pinson, “The Speech Chain” (Anchor Press, Garden City, N.Y., 1973).
A. M. Liberman, F. S. Cooper, D. P. Shankweiler, M. Studdert-Kennedy, “Perception of the Speech Code,” Psych. Rev., vol. 74, pp. 431–461 (1967); also in E. E. David Jr., P. B. Denes (Eds.), “Human Communication: A Unified View” (McGraw-Hill, New York, 1972), pp. 13–50.
N. Lindgren, “Machine Recognition of Human Language — Part II,” IEEE Spectrum, vol. 2, No. 4, pp. 44–59 (1965).
R. Jakobson, C. G. M. Fant, M. Halle, “Preliminaries to Speech Analysis” (MIT Press, Cambridge, Mass., 1963).
G. Fant, “Speech Sounds and Features” (MIT Press, Cambridge, Mass., 1973).
N. Chomsky, M. Halle, “The Sound Pattern of English” (Harper and Row, New York, 1968).
D. R. Reddy, “Segmentation of Speech Sounds,” J. Acoust. Soc. Am., vol. 40, pp. 307–312 (1966).
R. J. Niederjohn, “A Mathematical Formulation and Comparison of Zero-Crossing Analysis Techniques which have been Applied to Automatic Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 373–380 (1975).
P. Vicens, “Aspects of Speech Recognition by Computer,” Ph.D. dissertation, Stanford Univ., Stanford, Calif. (1969).
L. R. Rabiner, M. R. Sambur, “An Algorithm for Determining the Endpoints of Isolated Utterances,” Bell System Tech. J., vol. 54, pp. 297–315 (1975).
CMU Computer Science Speech Group, “Speech Understanding Systems: Summary of Results of the Five-Year Research Effort at Carnegie-Mellon University,” Dept. of Comp. Sci., Carnegie-Mellon Univ., Pittsburgh, Pa., Technical Report (1977).
J. M. Baker, “A New Time-Domain Analysis of Human Speech and Other Complex Waveforms,” Ph.D. dissertation, Carnegie-Mellon Univ., Pittsburgh, Pa. (1975).
L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, C. A. McGonegal, “A Comparative Performance Study of Several Pitch Detection Algorithms,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 399–418 (1976); also in (2).
M. M. Sondhi, “New Methods of Pitch Extraction,” IEEE Trans. Audio Electroacoust., vol. AU-16, pp. 262–266 (1968); also in (2).
J. D. Markel, “The SIFT Algorithm for Fundamental Frequency Extraction,” IEEE Trans. Audio Electroacoust., vol. AU-20, pp. 367–377 (1972); also in (2).
R. Gillmann, “A Fast Frequency Domain Pitch Algorithm,” J. Acoust. Soc. Am., vol. 58, suppl. 1, p. S62 (abstract)(1975).
J. J. Dubnowski, R. W. Schafer, L. R. Rabiner, “Real-Time Digital Hardware Pitch Detector,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 2–8 (1976).
M. J. Ross, H. L. Schaffer, A. Cohen,. R. Freudberg, H. J. Manley, “Average Magnitude Difference Function Pitch Extractor,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-22, pp. 353–362 (1974); also in (2).
N. J. Miller, “Pitch Detection by Data Reduction,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 72–79 (1975).
B. Gold, L. R. Rabiner, “Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain,” J. Acoust. Soc Am., vol. 46, pp. 442–448 (1969); also in (2).
J. Martony, “Studies of the Voice Source,” Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, QPSR 1/65, pp. 4–9 (1965).
C. G. Bell, H. Fujisaki, J. M. Heinz, K. N. Stevens, A. S. House, “Reduction of Speech Spectra by Analysis-by-Synthesis Techniques,” J. Acoust. Soc. Am., vol. 33, pp. 1725–1736 (1961); also in (2).
L. C. W. Pols, “Real-Time Recognition of Spoken Words, IEEE Trans. Comput., vol. C-20, pp. 972–978 (1971).
D. H. Klatt, “A Digital Filter Bank for Spectral Matching,” Record of the 1976 IEEE Int. Conf. on Acoust., Speech, Signal Proc, Philadelphia, Pa., pp. 537–540 (1976).
E. Zwicker, E. Terhardt, E. Paulus, “Automatic Speech Recognition using Psychoacoustic Models,” J. Acoust. Soc. Am., vol. 65, pp. 487–498 (1979).
C L. Searle, J. Z. Jacobson, S. G. Rayment, “Stop Consonant Discrimination Based on Human Audition,” J. Acoust. Soc. Am., vol. 65, pp. 799–811 (1979).
H. F. Silverman, “An Introduction to Programming the Winograd Fourier Transform Algorithm (WFTA),” IEEE Trans. Acoust., Speech, Signal Proc, Vol ASSP-25, pp. 152–165 (1977); vol. ASSP-26, pp. 268–269 (1978); vol. ASSP-26, p. 483 (1978).
A. V. Oppenheim, R. W. Schafer, “Homomorphic Analysis of Speech,” IEEE Trans. Audio Electroacoust., vol. AU-16, pp. 221–226 (1968); also in (2).
R. W. Schäfer, L. R. Rabiner, “System for Automatic Formant Analysis of Voiced Speech,” J. Acoust. Soc. Am., vol. 47, pp. 634–648 (1970); also in (2).
M. V. Mathews, J. E. Miller, E. E. David Jr., “Pitch Synchronous Analysis of Voiced Sounds,” J. Acoust. Soc. Am., vol. 33, PP. 179–186 (1961); also in (2).
W. J. Hess, “A Pitch-Synchronous Digital Feature Extraction System for Phonemic Recognition of Speech,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 14–26 (1976).
W. Woods, M. Bates, G. Brown, B. Bruce, C. Cook, J. Klovstad, J. Makhoul, B. Nash-Webber, R. Schwartz, J. Wolf, V. Zue, “Speech Understanding Systems: Final Technical Progress Report,” Bolt Beranek and Newman Inc., Cambridge, Mass., Report 3438, vol. II (1976).
B. S. Atal, S. L. Hanauer, “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave,” J. Acoust. Soc. Am., vol. 50, pp. 637–655 (1971); also in (2).
J. Makhoul, “Linear Prediction: A Tutorial Review,” Proc IEEE, vol. 63, PP. 561–580 (1975); also in (2).
J. D. Markel, A. H. Gray Jr., “Linear Prediction of Speech” (Springer-Verlag, Berlin, Heidelberg, New York, 1976).
G. Fant, “Acoustic Theory of Speech Production,” 2nd ed. (Mouton, The Hague, 1970).
B. S. Atal, M. R. Schroeder, “Linear Prediction Analysis of Speech Based on a Pole-Zero Representation,” J. Acoust Soc. Am., vol. 64, pp. 1310–1318 (1978).
J. Makhoul, J. Wolf, “Linear Prediction and the Spectral Analysis of Speech,” Bolt Beranek and Newman Inc., Cambridge, Mass., Report 2304 (1972).
J. Makhoul, “Spectral Linear Prediction: Properties and Applications,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 283–296 (1975).
J. Makhoul, J. Wolf, “The Use of a Two-Pole Linear Prediction Model in Speech Recognition,” Bolt Beranek and Newman Inc., Cambridge, Mass., Report 2357 (1973).
M. R. Sambur, L. R. Rabiner, “A Speaker Independent Digit Recognition System,” Bell System Tech. J., vol. 54, pp. 81–102 (1975).
F. Itakura, “Minimum Prediction Residual Principle Applied to Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, PP. 67–72 (1975).
A. H. Gray Jr., J. D. Markel, “Distance Measures for Speech Processing,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 380–391 (1976).
J. L. Flanagan, “Automatic Extraction of Formant Frequencies from Continuous Speech,” J. Acoust. Soc. Am., vol. 28, pp. 110–118 (1956); also in (2).
J. D. Markel, “Application of a Digital Inverse Filter for Automatic Formant and F0 Analysis,” IEEE Trans. Audio Electroacoust., vol. AU-21, pp. 149–153 (1973).
S. S. McCandless, “An Algorithm for Automatic Formant Extraction using Linear Prediction Spectra,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-22, pp. 135–141 (1974); also in (2).
S. Seneff, “Modifications to Formant Tracking Algorithm,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 192–193 (1976); also in (2).
R. DeMori, P. Laface, E. Piccolo, “Automatic Detection and Description of Syllabic Features in Continuous Speech,” IEEE Trans. Acoust, Speech, Signal Proc, vol. ASSP-24, pp. 365–379 (1976).
H. Wakita, “Direct Estimation of the Vocal Tract Shape by Inverse Filtering,” IEEE Trans. Audio Electroacoust., vol. AU-21, pp. 417–427 (1973); also in (2).
H. Wakita, “Estimation of Vocal-Tract Shapes form Acoustical Analysis of the Speech Wave: The State of the Art,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-27, pp. 281–285 (1979).
S. Seneff, “Real-Time Harmonic Pitch Detector,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-26, pp. 358–365 (1978).
M. R. Schroeder, “Period Histogram and Product Spectrum: New Methods for Fundamental Frequency Measurements,” J. Acoust. Soc Am., vol. 43, pp. 829–834 (1968).
A. M. Noll, “Cepstrum Pitch Determination,” J. Acoust. Soc. Am., vol. 41, pp. 293–309 (1967); also in (2).
B. S. Atal, L. R. Rabiner, “A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 201–212 (1976).
L. R. Rabiner, M. R. Sambur, C. E. Schmidt, “Applications of a Nonlinear Smoothing Algorithm to Speech Processing,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 552–557 (1975).
L. J. Gerstman, “Classification of Self-Normalized Vowels,” IEEE Trans. Audio Electroacoust., vol. AU-16, pp. 73–77 (1968).
I. Kameny, “Automatic Acoustic-Phonetic Analysis of Vowels and Sonorants,” Conference Record, 1976 IEEE Int. Conf. Acoust., Speech, Signal Proc, Philadelphia, Pa., pp. 166–169 (1976).
H. Wakita, “Normalization of Vowels by Vocal Tract Length and its Application to Vowel Identification,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-25, pp. 183–192 (1977).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1980 D. Reidel Publishing Company
About this paper
Cite this paper
Wolf, J.J. (1980). Speech Signal Processing and Feature Extraction. In: Simon, J.C. (eds) Spoken Language Generation and Understanding. NATO Advanced Study Institutes Series, vol 59. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-9091-3_6
Download citation
DOI: https://doi.org/10.1007/978-94-009-9091-3_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-009-9093-7
Online ISBN: 978-94-009-9091-3
eBook Packages: Springer Book Archive