Speech Signal Processing and Feature Extraction

Wolf, Jared J.

doi:10.1007/978-94-009-9091-3_6

Jared J. Wolf²

Part of the book series: NATO Advanced Study Institutes Series ((ASIC,volume 59))

397 Accesses
2 Citations

Abstract

Speech signal processing and feature extraction is the initial stage of any speech recognition system; it is through this component that the system views the speech signal itself. This chapter introduces general approaches to signal processing and feature extraction and surveys the techniques currently available in these areas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. L. Flanagan, “Speech Analysis, Synthesis and Perception,” 2nd ed. (Springer-Verlag, Berlin, Heidelberg, New York, 1972).
Google Scholar
R. W. Schafer, J. D. Markel, “Speech Analysis” (IEEE Press, New York, 1979).
Google Scholar
L. R. Rabiner, R. W. Schafer, “Digital Processing of Speech” (Prentice-Hall, Englewood Cliffs, N.J., 1978).
Google Scholar
A. V. Oppenheim, R. W. Schafer, “Digital Signal Processing” (Prentice-Hall, Englewood Cliffs, N.J., 1975).
MATH Google Scholar
L. R. Rabiner, B. Gold, “Theory and Application of Digital Signal Processing” (Prentice-Hall, Englewood Cliffs, N.J., 1975).
Google Scholar
A. Peled and B. Liu, “Digital Signal Processing, Theory, Design, and Implementation” (Wiley, New York, 1976).
Google Scholar
P. B. Denes, E. N. Pinson, “The Speech Chain” (Anchor Press, Garden City, N.Y., 1973).
Google Scholar
A. M. Liberman, F. S. Cooper, D. P. Shankweiler, M. Studdert-Kennedy, “Perception of the Speech Code,” Psych. Rev., vol. 74, pp. 431–461 (1967); also in E. E. David Jr., P. B. Denes (Eds.), “Human Communication: A Unified View” (McGraw-Hill, New York, 1972), pp. 13–50.
Article Google Scholar
N. Lindgren, “Machine Recognition of Human Language — Part II,” IEEE Spectrum, vol. 2, No. 4, pp. 44–59 (1965).
Article MathSciNet Google Scholar
R. Jakobson, C. G. M. Fant, M. Halle, “Preliminaries to Speech Analysis” (MIT Press, Cambridge, Mass., 1963).
Google Scholar
G. Fant, “Speech Sounds and Features” (MIT Press, Cambridge, Mass., 1973).
Google Scholar
N. Chomsky, M. Halle, “The Sound Pattern of English” (Harper and Row, New York, 1968).
Google Scholar
D. R. Reddy, “Segmentation of Speech Sounds,” J. Acoust. Soc. Am., vol. 40, pp. 307–312 (1966).
Article Google Scholar
R. J. Niederjohn, “A Mathematical Formulation and Comparison of Zero-Crossing Analysis Techniques which have been Applied to Automatic Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 373–380 (1975).
Article Google Scholar
P. Vicens, “Aspects of Speech Recognition by Computer,” Ph.D. dissertation, Stanford Univ., Stanford, Calif. (1969).
Google Scholar
L. R. Rabiner, M. R. Sambur, “An Algorithm for Determining the Endpoints of Isolated Utterances,” Bell System Tech. J., vol. 54, pp. 297–315 (1975).
Google Scholar
CMU Computer Science Speech Group, “Speech Understanding Systems: Summary of Results of the Five-Year Research Effort at Carnegie-Mellon University,” Dept. of Comp. Sci., Carnegie-Mellon Univ., Pittsburgh, Pa., Technical Report (1977).
Google Scholar
J. M. Baker, “A New Time-Domain Analysis of Human Speech and Other Complex Waveforms,” Ph.D. dissertation, Carnegie-Mellon Univ., Pittsburgh, Pa. (1975).
Google Scholar
L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, C. A. McGonegal, “A Comparative Performance Study of Several Pitch Detection Algorithms,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 399–418 (1976); also in (2).
Article Google Scholar
M. M. Sondhi, “New Methods of Pitch Extraction,” IEEE Trans. Audio Electroacoust., vol. AU-16, pp. 262–266 (1968); also in (2).
Article Google Scholar
J. D. Markel, “The SIFT Algorithm for Fundamental Frequency Extraction,” IEEE Trans. Audio Electroacoust., vol. AU-20, pp. 367–377 (1972); also in (2).
Article Google Scholar
R. Gillmann, “A Fast Frequency Domain Pitch Algorithm,” J. Acoust. Soc. Am., vol. 58, suppl. 1, p. S62 (abstract)(1975).
Article Google Scholar
J. J. Dubnowski, R. W. Schafer, L. R. Rabiner, “Real-Time Digital Hardware Pitch Detector,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 2–8 (1976).
Article Google Scholar
M. J. Ross, H. L. Schaffer, A. Cohen,. R. Freudberg, H. J. Manley, “Average Magnitude Difference Function Pitch Extractor,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-22, pp. 353–362 (1974); also in (2).
Article Google Scholar
N. J. Miller, “Pitch Detection by Data Reduction,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 72–79 (1975).
Article Google Scholar
B. Gold, L. R. Rabiner, “Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain,” J. Acoust. Soc Am., vol. 46, pp. 442–448 (1969); also in (2).
Article Google Scholar
J. Martony, “Studies of the Voice Source,” Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, QPSR 1/65, pp. 4–9 (1965).
Google Scholar
C. G. Bell, H. Fujisaki, J. M. Heinz, K. N. Stevens, A. S. House, “Reduction of Speech Spectra by Analysis-by-Synthesis Techniques,” J. Acoust. Soc. Am., vol. 33, pp. 1725–1736 (1961); also in (2).
Article Google Scholar
L. C. W. Pols, “Real-Time Recognition of Spoken Words, IEEE Trans. Comput., vol. C-20, pp. 972–978 (1971).
Article Google Scholar
D. H. Klatt, “A Digital Filter Bank for Spectral Matching,” Record of the 1976 IEEE Int. Conf. on Acoust., Speech, Signal Proc, Philadelphia, Pa., pp. 537–540 (1976).
Google Scholar
E. Zwicker, E. Terhardt, E. Paulus, “Automatic Speech Recognition using Psychoacoustic Models,” J. Acoust. Soc. Am., vol. 65, pp. 487–498 (1979).
Article Google Scholar
C L. Searle, J. Z. Jacobson, S. G. Rayment, “Stop Consonant Discrimination Based on Human Audition,” J. Acoust. Soc. Am., vol. 65, pp. 799–811 (1979).
Article Google Scholar
H. F. Silverman, “An Introduction to Programming the Winograd Fourier Transform Algorithm (WFTA),” IEEE Trans. Acoust., Speech, Signal Proc, Vol ASSP-25, pp. 152–165 (1977); vol. ASSP-26, pp. 268–269 (1978); vol. ASSP-26, p. 483 (1978).
Article Google Scholar
A. V. Oppenheim, R. W. Schafer, “Homomorphic Analysis of Speech,” IEEE Trans. Audio Electroacoust., vol. AU-16, pp. 221–226 (1968); also in (2).
Article Google Scholar
R. W. Schäfer, L. R. Rabiner, “System for Automatic Formant Analysis of Voiced Speech,” J. Acoust. Soc. Am., vol. 47, pp. 634–648 (1970); also in (2).
Article Google Scholar
M. V. Mathews, J. E. Miller, E. E. David Jr., “Pitch Synchronous Analysis of Voiced Sounds,” J. Acoust. Soc. Am., vol. 33, PP. 179–186 (1961); also in (2).
Article Google Scholar
W. J. Hess, “A Pitch-Synchronous Digital Feature Extraction System for Phonemic Recognition of Speech,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 14–26 (1976).
Article Google Scholar
W. Woods, M. Bates, G. Brown, B. Bruce, C. Cook, J. Klovstad, J. Makhoul, B. Nash-Webber, R. Schwartz, J. Wolf, V. Zue, “Speech Understanding Systems: Final Technical Progress Report,” Bolt Beranek and Newman Inc., Cambridge, Mass., Report 3438, vol. II (1976).
Google Scholar
B. S. Atal, S. L. Hanauer, “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave,” J. Acoust. Soc. Am., vol. 50, pp. 637–655 (1971); also in (2).
Article Google Scholar
J. Makhoul, “Linear Prediction: A Tutorial Review,” Proc IEEE, vol. 63, PP. 561–580 (1975); also in (2).
Article Google Scholar
J. D. Markel, A. H. Gray Jr., “Linear Prediction of Speech” (Springer-Verlag, Berlin, Heidelberg, New York, 1976).
Book MATH Google Scholar
G. Fant, “Acoustic Theory of Speech Production,” 2nd ed. (Mouton, The Hague, 1970).
Google Scholar
B. S. Atal, M. R. Schroeder, “Linear Prediction Analysis of Speech Based on a Pole-Zero Representation,” J. Acoust Soc. Am., vol. 64, pp. 1310–1318 (1978).
Article Google Scholar
J. Makhoul, J. Wolf, “Linear Prediction and the Spectral Analysis of Speech,” Bolt Beranek and Newman Inc., Cambridge, Mass., Report 2304 (1972).
Google Scholar
J. Makhoul, “Spectral Linear Prediction: Properties and Applications,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 283–296 (1975).
Article Google Scholar
J. Makhoul, J. Wolf, “The Use of a Two-Pole Linear Prediction Model in Speech Recognition,” Bolt Beranek and Newman Inc., Cambridge, Mass., Report 2357 (1973).
Google Scholar
M. R. Sambur, L. R. Rabiner, “A Speaker Independent Digit Recognition System,” Bell System Tech. J., vol. 54, pp. 81–102 (1975).
Google Scholar
F. Itakura, “Minimum Prediction Residual Principle Applied to Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, PP. 67–72 (1975).
Article Google Scholar
A. H. Gray Jr., J. D. Markel, “Distance Measures for Speech Processing,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 380–391 (1976).
Article Google Scholar
J. L. Flanagan, “Automatic Extraction of Formant Frequencies from Continuous Speech,” J. Acoust. Soc. Am., vol. 28, pp. 110–118 (1956); also in (2).
Article Google Scholar
J. D. Markel, “Application of a Digital Inverse Filter for Automatic Formant and F0 Analysis,” IEEE Trans. Audio Electroacoust., vol. AU-21, pp. 149–153 (1973).
MathSciNet Google Scholar
S. S. McCandless, “An Algorithm for Automatic Formant Extraction using Linear Prediction Spectra,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-22, pp. 135–141 (1974); also in (2).
Article Google Scholar
S. Seneff, “Modifications to Formant Tracking Algorithm,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 192–193 (1976); also in (2).
Article Google Scholar
R. DeMori, P. Laface, E. Piccolo, “Automatic Detection and Description of Syllabic Features in Continuous Speech,” IEEE Trans. Acoust, Speech, Signal Proc, vol. ASSP-24, pp. 365–379 (1976).
Article Google Scholar
H. Wakita, “Direct Estimation of the Vocal Tract Shape by Inverse Filtering,” IEEE Trans. Audio Electroacoust., vol. AU-21, pp. 417–427 (1973); also in (2).
Article Google Scholar
H. Wakita, “Estimation of Vocal-Tract Shapes form Acoustical Analysis of the Speech Wave: The State of the Art,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-27, pp. 281–285 (1979).
Article Google Scholar
S. Seneff, “Real-Time Harmonic Pitch Detector,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-26, pp. 358–365 (1978).
Article Google Scholar
M. R. Schroeder, “Period Histogram and Product Spectrum: New Methods for Fundamental Frequency Measurements,” J. Acoust. Soc Am., vol. 43, pp. 829–834 (1968).
Article Google Scholar
A. M. Noll, “Cepstrum Pitch Determination,” J. Acoust. Soc. Am., vol. 41, pp. 293–309 (1967); also in (2).
Article Google Scholar
B. S. Atal, L. R. Rabiner, “A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 201–212 (1976).
Article Google Scholar
L. R. Rabiner, M. R. Sambur, C. E. Schmidt, “Applications of a Nonlinear Smoothing Algorithm to Speech Processing,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 552–557 (1975).
Article Google Scholar
L. J. Gerstman, “Classification of Self-Normalized Vowels,” IEEE Trans. Audio Electroacoust., vol. AU-16, pp. 73–77 (1968).
Google Scholar
I. Kameny, “Automatic Acoustic-Phonetic Analysis of Vowels and Sonorants,” Conference Record, 1976 IEEE Int. Conf. Acoust., Speech, Signal Proc, Philadelphia, Pa., pp. 166–169 (1976).
Google Scholar
H. Wakita, “Normalization of Vowels by Vocal Tract Length and its Application to Vowel Identification,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-25, pp. 183–192 (1977).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bolt Beranek and Newman Inc., 50 Moulton Street, Cambridge, Mass., 02138, USA
Jared J. Wolf

Authors

Jared J. Wolf
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut de Programmation, Université Pierre et Marie Curie, Paris VI, France
J. C. Simon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wolf, J.J. (1980). Speech Signal Processing and Feature Extraction. In: Simon, J.C. (eds) Spoken Language Generation and Understanding. NATO Advanced Study Institutes Series, vol 59. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-9091-3_6

Download citation

DOI: https://doi.org/10.1007/978-94-009-9091-3_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-009-9093-7
Online ISBN: 978-94-009-9091-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics