Skip to main content

Speech Signal Processing and Feature Extraction

  • Conference paper
Spoken Language Generation and Understanding

Part of the book series: NATO Advanced Study Institutes Series ((ASIC,volume 59))

Abstract

Speech signal processing and feature extraction is the initial stage of any speech recognition system; it is through this component that the system views the speech signal itself. This chapter introduces general approaches to signal processing and feature extraction and surveys the techniques currently available in these areas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. L. Flanagan, “Speech Analysis, Synthesis and Perception,” 2nd ed. (Springer-Verlag, Berlin, Heidelberg, New York, 1972).

    Google Scholar 

  2. R. W. Schafer, J. D. Markel, “Speech Analysis” (IEEE Press, New York, 1979).

    Google Scholar 

  3. L. R. Rabiner, R. W. Schafer, “Digital Processing of Speech” (Prentice-Hall, Englewood Cliffs, N.J., 1978).

    Google Scholar 

  4. A. V. Oppenheim, R. W. Schafer, “Digital Signal Processing” (Prentice-Hall, Englewood Cliffs, N.J., 1975).

    MATH  Google Scholar 

  5. L. R. Rabiner, B. Gold, “Theory and Application of Digital Signal Processing” (Prentice-Hall, Englewood Cliffs, N.J., 1975).

    Google Scholar 

  6. A. Peled and B. Liu, “Digital Signal Processing, Theory, Design, and Implementation” (Wiley, New York, 1976).

    Google Scholar 

  7. P. B. Denes, E. N. Pinson, “The Speech Chain” (Anchor Press, Garden City, N.Y., 1973).

    Google Scholar 

  8. A. M. Liberman, F. S. Cooper, D. P. Shankweiler, M. Studdert-Kennedy, “Perception of the Speech Code,” Psych. Rev., vol. 74, pp. 431–461 (1967); also in E. E. David Jr., P. B. Denes (Eds.), “Human Communication: A Unified View” (McGraw-Hill, New York, 1972), pp. 13–50.

    Article  Google Scholar 

  9. N. Lindgren, “Machine Recognition of Human Language — Part II,” IEEE Spectrum, vol. 2, No. 4, pp. 44–59 (1965).

    Article  MathSciNet  Google Scholar 

  10. R. Jakobson, C. G. M. Fant, M. Halle, “Preliminaries to Speech Analysis” (MIT Press, Cambridge, Mass., 1963).

    Google Scholar 

  11. G. Fant, “Speech Sounds and Features” (MIT Press, Cambridge, Mass., 1973).

    Google Scholar 

  12. N. Chomsky, M. Halle, “The Sound Pattern of English” (Harper and Row, New York, 1968).

    Google Scholar 

  13. D. R. Reddy, “Segmentation of Speech Sounds,” J. Acoust. Soc. Am., vol. 40, pp. 307–312 (1966).

    Article  Google Scholar 

  14. R. J. Niederjohn, “A Mathematical Formulation and Comparison of Zero-Crossing Analysis Techniques which have been Applied to Automatic Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 373–380 (1975).

    Article  Google Scholar 

  15. P. Vicens, “Aspects of Speech Recognition by Computer,” Ph.D. dissertation, Stanford Univ., Stanford, Calif. (1969).

    Google Scholar 

  16. L. R. Rabiner, M. R. Sambur, “An Algorithm for Determining the Endpoints of Isolated Utterances,” Bell System Tech. J., vol. 54, pp. 297–315 (1975).

    Google Scholar 

  17. CMU Computer Science Speech Group, “Speech Understanding Systems: Summary of Results of the Five-Year Research Effort at Carnegie-Mellon University,” Dept. of Comp. Sci., Carnegie-Mellon Univ., Pittsburgh, Pa., Technical Report (1977).

    Google Scholar 

  18. J. M. Baker, “A New Time-Domain Analysis of Human Speech and Other Complex Waveforms,” Ph.D. dissertation, Carnegie-Mellon Univ., Pittsburgh, Pa. (1975).

    Google Scholar 

  19. L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, C. A. McGonegal, “A Comparative Performance Study of Several Pitch Detection Algorithms,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 399–418 (1976); also in (2).

    Article  Google Scholar 

  20. M. M. Sondhi, “New Methods of Pitch Extraction,” IEEE Trans. Audio Electroacoust., vol. AU-16, pp. 262–266 (1968); also in (2).

    Article  Google Scholar 

  21. J. D. Markel, “The SIFT Algorithm for Fundamental Frequency Extraction,” IEEE Trans. Audio Electroacoust., vol. AU-20, pp. 367–377 (1972); also in (2).

    Article  Google Scholar 

  22. R. Gillmann, “A Fast Frequency Domain Pitch Algorithm,” J. Acoust. Soc. Am., vol. 58, suppl. 1, p. S62 (abstract)(1975).

    Article  Google Scholar 

  23. J. J. Dubnowski, R. W. Schafer, L. R. Rabiner, “Real-Time Digital Hardware Pitch Detector,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 2–8 (1976).

    Article  Google Scholar 

  24. M. J. Ross, H. L. Schaffer, A. Cohen,. R. Freudberg, H. J. Manley, “Average Magnitude Difference Function Pitch Extractor,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-22, pp. 353–362 (1974); also in (2).

    Article  Google Scholar 

  25. N. J. Miller, “Pitch Detection by Data Reduction,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 72–79 (1975).

    Article  Google Scholar 

  26. B. Gold, L. R. Rabiner, “Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain,” J. Acoust. Soc Am., vol. 46, pp. 442–448 (1969); also in (2).

    Article  Google Scholar 

  27. J. Martony, “Studies of the Voice Source,” Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, QPSR 1/65, pp. 4–9 (1965).

    Google Scholar 

  28. C. G. Bell, H. Fujisaki, J. M. Heinz, K. N. Stevens, A. S. House, “Reduction of Speech Spectra by Analysis-by-Synthesis Techniques,” J. Acoust. Soc. Am., vol. 33, pp. 1725–1736 (1961); also in (2).

    Article  Google Scholar 

  29. L. C. W. Pols, “Real-Time Recognition of Spoken Words, IEEE Trans. Comput., vol. C-20, pp. 972–978 (1971).

    Article  Google Scholar 

  30. D. H. Klatt, “A Digital Filter Bank for Spectral Matching,” Record of the 1976 IEEE Int. Conf. on Acoust., Speech, Signal Proc, Philadelphia, Pa., pp. 537–540 (1976).

    Google Scholar 

  31. E. Zwicker, E. Terhardt, E. Paulus, “Automatic Speech Recognition using Psychoacoustic Models,” J. Acoust. Soc. Am., vol. 65, pp. 487–498 (1979).

    Article  Google Scholar 

  32. C L. Searle, J. Z. Jacobson, S. G. Rayment, “Stop Consonant Discrimination Based on Human Audition,” J. Acoust. Soc. Am., vol. 65, pp. 799–811 (1979).

    Article  Google Scholar 

  33. H. F. Silverman, “An Introduction to Programming the Winograd Fourier Transform Algorithm (WFTA),” IEEE Trans. Acoust., Speech, Signal Proc, Vol ASSP-25, pp. 152–165 (1977); vol. ASSP-26, pp. 268–269 (1978); vol. ASSP-26, p. 483 (1978).

    Article  Google Scholar 

  34. A. V. Oppenheim, R. W. Schafer, “Homomorphic Analysis of Speech,” IEEE Trans. Audio Electroacoust., vol. AU-16, pp. 221–226 (1968); also in (2).

    Article  Google Scholar 

  35. R. W. Schäfer, L. R. Rabiner, “System for Automatic Formant Analysis of Voiced Speech,” J. Acoust. Soc. Am., vol. 47, pp. 634–648 (1970); also in (2).

    Article  Google Scholar 

  36. M. V. Mathews, J. E. Miller, E. E. David Jr., “Pitch Synchronous Analysis of Voiced Sounds,” J. Acoust. Soc. Am., vol. 33, PP. 179–186 (1961); also in (2).

    Article  Google Scholar 

  37. W. J. Hess, “A Pitch-Synchronous Digital Feature Extraction System for Phonemic Recognition of Speech,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 14–26 (1976).

    Article  Google Scholar 

  38. W. Woods, M. Bates, G. Brown, B. Bruce, C. Cook, J. Klovstad, J. Makhoul, B. Nash-Webber, R. Schwartz, J. Wolf, V. Zue, “Speech Understanding Systems: Final Technical Progress Report,” Bolt Beranek and Newman Inc., Cambridge, Mass., Report 3438, vol. II (1976).

    Google Scholar 

  39. B. S. Atal, S. L. Hanauer, “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave,” J. Acoust. Soc. Am., vol. 50, pp. 637–655 (1971); also in (2).

    Article  Google Scholar 

  40. J. Makhoul, “Linear Prediction: A Tutorial Review,” Proc IEEE, vol. 63, PP. 561–580 (1975); also in (2).

    Article  Google Scholar 

  41. J. D. Markel, A. H. Gray Jr., “Linear Prediction of Speech” (Springer-Verlag, Berlin, Heidelberg, New York, 1976).

    Book  MATH  Google Scholar 

  42. G. Fant, “Acoustic Theory of Speech Production,” 2nd ed. (Mouton, The Hague, 1970).

    Google Scholar 

  43. B. S. Atal, M. R. Schroeder, “Linear Prediction Analysis of Speech Based on a Pole-Zero Representation,” J. Acoust Soc. Am., vol. 64, pp. 1310–1318 (1978).

    Article  Google Scholar 

  44. J. Makhoul, J. Wolf, “Linear Prediction and the Spectral Analysis of Speech,” Bolt Beranek and Newman Inc., Cambridge, Mass., Report 2304 (1972).

    Google Scholar 

  45. J. Makhoul, “Spectral Linear Prediction: Properties and Applications,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 283–296 (1975).

    Article  Google Scholar 

  46. J. Makhoul, J. Wolf, “The Use of a Two-Pole Linear Prediction Model in Speech Recognition,” Bolt Beranek and Newman Inc., Cambridge, Mass., Report 2357 (1973).

    Google Scholar 

  47. M. R. Sambur, L. R. Rabiner, “A Speaker Independent Digit Recognition System,” Bell System Tech. J., vol. 54, pp. 81–102 (1975).

    Google Scholar 

  48. F. Itakura, “Minimum Prediction Residual Principle Applied to Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, PP. 67–72 (1975).

    Article  Google Scholar 

  49. A. H. Gray Jr., J. D. Markel, “Distance Measures for Speech Processing,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 380–391 (1976).

    Article  Google Scholar 

  50. J. L. Flanagan, “Automatic Extraction of Formant Frequencies from Continuous Speech,” J. Acoust. Soc. Am., vol. 28, pp. 110–118 (1956); also in (2).

    Article  Google Scholar 

  51. J. D. Markel, “Application of a Digital Inverse Filter for Automatic Formant and F0 Analysis,” IEEE Trans. Audio Electroacoust., vol. AU-21, pp. 149–153 (1973).

    MathSciNet  Google Scholar 

  52. S. S. McCandless, “An Algorithm for Automatic Formant Extraction using Linear Prediction Spectra,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-22, pp. 135–141 (1974); also in (2).

    Article  Google Scholar 

  53. S. Seneff, “Modifications to Formant Tracking Algorithm,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 192–193 (1976); also in (2).

    Article  Google Scholar 

  54. R. DeMori, P. Laface, E. Piccolo, “Automatic Detection and Description of Syllabic Features in Continuous Speech,” IEEE Trans. Acoust, Speech, Signal Proc, vol. ASSP-24, pp. 365–379 (1976).

    Article  Google Scholar 

  55. H. Wakita, “Direct Estimation of the Vocal Tract Shape by Inverse Filtering,” IEEE Trans. Audio Electroacoust., vol. AU-21, pp. 417–427 (1973); also in (2).

    Article  Google Scholar 

  56. H. Wakita, “Estimation of Vocal-Tract Shapes form Acoustical Analysis of the Speech Wave: The State of the Art,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-27, pp. 281–285 (1979).

    Article  Google Scholar 

  57. S. Seneff, “Real-Time Harmonic Pitch Detector,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-26, pp. 358–365 (1978).

    Article  Google Scholar 

  58. M. R. Schroeder, “Period Histogram and Product Spectrum: New Methods for Fundamental Frequency Measurements,” J. Acoust. Soc Am., vol. 43, pp. 829–834 (1968).

    Article  Google Scholar 

  59. A. M. Noll, “Cepstrum Pitch Determination,” J. Acoust. Soc. Am., vol. 41, pp. 293–309 (1967); also in (2).

    Article  Google Scholar 

  60. B. S. Atal, L. R. Rabiner, “A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-24, pp. 201–212 (1976).

    Article  Google Scholar 

  61. L. R. Rabiner, M. R. Sambur, C. E. Schmidt, “Applications of a Nonlinear Smoothing Algorithm to Speech Processing,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-23, pp. 552–557 (1975).

    Article  Google Scholar 

  62. L. J. Gerstman, “Classification of Self-Normalized Vowels,” IEEE Trans. Audio Electroacoust., vol. AU-16, pp. 73–77 (1968).

    Google Scholar 

  63. I. Kameny, “Automatic Acoustic-Phonetic Analysis of Vowels and Sonorants,” Conference Record, 1976 IEEE Int. Conf. Acoust., Speech, Signal Proc, Philadelphia, Pa., pp. 166–169 (1976).

    Google Scholar 

  64. H. Wakita, “Normalization of Vowels by Vocal Tract Length and its Application to Vowel Identification,” IEEE Trans. Acoust., Speech, Signal Proc, vol. ASSP-25, pp. 183–192 (1977).

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1980 D. Reidel Publishing Company

About this paper

Cite this paper

Wolf, J.J. (1980). Speech Signal Processing and Feature Extraction. In: Simon, J.C. (eds) Spoken Language Generation and Understanding. NATO Advanced Study Institutes Series, vol 59. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-9091-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-94-009-9091-3_6

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-009-9093-7

  • Online ISBN: 978-94-009-9091-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics