Skip to main content

Analysis in Automatic Recognition of Speech

  • Conference paper
  • 262 Accesses

Abstract

The chapter describes the purpose and origins of some of the methods used for speech analysis. The first part discusses the history of the development of some of the early speech analysis techniques. The focus of the second half is on a currently active area of research on auditory modeling in speech analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atal B.S., Schroeder M.R. (1979) Predictive coding of speech signals and subjective error criterion, IEEE Trans. ASSP-27 pp. 247–254

    Article  Google Scholar 

  2. Schroeder M.R., Atal BS (1968) Predictive coding of speech signals, reports of the 6th Intl. Cong. Acoust., ed. By Y. Kohasi (Tokyo) C-5-5

    Google Scholar 

  3. Itakura F., Saito S. (1970) A statistical method for estimation of speech spectral density and formant frequencies, electronics and Communication in Japan, Vol. 53-A, pp. 36–43

    Google Scholar 

  4. Atal B.S., Remde J.R. (1982) A new model of LPC excitation for producing natural sounding speech, in Proc. IEEE ICASSP’82, pp. 614–618

    Google Scholar 

  5. Schroeder M., Atal B.S. (1985) Code-excited linear prediction (CELP): High-quality speech at very low bit rates, in Proc. IEEE ICASSP’85, pp. 937–940

    Google Scholar 

  6. Scripture C. (1906) Researches in Experimental Phonetics, The Carnegie Institution of Washington

    Google Scholar 

  7. Koenig W., Dunn H.K., Lacey L.Y. (1946) The sound spectrograph, J. Acoust. Soc. Am. 18: pp. 19–49

    Article  Google Scholar 

  8. Cole R.A., Zue V., Reddy R. (1978) Speech as patterns on paper, in Perception and Production of Fluent Speech, Cole RA ed., Erlbaum

    Google Scholar 

  9. Flanagan, J.L. (1972), Speech Analysis Synthesis and Perception, Second Edition, Springer-Verlag

    Google Scholar 

  10. Helmholtz, H. (1954), On the Sensation of Tone, Dover

    Google Scholar 

  11. Moore B.C.J. (1989) An introduction to the psychology of hearing, Academic Press

    Google Scholar 

  12. Portnoff M. (1980) Time-frequency representation of digital signals and systems based on short-time Fourier analysis, IEEE Trans. Acoust. Speech and Signal Proc., vol. 28, No. 1, pp. 55–69

    Article  MATH  Google Scholar 

  13. Rabiner L.R., Schafer R.W. (1978) Digital Processing of Speech Signals Prentice-Hall

    Google Scholar 

  14. Cohen L. (1995) Time-frequency analysis, Prentice Hall

    Google Scholar 

  15. Harris F.J. (1978) On the use of windows for harmonic analysis with discrete Fourier Transform, Proc. IEEE, vol. 66 No. 1, pp. 51–83

    Article  Google Scholar 

  16. Schroeder M.R., Strube H.W. (1986) Flat-spectrum speech, J. Acoust. Soc. Am. 79 (5), pp. 1580–1582, 1986

    Article  Google Scholar 

  17. Fant, G. (1965), “Acoustic Description and Classification of Phonetic Units”, Ericsson Technics, No. 1, 1965, reprinted in Fant, G, (1973), Speech Sounds and Features, The MIT Press

    Google Scholar 

  18. Fant, G. and A. Risberg (1962), “Auditory matching of vowels with two formant synthetic sounds”, Speech Transmission Laboratory-Quarterly Progress and Status Report 4, Royal Institute of Technology, Stockholm

    Google Scholar 

  19. Chistovich, L.A. (1985), “Central auditory processing of peripheral vowel spectra”, J. Acoust. Soc. Am., no. 77, pp. 789–805

    Article  Google Scholar 

  20. Hermansky, H., and D. Broad (1989), “The effective second formant F2’ and the vocal tract front cavity”, Proc Internat. Conf. Acoust. Speech Signal Processing, Glasgow, Scotland, pp. 480–483

    Google Scholar 

  21. Haykin, S. (1991) Adaptive filter theory, Prentice Hall

    Google Scholar 

  22. Makhoul J (1975) Spectral linear prediction properties and applications, IEEE Trans. ASSP- 23, pp. 283–296

    Article  Google Scholar 

  23. Hermansky, H., Fujisaki, H. & Sato Y. (1983), “Analysis and synthesis of speech based on spectral transform linear predictive method”, Proc. Intemat. Conf. Acoust. Speech Signal Processing, Boston, MA, pp. 777–780

    Google Scholar 

  24. Viswanathan R., Makhoul J. (1975) Quantization properties of trans-mission parameters in linear predictive systems, IEEE Trans. ASSP-23 No. 3, pp. 587–596

    Google Scholar 

  25. Atal B.S., Hanauer S.L. (1971) Speech Analysis and Synthesis by Linear Prediction of the Speech Wave, J. Acoust. Soc. Am. 50 (2): pp. 637–655

    Article  Google Scholar 

  26. Oppenheim A.V., Schafer R.W. (1989) Discrete-time signal processing, Prentice Hall

    Google Scholar 

  27. Lim, J.S. (1979), “Spectral root homomorphicdeconvolution system”, IEEE Trans, on Acoustics, Speech, & Signal Processing, vol. 27, no. 3, pp. 223–233

    Article  MATH  Google Scholar 

  28. Mermelstein, P. (1976), “Distance measures for speech recognition, psychological and instrumental”, in Pattern Recognition and Artificial Intelligence, R.C.H. Chen, ed., Academic Press: New York, pp. 374–388

    Google Scholar 

  29. Davis, S.B. and P. Mermelstein (1980), “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Trans, on Acoustics, Speech & Signal Processing, vol. 28, no. 4, pp. 357–366, 1980

    Article  Google Scholar 

  30. Hermansky, H. and M. Pavel (1995), “Psychophysics of speech engineering systems”, Invited paper, 13th International Congress on Phonetic Sciences, Stockholm, Sweden, pp. 42–49

    Google Scholar 

  31. Hermansky, H. (1990), “Perceptual linear predictive (PLP) analysis of speech”, J. Acoust. Soc. Amvol. 87, no. 4, pp. 1738–1752

    Article  Google Scholar 

  32. Woodland, P.C., M.J.F. Gales, and D. Pye (1996), “Improving environmental robustness in large vocabulary speech recognition”, Proc. Intemat. Conf. Acoust. Speech Signal Processing, pp. 65–68

    Google Scholar 

  33. Klatt, D.H. (1982), “Speech processing strategies based on auditory models”, in The representation of speech in the peripheral auditory system, (R. Carlson and B. Granstrom, eds.), pp. 181–202, Elsevier Biomedical Press: New York

    Google Scholar 

  34. Malayath, N., H. Hermansky, and A. Kain (1997), “Towards decomposing the sources of variability in speech”, Proc. Eurospeech 97, Rhodos, Greece

    Google Scholar 

  35. Broad, D. and H. Hermansky (1989), “The front cavity/F2’ hypothesis tested by data on tongue movements”, J. Acoust. Soc. Am., Suppl. 1, 86 S13–S14

    Google Scholar 

  36. Hermansky, H. (1995), “Exploring temporal domain for robustness in speech recognition”, Proc. of 15th International Congress on Acoustics, (Trondheim, Norway), Vol. II., pp. 61–64

    Google Scholar 

  37. Hermansky, H, N. Morgan, A. Bayya and P. Kohn (1991), “Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP)”, Proc. Eurospeech’91, Genova, Italy, pp. 1367–1371

    Google Scholar 

  38. Hirsch, H.G., P. Meyer, and H. Ruehl (1991), “Improved speech recognition using high-pass filtering of subband envelopes”, Proc. Eurospeech’91, Genova, Italy, pp. 413–416

    Google Scholar 

  39. Hermansky, H., E. Wan, E., & C. Avendano (1995), “Speech enhancement based on temporal processing”, Proc. Internal. Conf. Acoust. Speech Signal Processing, Detroit, MI, pp. 405–408

    Google Scholar 

  40. Avendano, C. and H. Hermansky (1997), “On the properties of temporal processing for speech in adverse environments”, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, New York

    Google Scholar 

  41. Cohen, J.R. (1989), “Application of an auditory model to speech recognition”, J. Acoust. Soc. Am., vol. 85, no. 6, pp. 2623–2629.

    Article  Google Scholar 

  42. Seneff, S. (1985), “A joint synchrony/mean-rate model of auditory speech processing”, J. of Phonetics, vol. 16, no. 1, pp. 55 - 76

    Google Scholar 

  43. Hermansky, H. & N. Morgan (1994), “RASTA processing of speech”, IEEE Trans, on Speech and Audio Processing, vol. 2, no. 4 pp. 578–589

    Article  Google Scholar 

  44. Hermansky.H., “Modulation spectrum in speech processing”, in Signal Analysis and Prediction, A. Prochazka, J. Uhlir, P.J.W. Rayner, N.G. Kingsbury, Eds., Birkhauser, Boston 1998

    Google Scholar 

  45. Greenberg, S. (1997), “On the origins of speech intelligibility in the real world”, Proceedings of ESCA-NATO Tutorial and Research Workshop on Robust speech recognition for unknown communication channels, Pont-a-Mousson, France, pp. 23–32

    Google Scholar 

  46. Hunt, M.J. (1979), “A statistical approach to metrics for word and syllable recognition”, J. Acoust. Soc. Am 66 (S1), S35(A)

    Google Scholar 

  47. Brown, P. (1987), The Acoustic-Modeling Problem in Automatic Speech Recognition, PhD Thesis, Computer Science Department, Carnegie Mellon University

    Google Scholar 

  48. van Vuuren, S. and H. Hermansky (1997), “Data-driven design of RASTA-like filters”, Proc. Eurospeech 97, Rhodos, Greece, pp. 409–412

    Google Scholar 

  49. Furui, S. (1981), “Cepstral analysis technique for automatic speaker verification”, IEEE Trans, on Acoustic, Speech, & Signal Processing, vol. 29, pp. 254–272

    Article  Google Scholar 

  50. Marr, D. (1982), Vision, W.H. Freeman, San Francisco

    Google Scholar 

  51. Wang, K. and S.S. Shamma (1995), “Spectral Shape Analysis in the Central Auditory System”, IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 5, pp. 382–394

    Article  Google Scholar 

  52. Kozhevnikov, V.A. and L.A. Chistovich (1967), Speech: Articulation and Perception, translated from Russian by US Department of Commerce. p. 250–251

    Google Scholar 

  53. Stevens, K.N. (1996), “Applying phonetic knowledge to lexical access”, Proc. Eurospeech 95, p.3, Madrid, Spain

    Google Scholar 

  54. Fletcher, H. (1953), Speech and Hearing in Communication, New York: Krieger

    Google Scholar 

  55. Allen, J.B. (1994), “How do humans process and recognize speech?”, IEEE Trans, on Speech and Audio Processing, vol. 2, no. 4, pp.567–577.

    Article  Google Scholar 

  56. Hermansky, H., S. Tibrewala, M. Pavel (1996), “Towards ASR on partially corrupted speech”, Proc. Internat. Conf. Spoken Language Processing, Philadelphia, PA, pp. 462–465

    Google Scholar 

  57. Bourlard, H., H. Hermansky, and N. Morgan (1996), “Copernicus and ASR challenge: Waiting for Kepler”, Proc. ARPA ASR Workshop Spring 1996, Arden House, NY, pp. 157–162

    Google Scholar 

  58. Bourlard, H. and S. Dupont (1996), “A new ASR approach based on independent processing and recombination of partial frequency bands”, Proc. Internat. Conf. Spoken Language Processing, Philadelphia, pp. 426–429

    Google Scholar 

  59. Lippmann, R.P. (1995), “Accurate consonant perception without midfrequency speech energy”, IEEE Trans, on Speech and Audio, vol. 4, no. 1, pp. 66–69

    Article  MathSciNet  Google Scholar 

  60. Tibrewala, S. and H. Hermansky (1997), “Multi-band and adaptation approaches to robust speech recognition”, Proc. Eurospeech 97, Rhodos, Greece, pp. 2619–2622

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag London Limited

About this paper

Cite this paper

Hermansky, H. (1999). Analysis in Automatic Recognition of Speech. In: Chollet, G., Di Benedetto, M.G., Esposito, A., Marinaro, M. (eds) Speech Processing, Recognition and Artificial Neural Networks. Springer, London. https://doi.org/10.1007/978-1-4471-0845-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0845-0_5

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-094-1

  • Online ISBN: 978-1-4471-0845-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics