Skip to main content

On the use of auditory models in speech technology

  • Part II The Quest of Perceptual Primitives
  • Chapter
  • First Online:
Intelligent Perceptual Systems

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 745))

Abstract

A joint Synchrony/Mean-Rate model of Auditory Speech Processing (ASP) is described, and its application in speech technology is considered. As for automatic segmentation and recognition, a few examples are illustrated in which the superiority of the ASP scheme over other methods is emphasized, especially considering speech in adverse conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L.R. Rabiner and R.W. Shafer (1978), “Digital Processing of Speech Signals”, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1978.

    Google Scholar 

  2. J.D. Markel and A.H. Gray (1976), Jr., “Linear Prediction of Speech”, Springer-Verlag, Berlin, Heidelberg, New York, 1976.

    Google Scholar 

  3. H.W. Strube (1976), “Linear prediction on a warped frequency scale”, Journal of Acoustical Society of America, JASA Vol. 68(4), Oct. 1980, pp. 1071–1076.

    Google Scholar 

  4. M. Blomberg, R. Carlson,K. Elenius and Bjorn Granstrom (1983), “Auditory Models and Isolated Word Recognition”, STL-QPSR Vol. 4, 1983, pp.1–15.

    Google Scholar 

  5. H. Hermansky, B.A. Hanson and H. Wakita, “Perceptually Based Linear Predictive Analysis of Speech”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-84), paper 13.10, pp.509–512.

    Google Scholar 

  6. H. Hermansky, J.C. Junqua, “Optimization of Perceptually Based ASR Front-Ends”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y. April 11–14, 1988, paper S5.10, pp.219–222.

    Google Scholar 

  7. S.B. Davis and P. Mermelstein (1980), “Comparison of Parametric Representation of Monosyllabic Word Recognition in Continuously Spoken Sentences”, IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP Vol. 28(4), pp. 357–366.

    Google Scholar 

  8. S. Furui (1986), “Speaker Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum”, IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP Vol. 34(1), pp. 52–59.

    Google Scholar 

  9. L.R. Rabiner, J.G. Wilpon and F.K. Soong (1988), “High Performance Connected Digit Recognition Using Hidden Markov Models”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y., April 11–14, 1988, paper S3.6, pp.119–122.

    Google Scholar 

  10. K.F. Lee (1989), “Automatic Speech Recognition; The Development of the SPHINX System”, Kluwer Academic Publisher, Boston, 1989.

    Google Scholar 

  11. O. Rioul and M. Vetterli, “Wavelets and Signal Processing”, IEEE Signal Processing Magazine, October 1991, pp.14–38.

    Google Scholar 

  12. B. Delgutte (1980), “Representation of speech-like sounds in the discharge patterns of auditory-nerve fibers”, Journal of the Acoustical Society of America, JASA Vol. 68, 1980, pp. 843–857.

    Google Scholar 

  13. B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: I. Vowel-like sounds”, Journal of Acoustical Society of America, JASA Vol. 75, 1984, pp. 866–878.

    Google Scholar 

  14. B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: II. Processing Schemes for Vowel-like sounds”, Journal of Acoustical Society America, JASA Vol. 75, 1984, pp. 897–907.

    Google Scholar 

  15. B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: III. Voiceless fricative consonants”, Journal of Acoustical Society of America, JASA Vol. 75, 1984, pp. 887–896.

    Google Scholar 

  16. B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics”, Journal of Acoustical Society America, JASA Vol. 75, 1984, pp. 897–907.

    Google Scholar 

  17. E. D. Young and M.B. Sachs (1979),“Representation of steady-state vowels in the temporal aspects of the discharge pattern of populations of auditory nerve fibers”, Journal of Acoustical Society of America, JASA Vol. 66, 1979, pp. 1381–1403.

    Google Scholar 

  18. M. B. Sachs and E. D. Young (1980), “Effects of nonlinearities on speech encoding in the auditory nerve”, Journal of Acoustical Society of America, JASA Vol. 68, 1980, pp. 858–875.

    Google Scholar 

  19. M. I. Miller and M. B. Sachs (1983), “Representation of stop consonants in the discharge patterns of auditory-nerve fibers”, Journal of Acoustical Society of America, JASA Vol. 74, 1983, pp. 502–517.

    Google Scholar 

  20. D. G. Sinex and C. D. Geisler (1983), “Responses of auditory-nerve fibers to consonant-vowel syllables”, Journal of Acoustical Society of America, JASA Vol. 73, 1983, pp. 602–615.

    Google Scholar 

  21. N.Y.S. Kiang, T. Watanabe, E. C. Thomas and L. F. Clark (1965), Discharge patterns of single fibers in the cat's auditory-nerve fibers, Cambridge, MA, MIT press, 1965.

    Google Scholar 

  22. S. Greenberg ed. (1988), “Representation of Speech in the Auditory Periphery”, Journal of Phonetics, Special Issue, Vo. 16(1), January 1988.

    Google Scholar 

  23. S. Seneff (1988), “A joint synchrony/mean-rate model of auditory speech processing”, Journal of Phonetics, Special Issue, Vol. 16(1), January 1988, pp. 55–76.

    Google Scholar 

  24. V.W. Zue, J. Glass, M. Philips and S. Seneff, “Acoustic Segmentation and Phonetic Classification in the SUMMIT System”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-89), paper S8.1, pp. 389–392.

    Google Scholar 

  25. P. Cosi, Y. Bengio and R. De Mori,(1990), “Phonetically-Based Multi-Layered Neural Networks for Vowel Classification”, Speech Comm., Vol. 9, N. 1, Feb 1990, pp. 15–29.

    Google Scholar 

  26. P. Cosi, P. Frasconi, M. Gori and N. Griggio, “Phonetic Recognition Experiments with Recurrent Neural Networks”, Proc. International Conference on Spoken Language Processing (ICSLP-92), Banff, Alberta, Canada, October 12–16, 1992, pp. 1335–1338

    Google Scholar 

  27. P. Cosi, “Ear Modelling for Speech Analysis and Recognition” (1992), Proceedings of “Comparing Speech Signal Representations”, ESCA Tutorial and Research Workshop, Sheffield, England, 8–9 April 1992; paper ISSN 1018–4554 (to be published in J. Wiley & sons L.t.D. book).

    Google Scholar 

  28. M.J. Hunt and C. Lefebvre, “Speaker Dependent and Independent Speech Recognition Experiments with an Auditory Model”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y., April 11–14, 1988, paper S5.9, pp. 215–218.

    Google Scholar 

  29. S. Seneff (1984), “Pitch and spectral estimation of speech based on an auditory synchrony model”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-84), San Diego, CA, March 19–21, 1984, pp. 36.2.1–36.2.4.

    Google Scholar 

  30. S. Seneff (1985), “Pitch and spectral analysis of speech based on an auditory synchrony model”, RLE Technical Report, No. 504, Mass. Inst. of Techn., 1985.

    Google Scholar 

  31. S. Seneff (1986), “A computational model for the peripheral auditory system: application to speech recognition research”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-86), Tokyo, April 7–11, 1986, pp. 37.8.1–37.8.4.

    Google Scholar 

  32. E. Zwicker and E. Terhardt, “Analytical expression for critical-band rate and critical bandwidth ad a function of frequency”, Journal of Acoustical Society of America, JASA Vol. 68(5), 1980, pp. 1523–1525.

    Google Scholar 

  33. R.S. Goldhor (1985), “Representation of Consonants in the Peripheral Auditory System: A Modeling Study of the Correspondence between Response Properties and Phonetic Features”, RLE Technical Report, N. 505, MIT press, 1985.

    Google Scholar 

  34. D.H. Swami and A. Swami (1983), “The transmission of signals by auditory-nerve fiber discharge patterns”, Journal of Acoustical Society of America, JASA Vol. 74, pp. 493–501.

    Google Scholar 

  35. P. Cosi, L. Dellana, G.A. Mian and M. Omologo (1991), “Auditory Model Implementation on a DSP32C-Board”, Proc. GRETSI-91, Juan Les Pins, 16–20 Sep 1991.

    Google Scholar 

  36. J.R. Glass and V.W. Zue (1986), “Signal Representation for Acoustic Segmentation”, Proc. First Australian Conference on Speech Science and Technology, November 1986, pp.124–129.

    Google Scholar 

  37. J.R. Glass and V.W. Zue (1988), “Multi-Level Acoustic Segmentation of Continuous Speech”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y., April 11–14, 1988, paper S10.6, pp. 429–432.

    Google Scholar 

  38. J.R. Glass (1988), “Finding Acoustic Regularities in Speech: Application to Phonetic Recognition”, Ph.D Thesis, May 1988, MIT press.

    Google Scholar 

  39. H.C. Leung and V. W. Zue (1984), “A Procedure for Automatic Alignment of Phonetic Transcriptions with Continuous Speech”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-84), San Diego, CA, March 19–21, 1984, pp. 2.7.1–2.7.4.

    Google Scholar 

  40. H.C. Leung (1984), “A Procedure for Automatic Alignment of Phonetic Transcriptions with Continuous Speech”, M.S. Thesis, Department of Electrical Engineering and Computer Science, Massachussets Institute of Technology, January 1985.

    Google Scholar 

  41. S. Seneff and V.W Zue (1988), “Transcription and Alignment of the TIMIT Database”, Unpublished manuscript to be distributed with the TIMIT database by NBS, 1988.

    Google Scholar 

  42. A.J. Fourcin, G. Harland, W. Barry.and W. Hazan eds. (1989), “Speech Input and Output Assessment, Multilingual Methods and Standards”, Ellis Horwood Books in Information Technology, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Vito Roberto

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Cosi, P. (1993). On the use of auditory models in speech technology. In: Roberto, V. (eds) Intelligent Perceptual Systems. Lecture Notes in Computer Science, vol 745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57379-8_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-57379-8_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57379-1

  • Online ISBN: 978-3-540-48103-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics