On the use of auditory models in speech technology

Cosi, Piero

doi:10.1007/3-540-57379-8_5

Piero Cosi¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 745))

125 Accesses
1 Citations

Abstract

A joint Synchrony/Mean-Rate model of Auditory Speech Processing (ASP) is described, and its application in speech technology is considered. As for automatic segmentation and recognition, a few examples are illustrated in which the superiority of the ASP scheme over other methods is emphasized, especially considering speech in adverse conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L.R. Rabiner and R.W. Shafer (1978), “Digital Processing of Speech Signals”, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1978.
Google Scholar
J.D. Markel and A.H. Gray (1976), Jr., “Linear Prediction of Speech”, Springer-Verlag, Berlin, Heidelberg, New York, 1976.
Google Scholar
H.W. Strube (1976), “Linear prediction on a warped frequency scale”, Journal of Acoustical Society of America, JASA Vol. 68(4), Oct. 1980, pp. 1071–1076.
Google Scholar
M. Blomberg, R. Carlson,K. Elenius and Bjorn Granstrom (1983), “Auditory Models and Isolated Word Recognition”, STL-QPSR Vol. 4, 1983, pp.1–15.
Google Scholar
H. Hermansky, B.A. Hanson and H. Wakita, “Perceptually Based Linear Predictive Analysis of Speech”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-84), paper 13.10, pp.509–512.
Google Scholar
H. Hermansky, J.C. Junqua, “Optimization of Perceptually Based ASR Front-Ends”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y. April 11–14, 1988, paper S5.10, pp.219–222.
Google Scholar
S.B. Davis and P. Mermelstein (1980), “Comparison of Parametric Representation of Monosyllabic Word Recognition in Continuously Spoken Sentences”, IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP Vol. 28(4), pp. 357–366.
Google Scholar
S. Furui (1986), “Speaker Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum”, IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP Vol. 34(1), pp. 52–59.
Google Scholar
L.R. Rabiner, J.G. Wilpon and F.K. Soong (1988), “High Performance Connected Digit Recognition Using Hidden Markov Models”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y., April 11–14, 1988, paper S3.6, pp.119–122.
Google Scholar
K.F. Lee (1989), “Automatic Speech Recognition; The Development of the SPHINX System”, Kluwer Academic Publisher, Boston, 1989.
Google Scholar
O. Rioul and M. Vetterli, “Wavelets and Signal Processing”, IEEE Signal Processing Magazine, October 1991, pp.14–38.
Google Scholar
B. Delgutte (1980), “Representation of speech-like sounds in the discharge patterns of auditory-nerve fibers”, Journal of the Acoustical Society of America, JASA Vol. 68, 1980, pp. 843–857.
Google Scholar
B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: I. Vowel-like sounds”, Journal of Acoustical Society of America, JASA Vol. 75, 1984, pp. 866–878.
Google Scholar
B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: II. Processing Schemes for Vowel-like sounds”, Journal of Acoustical Society America, JASA Vol. 75, 1984, pp. 897–907.
Google Scholar
B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: III. Voiceless fricative consonants”, Journal of Acoustical Society of America, JASA Vol. 75, 1984, pp. 887–896.
Google Scholar
B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics”, Journal of Acoustical Society America, JASA Vol. 75, 1984, pp. 897–907.
Google Scholar
E. D. Young and M.B. Sachs (1979),“Representation of steady-state vowels in the temporal aspects of the discharge pattern of populations of auditory nerve fibers”, Journal of Acoustical Society of America, JASA Vol. 66, 1979, pp. 1381–1403.
Google Scholar
M. B. Sachs and E. D. Young (1980), “Effects of nonlinearities on speech encoding in the auditory nerve”, Journal of Acoustical Society of America, JASA Vol. 68, 1980, pp. 858–875.
Google Scholar
M. I. Miller and M. B. Sachs (1983), “Representation of stop consonants in the discharge patterns of auditory-nerve fibers”, Journal of Acoustical Society of America, JASA Vol. 74, 1983, pp. 502–517.
Google Scholar
D. G. Sinex and C. D. Geisler (1983), “Responses of auditory-nerve fibers to consonant-vowel syllables”, Journal of Acoustical Society of America, JASA Vol. 73, 1983, pp. 602–615.
Google Scholar
N.Y.S. Kiang, T. Watanabe, E. C. Thomas and L. F. Clark (1965), Discharge patterns of single fibers in the cat's auditory-nerve fibers, Cambridge, MA, MIT press, 1965.
Google Scholar
S. Greenberg ed. (1988), “Representation of Speech in the Auditory Periphery”, Journal of Phonetics, Special Issue, Vo. 16(1), January 1988.
Google Scholar
S. Seneff (1988), “A joint synchrony/mean-rate model of auditory speech processing”, Journal of Phonetics, Special Issue, Vol. 16(1), January 1988, pp. 55–76.
Google Scholar
V.W. Zue, J. Glass, M. Philips and S. Seneff, “Acoustic Segmentation and Phonetic Classification in the SUMMIT System”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-89), paper S8.1, pp. 389–392.
Google Scholar
P. Cosi, Y. Bengio and R. De Mori,(1990), “Phonetically-Based Multi-Layered Neural Networks for Vowel Classification”, Speech Comm., Vol. 9, N. 1, Feb 1990, pp. 15–29.
Google Scholar
P. Cosi, P. Frasconi, M. Gori and N. Griggio, “Phonetic Recognition Experiments with Recurrent Neural Networks”, Proc. International Conference on Spoken Language Processing (ICSLP-92), Banff, Alberta, Canada, October 12–16, 1992, pp. 1335–1338
Google Scholar
P. Cosi, “Ear Modelling for Speech Analysis and Recognition” (1992), Proceedings of “Comparing Speech Signal Representations”, ESCA Tutorial and Research Workshop, Sheffield, England, 8–9 April 1992; paper ISSN 1018–4554 (to be published in J. Wiley & sons L.t.D. book).
Google Scholar
M.J. Hunt and C. Lefebvre, “Speaker Dependent and Independent Speech Recognition Experiments with an Auditory Model”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y., April 11–14, 1988, paper S5.9, pp. 215–218.
Google Scholar
S. Seneff (1984), “Pitch and spectral estimation of speech based on an auditory synchrony model”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-84), San Diego, CA, March 19–21, 1984, pp. 36.2.1–36.2.4.
Google Scholar
S. Seneff (1985), “Pitch and spectral analysis of speech based on an auditory synchrony model”, RLE Technical Report, No. 504, Mass. Inst. of Techn., 1985.
Google Scholar
S. Seneff (1986), “A computational model for the peripheral auditory system: application to speech recognition research”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-86), Tokyo, April 7–11, 1986, pp. 37.8.1–37.8.4.
Google Scholar
E. Zwicker and E. Terhardt, “Analytical expression for critical-band rate and critical bandwidth ad a function of frequency”, Journal of Acoustical Society of America, JASA Vol. 68(5), 1980, pp. 1523–1525.
Google Scholar
R.S. Goldhor (1985), “Representation of Consonants in the Peripheral Auditory System: A Modeling Study of the Correspondence between Response Properties and Phonetic Features”, RLE Technical Report, N. 505, MIT press, 1985.
Google Scholar
D.H. Swami and A. Swami (1983), “The transmission of signals by auditory-nerve fiber discharge patterns”, Journal of Acoustical Society of America, JASA Vol. 74, pp. 493–501.
Google Scholar
P. Cosi, L. Dellana, G.A. Mian and M. Omologo (1991), “Auditory Model Implementation on a DSP32C-Board”, Proc. GRETSI-91, Juan Les Pins, 16–20 Sep 1991.
Google Scholar
J.R. Glass and V.W. Zue (1986), “Signal Representation for Acoustic Segmentation”, Proc. First Australian Conference on Speech Science and Technology, November 1986, pp.124–129.
Google Scholar
J.R. Glass and V.W. Zue (1988), “Multi-Level Acoustic Segmentation of Continuous Speech”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y., April 11–14, 1988, paper S10.6, pp. 429–432.
Google Scholar
J.R. Glass (1988), “Finding Acoustic Regularities in Speech: Application to Phonetic Recognition”, Ph.D Thesis, May 1988, MIT press.
Google Scholar
H.C. Leung and V. W. Zue (1984), “A Procedure for Automatic Alignment of Phonetic Transcriptions with Continuous Speech”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-84), San Diego, CA, March 19–21, 1984, pp. 2.7.1–2.7.4.
Google Scholar
H.C. Leung (1984), “A Procedure for Automatic Alignment of Phonetic Transcriptions with Continuous Speech”, M.S. Thesis, Department of Electrical Engineering and Computer Science, Massachussets Institute of Technology, January 1985.
Google Scholar
S. Seneff and V.W Zue (1988), “Transcription and Alignment of the TIMIT Database”, Unpublished manuscript to be distributed with the TIMIT database by NBS, 1988.
Google Scholar
A.J. Fourcin, G. Harland, W. Barry.and W. Hazan eds. (1989), “Speech Input and Output Assessment, Multilingual Methods and Standards”, Ellis Horwood Books in Information Technology, 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

Centro di Studio per le Ricerche di Fonetica, C.N.R., Piazza G. Salvemini 13, 35131, Padova, Italy
Piero Cosi

Authors

Piero Cosi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Vito Roberto

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cosi, P. (1993). On the use of auditory models in speech technology. In: Roberto, V. (eds) Intelligent Perceptual Systems. Lecture Notes in Computer Science, vol 745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57379-8_5

Download citation

DOI: https://doi.org/10.1007/3-540-57379-8_5
Published: 30 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57379-1
Online ISBN: 978-3-540-48103-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics