Abstract
A joint Synchrony/Mean-Rate model of Auditory Speech Processing (ASP) is described, and its application in speech technology is considered. As for automatic segmentation and recognition, a few examples are illustrated in which the superiority of the ASP scheme over other methods is emphasized, especially considering speech in adverse conditions.
Preview
Unable to display preview. Download preview PDF.
References
L.R. Rabiner and R.W. Shafer (1978), “Digital Processing of Speech Signals”, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1978.
J.D. Markel and A.H. Gray (1976), Jr., “Linear Prediction of Speech”, Springer-Verlag, Berlin, Heidelberg, New York, 1976.
H.W. Strube (1976), “Linear prediction on a warped frequency scale”, Journal of Acoustical Society of America, JASA Vol. 68(4), Oct. 1980, pp. 1071–1076.
M. Blomberg, R. Carlson,K. Elenius and Bjorn Granstrom (1983), “Auditory Models and Isolated Word Recognition”, STL-QPSR Vol. 4, 1983, pp.1–15.
H. Hermansky, B.A. Hanson and H. Wakita, “Perceptually Based Linear Predictive Analysis of Speech”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-84), paper 13.10, pp.509–512.
H. Hermansky, J.C. Junqua, “Optimization of Perceptually Based ASR Front-Ends”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y. April 11–14, 1988, paper S5.10, pp.219–222.
S.B. Davis and P. Mermelstein (1980), “Comparison of Parametric Representation of Monosyllabic Word Recognition in Continuously Spoken Sentences”, IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP Vol. 28(4), pp. 357–366.
S. Furui (1986), “Speaker Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum”, IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP Vol. 34(1), pp. 52–59.
L.R. Rabiner, J.G. Wilpon and F.K. Soong (1988), “High Performance Connected Digit Recognition Using Hidden Markov Models”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y., April 11–14, 1988, paper S3.6, pp.119–122.
K.F. Lee (1989), “Automatic Speech Recognition; The Development of the SPHINX System”, Kluwer Academic Publisher, Boston, 1989.
O. Rioul and M. Vetterli, “Wavelets and Signal Processing”, IEEE Signal Processing Magazine, October 1991, pp.14–38.
B. Delgutte (1980), “Representation of speech-like sounds in the discharge patterns of auditory-nerve fibers”, Journal of the Acoustical Society of America, JASA Vol. 68, 1980, pp. 843–857.
B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: I. Vowel-like sounds”, Journal of Acoustical Society of America, JASA Vol. 75, 1984, pp. 866–878.
B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: II. Processing Schemes for Vowel-like sounds”, Journal of Acoustical Society America, JASA Vol. 75, 1984, pp. 897–907.
B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: III. Voiceless fricative consonants”, Journal of Acoustical Society of America, JASA Vol. 75, 1984, pp. 887–896.
B. Delgutte and N.Y.S. Kiang (1984), “Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics”, Journal of Acoustical Society America, JASA Vol. 75, 1984, pp. 897–907.
E. D. Young and M.B. Sachs (1979),“Representation of steady-state vowels in the temporal aspects of the discharge pattern of populations of auditory nerve fibers”, Journal of Acoustical Society of America, JASA Vol. 66, 1979, pp. 1381–1403.
M. B. Sachs and E. D. Young (1980), “Effects of nonlinearities on speech encoding in the auditory nerve”, Journal of Acoustical Society of America, JASA Vol. 68, 1980, pp. 858–875.
M. I. Miller and M. B. Sachs (1983), “Representation of stop consonants in the discharge patterns of auditory-nerve fibers”, Journal of Acoustical Society of America, JASA Vol. 74, 1983, pp. 502–517.
D. G. Sinex and C. D. Geisler (1983), “Responses of auditory-nerve fibers to consonant-vowel syllables”, Journal of Acoustical Society of America, JASA Vol. 73, 1983, pp. 602–615.
N.Y.S. Kiang, T. Watanabe, E. C. Thomas and L. F. Clark (1965), Discharge patterns of single fibers in the cat's auditory-nerve fibers, Cambridge, MA, MIT press, 1965.
S. Greenberg ed. (1988), “Representation of Speech in the Auditory Periphery”, Journal of Phonetics, Special Issue, Vo. 16(1), January 1988.
S. Seneff (1988), “A joint synchrony/mean-rate model of auditory speech processing”, Journal of Phonetics, Special Issue, Vol. 16(1), January 1988, pp. 55–76.
V.W. Zue, J. Glass, M. Philips and S. Seneff, “Acoustic Segmentation and Phonetic Classification in the SUMMIT System”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-89), paper S8.1, pp. 389–392.
P. Cosi, Y. Bengio and R. De Mori,(1990), “Phonetically-Based Multi-Layered Neural Networks for Vowel Classification”, Speech Comm., Vol. 9, N. 1, Feb 1990, pp. 15–29.
P. Cosi, P. Frasconi, M. Gori and N. Griggio, “Phonetic Recognition Experiments with Recurrent Neural Networks”, Proc. International Conference on Spoken Language Processing (ICSLP-92), Banff, Alberta, Canada, October 12–16, 1992, pp. 1335–1338
P. Cosi, “Ear Modelling for Speech Analysis and Recognition” (1992), Proceedings of “Comparing Speech Signal Representations”, ESCA Tutorial and Research Workshop, Sheffield, England, 8–9 April 1992; paper ISSN 1018–4554 (to be published in J. Wiley & sons L.t.D. book).
M.J. Hunt and C. Lefebvre, “Speaker Dependent and Independent Speech Recognition Experiments with an Auditory Model”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y., April 11–14, 1988, paper S5.9, pp. 215–218.
S. Seneff (1984), “Pitch and spectral estimation of speech based on an auditory synchrony model”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-84), San Diego, CA, March 19–21, 1984, pp. 36.2.1–36.2.4.
S. Seneff (1985), “Pitch and spectral analysis of speech based on an auditory synchrony model”, RLE Technical Report, No. 504, Mass. Inst. of Techn., 1985.
S. Seneff (1986), “A computational model for the peripheral auditory system: application to speech recognition research”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-86), Tokyo, April 7–11, 1986, pp. 37.8.1–37.8.4.
E. Zwicker and E. Terhardt, “Analytical expression for critical-band rate and critical bandwidth ad a function of frequency”, Journal of Acoustical Society of America, JASA Vol. 68(5), 1980, pp. 1523–1525.
R.S. Goldhor (1985), “Representation of Consonants in the Peripheral Auditory System: A Modeling Study of the Correspondence between Response Properties and Phonetic Features”, RLE Technical Report, N. 505, MIT press, 1985.
D.H. Swami and A. Swami (1983), “The transmission of signals by auditory-nerve fiber discharge patterns”, Journal of Acoustical Society of America, JASA Vol. 74, pp. 493–501.
P. Cosi, L. Dellana, G.A. Mian and M. Omologo (1991), “Auditory Model Implementation on a DSP32C-Board”, Proc. GRETSI-91, Juan Les Pins, 16–20 Sep 1991.
J.R. Glass and V.W. Zue (1986), “Signal Representation for Acoustic Segmentation”, Proc. First Australian Conference on Speech Science and Technology, November 1986, pp.124–129.
J.R. Glass and V.W. Zue (1988), “Multi-Level Acoustic Segmentation of Continuous Speech”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-88), New York, N.Y., April 11–14, 1988, paper S10.6, pp. 429–432.
J.R. Glass (1988), “Finding Acoustic Regularities in Speech: Application to Phonetic Recognition”, Ph.D Thesis, May 1988, MIT press.
H.C. Leung and V. W. Zue (1984), “A Procedure for Automatic Alignment of Phonetic Transcriptions with Continuous Speech”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-84), San Diego, CA, March 19–21, 1984, pp. 2.7.1–2.7.4.
H.C. Leung (1984), “A Procedure for Automatic Alignment of Phonetic Transcriptions with Continuous Speech”, M.S. Thesis, Department of Electrical Engineering and Computer Science, Massachussets Institute of Technology, January 1985.
S. Seneff and V.W Zue (1988), “Transcription and Alignment of the TIMIT Database”, Unpublished manuscript to be distributed with the TIMIT database by NBS, 1988.
A.J. Fourcin, G. Harland, W. Barry.and W. Hazan eds. (1989), “Speech Input and Output Assessment, Multilingual Methods and Standards”, Ellis Horwood Books in Information Technology, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Cosi, P. (1993). On the use of auditory models in speech technology. In: Roberto, V. (eds) Intelligent Perceptual Systems. Lecture Notes in Computer Science, vol 745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57379-8_5
Download citation
DOI: https://doi.org/10.1007/3-540-57379-8_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57379-1
Online ISBN: 978-3-540-48103-4
eBook Packages: Springer Book Archive