Analysis in Automatic Recognition of Speech

Hermansky, Hynek

doi:10.1007/978-1-4471-0845-0_5

Analysis in Automatic Recognition of Speech

Hynek Hermansky⁴

Conference paper

262 Accesses

Abstract

The chapter describes the purpose and origins of some of the methods used for speech analysis. The first part discusses the history of the development of some of the early speech analysis techniques. The focus of the second half is on a currently active area of research on auditory modeling in speech analysis.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atal B.S., Schroeder M.R. (1979) Predictive coding of speech signals and subjective error criterion, IEEE Trans. ASSP-27 pp. 247–254
Article Google Scholar
Schroeder M.R., Atal BS (1968) Predictive coding of speech signals, reports of the 6th Intl. Cong. Acoust., ed. By Y. Kohasi (Tokyo) C-5-5
Google Scholar
Itakura F., Saito S. (1970) A statistical method for estimation of speech spectral density and formant frequencies, electronics and Communication in Japan, Vol. 53-A, pp. 36–43
Google Scholar
Atal B.S., Remde J.R. (1982) A new model of LPC excitation for producing natural sounding speech, in Proc. IEEE ICASSP’82, pp. 614–618
Google Scholar
Schroeder M., Atal B.S. (1985) Code-excited linear prediction (CELP): High-quality speech at very low bit rates, in Proc. IEEE ICASSP’85, pp. 937–940
Google Scholar
Scripture C. (1906) Researches in Experimental Phonetics, The Carnegie Institution of Washington
Google Scholar
Koenig W., Dunn H.K., Lacey L.Y. (1946) The sound spectrograph, J. Acoust. Soc. Am. 18: pp. 19–49
Article Google Scholar
Cole R.A., Zue V., Reddy R. (1978) Speech as patterns on paper, in Perception and Production of Fluent Speech, Cole RA ed., Erlbaum
Google Scholar
Flanagan, J.L. (1972), Speech Analysis Synthesis and Perception, Second Edition, Springer-Verlag
Google Scholar
Helmholtz, H. (1954), On the Sensation of Tone, Dover
Google Scholar
Moore B.C.J. (1989) An introduction to the psychology of hearing, Academic Press
Google Scholar
Portnoff M. (1980) Time-frequency representation of digital signals and systems based on short-time Fourier analysis, IEEE Trans. Acoust. Speech and Signal Proc., vol. 28, No. 1, pp. 55–69
Article MATH Google Scholar
Rabiner L.R., Schafer R.W. (1978) Digital Processing of Speech Signals Prentice-Hall
Google Scholar
Cohen L. (1995) Time-frequency analysis, Prentice Hall
Google Scholar
Harris F.J. (1978) On the use of windows for harmonic analysis with discrete Fourier Transform, Proc. IEEE, vol. 66 No. 1, pp. 51–83
Article Google Scholar
Schroeder M.R., Strube H.W. (1986) Flat-spectrum speech, J. Acoust. Soc. Am. 79 (5), pp. 1580–1582, 1986
Article Google Scholar
Fant, G. (1965), “Acoustic Description and Classification of Phonetic Units”, Ericsson Technics, No. 1, 1965, reprinted in Fant, G, (1973), Speech Sounds and Features, The MIT Press
Google Scholar
Fant, G. and A. Risberg (1962), “Auditory matching of vowels with two formant synthetic sounds”, Speech Transmission Laboratory-Quarterly Progress and Status Report 4, Royal Institute of Technology, Stockholm
Google Scholar
Chistovich, L.A. (1985), “Central auditory processing of peripheral vowel spectra”, J. Acoust. Soc. Am., no. 77, pp. 789–805
Article Google Scholar
Hermansky, H., and D. Broad (1989), “The effective second formant F2’ and the vocal tract front cavity”, Proc Internat. Conf. Acoust. Speech Signal Processing, Glasgow, Scotland, pp. 480–483
Google Scholar
Haykin, S. (1991) Adaptive filter theory, Prentice Hall
Google Scholar
Makhoul J (1975) Spectral linear prediction properties and applications, IEEE Trans. ASSP- 23, pp. 283–296
Article Google Scholar
Hermansky, H., Fujisaki, H. & Sato Y. (1983), “Analysis and synthesis of speech based on spectral transform linear predictive method”, Proc. Intemat. Conf. Acoust. Speech Signal Processing, Boston, MA, pp. 777–780
Google Scholar
Viswanathan R., Makhoul J. (1975) Quantization properties of trans-mission parameters in linear predictive systems, IEEE Trans. ASSP-23 No. 3, pp. 587–596
Google Scholar
Atal B.S., Hanauer S.L. (1971) Speech Analysis and Synthesis by Linear Prediction of the Speech Wave, J. Acoust. Soc. Am. 50 (2): pp. 637–655
Article Google Scholar
Oppenheim A.V., Schafer R.W. (1989) Discrete-time signal processing, Prentice Hall
Google Scholar
Lim, J.S. (1979), “Spectral root homomorphicdeconvolution system”, IEEE Trans, on Acoustics, Speech, & Signal Processing, vol. 27, no. 3, pp. 223–233
Article MATH Google Scholar
Mermelstein, P. (1976), “Distance measures for speech recognition, psychological and instrumental”, in Pattern Recognition and Artificial Intelligence, R.C.H. Chen, ed., Academic Press: New York, pp. 374–388
Google Scholar
Davis, S.B. and P. Mermelstein (1980), “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Trans, on Acoustics, Speech & Signal Processing, vol. 28, no. 4, pp. 357–366, 1980
Article Google Scholar
Hermansky, H. and M. Pavel (1995), “Psychophysics of speech engineering systems”, Invited paper, 13th International Congress on Phonetic Sciences, Stockholm, Sweden, pp. 42–49
Google Scholar
Hermansky, H. (1990), “Perceptual linear predictive (PLP) analysis of speech”, J. Acoust. Soc. Amvol. 87, no. 4, pp. 1738–1752
Article Google Scholar
Woodland, P.C., M.J.F. Gales, and D. Pye (1996), “Improving environmental robustness in large vocabulary speech recognition”, Proc. Intemat. Conf. Acoust. Speech Signal Processing, pp. 65–68
Google Scholar
Klatt, D.H. (1982), “Speech processing strategies based on auditory models”, in The representation of speech in the peripheral auditory system, (R. Carlson and B. Granstrom, eds.), pp. 181–202, Elsevier Biomedical Press: New York
Google Scholar
Malayath, N., H. Hermansky, and A. Kain (1997), “Towards decomposing the sources of variability in speech”, Proc. Eurospeech 97, Rhodos, Greece
Google Scholar
Broad, D. and H. Hermansky (1989), “The front cavity/F2’ hypothesis tested by data on tongue movements”, J. Acoust. Soc. Am., Suppl. 1, 86 S13–S14
Google Scholar
Hermansky, H. (1995), “Exploring temporal domain for robustness in speech recognition”, Proc. of 15th International Congress on Acoustics, (Trondheim, Norway), Vol. II., pp. 61–64
Google Scholar
Hermansky, H, N. Morgan, A. Bayya and P. Kohn (1991), “Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP)”, Proc. Eurospeech’91, Genova, Italy, pp. 1367–1371
Google Scholar
Hirsch, H.G., P. Meyer, and H. Ruehl (1991), “Improved speech recognition using high-pass filtering of subband envelopes”, Proc. Eurospeech’91, Genova, Italy, pp. 413–416
Google Scholar
Hermansky, H., E. Wan, E., & C. Avendano (1995), “Speech enhancement based on temporal processing”, Proc. Internal. Conf. Acoust. Speech Signal Processing, Detroit, MI, pp. 405–408
Google Scholar
Avendano, C. and H. Hermansky (1997), “On the properties of temporal processing for speech in adverse environments”, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, New York
Google Scholar
Cohen, J.R. (1989), “Application of an auditory model to speech recognition”, J. Acoust. Soc. Am., vol. 85, no. 6, pp. 2623–2629.
Article Google Scholar
Seneff, S. (1985), “A joint synchrony/mean-rate model of auditory speech processing”, J. of Phonetics, vol. 16, no. 1, pp. 55 - 76
Google Scholar
Hermansky, H. & N. Morgan (1994), “RASTA processing of speech”, IEEE Trans, on Speech and Audio Processing, vol. 2, no. 4 pp. 578–589
Article Google Scholar
Hermansky.H., “Modulation spectrum in speech processing”, in Signal Analysis and Prediction, A. Prochazka, J. Uhlir, P.J.W. Rayner, N.G. Kingsbury, Eds., Birkhauser, Boston 1998
Google Scholar
Greenberg, S. (1997), “On the origins of speech intelligibility in the real world”, Proceedings of ESCA-NATO Tutorial and Research Workshop on Robust speech recognition for unknown communication channels, Pont-a-Mousson, France, pp. 23–32
Google Scholar
Hunt, M.J. (1979), “A statistical approach to metrics for word and syllable recognition”, J. Acoust. Soc. Am 66 (S1), S35(A)
Google Scholar
Brown, P. (1987), The Acoustic-Modeling Problem in Automatic Speech Recognition, PhD Thesis, Computer Science Department, Carnegie Mellon University
Google Scholar
van Vuuren, S. and H. Hermansky (1997), “Data-driven design of RASTA-like filters”, Proc. Eurospeech 97, Rhodos, Greece, pp. 409–412
Google Scholar
Furui, S. (1981), “Cepstral analysis technique for automatic speaker verification”, IEEE Trans, on Acoustic, Speech, & Signal Processing, vol. 29, pp. 254–272
Article Google Scholar
Marr, D. (1982), Vision, W.H. Freeman, San Francisco
Google Scholar
Wang, K. and S.S. Shamma (1995), “Spectral Shape Analysis in the Central Auditory System”, IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 5, pp. 382–394
Article Google Scholar
Kozhevnikov, V.A. and L.A. Chistovich (1967), Speech: Articulation and Perception, translated from Russian by US Department of Commerce. p. 250–251
Google Scholar
Stevens, K.N. (1996), “Applying phonetic knowledge to lexical access”, Proc. Eurospeech 95, p.3, Madrid, Spain
Google Scholar
Fletcher, H. (1953), Speech and Hearing in Communication, New York: Krieger
Google Scholar
Allen, J.B. (1994), “How do humans process and recognize speech?”, IEEE Trans, on Speech and Audio Processing, vol. 2, no. 4, pp.567–577.
Article Google Scholar
Hermansky, H., S. Tibrewala, M. Pavel (1996), “Towards ASR on partially corrupted speech”, Proc. Internat. Conf. Spoken Language Processing, Philadelphia, PA, pp. 462–465
Google Scholar
Bourlard, H., H. Hermansky, and N. Morgan (1996), “Copernicus and ASR challenge: Waiting for Kepler”, Proc. ARPA ASR Workshop Spring 1996, Arden House, NY, pp. 157–162
Google Scholar
Bourlard, H. and S. Dupont (1996), “A new ASR approach based on independent processing and recombination of partial frequency bands”, Proc. Internat. Conf. Spoken Language Processing, Philadelphia, pp. 426–429
Google Scholar
Lippmann, R.P. (1995), “Accurate consonant perception without midfrequency speech energy”, IEEE Trans, on Speech and Audio, vol. 4, no. 1, pp. 66–69
Article MathSciNet Google Scholar
Tibrewala, S. and H. Hermansky (1997), “Multi-band and adaptation approaches to robust speech recognition”, Proc. Eurospeech 97, Rhodos, Greece, pp. 2619–2622
Google Scholar

Download references

Author information

Authors and Affiliations

Oregon Graduate Institute, Portland, Oregon, USA
Hynek Hermansky

Authors

Hynek Hermansky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ENST-CNR URA 820, 46 rue Barrault, 75634, Paris Cedex 13, France
Gerard Chollet PhD
INFOCOM Department, Rome University “La Sapienza”, via Eudossiana 18, I00184, Rome, Italy
Maria Gabriella Di Benedetto PhD
IIASS, via G Pellegrino 19, I-84019, Vietri sul Mare (SA), Italy
Anna Esposito PhD & Maria Marinaro PhD &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hermansky, H. (1999). Analysis in Automatic Recognition of Speech. In: Chollet, G., Di Benedetto, M.G., Esposito, A., Marinaro, M. (eds) Speech Processing, Recognition and Artificial Neural Networks. Springer, London. https://doi.org/10.1007/978-1-4471-0845-0_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0845-0_5
Publisher Name: Springer, London
Print ISBN: 978-1-85233-094-1
Online ISBN: 978-1-4471-0845-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics