Abstract
The chapter describes the purpose and origins of some of the methods used for speech analysis. The first part discusses the history of the development of some of the early speech analysis techniques. The focus of the second half is on a currently active area of research on auditory modeling in speech analysis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Atal B.S., Schroeder M.R. (1979) Predictive coding of speech signals and subjective error criterion, IEEE Trans. ASSP-27 pp. 247–254
Schroeder M.R., Atal BS (1968) Predictive coding of speech signals, reports of the 6th Intl. Cong. Acoust., ed. By Y. Kohasi (Tokyo) C-5-5
Itakura F., Saito S. (1970) A statistical method for estimation of speech spectral density and formant frequencies, electronics and Communication in Japan, Vol. 53-A, pp. 36–43
Atal B.S., Remde J.R. (1982) A new model of LPC excitation for producing natural sounding speech, in Proc. IEEE ICASSP’82, pp. 614–618
Schroeder M., Atal B.S. (1985) Code-excited linear prediction (CELP): High-quality speech at very low bit rates, in Proc. IEEE ICASSP’85, pp. 937–940
Scripture C. (1906) Researches in Experimental Phonetics, The Carnegie Institution of Washington
Koenig W., Dunn H.K., Lacey L.Y. (1946) The sound spectrograph, J. Acoust. Soc. Am. 18: pp. 19–49
Cole R.A., Zue V., Reddy R. (1978) Speech as patterns on paper, in Perception and Production of Fluent Speech, Cole RA ed., Erlbaum
Flanagan, J.L. (1972), Speech Analysis Synthesis and Perception, Second Edition, Springer-Verlag
Helmholtz, H. (1954), On the Sensation of Tone, Dover
Moore B.C.J. (1989) An introduction to the psychology of hearing, Academic Press
Portnoff M. (1980) Time-frequency representation of digital signals and systems based on short-time Fourier analysis, IEEE Trans. Acoust. Speech and Signal Proc., vol. 28, No. 1, pp. 55–69
Rabiner L.R., Schafer R.W. (1978) Digital Processing of Speech Signals Prentice-Hall
Cohen L. (1995) Time-frequency analysis, Prentice Hall
Harris F.J. (1978) On the use of windows for harmonic analysis with discrete Fourier Transform, Proc. IEEE, vol. 66 No. 1, pp. 51–83
Schroeder M.R., Strube H.W. (1986) Flat-spectrum speech, J. Acoust. Soc. Am. 79 (5), pp. 1580–1582, 1986
Fant, G. (1965), “Acoustic Description and Classification of Phonetic Units”, Ericsson Technics, No. 1, 1965, reprinted in Fant, G, (1973), Speech Sounds and Features, The MIT Press
Fant, G. and A. Risberg (1962), “Auditory matching of vowels with two formant synthetic sounds”, Speech Transmission Laboratory-Quarterly Progress and Status Report 4, Royal Institute of Technology, Stockholm
Chistovich, L.A. (1985), “Central auditory processing of peripheral vowel spectra”, J. Acoust. Soc. Am., no. 77, pp. 789–805
Hermansky, H., and D. Broad (1989), “The effective second formant F2’ and the vocal tract front cavity”, Proc Internat. Conf. Acoust. Speech Signal Processing, Glasgow, Scotland, pp. 480–483
Haykin, S. (1991) Adaptive filter theory, Prentice Hall
Makhoul J (1975) Spectral linear prediction properties and applications, IEEE Trans. ASSP- 23, pp. 283–296
Hermansky, H., Fujisaki, H. & Sato Y. (1983), “Analysis and synthesis of speech based on spectral transform linear predictive method”, Proc. Intemat. Conf. Acoust. Speech Signal Processing, Boston, MA, pp. 777–780
Viswanathan R., Makhoul J. (1975) Quantization properties of trans-mission parameters in linear predictive systems, IEEE Trans. ASSP-23 No. 3, pp. 587–596
Atal B.S., Hanauer S.L. (1971) Speech Analysis and Synthesis by Linear Prediction of the Speech Wave, J. Acoust. Soc. Am. 50 (2): pp. 637–655
Oppenheim A.V., Schafer R.W. (1989) Discrete-time signal processing, Prentice Hall
Lim, J.S. (1979), “Spectral root homomorphicdeconvolution system”, IEEE Trans, on Acoustics, Speech, & Signal Processing, vol. 27, no. 3, pp. 223–233
Mermelstein, P. (1976), “Distance measures for speech recognition, psychological and instrumental”, in Pattern Recognition and Artificial Intelligence, R.C.H. Chen, ed., Academic Press: New York, pp. 374–388
Davis, S.B. and P. Mermelstein (1980), “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Trans, on Acoustics, Speech & Signal Processing, vol. 28, no. 4, pp. 357–366, 1980
Hermansky, H. and M. Pavel (1995), “Psychophysics of speech engineering systems”, Invited paper, 13th International Congress on Phonetic Sciences, Stockholm, Sweden, pp. 42–49
Hermansky, H. (1990), “Perceptual linear predictive (PLP) analysis of speech”, J. Acoust. Soc. Amvol. 87, no. 4, pp. 1738–1752
Woodland, P.C., M.J.F. Gales, and D. Pye (1996), “Improving environmental robustness in large vocabulary speech recognition”, Proc. Intemat. Conf. Acoust. Speech Signal Processing, pp. 65–68
Klatt, D.H. (1982), “Speech processing strategies based on auditory models”, in The representation of speech in the peripheral auditory system, (R. Carlson and B. Granstrom, eds.), pp. 181–202, Elsevier Biomedical Press: New York
Malayath, N., H. Hermansky, and A. Kain (1997), “Towards decomposing the sources of variability in speech”, Proc. Eurospeech 97, Rhodos, Greece
Broad, D. and H. Hermansky (1989), “The front cavity/F2’ hypothesis tested by data on tongue movements”, J. Acoust. Soc. Am., Suppl. 1, 86 S13–S14
Hermansky, H. (1995), “Exploring temporal domain for robustness in speech recognition”, Proc. of 15th International Congress on Acoustics, (Trondheim, Norway), Vol. II., pp. 61–64
Hermansky, H, N. Morgan, A. Bayya and P. Kohn (1991), “Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP)”, Proc. Eurospeech’91, Genova, Italy, pp. 1367–1371
Hirsch, H.G., P. Meyer, and H. Ruehl (1991), “Improved speech recognition using high-pass filtering of subband envelopes”, Proc. Eurospeech’91, Genova, Italy, pp. 413–416
Hermansky, H., E. Wan, E., & C. Avendano (1995), “Speech enhancement based on temporal processing”, Proc. Internal. Conf. Acoust. Speech Signal Processing, Detroit, MI, pp. 405–408
Avendano, C. and H. Hermansky (1997), “On the properties of temporal processing for speech in adverse environments”, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, New York
Cohen, J.R. (1989), “Application of an auditory model to speech recognition”, J. Acoust. Soc. Am., vol. 85, no. 6, pp. 2623–2629.
Seneff, S. (1985), “A joint synchrony/mean-rate model of auditory speech processing”, J. of Phonetics, vol. 16, no. 1, pp. 55 - 76
Hermansky, H. & N. Morgan (1994), “RASTA processing of speech”, IEEE Trans, on Speech and Audio Processing, vol. 2, no. 4 pp. 578–589
Hermansky.H., “Modulation spectrum in speech processing”, in Signal Analysis and Prediction, A. Prochazka, J. Uhlir, P.J.W. Rayner, N.G. Kingsbury, Eds., Birkhauser, Boston 1998
Greenberg, S. (1997), “On the origins of speech intelligibility in the real world”, Proceedings of ESCA-NATO Tutorial and Research Workshop on Robust speech recognition for unknown communication channels, Pont-a-Mousson, France, pp. 23–32
Hunt, M.J. (1979), “A statistical approach to metrics for word and syllable recognition”, J. Acoust. Soc. Am 66 (S1), S35(A)
Brown, P. (1987), The Acoustic-Modeling Problem in Automatic Speech Recognition, PhD Thesis, Computer Science Department, Carnegie Mellon University
van Vuuren, S. and H. Hermansky (1997), “Data-driven design of RASTA-like filters”, Proc. Eurospeech 97, Rhodos, Greece, pp. 409–412
Furui, S. (1981), “Cepstral analysis technique for automatic speaker verification”, IEEE Trans, on Acoustic, Speech, & Signal Processing, vol. 29, pp. 254–272
Marr, D. (1982), Vision, W.H. Freeman, San Francisco
Wang, K. and S.S. Shamma (1995), “Spectral Shape Analysis in the Central Auditory System”, IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 5, pp. 382–394
Kozhevnikov, V.A. and L.A. Chistovich (1967), Speech: Articulation and Perception, translated from Russian by US Department of Commerce. p. 250–251
Stevens, K.N. (1996), “Applying phonetic knowledge to lexical access”, Proc. Eurospeech 95, p.3, Madrid, Spain
Fletcher, H. (1953), Speech and Hearing in Communication, New York: Krieger
Allen, J.B. (1994), “How do humans process and recognize speech?”, IEEE Trans, on Speech and Audio Processing, vol. 2, no. 4, pp.567–577.
Hermansky, H., S. Tibrewala, M. Pavel (1996), “Towards ASR on partially corrupted speech”, Proc. Internat. Conf. Spoken Language Processing, Philadelphia, PA, pp. 462–465
Bourlard, H., H. Hermansky, and N. Morgan (1996), “Copernicus and ASR challenge: Waiting for Kepler”, Proc. ARPA ASR Workshop Spring 1996, Arden House, NY, pp. 157–162
Bourlard, H. and S. Dupont (1996), “A new ASR approach based on independent processing and recombination of partial frequency bands”, Proc. Internat. Conf. Spoken Language Processing, Philadelphia, pp. 426–429
Lippmann, R.P. (1995), “Accurate consonant perception without midfrequency speech energy”, IEEE Trans, on Speech and Audio, vol. 4, no. 1, pp. 66–69
Tibrewala, S. and H. Hermansky (1997), “Multi-band and adaptation approaches to robust speech recognition”, Proc. Eurospeech 97, Rhodos, Greece, pp. 2619–2622
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this paper
Cite this paper
Hermansky, H. (1999). Analysis in Automatic Recognition of Speech. In: Chollet, G., Di Benedetto, M.G., Esposito, A., Marinaro, M. (eds) Speech Processing, Recognition and Artificial Neural Networks. Springer, London. https://doi.org/10.1007/978-1-4471-0845-0_5
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0845-0_5
Publisher Name: Springer, London
Print ISBN: 978-1-85233-094-1
Online ISBN: 978-1-4471-0845-0
eBook Packages: Springer Book Archive