Abstract
This paper overviews robust architecture and modeling techniques for automatic recognition and understanding. The topics include robust acoustic and language modeling for spontaneous speech recognition, unsupervised adaptation of acoustic and language models, robust architecture for spoken dialogue systems, multi-modal speech recognition, and speech understanding. This paper also discusses the most important research problems to be solved in order to achieve ultimate robust speech recognition and understanding systems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Juang, B.-H., Furui, S.: Automatic recognition and understanding of spoken language – A first step towards natural human-machine communication. Proc. IEEE 88(8), 1142–1165 (2000)
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Furui, S.: Digital Speech Processing, Synthesis, and Recognition, 2nd edn. Marcel Dekker, New York (2000)
Ney, H.: Corpus-based statistical methods in speech and language processing. In: Young, S., Bloothooft, G. (eds.) Corpusbased Methods in Language and Speech Processing, pp. 1–26. Kluwer, Dordrecht (1997)
Furui, S.: Recent advances in spontaneous speech recognition and understanding. In: Proc. IEEE-ISCA Workshop on Spontaneous Speech Processing and Recognition (SSPR), Tokyo, pp. 1–6 (2003)
Furui, S.: Steps toward natural human-machine communication in the 21st century. In: Proc. ISCA Workshop on Voice Operated Telecom Services, Ghent, pp. 17–24 (2000)
Levin, E., et al.: The AT&T-DARPA COMMUNICATOR mixed-initiative spoken dialogue system. In: Proc. ICSLP, Beijing, pp. II–122–125 (2000)
Basu, S., et al.: Audio-visual large vocabulary continuous speech recognition in the broadcast domain. In: Proc. IEEE Multimedia Signal Processing (MMSP), Copenhagen, pp. 475–481 (1999)
Furui, S.: Toward spontaneous speech recognition and understanding. In: Chou, W., Juang, B.-H. (eds.) Pattern Recognition in Speech and language Processing, pp. 191–227. CRC Press, Boca Raton (2003)
Shinozaki, T., et al.: Towards automatic transcription of spontaneous presentations. In: Proc. Eurospeech, Aalborg, vol. 1, pp. 491–494 (2001)
Shinozaki, T., Furui, S.: Analysis on individual differences in automatic transcription of spontaneous presentations. In: Proc. ICASSP, Orlando, pp. I–729–732 (2002)
Zhang, Z., et al.: On-line incremental speaker adaptation for broadcast news transcription. Speech Communication 37, 271–281 (2002)
Zhang, Z., et al.: An online incremental speaker adaptation method using speaker-clustered initial models. In: Proc. ICSLP, Beijing, pp. III–694–697 (2000)
Gales, M.J.F., et al.: An improved approach to the hidden Markov model decomposition of speech and noise. In: Proc. ICASSP, San Francisco, pp. 233–236 (1992)
Martin, F., et al.: Recognition of noisy speech by composition of hidden Markov models. In: Proc. Eurospeech, Berlin, pp. 1031–1034 (1993)
Furui, S., et al.: Noise adaptation of HMMs using neural networks. In: Proc. ISCA Workshop on Automatic Speech Recognition, Paris, pp. 160–167 (2000)
Zhang, Z., et al.: Tree-structured noise-adapted HMM modeling for piecewise lineartransformation- based adaptation. In: Proc. Eurospeech, Geneva (2003)
Shinozaki, T., Furui, S.: Time adjustable mixture weights for speaking rate fluctuation. In: Proc. Eurospeech, Geneva (2003)
Yokoyama, Y., et al.: Unsupervised language model adaptation using word classes for spontaneous speech recognition. In: Proc. IEEE-ISCA Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 71–74 (2003)
Taguma, R., et al.: Parallel computing-based architecture for mixed-initiative spoken dialogue. In: Proc. IEEE Int. Conf. on Multimodal Interfaces (ICMI), Pittsburgh, pp. 53–58 (2002)
Tamura, S., et al.: A robust multi-modal speech recognition method using optical-flow analysis. In: Proc. ISCA Workshop on Multi-modal Dialogue in Mobile Environments, Kloster Irsee (2002)
Yoshinaga, T., et al.: Audio-visual speech recognition using lip movement extracted from side-face images. In: Proc. Eurospeech, Geneva (2003)
Furui, S., et al.: Speech-to-speech and speech-to-text summarization. In: Proc. Int. Workshop on Language Understanding and Agents for Real World Interaction, Sapporo (2003)
Kikuchi, T., et al.: Two-stage automatic speech summarization by sentence extraction and compaction. In: Proc. IEEE-ISCA Workshop on Spontaneous Speech Processing and Recognition (SSPR), Tokyo, pp. 207–210 (2003)
Hori, C., et al.: A statistical approach to automatic speech summarization. EURASIP Journal on Applied Signal Processing, 128–139 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Furui, S. (2003). Toward Robust Speech Recognition and Understanding. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-39398-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive