Toward Robust Speech Recognition and Understanding

Furui, Sadaoki

doi:10.1007/978-3-540-39398-6_2

Toward Robust Speech Recognition and Understanding

Sadaoki Furui⁷

Conference paper

439 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2807))

Abstract

This paper overviews robust architecture and modeling techniques for automatic recognition and understanding. The topics include robust acoustic and language modeling for spontaneous speech recognition, unsupervised adaptation of acoustic and language models, robust architecture for spoken dialogue systems, multi-modal speech recognition, and speech understanding. This paper also discusses the most important research problems to be solved in order to achieve ultimate robust speech recognition and understanding systems.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Juang, B.-H., Furui, S.: Automatic recognition and understanding of spoken language – A first step towards natural human-machine communication. Proc. IEEE 88(8), 1142–1165 (2000)
Article Google Scholar
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Furui, S.: Digital Speech Processing, Synthesis, and Recognition, 2nd edn. Marcel Dekker, New York (2000)
Google Scholar
Ney, H.: Corpus-based statistical methods in speech and language processing. In: Young, S., Bloothooft, G. (eds.) Corpusbased Methods in Language and Speech Processing, pp. 1–26. Kluwer, Dordrecht (1997)
Google Scholar
Furui, S.: Recent advances in spontaneous speech recognition and understanding. In: Proc. IEEE-ISCA Workshop on Spontaneous Speech Processing and Recognition (SSPR), Tokyo, pp. 1–6 (2003)
Google Scholar
Furui, S.: Steps toward natural human-machine communication in the 21st century. In: Proc. ISCA Workshop on Voice Operated Telecom Services, Ghent, pp. 17–24 (2000)
Google Scholar
Levin, E., et al.: The AT&T-DARPA COMMUNICATOR mixed-initiative spoken dialogue system. In: Proc. ICSLP, Beijing, pp. II–122–125 (2000)
Google Scholar
Basu, S., et al.: Audio-visual large vocabulary continuous speech recognition in the broadcast domain. In: Proc. IEEE Multimedia Signal Processing (MMSP), Copenhagen, pp. 475–481 (1999)
Google Scholar
Furui, S.: Toward spontaneous speech recognition and understanding. In: Chou, W., Juang, B.-H. (eds.) Pattern Recognition in Speech and language Processing, pp. 191–227. CRC Press, Boca Raton (2003)
Google Scholar
Shinozaki, T., et al.: Towards automatic transcription of spontaneous presentations. In: Proc. Eurospeech, Aalborg, vol. 1, pp. 491–494 (2001)
Google Scholar
Shinozaki, T., Furui, S.: Analysis on individual differences in automatic transcription of spontaneous presentations. In: Proc. ICASSP, Orlando, pp. I–729–732 (2002)
Google Scholar
Zhang, Z., et al.: On-line incremental speaker adaptation for broadcast news transcription. Speech Communication 37, 271–281 (2002)
Article MATH Google Scholar
Zhang, Z., et al.: An online incremental speaker adaptation method using speaker-clustered initial models. In: Proc. ICSLP, Beijing, pp. III–694–697 (2000)
Google Scholar
Gales, M.J.F., et al.: An improved approach to the hidden Markov model decomposition of speech and noise. In: Proc. ICASSP, San Francisco, pp. 233–236 (1992)
Google Scholar
Martin, F., et al.: Recognition of noisy speech by composition of hidden Markov models. In: Proc. Eurospeech, Berlin, pp. 1031–1034 (1993)
Google Scholar
Furui, S., et al.: Noise adaptation of HMMs using neural networks. In: Proc. ISCA Workshop on Automatic Speech Recognition, Paris, pp. 160–167 (2000)
Google Scholar
Zhang, Z., et al.: Tree-structured noise-adapted HMM modeling for piecewise lineartransformation- based adaptation. In: Proc. Eurospeech, Geneva (2003)
Google Scholar
Shinozaki, T., Furui, S.: Time adjustable mixture weights for speaking rate fluctuation. In: Proc. Eurospeech, Geneva (2003)
Google Scholar
Yokoyama, Y., et al.: Unsupervised language model adaptation using word classes for spontaneous speech recognition. In: Proc. IEEE-ISCA Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 71–74 (2003)
Google Scholar
Taguma, R., et al.: Parallel computing-based architecture for mixed-initiative spoken dialogue. In: Proc. IEEE Int. Conf. on Multimodal Interfaces (ICMI), Pittsburgh, pp. 53–58 (2002)
Google Scholar
Tamura, S., et al.: A robust multi-modal speech recognition method using optical-flow analysis. In: Proc. ISCA Workshop on Multi-modal Dialogue in Mobile Environments, Kloster Irsee (2002)
Google Scholar
Yoshinaga, T., et al.: Audio-visual speech recognition using lip movement extracted from side-face images. In: Proc. Eurospeech, Geneva (2003)
Google Scholar
Furui, S., et al.: Speech-to-speech and speech-to-text summarization. In: Proc. Int. Workshop on Language Understanding and Agents for Real World Interaction, Sapporo (2003)
Google Scholar
Kikuchi, T., et al.: Two-stage automatic speech summarization by sentence extraction and compaction. In: Proc. IEEE-ISCA Workshop on Spontaneous Speech Processing and Recognition (SSPR), Tokyo, pp. 207–210 (2003)
Google Scholar
Hori, C., et al.: A statistical approach to automatic speech summarization. EURASIP Journal on Applied Signal Processing, 128–139 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8552, Japan
Sadaoki Furui

Authors

Sadaoki Furui
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek & Pavel Mautner &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Furui, S. (2003). Toward Robust Speech Recognition and Understanding. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-39398-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics