Skip to main content

Toward Robust Speech Recognition and Understanding

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2807))

Abstract

This paper overviews robust architecture and modeling techniques for automatic recognition and understanding. The topics include robust acoustic and language modeling for spontaneous speech recognition, unsupervised adaptation of acoustic and language models, robust architecture for spoken dialogue systems, multi-modal speech recognition, and speech understanding. This paper also discusses the most important research problems to be solved in order to achieve ultimate robust speech recognition and understanding systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Juang, B.-H., Furui, S.: Automatic recognition and understanding of spoken language – A first step towards natural human-machine communication. Proc. IEEE 88(8), 1142–1165 (2000)

    Article  Google Scholar 

  2. Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)

    Google Scholar 

  3. Furui, S.: Digital Speech Processing, Synthesis, and Recognition, 2nd edn. Marcel Dekker, New York (2000)

    Google Scholar 

  4. Ney, H.: Corpus-based statistical methods in speech and language processing. In: Young, S., Bloothooft, G. (eds.) Corpusbased Methods in Language and Speech Processing, pp. 1–26. Kluwer, Dordrecht (1997)

    Google Scholar 

  5. Furui, S.: Recent advances in spontaneous speech recognition and understanding. In: Proc. IEEE-ISCA Workshop on Spontaneous Speech Processing and Recognition (SSPR), Tokyo, pp. 1–6 (2003)

    Google Scholar 

  6. Furui, S.: Steps toward natural human-machine communication in the 21st century. In: Proc. ISCA Workshop on Voice Operated Telecom Services, Ghent, pp. 17–24 (2000)

    Google Scholar 

  7. Levin, E., et al.: The AT&T-DARPA COMMUNICATOR mixed-initiative spoken dialogue system. In: Proc. ICSLP, Beijing, pp. II–122–125 (2000)

    Google Scholar 

  8. Basu, S., et al.: Audio-visual large vocabulary continuous speech recognition in the broadcast domain. In: Proc. IEEE Multimedia Signal Processing (MMSP), Copenhagen, pp. 475–481 (1999)

    Google Scholar 

  9. Furui, S.: Toward spontaneous speech recognition and understanding. In: Chou, W., Juang, B.-H. (eds.) Pattern Recognition in Speech and language Processing, pp. 191–227. CRC Press, Boca Raton (2003)

    Google Scholar 

  10. Shinozaki, T., et al.: Towards automatic transcription of spontaneous presentations. In: Proc. Eurospeech, Aalborg, vol. 1, pp. 491–494 (2001)

    Google Scholar 

  11. Shinozaki, T., Furui, S.: Analysis on individual differences in automatic transcription of spontaneous presentations. In: Proc. ICASSP, Orlando, pp. I–729–732 (2002)

    Google Scholar 

  12. Zhang, Z., et al.: On-line incremental speaker adaptation for broadcast news transcription. Speech Communication 37, 271–281 (2002)

    Article  MATH  Google Scholar 

  13. Zhang, Z., et al.: An online incremental speaker adaptation method using speaker-clustered initial models. In: Proc. ICSLP, Beijing, pp. III–694–697 (2000)

    Google Scholar 

  14. Gales, M.J.F., et al.: An improved approach to the hidden Markov model decomposition of speech and noise. In: Proc. ICASSP, San Francisco, pp. 233–236 (1992)

    Google Scholar 

  15. Martin, F., et al.: Recognition of noisy speech by composition of hidden Markov models. In: Proc. Eurospeech, Berlin, pp. 1031–1034 (1993)

    Google Scholar 

  16. Furui, S., et al.: Noise adaptation of HMMs using neural networks. In: Proc. ISCA Workshop on Automatic Speech Recognition, Paris, pp. 160–167 (2000)

    Google Scholar 

  17. Zhang, Z., et al.: Tree-structured noise-adapted HMM modeling for piecewise lineartransformation- based adaptation. In: Proc. Eurospeech, Geneva (2003)

    Google Scholar 

  18. Shinozaki, T., Furui, S.: Time adjustable mixture weights for speaking rate fluctuation. In: Proc. Eurospeech, Geneva (2003)

    Google Scholar 

  19. Yokoyama, Y., et al.: Unsupervised language model adaptation using word classes for spontaneous speech recognition. In: Proc. IEEE-ISCA Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 71–74 (2003)

    Google Scholar 

  20. Taguma, R., et al.: Parallel computing-based architecture for mixed-initiative spoken dialogue. In: Proc. IEEE Int. Conf. on Multimodal Interfaces (ICMI), Pittsburgh, pp. 53–58 (2002)

    Google Scholar 

  21. Tamura, S., et al.: A robust multi-modal speech recognition method using optical-flow analysis. In: Proc. ISCA Workshop on Multi-modal Dialogue in Mobile Environments, Kloster Irsee (2002)

    Google Scholar 

  22. Yoshinaga, T., et al.: Audio-visual speech recognition using lip movement extracted from side-face images. In: Proc. Eurospeech, Geneva (2003)

    Google Scholar 

  23. Furui, S., et al.: Speech-to-speech and speech-to-text summarization. In: Proc. Int. Workshop on Language Understanding and Agents for Real World Interaction, Sapporo (2003)

    Google Scholar 

  24. Kikuchi, T., et al.: Two-stage automatic speech summarization by sentence extraction and compaction. In: Proc. IEEE-ISCA Workshop on Spontaneous Speech Processing and Recognition (SSPR), Tokyo, pp. 207–210 (2003)

    Google Scholar 

  25. Hori, C., et al.: A statistical approach to automatic speech summarization. EURASIP Journal on Applied Signal Processing, 128–139 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Furui, S. (2003). Toward Robust Speech Recognition and Understanding. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39398-6_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20024-6

  • Online ISBN: 978-3-540-39398-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics