Advertisement

Evaluation of Spoken Dialogue System that uses Utterance Timing to Interpret User Utterances

  • Kazunori Komatani
  • Kyoko Matsuyama
  • Ryu Takeda
  • Tetsuya Ogata
  • Hiroshi G. Okuno
Conference paper

Abstract

In real environments where automatic speech recognition (ASR) performance is not always high, an alternative strategy to relying on ASR results is required to achieve a robust speech interaction.We construct a spoken dialogue system that can enter into the enumeration subdialogue, in which the utterance timing as well as the ASR results are used to interpret user utterances. Since the utterance timing can be obtained more reliably than the ASR results, the subdialogue leads to achieve a more robust interaction even in situations with a low ASR performance. We conducted an experiment with 31 participants. The results showed that our system achieved a higher task completion rate than a baseline system that uses only the ASR results when the ASR accuracy was not high.

Keywords

Humanoid Robot Automatic Speech Recognition Voice Activity Detection Query Condition Utterance Timing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

We are grateful to Prof. Shinsuke Mori and Mr. Tetsuro Sasada of Kyoto University for constructing the statistical language model for the system. This work was supported in part by KAKENHI (grant-in-aid for scientific research from the Ministry of Education, Culture, Sports, Science and Technology of Japan) and the JST PRESTO program.

References

  1. 1.
    J. Austin, How to Do Things with Words (Oxford University Press, 1962)Google Scholar
  2. 2.
    N. Ström, S. Seneff, in Proc. Int’l Conf. Spoken Language Processing (ICSLP) (2000), pp. 652–655Google Scholar
  3. 3.
    R.C. Rose, H.K. Kim, in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2003), pp. 198–203Google Scholar
  4. 4.
    A. Raux, M. Eskenazi, in Proc. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT NAACL) (2009), pp. 629–637Google Scholar
  5. 5.
    K. Matsuyama, K. Komatani, T. Ogata, H.G. Okuno, in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH) (2009), pp. 252–255Google Scholar
  6. 6.
    C. Kim, R.M. Stern, in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH) (2008), pp. 2598–2601Google Scholar
  7. 7.
    R. Takeda, K. Nakadai, K. Komatani, T. Ogata, H.G. Okuno, in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2008), pp. 1718–1723Google Scholar
  8. 8.
    A. Lee, T. Kawahara, in Proceedings : APSIPA ASC 2009 : Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference (2009), pp. 131–137Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Kazunori Komatani
    • 1
  • Kyoko Matsuyama
    • 2
  • Ryu Takeda
    • 2
  • Tetsuya Ogata
    • 2
  • Hiroshi G. Okuno
    • 2
  1. 1.Graduate School of EngineeringNagoya UniversityNagoyaJapan
  2. 2.Graduate School of InformaticsKyoto UniversityKyotoJapan

Personalised recommendations