Evaluation of Spoken Dialogue System that uses Utterance Timing to Interpret User Utterances
In real environments where automatic speech recognition (ASR) performance is not always high, an alternative strategy to relying on ASR results is required to achieve a robust speech interaction.We construct a spoken dialogue system that can enter into the enumeration subdialogue, in which the utterance timing as well as the ASR results are used to interpret user utterances. Since the utterance timing can be obtained more reliably than the ASR results, the subdialogue leads to achieve a more robust interaction even in situations with a low ASR performance. We conducted an experiment with 31 participants. The results showed that our system achieved a higher task completion rate than a baseline system that uses only the ASR results when the ASR accuracy was not high.
KeywordsHumanoid Robot Automatic Speech Recognition Voice Activity Detection Query Condition Utterance Timing
Unable to display preview. Download preview PDF.
We are grateful to Prof. Shinsuke Mori and Mr. Tetsuro Sasada of Kyoto University for constructing the statistical language model for the system. This work was supported in part by KAKENHI (grant-in-aid for scientific research from the Ministry of Education, Culture, Sports, Science and Technology of Japan) and the JST PRESTO program.
- 1.J. Austin, How to Do Things with Words (Oxford University Press, 1962)Google Scholar
- 2.N. Ström, S. Seneff, in Proc. Int’l Conf. Spoken Language Processing (ICSLP) (2000), pp. 652–655Google Scholar
- 3.R.C. Rose, H.K. Kim, in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2003), pp. 198–203Google Scholar
- 4.A. Raux, M. Eskenazi, in Proc. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT NAACL) (2009), pp. 629–637Google Scholar
- 5.K. Matsuyama, K. Komatani, T. Ogata, H.G. Okuno, in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH) (2009), pp. 252–255Google Scholar
- 6.C. Kim, R.M. Stern, in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH) (2008), pp. 2598–2601Google Scholar
- 7.R. Takeda, K. Nakadai, K. Komatani, T. Ogata, H.G. Okuno, in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2008), pp. 1718–1723Google Scholar
- 8.A. Lee, T. Kawahara, in Proceedings : APSIPA ASC 2009 : Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference (2009), pp. 131–137Google Scholar