Abstract
In this paper, a keyword spotting based dialogue system is described. It is critical to understand user’s requests accurately in a dialogue system. But the performance of large vocabulary continuous speech recognition (LVCSR) system is far from perfect, especially for spontaneous speech. In this work, an improved keyword spotting scheme is adopted instead. A fuzzy search algorithm is proposed to extract keyword hypotheses from syllable confusion networks (CN). CNs are linear and naturally suitable for indexing. To accelerate search process, CNs are pruned to feasible sizes. Furthermore, we enhance the discriminability of confidence measure by applying entropy information to the posterior probability of word hypotheses. On mandarin conversational telephone speech (CTS), the proposed algorithms obtained a 4.7% relative equal error rate (EER) reduction.
Chapter PDF
References
Carlson, R., Hirschberg, J., Swerts, M.: Error Handling in Spoken Dialogue Systems. Speech Communication, pp. 207–209 (2005)
Akyol, A., Erdogan, H.: Filler Model Based Confidence Measures for Spoken Dialogue Systems: A Case Study for Turkish. ICASSP 2004, pp. 781–784 (2004)
Heracleous, P., Shimizu, T.: A Novel Approach for Modeling Non-keyword Intervals in a Keyword Spotter Exploiting Acoustic Similarities of Languages. Speech Communication, 373–386 (2005)
Higashinaka, R., et al.: Evaluating Discourse Understanding in Spoken Dialogue Systems. ACM Transactions on Speech and Language Processing, 1–18 (2004)
Higashinaka, R., Sudoh, K., Nakano, M.: Incorporating Discourse Features into Confidence Scoring of Intention Recognition Results in Spoken Dialogue Systems. Speech Communication, pp. 417–436 (2006)
Mangu, L., Brill, E., Stolcke, A.: Finding Consensus Among Words: Lattice-based Word Error Minimization. Eurospeech, pp. 495–498 (1999)
Moreau, N., Kim, H-K., Sikora, T.: Phonetic Confusion Based Document Expansion for Spoken Document Retrieval. ICSLP, pp. 542–545 (2004)
Liu, M., et al.: Mandarin Accent Adaptation Based on Context-independent/Context-dependent Pronunciation Modeling. In: Proc. ICASSP 2000, pp. 1025–1028 (2000)
Yi, L., Fung, P.: Modelling Pronunciation Variations in Spontaneous Mandarin Speech. ICSLP 2000, pp. 630–633 (2000)
Fiscus, J.G.: A Post-processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER). In: Proceedings of IEEE ASRUWorkshop: Santa Barbara, pp. 347–352 (1997)
Chen, T-H., Chen, B., Wang, H-M.: On Using Entropy Information to Improve Posterior Probability-based Confidence Measures. In: International Symposium on Chinese Spoken Language Processing, pp. 454–463 (2006)
Xue, J., Zhao, Y.: Random Forests-based Confidence Annotation Using Novel Features from Confusion Network. In: ICASSP 2006, pp. 1149–1152 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, P., Zhao, Q., Yan, Y. (2007). A Spoken Dialogue System Based on Keyword Spotting Technology. In: Jacko, J.A. (eds) Human-Computer Interaction. HCI Intelligent Multimodal Interaction Environments. HCI 2007. Lecture Notes in Computer Science, vol 4552. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73110-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-73110-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73108-5
Online ISBN: 978-3-540-73110-8
eBook Packages: Computer ScienceComputer Science (R0)