A Spoken Dialogue System Based on Keyword Spotting Technology

Zhang, Pengyuan; Zhao, Qingwei; Yan, Yonghong

doi:10.1007/978-3-540-73110-8_27

A Spoken Dialogue System Based on Keyword Spotting Technology

Pengyuan Zhang¹,
Qingwei Zhao¹ &
Yonghong Yan¹

Conference paper

4001 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 4552))

Abstract

In this paper, a keyword spotting based dialogue system is described. It is critical to understand user’s requests accurately in a dialogue system. But the performance of large vocabulary continuous speech recognition (LVCSR) system is far from perfect, especially for spontaneous speech. In this work, an improved keyword spotting scheme is adopted instead. A fuzzy search algorithm is proposed to extract keyword hypotheses from syllable confusion networks (CN). CNs are linear and naturally suitable for indexing. To accelerate search process, CNs are pruned to feasible sizes. Furthermore, we enhance the discriminability of confidence measure by applying entropy information to the posterior probability of word hypotheses. On mandarin conversational telephone speech (CTS), the proposed algorithms obtained a 4.7% relative equal error rate (EER) reduction.

Download to read the full chapter text

Chapter PDF

References

Carlson, R., Hirschberg, J., Swerts, M.: Error Handling in Spoken Dialogue Systems. Speech Communication, pp. 207–209 (2005)
Google Scholar
Akyol, A., Erdogan, H.: Filler Model Based Confidence Measures for Spoken Dialogue Systems: A Case Study for Turkish. ICASSP 2004, pp. 781–784 (2004)
Google Scholar
Heracleous, P., Shimizu, T.: A Novel Approach for Modeling Non-keyword Intervals in a Keyword Spotter Exploiting Acoustic Similarities of Languages. Speech Communication, 373–386 (2005)
Google Scholar
Higashinaka, R., et al.: Evaluating Discourse Understanding in Spoken Dialogue Systems. ACM Transactions on Speech and Language Processing, 1–18 (2004)
Google Scholar
Higashinaka, R., Sudoh, K., Nakano, M.: Incorporating Discourse Features into Confidence Scoring of Intention Recognition Results in Spoken Dialogue Systems. Speech Communication, pp. 417–436 (2006)
Google Scholar
Mangu, L., Brill, E., Stolcke, A.: Finding Consensus Among Words: Lattice-based Word Error Minimization. Eurospeech, pp. 495–498 (1999)
Google Scholar
Moreau, N., Kim, H-K., Sikora, T.: Phonetic Confusion Based Document Expansion for Spoken Document Retrieval. ICSLP, pp. 542–545 (2004)
Google Scholar
Liu, M., et al.: Mandarin Accent Adaptation Based on Context-independent/Context-dependent Pronunciation Modeling. In: Proc. ICASSP 2000, pp. 1025–1028 (2000)
Google Scholar
Yi, L., Fung, P.: Modelling Pronunciation Variations in Spontaneous Mandarin Speech. ICSLP 2000, pp. 630–633 (2000)
Google Scholar
Fiscus, J.G.: A Post-processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER). In: Proceedings of IEEE ASRUWorkshop: Santa Barbara, pp. 347–352 (1997)
Google Scholar
Chen, T-H., Chen, B., Wang, H-M.: On Using Entropy Information to Improve Posterior Probability-based Confidence Measures. In: International Symposium on Chinese Spoken Language Processing, pp. 454–463 (2006)
Google Scholar
Xue, J., Zhao, Y.: Random Forests-based Confidence Annotation Using Novel Features from Confusion Network. In: ICASSP 2006, pp. 1149–1152 (2006)
Google Scholar
http://www.ldc.upenn.edu/

Download references

Author information

Authors and Affiliations

ThinkIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100080, P.R. China
Pengyuan Zhang, Qingwei Zhao & Yonghong Yan

Authors

Pengyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qingwei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yonghong Yan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Julie A. Jacko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, P., Zhao, Q., Yan, Y. (2007). A Spoken Dialogue System Based on Keyword Spotting Technology. In: Jacko, J.A. (eds) Human-Computer Interaction. HCI Intelligent Multimodal Interaction Environments. HCI 2007. Lecture Notes in Computer Science, vol 4552. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73110-8_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-73110-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73108-5
Online ISBN: 978-3-540-73110-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics