Spoken Dialogue in Virtual Worlds

Chollet, Gérard; Amehraye, Asmaa; Razik, Joseph; Zouari, Leila; Khemiri, Houssemeddine; Mokbel, Chafic

doi:10.1007/978-3-642-12397-9_36

Gérard Chollet²⁰,
Asmaa Amehraye²⁰,
Joseph Razik²⁰,
Leila Zouari²¹,
Houssemeddine Khemiri²¹ &
…
Chafic Mokbel²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5967))

2346 Accesses
1 Citations

Abstract

Human-computer conversations have attracted a great deal of interest especially in virtual worlds. In fact, research gave rise to spoken dialogue systems by taking advantage of speech recognition, language understanding and speech synthesis advances. This work surveys the state of the art of speech dialogue systems. Current dialogue system technologies and approaches are first introduced emphasizing differences between them, then, speech recognition and synthesis and language understanding are introduced as complementary and necessary modules. On the other hand, as the development of spoken dialogue systems becomes more complex, it is necessary to define some processes to evaluate their performance. Wizard-of-Oz techniques play an important role to achieve this task. Thanks to this technique is obtained a suitable dialogue corpus necessary to achieve good performance. A description of this technique is given in this work together with perspectives on multimodal dialogue systems in virtual worlds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abe, M.: A segment-based approach to voice conversion, Acoustics, Speech, and Signal Processing. In: IEEE International Conference on ICASSP, pp. 765–768 (1991)
Google Scholar
Ailomaa, M., Melichar, M., Rajman, M., Lisowska, A.: Archivus, a multimodal system for multimedia meeting browsing and retrieval. In: Proceedings of the COLING/ACL on Interactive presentation sessions, Morristown, NJ, USA, pp. 49–52 (2006)
Google Scholar
Arnaldi, B., Fuchs, P., Tisseau, J.: Chapitre 1 du volume 1 du traité de la réalité virtuelle, Les presses de l’école de Mines de Paris (2003)
Google Scholar
Béchet, F.: Processing spontaneous speech in deployed spoken language understanding systems: a survey, SLT (December 2008)
Google Scholar
Bimbot, F., Chollet, G., Deleglise, P., Montacie, C.: Temporal decomposition and acoustic-phonetic decoding of speech. In: ICASSP, pp. 445–448 (1988)
Google Scholar
Boda, P.P.: From stochastic speech recognition to understanding: an hmm based approach. In: Proc. IEEE ASRU, pp. 57–64 (1997)
Google Scholar
Bonneau-Maynard, H., Ayache, C., Béchet, F., Denis, A., Kuhn, A., Lefevre1, F., Mostefa, D., Quignard, M., Rosset1, S., Servan, C., Villaneau, J.: Results of the french evalda-media evaluation campaign for literal understanding. In: Proceedings of the 5th international Conference on Language Resources and Evaluation (LREC), Genoa, Italy, pp. 2054–2059 (2002)
Google Scholar
Bremer, D., Johnson, J., Jones, H., Liu, Y., May, D., Meredith, J., Veydia, S.: Application kernels on graphics processing units. In: Workshop on High Performance Embedded Computing (2005)
Google Scholar
Camelin, N.: Stratégies robustes de compréhension de la parole basées sur des méthodes de classification automatique, Ph.D thesis, Université d’avignon (2007)
Google Scholar
Cardinal, P., Dumouchel, P., Boulianne, G., Comeau, M.: Gpu accelerated acoustic likelihood computations. In: InterSpeech (2008)
Google Scholar
Childers, D.G.: Glottal source modeling for voice conversion. Speech Communication, 127–138 (1995)
Google Scholar
Chollet, G., Cernocký, J., Constantinescu, A., Deligne, S., Bimbot, F.: Toward alisp: Automatic language independent speech processing. Springer, Heidelberg (1998)
MATH Google Scholar
Chomsky, N.: Syntactic structures. Mouton, The Hague (1957)
MATH Google Scholar
Chong, J., Yi, Y., Faria, A., Satish, N., Keutzer, K.: Data-parallel large vocabulary continuous speech recognition on graphics processors, Technical report, University of California at Berkeley (2008)
Google Scholar
Chu, S., Neill, I., Hanna, P., McTear, M.: An approach to multi-strategy dialogue management. In: INTERSPEECH, pp. 865–868 (2005)
Google Scholar
Corradini, A., Mehta, M., Bernsen, N.O., Martin, J., Abrilian, S.: Multimodal input fusion in human-computer interactio. In: Proceedings of the NATO ASI 2003 Conference, NAREK center of Yerevan University, Tsakhkadzor, Armenia (2003)
Google Scholar
Dahlback, N., Jönsson, A., Ahrenberg, L.: Wizard of oz studies: why and how. Knowl.-Based Syst., 258–266 (1993)
Google Scholar
Damasio, A.R.: Descartes’ error: emotion, reason, and the human brain. Grosset/Putnam, New York (1994)
Google Scholar
Dixon, P.R., Caseiro, D.A., Oonishi, T., Furui, S.: The titech large vocabulary wfst speech recognition system. In: ASRU, pp. 443–448 (2007)
Google Scholar
Dixon, P.R., Oonishi, T., Furui, S.: Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition. Comp. Speech and Language 23, 510–526 (2009)
Article Google Scholar
Dowding, J., Moore, R., Andry, F., Moran, D.: Interleaving syntax and semantics in an efficient bottom-up parser. In: 32nd Annual Meeting of the Association for Computational Linguistics, New Maxico (June 1994)
Google Scholar
Dybkjaer, L., Bernsen, N.O.: The disc approach to spoken language system development and evaluation. In: LREC (1998)
Google Scholar
Fang, X.W., Zheng, F., Xu, M.: Topic forest: A plan-based dialog management structure. In: Proceedings of ICASSP 2001, Salt Lake City (2001)
Google Scholar
Fleury, M., Downton, A.C., Clark, A.F.: Parallel structure in an integrated speech-recognition network. In: Amestoy, P.R., Berger, P., Daydé, M., Duff, I.S., Frayssé, V., Giraud, L., Ruiz, D. (eds.) Euro-Par 1999. LNCS, vol. 1685, pp. 995–1004. Springer, Heidelberg (1999)
Chapter Google Scholar
Foster, M.E., White, M., Setzer, A., Catizone, R.: Multimodal generation in the comic dialogue system. In: ACL 2005: Proceedings of the ACL 2005 on Interactive poster and demonstration sessions, pp. 45–48 (2005)
Google Scholar
Fuchs, P., Nashashibi, F., Lourdeaux, D.: A theoretical approach of the design and evaluation of a virtual reality device. In: Virtual reality and prototyping, Laval, France, pp. 11–20 (1999)
Google Scholar
Goddeau, D., Meng, H., Polifroni, J., Seneff, S., Busayapongchaiy, S.: A form-based dialogue manager for spoken language applications. In: Proc. ICSLP, pp. 701–704 (1996)
Google Scholar
Gorniak, P., Roy, D.: Situated language understanding as filtering perceived affordances. Cognitive Science, 197–231 (2007)
Google Scholar
Guedj, R.: Human-machine interaction and digital signal processing. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, vol. 7, pp. 17–19 (1982)
Google Scholar
He, Y., Young, S.: A data driven spoken language understanding system. In: IEEE Automatic Speech Recognition and Understanding Workshop, St. Thomas, U.S. Virgin Islands (December 2003)
Google Scholar
He, Y., Young, S.: Hidden vector state hierarchical semantic parsing. In: IEEE ICASSP, Hong Kong, pp. 555–558 (2003)
Google Scholar
Kain, A., Maccon, M.W.: Spectral voice conversion for text to speech synthesis. In: ICASSP, pp. 285–288 (1998)
Google Scholar
Karam, W., Bredin, H., Greige, H., Chollet, G., Mokbel, C.: Talking-face identity verification, audiovisual forgery and robustness issues. EURASIP Journal on Advances in Signal Processing, Special Issue on Recent Advances in Biometric Systems: A Signal Processing Perspective (2009)
Google Scholar
Kawamoto, S., Shimodaira, H., Nitta, T., Nishimoto, T., Nakamura, S., Itou, K., Morishima, S., Yotsukura, T., Kai, A., Lee, A., Yamashita, Y., Kobayashi, T., Tokuda, K., Hirose, K., Minematsu, N., Yamada, A., Den, Y., Utsuro, T., Sagayama, S.: Glatea: Open-source software for developing anthropomorphic spoken dialog agents incorporating voice dialogs in a multi-user virtual environment. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters, pp. 187–212. Springer, Berlin (2004)
Chapter Google Scholar
Kim, K., Lee, C., Jung, S., Lee, G.G.: A frame-based probabilistic framework for spoken dialog management using dialog examples. In: Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, pp. 120–127 (2008)
Google Scholar
Lamere, P., et al.: Design of the cmu sphinx-4 decoder. In: Proc. EUROSPEECH, pp. 1181–1184 (2003)
Google Scholar
Landragin, F.: Physical, semantic and pragmatic levels for multimodal fusion and fission. In: Proceedings of the Seventh International Workshop on Computational Semantics (IWCS 2007), pp. 346–350. Universitätsverlag Tilburg (January 2007)
Google Scholar
Lane, I., Ueno, S., Kawahara, T.: Cooperative dialogue planning with user and situation models via example-based training. In: Proc. Workshop on Man-Machine Symbiotic Systems, Kyoto, Japan, pp. 93–102 (2004)
Google Scholar
Lee, A., Kawahara, T., Shikano, K.: Julius - an open source real-time large vocabulary recognition engine. In: Proc. EUROSPEECH, Aalborg, pp. 1691–1694 (2001)
Google Scholar
Lee, C., Jung, S., Kim, S., Lee, G.G.: Example-based dialog modeling for practical multi-domain dialog system. Speech Commun. 51(5), 466–484 (2009)
Article Google Scholar
Levin, E., Pieraccini, R.: CHRONUS, the next generation. In: DARPA Speech and Natural Language Workshop, January 1995, pp. 269–271 (1995)
Google Scholar
Levin, E., Pieraccini, R.: Concept-based spontaneous speech understanding system. In: Eurospeech, pp. 555–558 (1995)
Google Scholar
Levin, E., Pieraccini, R., Eckert, W.: Learning dialogue strategies within the markov decision process framework. In: Proc. IEEE ASRU, pp. 72–79 (1997)
Google Scholar
Lin, E., Yu, K., Rutenbar, R.A., Chen, T.: Moving speech recognition from software to silicon: the in silico vox project. In: Interspeech (2006)
Google Scholar
Lin, E., Yu, K., Rutenbar, R.A., Chen, T.: A 1000-word vocabulary, speaker independent, continuous live-mode speech recognizer implemented in a single fpga. In: Int. Symposium on Field-Programmable Gate Arrays, FPGA (2007)
Google Scholar
Litman, D.J., Allen, J.F.: A plan recognition model for subdialogues in conversations. Cognitive Science 11(2), 163–200 (1987)
Article Google Scholar
Madcow: Multi-site data collection for a spoken language corpus. In: DARPA Speech and Natural Language Workshop (1992)
Google Scholar
Maynard, H., McTait, K., Mostefa, D., Devillers, L., Rosset, S., Paroubek, P., Bousquet, C., Choukri, K., Goulian, J., Antoine, J.-Y., Béchet, F., Bontron, O., Charnay, L., Romary, L., Vergnes, M.: Constitution d’un corpus de dialogue oral pour l’évaluation automatique de la compréhension hors et en contexte du dialogue. In: JEP (2004)
Google Scholar
Miller, S., Bates, M., Bobrow, R., Ingria, R., Makhoul, J., Schwartz, R.: Recent progress in hidden understanding models. In: DARPA Speech and Natural Language Workshop, Austin, January 1995, pp. 276–280. Morgan Kaufman, San Francisco (1995)
Google Scholar
MITRE, ARPA communicator homepage (2003)
Google Scholar
Mohamed El Hadj, Y.O., Revol, N., Meziane, A.: Parallelization of automatic speech recognition, Research report 4110, INRIA (2001)
Google Scholar
De Mori, R.: Spoken dialogues with computers. Academic Press, London (1998)
Google Scholar
Nguyen, A., Wayne, W.: An agent-based approach to dialogue management in personal assistants. In: IUI 2005: Intelligent User Interfaces, pp. 137–144 (2005)
Google Scholar
Noda, H., Shirazi, M.N., Zhang, B.: A parallel processing algorithm for speech recognition using markov random fields. Communication Research Laboratory 41(2), 87–100 (1994)
Google Scholar
Orkin, J., Roy, D.: The restaurant game: Learning social behavior and language from thousands of players online. Journal of Game Development, 39–60 (December 2007)
Google Scholar
Perrot, P., Morel, M., Razik, J., Chollet, G.: Vocal forgery in forensic sciences.In: International Conference on Forensic Applications and Techniques in Telecommunications. Information and Multimedia, e-Forensics 2009, 7p. (2009)
Google Scholar
Phillips, S., Rogers, A.: Parallel speech recognition. In: InterSpeech 1997, pp. 135–138 (1997)
Google Scholar
Rajman, M., Ailomaa, M., Lisowska, A., Melichar, M., Armstrong, S.: Extending the wizard of oz methodology for language-enabled multimodal systems. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), Genoa, Italy, May 22-28, pp. 2539–2543 (2006)
Google Scholar
Raymond, C., Bechet, F., Camelin, N., de Mori, R., Damnati, G.: Semantic interpretation with error correction. In: IEEE ICASSP, Montreal (2005)
Google Scholar
Lee, C., Lee, S., Lee, G.: Example-based dialog modeling for english conversation tutoring. In: Proceedings of the 2nd International Conference on Next Generation Computing (2007)
Google Scholar
Salber, D., Coutaz, J.: Applying the wizard of oz technique to the study of multimodal systems. In: Human-Computer Interaction Selected Papers. LNCS, pp. 219–230. Springer, Heidelberg (1993)
Chapter Google Scholar
Seneff, S.: Robust parsing for spoken language systems. In: IEEE International Confrence on Acoustics, Speech and Signal Processing, San Francisco (1992)
Google Scholar
Singh, S., Kearns, M., Litman, D., Walker, M.: Reinforcement learning for spoken dialogue systems. In: Proc. NIPS (1999)
Google Scholar
Stallard, D.: Evaluation results for the talk’n’travel system. In: Human Language Technology Conference, San Diego, California (Mars 2001)
Google Scholar
Sutton, S., et al.: Universal speech tools: The cslu toolkit. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 3221–3224 (1998)
Google Scholar
Tokuda, K., Zen, H., Black, A.W.: An hmm-based speech synthesis system applied to english. In: Proceedings of IEEE Speech Synthesis Workshop, pp. 227–230 (2002)
Google Scholar
Wahlster, W., Reithinger, N., Blocher, A.: Smartkom: Towards multimodal dialogues with anthropomorphic interface agents. In: MTI Status Conference, Saarbrücken, Germany, October 26-27 (2001)
Google Scholar
Walker, M., Passonneau, R., Boland, J.: Quantitative and qualitative evaluation of darpa communicator spoken dialog systems. In: ACL/EACL Workshop (2002)
Google Scholar
Ward, W., Issar, S.: Recent improvements in the CMU spoken language understanding system. In: ARPA Human Language Technology Workshop, pp. 213–216. Morgan Kaufman, San Francisco (1996)
Google Scholar
Williams, J.D., Young, S.J.: Partially observable markov decision process for spoken dialog systems. Computer Speech and Language 21, 231–422 (2007)
Article Google Scholar
Young, S.: The htk hidden markov model toolkit: Design and philosophy, Technical report, Cambridge University Engineering Department, UK (1994)
Google Scholar
Zeigler, B., Mazor, B.: Dialog design for a speech-interactive automation system. In: Proc. EUROSPEECH 1995, Madrid, Spain, pp. 113–116 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

CNRS-LTCI Télécom-ParisTech, 46 rue Barrault, 75013, Paris, France
Gérard Chollet, Asmaa Amehraye & Joseph Razik
TECHTRA-Ecole Supérieure des Communications de Tunis, 2083 Cité El-Ghazala-Ariana, Tunisia
Leila Zouari & Houssemeddine Khemiri
Mathematics Department, University of Balamand, 100 El-Koura, Lebanon
Chafic Mokbel

Authors

Gérard Chollet
View author publications
You can also search for this author in PubMed Google Scholar
Asmaa Amehraye
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Razik
View author publications
You can also search for this author in PubMed Google Scholar
Leila Zouari
View author publications
You can also search for this author in PubMed Google Scholar
Houssemeddine Khemiri
View author publications
You can also search for this author in PubMed Google Scholar
Chafic Mokbel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Second University of Naples, and IIASS, Via Pellegrino, 84019, Vietri sul Mare, SA, Italy
Anna Esposito
Centre for Language and Communication Studies, Trinity College, The University of Dublin, Dublin 2, Ireland
Nick Campbell & Carl Vogel &
Department of Computing Science & Mathematics, University of Stirling, FK9 4LA, Stirling, Scotland, UK
Amir Hussain
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, P.O. Box 217, 7500 AE, Enschede, The Netherlands
Anton Nijholt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chollet, G., Amehraye, A., Razik, J., Zouari, L., Khemiri, H., Mokbel, C. (2010). Spoken Dialogue in Virtual Worlds. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds) Development of Multimodal Interfaces: Active Listening and Synchrony. Lecture Notes in Computer Science, vol 5967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12397-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-12397-9_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12396-2
Online ISBN: 978-3-642-12397-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics