Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5967))

Abstract

Human-computer conversations have attracted a great deal of interest especially in virtual worlds. In fact, research gave rise to spoken dialogue systems by taking advantage of speech recognition, language understanding and speech synthesis advances. This work surveys the state of the art of speech dialogue systems. Current dialogue system technologies and approaches are first introduced emphasizing differences between them, then, speech recognition and synthesis and language understanding are introduced as complementary and necessary modules. On the other hand, as the development of spoken dialogue systems becomes more complex, it is necessary to define some processes to evaluate their performance. Wizard-of-Oz techniques play an important role to achieve this task. Thanks to this technique is obtained a suitable dialogue corpus necessary to achieve good performance. A description of this technique is given in this work together with perspectives on multimodal dialogue systems in virtual worlds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe, M.: A segment-based approach to voice conversion, Acoustics, Speech, and Signal Processing. In: IEEE International Conference on ICASSP, pp. 765–768 (1991)

    Google Scholar 

  2. Ailomaa, M., Melichar, M., Rajman, M., Lisowska, A.: Archivus, a multimodal system for multimedia meeting browsing and retrieval. In: Proceedings of the COLING/ACL on Interactive presentation sessions, Morristown, NJ, USA, pp. 49–52 (2006)

    Google Scholar 

  3. Arnaldi, B., Fuchs, P., Tisseau, J.: Chapitre 1 du volume 1 du traité de la réalité virtuelle, Les presses de l’école de Mines de Paris (2003)

    Google Scholar 

  4. Béchet, F.: Processing spontaneous speech in deployed spoken language understanding systems: a survey, SLT (December 2008)

    Google Scholar 

  5. Bimbot, F., Chollet, G., Deleglise, P., Montacie, C.: Temporal decomposition and acoustic-phonetic decoding of speech. In: ICASSP, pp. 445–448 (1988)

    Google Scholar 

  6. Boda, P.P.: From stochastic speech recognition to understanding: an hmm based approach. In: Proc. IEEE ASRU, pp. 57–64 (1997)

    Google Scholar 

  7. Bonneau-Maynard, H., Ayache, C., Béchet, F., Denis, A., Kuhn, A., Lefevre1, F., Mostefa, D., Quignard, M., Rosset1, S., Servan, C., Villaneau, J.: Results of the french evalda-media evaluation campaign for literal understanding. In: Proceedings of the 5th international Conference on Language Resources and Evaluation (LREC), Genoa, Italy, pp. 2054–2059 (2002)

    Google Scholar 

  8. Bremer, D., Johnson, J., Jones, H., Liu, Y., May, D., Meredith, J., Veydia, S.: Application kernels on graphics processing units. In: Workshop on High Performance Embedded Computing (2005)

    Google Scholar 

  9. Camelin, N.: Stratégies robustes de compréhension de la parole basées sur des méthodes de classification automatique, Ph.D thesis, Université d’avignon (2007)

    Google Scholar 

  10. Cardinal, P., Dumouchel, P., Boulianne, G., Comeau, M.: Gpu accelerated acoustic likelihood computations. In: InterSpeech (2008)

    Google Scholar 

  11. Childers, D.G.: Glottal source modeling for voice conversion. Speech Communication, 127–138 (1995)

    Google Scholar 

  12. Chollet, G., Cernocký, J., Constantinescu, A., Deligne, S., Bimbot, F.: Toward alisp: Automatic language independent speech processing. Springer, Heidelberg (1998)

    MATH  Google Scholar 

  13. Chomsky, N.: Syntactic structures. Mouton, The Hague (1957)

    MATH  Google Scholar 

  14. Chong, J., Yi, Y., Faria, A., Satish, N., Keutzer, K.: Data-parallel large vocabulary continuous speech recognition on graphics processors, Technical report, University of California at Berkeley (2008)

    Google Scholar 

  15. Chu, S., Neill, I., Hanna, P., McTear, M.: An approach to multi-strategy dialogue management. In: INTERSPEECH, pp. 865–868 (2005)

    Google Scholar 

  16. Corradini, A., Mehta, M., Bernsen, N.O., Martin, J., Abrilian, S.: Multimodal input fusion in human-computer interactio. In: Proceedings of the NATO ASI 2003 Conference, NAREK center of Yerevan University, Tsakhkadzor, Armenia (2003)

    Google Scholar 

  17. Dahlback, N., Jönsson, A., Ahrenberg, L.: Wizard of oz studies: why and how. Knowl.-Based Syst., 258–266 (1993)

    Google Scholar 

  18. Damasio, A.R.: Descartes’ error: emotion, reason, and the human brain. Grosset/Putnam, New York (1994)

    Google Scholar 

  19. Dixon, P.R., Caseiro, D.A., Oonishi, T., Furui, S.: The titech large vocabulary wfst speech recognition system. In: ASRU, pp. 443–448 (2007)

    Google Scholar 

  20. Dixon, P.R., Oonishi, T., Furui, S.: Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition. Comp. Speech and Language 23, 510–526 (2009)

    Article  Google Scholar 

  21. Dowding, J., Moore, R., Andry, F., Moran, D.: Interleaving syntax and semantics in an efficient bottom-up parser. In: 32nd Annual Meeting of the Association for Computational Linguistics, New Maxico (June 1994)

    Google Scholar 

  22. Dybkjaer, L., Bernsen, N.O.: The disc approach to spoken language system development and evaluation. In: LREC (1998)

    Google Scholar 

  23. Fang, X.W., Zheng, F., Xu, M.: Topic forest: A plan-based dialog management structure. In: Proceedings of ICASSP 2001, Salt Lake City (2001)

    Google Scholar 

  24. Fleury, M., Downton, A.C., Clark, A.F.: Parallel structure in an integrated speech-recognition network. In: Amestoy, P.R., Berger, P., Daydé, M., Duff, I.S., Frayssé, V., Giraud, L., Ruiz, D. (eds.) Euro-Par 1999. LNCS, vol. 1685, pp. 995–1004. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  25. Foster, M.E., White, M., Setzer, A., Catizone, R.: Multimodal generation in the comic dialogue system. In: ACL 2005: Proceedings of the ACL 2005 on Interactive poster and demonstration sessions, pp. 45–48 (2005)

    Google Scholar 

  26. Fuchs, P., Nashashibi, F., Lourdeaux, D.: A theoretical approach of the design and evaluation of a virtual reality device. In: Virtual reality and prototyping, Laval, France, pp. 11–20 (1999)

    Google Scholar 

  27. Goddeau, D., Meng, H., Polifroni, J., Seneff, S., Busayapongchaiy, S.: A form-based dialogue manager for spoken language applications. In: Proc. ICSLP, pp. 701–704 (1996)

    Google Scholar 

  28. Gorniak, P., Roy, D.: Situated language understanding as filtering perceived affordances. Cognitive Science, 197–231 (2007)

    Google Scholar 

  29. Guedj, R.: Human-machine interaction and digital signal processing. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, vol. 7, pp. 17–19 (1982)

    Google Scholar 

  30. He, Y., Young, S.: A data driven spoken language understanding system. In: IEEE Automatic Speech Recognition and Understanding Workshop, St. Thomas, U.S. Virgin Islands (December 2003)

    Google Scholar 

  31. He, Y., Young, S.: Hidden vector state hierarchical semantic parsing. In: IEEE ICASSP, Hong Kong, pp. 555–558 (2003)

    Google Scholar 

  32. Kain, A., Maccon, M.W.: Spectral voice conversion for text to speech synthesis. In: ICASSP, pp. 285–288 (1998)

    Google Scholar 

  33. Karam, W., Bredin, H., Greige, H., Chollet, G., Mokbel, C.: Talking-face identity verification, audiovisual forgery and robustness issues. EURASIP Journal on Advances in Signal Processing, Special Issue on Recent Advances in Biometric Systems: A Signal Processing Perspective (2009)

    Google Scholar 

  34. Kawamoto, S., Shimodaira, H., Nitta, T., Nishimoto, T., Nakamura, S., Itou, K., Morishima, S., Yotsukura, T., Kai, A., Lee, A., Yamashita, Y., Kobayashi, T., Tokuda, K., Hirose, K., Minematsu, N., Yamada, A., Den, Y., Utsuro, T., Sagayama, S.: Glatea: Open-source software for developing anthropomorphic spoken dialog agents incorporating voice dialogs in a multi-user virtual environment. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters, pp. 187–212. Springer, Berlin (2004)

    Chapter  Google Scholar 

  35. Kim, K., Lee, C., Jung, S., Lee, G.G.: A frame-based probabilistic framework for spoken dialog management using dialog examples. In: Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, pp. 120–127 (2008)

    Google Scholar 

  36. Lamere, P., et al.: Design of the cmu sphinx-4 decoder. In: Proc. EUROSPEECH, pp. 1181–1184 (2003)

    Google Scholar 

  37. Landragin, F.: Physical, semantic and pragmatic levels for multimodal fusion and fission. In: Proceedings of the Seventh International Workshop on Computational Semantics (IWCS 2007), pp. 346–350. Universitätsverlag Tilburg (January 2007)

    Google Scholar 

  38. Lane, I., Ueno, S., Kawahara, T.: Cooperative dialogue planning with user and situation models via example-based training. In: Proc. Workshop on Man-Machine Symbiotic Systems, Kyoto, Japan, pp. 93–102 (2004)

    Google Scholar 

  39. Lee, A., Kawahara, T., Shikano, K.: Julius - an open source real-time large vocabulary recognition engine. In: Proc. EUROSPEECH, Aalborg, pp. 1691–1694 (2001)

    Google Scholar 

  40. Lee, C., Jung, S., Kim, S., Lee, G.G.: Example-based dialog modeling for practical multi-domain dialog system. Speech Commun. 51(5), 466–484 (2009)

    Article  Google Scholar 

  41. Levin, E., Pieraccini, R.: CHRONUS, the next generation. In: DARPA Speech and Natural Language Workshop, January 1995, pp. 269–271 (1995)

    Google Scholar 

  42. Levin, E., Pieraccini, R.: Concept-based spontaneous speech understanding system. In: Eurospeech, pp. 555–558 (1995)

    Google Scholar 

  43. Levin, E., Pieraccini, R., Eckert, W.: Learning dialogue strategies within the markov decision process framework. In: Proc. IEEE ASRU, pp. 72–79 (1997)

    Google Scholar 

  44. Lin, E., Yu, K., Rutenbar, R.A., Chen, T.: Moving speech recognition from software to silicon: the in silico vox project. In: Interspeech (2006)

    Google Scholar 

  45. Lin, E., Yu, K., Rutenbar, R.A., Chen, T.: A 1000-word vocabulary, speaker independent, continuous live-mode speech recognizer implemented in a single fpga. In: Int. Symposium on Field-Programmable Gate Arrays, FPGA (2007)

    Google Scholar 

  46. Litman, D.J., Allen, J.F.: A plan recognition model for subdialogues in conversations. Cognitive Science 11(2), 163–200 (1987)

    Article  Google Scholar 

  47. Madcow: Multi-site data collection for a spoken language corpus. In: DARPA Speech and Natural Language Workshop (1992)

    Google Scholar 

  48. Maynard, H., McTait, K., Mostefa, D., Devillers, L., Rosset, S., Paroubek, P., Bousquet, C., Choukri, K., Goulian, J., Antoine, J.-Y., Béchet, F., Bontron, O., Charnay, L., Romary, L., Vergnes, M.: Constitution d’un corpus de dialogue oral pour l’évaluation automatique de la compréhension hors et en contexte du dialogue. In: JEP (2004)

    Google Scholar 

  49. Miller, S., Bates, M., Bobrow, R., Ingria, R., Makhoul, J., Schwartz, R.: Recent progress in hidden understanding models. In: DARPA Speech and Natural Language Workshop, Austin, January 1995, pp. 276–280. Morgan Kaufman, San Francisco (1995)

    Google Scholar 

  50. MITRE, ARPA communicator homepage (2003)

    Google Scholar 

  51. Mohamed El Hadj, Y.O., Revol, N., Meziane, A.: Parallelization of automatic speech recognition, Research report 4110, INRIA (2001)

    Google Scholar 

  52. De Mori, R.: Spoken dialogues with computers. Academic Press, London (1998)

    Google Scholar 

  53. Nguyen, A., Wayne, W.: An agent-based approach to dialogue management in personal assistants. In: IUI 2005: Intelligent User Interfaces, pp. 137–144 (2005)

    Google Scholar 

  54. Noda, H., Shirazi, M.N., Zhang, B.: A parallel processing algorithm for speech recognition using markov random fields. Communication Research Laboratory 41(2), 87–100 (1994)

    Google Scholar 

  55. Orkin, J., Roy, D.: The restaurant game: Learning social behavior and language from thousands of players online. Journal of Game Development, 39–60 (December 2007)

    Google Scholar 

  56. Perrot, P., Morel, M., Razik, J., Chollet, G.: Vocal forgery in forensic sciences.In: International Conference on Forensic Applications and Techniques in Telecommunications. Information and Multimedia, e-Forensics 2009, 7p. (2009)

    Google Scholar 

  57. Phillips, S., Rogers, A.: Parallel speech recognition. In: InterSpeech 1997, pp. 135–138 (1997)

    Google Scholar 

  58. Rajman, M., Ailomaa, M., Lisowska, A., Melichar, M., Armstrong, S.: Extending the wizard of oz methodology for language-enabled multimodal systems. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), Genoa, Italy, May 22-28, pp. 2539–2543 (2006)

    Google Scholar 

  59. Raymond, C., Bechet, F., Camelin, N., de Mori, R., Damnati, G.: Semantic interpretation with error correction. In: IEEE ICASSP, Montreal (2005)

    Google Scholar 

  60. Lee, C., Lee, S., Lee, G.: Example-based dialog modeling for english conversation tutoring. In: Proceedings of the 2nd International Conference on Next Generation Computing (2007)

    Google Scholar 

  61. Salber, D., Coutaz, J.: Applying the wizard of oz technique to the study of multimodal systems. In: Human-Computer Interaction Selected Papers. LNCS, pp. 219–230. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  62. Seneff, S.: Robust parsing for spoken language systems. In: IEEE International Confrence on Acoustics, Speech and Signal Processing, San Francisco (1992)

    Google Scholar 

  63. Singh, S., Kearns, M., Litman, D., Walker, M.: Reinforcement learning for spoken dialogue systems. In: Proc. NIPS (1999)

    Google Scholar 

  64. Stallard, D.: Evaluation results for the talk’n’travel system. In: Human Language Technology Conference, San Diego, California (Mars 2001)

    Google Scholar 

  65. Sutton, S., et al.: Universal speech tools: The cslu toolkit. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 3221–3224 (1998)

    Google Scholar 

  66. Tokuda, K., Zen, H., Black, A.W.: An hmm-based speech synthesis system applied to english. In: Proceedings of IEEE Speech Synthesis Workshop, pp. 227–230 (2002)

    Google Scholar 

  67. Wahlster, W., Reithinger, N., Blocher, A.: Smartkom: Towards multimodal dialogues with anthropomorphic interface agents. In: MTI Status Conference, Saarbrücken, Germany, October 26-27 (2001)

    Google Scholar 

  68. Walker, M., Passonneau, R., Boland, J.: Quantitative and qualitative evaluation of darpa communicator spoken dialog systems. In: ACL/EACL Workshop (2002)

    Google Scholar 

  69. Ward, W., Issar, S.: Recent improvements in the CMU spoken language understanding system. In: ARPA Human Language Technology Workshop, pp. 213–216. Morgan Kaufman, San Francisco (1996)

    Google Scholar 

  70. Williams, J.D., Young, S.J.: Partially observable markov decision process for spoken dialog systems. Computer Speech and Language 21, 231–422 (2007)

    Article  Google Scholar 

  71. Young, S.: The htk hidden markov model toolkit: Design and philosophy, Technical report, Cambridge University Engineering Department, UK (1994)

    Google Scholar 

  72. Zeigler, B., Mazor, B.: Dialog design for a speech-interactive automation system. In: Proc. EUROSPEECH 1995, Madrid, Spain, pp. 113–116 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chollet, G., Amehraye, A., Razik, J., Zouari, L., Khemiri, H., Mokbel, C. (2010). Spoken Dialogue in Virtual Worlds. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds) Development of Multimodal Interfaces: Active Listening and Synchrony. Lecture Notes in Computer Science, vol 5967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12397-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12397-9_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12396-2

  • Online ISBN: 978-3-642-12397-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics