Skip to main content

Dialogue Systems Go Multimodal: The SmartKom Experience

  • Chapter
SmartKom: Foundations of Multimodal Dialogue Systems

Part of the book series: Cognitive Technologies ((COGTECH))

Summary

Multimodal dialogue systems exploit one of the major characteristics of humanhuman interaction: the coordinated use of different modalities. Allowing all of the modalities to refer to and depend upon each other is a key to the richness of multimodal communication. We introduce the notion of symmetric multimodality for dialogue systems in which all input modes (e.g., speech, gesture, facial expression) are also available for output, and vice versa. A dialogue system with symmetric multimodality must not only understand and represent the user’s multimodal input, but also its own multimodal output. We present an overview of the SmartKom system that provides full symmetric multimodality in a mixed-initiative dialogue system with an embodied conversational agent. SMARTKOM represents a new generation of multimodal dialogue systems that deal not only with simple modality integration and synchronization but cover the full spectrum of dialogue phenomena that are associated with symmetric multimodality (including crossmodal references, one-anaphora, and backchannelling). We show that SmartKom’s plug-and-play architecture supports multiple recognizers for a single modality, e.g., the user’s speech signal can be processed by three unimodal recognizers in parallel (speech recognition, emotional prosody, boundary prosody). We detail SmartKom’s three-tiered representation of multimodal discourse, consisting of a domain layer, a discourse layer, and a modality layer. We discuss the limitations of SmartKom and how they are overcome in the follow-up project SmartWeb. In addition, we present the research roadmap for multimodality addressing the key open research questions in this young field. To conclude, we discuss the economic and scientific impact of the SMARTKOM project, which has led to more than 50 patents and 29 spin-off products.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • J. Alexandersson and T. Becker. The Formal Foundations Underlying Overlay. In: Proc. 5th Int. Workshop on Computational Semantics (IWCS-5), pp. 22–36, Tilburg, The Netherlands, February 2003.

    Google Scholar 

  • A. Batliner, R. Huber, H. Niemann, E. Nöth, J. Spilker, and K. Fischer. The Recognition of Emotion. In: W. Wahlster (ed.), Verbmobil: Foundations of Speech-to-Speech Translation, pp. 122–130, Berlin Heidelberg New York, 2000. Springer.

    Google Scholar 

  • H. Bunt, M. Kipp, M. Maybury, and W. Wahlster. Fusion and Coordination for Multimodal Interactive Information Presentation. In: O. Stock and M. Zancanaro (eds.), Multimodal Intelligent Information Presentation, vol. 27 of Text, Speech and Language Technology, pp. 325–340, Berlin Heidelberg New York, 2005. Springer.

    Google Scholar 

  • R. Catizone, A. Setzer, and Y. Wilks. Multimodal Dialogue Management in the CoMIC Project. In: Proc. EACL-03 Workshop on “Dialogue Systems: Interaction, Adaptation and Styles of Management”, Budapest, Hungary, April 2003. European Chapter of the Association for Computational Linguistics (EACL).

    Google Scholar 

  • P.R. Cohen, M. Johnston, D. McGee, S.L. Oviatt, J.A. Pittman, I. Smith, L. Chen, and J. Clow. QuickSet: Multimodal Interaction for Distributed Applications. In: Proc. 5th Int. Multimedia Conference (ACM Multimedia’ 97), pp. 31–40, Seattle, WA, 1997. ACM.

    Google Scholar 

  • D. Fensel, J. Hendler, H. Lieberman, and W. Wahlster (eds.). Spinning the Semantic Web. Bringing the World Wide Web to Its Full Potential. MIT Press, Cambridge, MA, 2003.

    Google Scholar 

  • D. Fensel, F. van Harmelen, I. Horrocks, D.L. McGuinness, and P.F. Patel-Schneider. OIL: An Ontology Infrastructure for the Semantic Web. IEEE Intelligent Systems, 16(2):38–45, 2001.

    Article  Google Scholar 

  • I. Gurevych, S. Merten, and R. Porzel. Automatic Creation of Interface Specifi-cations from Ontologies. In: H. Cunningham and J. Patrick (eds.), Proc. HLTNAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS), pp. 59–66, Edmonton, Canada, 2003. Association for Computational Linguistics.

    Google Scholar 

  • G. Herzog, H. Kirchmann, S. Merten, A. Ndiaye, P. Poller, and T. Becker. MULTIPLATFORM Testbed: An Integration Platform for Multimodal Dialog Systems. In: H. Cunningham and J. Patrick (eds.), Proc. HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS), pp. 75–82, Edmonton, Canada, 2003. Association for Computational Linguistics.

    Google Scholar 

  • M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, and P. Maloor. MATCH: An Architecture for Multimodal Dialogue Systems. In: Proc. 10th ACM Int. Symposium on Advances in Geographic Information Systems, pp. 376–383, Washington, DC, 2002.

    Google Scholar 

  • M. Löckelt, T. Becker, N. Pfleger, and J. Alexandersson. Making Sense of Partial. In: C.M. Johan Bos, Mary Ellen Foster (ed.), Proc. 6th Workshop on the Semantics and Pragmatics of Dialogue (EDILOG 2002), pp. 101–107, Edinburgh, UK, September 2002.

    Google Scholar 

  • S. LuperFoy. Discourse Pegs: A Computational Analysis of Context-Dependent Referring Expressions. PhD thesis, University of Texas at Austin, December 1991.

    Google Scholar 

  • D.L. Martin, A.J. Cheyer, and D.B. Moran. The Open Agent Architecture: A Framework for Building Distributed Software Systems. Applied Artificial Intelligence, 13(1–2):91–128, 1999.

    Google Scholar 

  • M.T. Maybury and W. Wahlster. Intelligent User Interfaces: An Introduction. In: M.T. Maybury and W. Wahlster (eds.), Readings in Intelligent User Interfaces, pp. 1–13, San Francisco, CA, 1998. Morgan Kaufmann.

    Google Scholar 

  • S. Oviatt. Multimodal Interfaces. In: J.A. Jacko and A. Sears (eds.), The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, pp. 286–304, Mahwah, NJ, 2003. Lawrence Erlbaum.

    Google Scholar 

  • S.L. Oviatt, T. Darrell, M.T. Maybury, and W. Wahlster (eds.). Proc. Int. Conf. on Multimodal Interfaces (ICMI’03), Vancouver, Canada, November 5–7 2003. ACM.

    Google Scholar 

  • N. Pfleger, J. Alexandersson, and T. Becker. Scoring Functions for Overlay and Their Application in Discourse Processing. In: Proc. KONVENS 2002, pp. 139–146, Saarbruecken, Germany, September–October 2002.

    Google Scholar 

  • N. Reithinger, S. Bergweiler, R. Engel, G. Herzog, N. Pfleger, M. Romanelli, and S. Sonntag. A Look Under the Hood — Design and Development of the First SmartWeb System Demonstrator. In: Proc. Int. Conf. on Multimodal Interfaces (ICMI’05), pp. 159–166, Trento, Italy, 2005.

    Google Scholar 

  • S. Salmon-Alt. Reference Resolution Within the Framework of Cognitive Grammar. In: Int. Colloquium on Cognitive Science, pp. 1–15, San Sebastian, Spain, May 2001.

    Google Scholar 

  • F. Schiel, S. Steininger, and U. Türk. The SmartKom Multimodal Corpus at BAS. In: Proc. 3rd Int. Conf. on Language Resources and Evaluation (LREC 2002), pp. 35–41, Las Palmas, Spain, 2002.

    Google Scholar 

  • S. Seneff, R. Lau, and J. Polifroni. Organization, Communication, and Control in the Galaxy-II Conversational System. In: Proc. EUROSPEECH-99, pp. 1271–1274, Budapest, Hungary, 1999.

    Google Scholar 

  • W. Wahlster. User and Discourse Models for Multimodal Communication. In: J.W. Sullivan and S.W. Tyler (eds.), Intelligent User Interfaces, pp. 45–67, New York, 1991. ACM.

    Google Scholar 

  • W. Wahlster. SmartKom: Fusion and Fission of Speech, Gestures, and Facial Expressions. In: Proc. 1st Int. Workshop on Man-Machine Symbiotic Systems, pp. 213–225, Kyoto, Japan, 2002.

    Google Scholar 

  • W. Wahlster, E. André, W. Finkler, H.J. Profitlich, and T. Rist. Plan-Based Integration of Natural Language and Graphics Generation. Artificial Intelligence, 63:387–427, 1993.

    Article  Google Scholar 

  • W. Wahlster, N. Reithinger, and A. Blocher. SmartKom: Multimodal Communication with a Life-like Character. In: Proc. EUROSPEECH-01, vol. 3, pp. 1547–1550, Aalborg, Denmark, September 2001.

    Google Scholar 

  • W. Wahlster and R. Wasinger. The Anthropomorphized Product Shelf: Symmetric Multimodal Interaction with Instrumented Environments. In: E. Aarts and J. Encarnação (eds.), True Visions: The Emergence of Ambient Intelligence, Berlin Heidelberg New York, 2006. Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wahlster, W. (2006). Dialogue Systems Go Multimodal: The SmartKom Experience. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems. Cognitive Technologies. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-36678-4_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-36678-4_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23732-7

  • Online ISBN: 978-3-540-36678-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics