Summary
Multimodal dialogue systems exploit one of the major characteristics of humanhuman interaction: the coordinated use of different modalities. Allowing all of the modalities to refer to and depend upon each other is a key to the richness of multimodal communication. We introduce the notion of symmetric multimodality for dialogue systems in which all input modes (e.g., speech, gesture, facial expression) are also available for output, and vice versa. A dialogue system with symmetric multimodality must not only understand and represent the user’s multimodal input, but also its own multimodal output. We present an overview of the SmartKom system that provides full symmetric multimodality in a mixed-initiative dialogue system with an embodied conversational agent. SMARTKOM represents a new generation of multimodal dialogue systems that deal not only with simple modality integration and synchronization but cover the full spectrum of dialogue phenomena that are associated with symmetric multimodality (including crossmodal references, one-anaphora, and backchannelling). We show that SmartKom’s plug-and-play architecture supports multiple recognizers for a single modality, e.g., the user’s speech signal can be processed by three unimodal recognizers in parallel (speech recognition, emotional prosody, boundary prosody). We detail SmartKom’s three-tiered representation of multimodal discourse, consisting of a domain layer, a discourse layer, and a modality layer. We discuss the limitations of SmartKom and how they are overcome in the follow-up project SmartWeb. In addition, we present the research roadmap for multimodality addressing the key open research questions in this young field. To conclude, we discuss the economic and scientific impact of the SMARTKOM project, which has led to more than 50 patents and 29 spin-off products.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. Alexandersson and T. Becker. The Formal Foundations Underlying Overlay. In: Proc. 5th Int. Workshop on Computational Semantics (IWCS-5), pp. 22–36, Tilburg, The Netherlands, February 2003.
A. Batliner, R. Huber, H. Niemann, E. Nöth, J. Spilker, and K. Fischer. The Recognition of Emotion. In: W. Wahlster (ed.), Verbmobil: Foundations of Speech-to-Speech Translation, pp. 122–130, Berlin Heidelberg New York, 2000. Springer.
H. Bunt, M. Kipp, M. Maybury, and W. Wahlster. Fusion and Coordination for Multimodal Interactive Information Presentation. In: O. Stock and M. Zancanaro (eds.), Multimodal Intelligent Information Presentation, vol. 27 of Text, Speech and Language Technology, pp. 325–340, Berlin Heidelberg New York, 2005. Springer.
R. Catizone, A. Setzer, and Y. Wilks. Multimodal Dialogue Management in the CoMIC Project. In: Proc. EACL-03 Workshop on “Dialogue Systems: Interaction, Adaptation and Styles of Management”, Budapest, Hungary, April 2003. European Chapter of the Association for Computational Linguistics (EACL).
P.R. Cohen, M. Johnston, D. McGee, S.L. Oviatt, J.A. Pittman, I. Smith, L. Chen, and J. Clow. QuickSet: Multimodal Interaction for Distributed Applications. In: Proc. 5th Int. Multimedia Conference (ACM Multimedia’ 97), pp. 31–40, Seattle, WA, 1997. ACM.
D. Fensel, J. Hendler, H. Lieberman, and W. Wahlster (eds.). Spinning the Semantic Web. Bringing the World Wide Web to Its Full Potential. MIT Press, Cambridge, MA, 2003.
D. Fensel, F. van Harmelen, I. Horrocks, D.L. McGuinness, and P.F. Patel-Schneider. OIL: An Ontology Infrastructure for the Semantic Web. IEEE Intelligent Systems, 16(2):38–45, 2001.
I. Gurevych, S. Merten, and R. Porzel. Automatic Creation of Interface Specifi-cations from Ontologies. In: H. Cunningham and J. Patrick (eds.), Proc. HLTNAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS), pp. 59–66, Edmonton, Canada, 2003. Association for Computational Linguistics.
G. Herzog, H. Kirchmann, S. Merten, A. Ndiaye, P. Poller, and T. Becker. MULTIPLATFORM Testbed: An Integration Platform for Multimodal Dialog Systems. In: H. Cunningham and J. Patrick (eds.), Proc. HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS), pp. 75–82, Edmonton, Canada, 2003. Association for Computational Linguistics.
M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, and P. Maloor. MATCH: An Architecture for Multimodal Dialogue Systems. In: Proc. 10th ACM Int. Symposium on Advances in Geographic Information Systems, pp. 376–383, Washington, DC, 2002.
M. Löckelt, T. Becker, N. Pfleger, and J. Alexandersson. Making Sense of Partial. In: C.M. Johan Bos, Mary Ellen Foster (ed.), Proc. 6th Workshop on the Semantics and Pragmatics of Dialogue (EDILOG 2002), pp. 101–107, Edinburgh, UK, September 2002.
S. LuperFoy. Discourse Pegs: A Computational Analysis of Context-Dependent Referring Expressions. PhD thesis, University of Texas at Austin, December 1991.
D.L. Martin, A.J. Cheyer, and D.B. Moran. The Open Agent Architecture: A Framework for Building Distributed Software Systems. Applied Artificial Intelligence, 13(1–2):91–128, 1999.
M.T. Maybury and W. Wahlster. Intelligent User Interfaces: An Introduction. In: M.T. Maybury and W. Wahlster (eds.), Readings in Intelligent User Interfaces, pp. 1–13, San Francisco, CA, 1998. Morgan Kaufmann.
S. Oviatt. Multimodal Interfaces. In: J.A. Jacko and A. Sears (eds.), The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, pp. 286–304, Mahwah, NJ, 2003. Lawrence Erlbaum.
S.L. Oviatt, T. Darrell, M.T. Maybury, and W. Wahlster (eds.). Proc. Int. Conf. on Multimodal Interfaces (ICMI’03), Vancouver, Canada, November 5–7 2003. ACM.
N. Pfleger, J. Alexandersson, and T. Becker. Scoring Functions for Overlay and Their Application in Discourse Processing. In: Proc. KONVENS 2002, pp. 139–146, Saarbruecken, Germany, September–October 2002.
N. Reithinger, S. Bergweiler, R. Engel, G. Herzog, N. Pfleger, M. Romanelli, and S. Sonntag. A Look Under the Hood — Design and Development of the First SmartWeb System Demonstrator. In: Proc. Int. Conf. on Multimodal Interfaces (ICMI’05), pp. 159–166, Trento, Italy, 2005.
S. Salmon-Alt. Reference Resolution Within the Framework of Cognitive Grammar. In: Int. Colloquium on Cognitive Science, pp. 1–15, San Sebastian, Spain, May 2001.
F. Schiel, S. Steininger, and U. Türk. The SmartKom Multimodal Corpus at BAS. In: Proc. 3rd Int. Conf. on Language Resources and Evaluation (LREC 2002), pp. 35–41, Las Palmas, Spain, 2002.
S. Seneff, R. Lau, and J. Polifroni. Organization, Communication, and Control in the Galaxy-II Conversational System. In: Proc. EUROSPEECH-99, pp. 1271–1274, Budapest, Hungary, 1999.
W. Wahlster. User and Discourse Models for Multimodal Communication. In: J.W. Sullivan and S.W. Tyler (eds.), Intelligent User Interfaces, pp. 45–67, New York, 1991. ACM.
W. Wahlster. SmartKom: Fusion and Fission of Speech, Gestures, and Facial Expressions. In: Proc. 1st Int. Workshop on Man-Machine Symbiotic Systems, pp. 213–225, Kyoto, Japan, 2002.
W. Wahlster, E. André, W. Finkler, H.J. Profitlich, and T. Rist. Plan-Based Integration of Natural Language and Graphics Generation. Artificial Intelligence, 63:387–427, 1993.
W. Wahlster, N. Reithinger, and A. Blocher. SmartKom: Multimodal Communication with a Life-like Character. In: Proc. EUROSPEECH-01, vol. 3, pp. 1547–1550, Aalborg, Denmark, September 2001.
W. Wahlster and R. Wasinger. The Anthropomorphized Product Shelf: Symmetric Multimodal Interaction with Instrumented Environments. In: E. Aarts and J. Encarnação (eds.), True Visions: The Emergence of Ambient Intelligence, Berlin Heidelberg New York, 2006. Springer.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wahlster, W. (2006). Dialogue Systems Go Multimodal: The SmartKom Experience. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems. Cognitive Technologies. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-36678-4_1
Download citation
DOI: https://doi.org/10.1007/3-540-36678-4_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23732-7
Online ISBN: 978-3-540-36678-2
eBook Packages: Computer ScienceComputer Science (R0)