Abstract
More effective, efficient and natural human computer or computer mediated human-human interaction will require both automated understanding and generation of multimedia. Fluent conversational interaction demands explicit models of the user, discourse, task and context. It will also require a richer understanding of media (i.e., text, audio, video), both in its use in the interface to support interaction with the user as well as its use in access to content by the user during a session. Multimedia dialogue prototypes have been developed in several application domains including CUBRICON (for a mission planning domain) (Neal and Shapiro, 1991), XTRA (tax-form preparation) (Wahlster, 1991), AIMI (air mission planning) (Burger and Marshall, 1993), and AlFresco (art history information exploration) (Stock et al., 1993). Typically, these systems parse mixed (typically asynchronous) multimedia input and generate coordinated multimedia output. They also attempt to maintain coherency, cohesion, and consistency across both multimedia input and output. For example, these systems often support integrated language and deixis for both input and output. They extend research in discourse and user modeling (Kobsa and Wahlster, 1989) by incorporating representations of media to enable media (cross) reference and reuse over the course of a session with a user. These enhanced representations support the exploitation of user perceptual abilities and media preferences as well as the resolution of multimedia references (e.g. “Send this plane there” articulated with synchronous gestures on a map).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aberdeen, J., Burger, J., Day, D., Hirschman, L., Robinson, P. and Vilain, M. 1995. Description of the Alembic System Used for MUC-6, Proceedings of the Sixth Message Understanding Conference. Advanced Research Projects Agency Information Technology Office, Columbia, MD.
Brill, E. 1995. Transformation-based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics, 21 (4).
Brown, M.G., Foote, J.T., Jones, G.J.F., Sparck-Jones, K. and Young, S.J. 1995. Automatic Content-Based Retrieval of Broadcast News, Proceedings of ACM Multimedia. San Francisco, CA, p. 35–44.
Dubner, B. 1996. Automatic Scene Detector and Videotape logging system, User Guide, Dubner International, Inc., Copyright 1995.
Grosz, B. J. and Sidner, C. 1986. Attention, Intentions, and the Structure of Discourse. Computational Linguistics 12 (3): 175–204.
Hearst, M. A. 1994. Multi-Paragraph Segmentation of Expository Text, ACL-94, Las Cruces, New Mexico.
Kobsa, A. and Wahlster, W. (eds.) 1989. User Models in Dialog Systems. Berlin: Springer-Verlag.
Mani, I. 1995. Very Large Scale Text Summarization, Technical Note, MITRE Corporation. Mani, I., House, D., Maybury, M. and Green, M. 1997. Towards Content-based Browsing of
Broadcast News Video. In Maybury, M. (ed.) Intelligent Multimedia Information
Retrieval, AAAUMIT Press, 241–258.
Maybury, M. T. (ed.) 1993. Intelligent Multimedia Interfaces. Menlo Park: AAAI/MIT Press. (http://www.aaai.org/Publications/Press/Catalog/maybury.html)
Maybury, M. T. 1995. Generating Summaries from Event Data. International Journal of Information Processing and Management: Special Issue on Text Summarization 31 (5): 735–751.
Maybury, M. T. (ed.) 1997. Intelligent Multimedia Information Retrieval. Menlo Park: AAAI/M T Press. (http://www.aaai org:80/Press/Books/Maybury-2/)
Maybury, M., Merlino, A. and Morey, D. 1997. Broadcast News Navigation using Story Segments, Proceedings of the ACM International Multimedia Conference, Seattle, WA, November 8–14, 381–391.
Michell, R. 1996. Forager for Information on the Super Highway (FISH). Unpublished Manuscript.
Proceedings of the Sixth Message Understanding Conference. Advanced Research Projects Agency Information Technology Office, Columbia, MD, 6–8 November, 1995.
Pelachaud, C. 1992. Functional Decomposition of Facial Expressions for an Animation System. In Catarci, T., Costabile, M. F. and Levialdi, S. (eds). Advanced Visual Interfaces: Proceedings of the International Workshop, Singapore: World Scientific Series in Computer Science, Vol 36: 26–49.
Reiter, E., Mellish, C. and Levine, J. 1995. Automatic Generation of Technical Documentation. Applied Artificial Intelligence 9 (3): 259–287
Shahraray, B. and Gibbon, D. 1995. Automated Authoring of Hypermedia Documents of Video Programs. Proceedings of ACM Multimedia. San Francisco, CA, p. 401–409.
Smotroff, I., Hirschman, L. and Bayer, S. 1995. Integrating Natural Language with Large DataspaceVisualization, to appear in Adam, N and Bhargava, B. (eds), Advances in Digital Libraries, Lecture Notes in Computer Science, Springer Verlag.
Stevens et al., 1994. Informedia–Improving Access to Digital Video, Interactions, October, pp. 67–71.
Stock, O. and the ALFRESCO Project Team. 1993. ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration. In Intelligent Multimedia Interfaces, ed. M. Maybury, 197–224. Menlo Park: AAAI/NIIT Press.
Wahlster, W. 1991. User and Discourse Models for Multimodal Communication. In Sullivan, J. W. and Tyler, S. W. (eds). Intelligent User Interfaces. Frontier Series. New York: ACM Press, 45–67.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer Science+Business Media New York
About this chapter
Cite this chapter
Maybury, M.T. (1999). Conversational Multimedia Interaction. In: Wilks, Y. (eds) Machine Conversations. The Springer International Series in Engineering and Computer Science, vol 511. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-5687-6_5
Download citation
DOI: https://doi.org/10.1007/978-1-4757-5687-6_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5092-5
Online ISBN: 978-1-4757-5687-6
eBook Packages: Springer Book Archive