Conversational Multimedia Interaction

  • M. T. Maybury
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 511)


More effective, efficient and natural human computer or computer mediated human-human interaction will require both automated understanding and generation of multimedia. Fluent conversational interaction demands explicit models of the user, discourse, task and context. It will also require a richer understanding of media (i.e., text, audio, video), both in its use in the interface to support interaction with the user as well as its use in access to content by the user during a session. Multimedia dialogue prototypes have been developed in several application domains including CUBRICON (for a mission planning domain) (Neal and Shapiro, 1991), XTRA (tax-form preparation) (Wahlster, 1991), AIMI (air mission planning) (Burger and Marshall, 1993), and AlFresco (art history information exploration) (Stock et al., 1993). Typically, these systems parse mixed (typically asynchronous) multimedia input and generate coordinated multimedia output. They also attempt to maintain coherency, cohesion, and consistency across both multimedia input and output. For example, these systems often support integrated language and deixis for both input and output. They extend research in discourse and user modeling (Kobsa and Wahlster, 1989) by incorporating representations of media to enable media (cross) reference and reuse over the course of a session with a user. These enhanced representations support the exploitation of user perceptual abilities and media preferences as well as the resolution of multimedia references (e.g. “Send this plane there” articulated with synchronous gestures on a map).


Broadcast News Natural Language Generation News Program Multimedia Interface Closed Caption 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aberdeen, J., Burger, J., Day, D., Hirschman, L., Robinson, P. and Vilain, M. 1995. Description of the Alembic System Used for MUC-6, Proceedings of the Sixth Message Understanding Conference. Advanced Research Projects Agency Information Technology Office, Columbia, MD.Google Scholar
  2. Brill, E. 1995. Transformation-based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics, 21 (4).Google Scholar
  3. Brown, M.G., Foote, J.T., Jones, G.J.F., Sparck-Jones, K. and Young, S.J. 1995. Automatic Content-Based Retrieval of Broadcast News, Proceedings of ACM Multimedia. San Francisco, CA, p. 35–44.Google Scholar
  4. Dubner, B. 1996. Automatic Scene Detector and Videotape logging system, User Guide, Dubner International, Inc., Copyright 1995.Google Scholar
  5. Grosz, B. J. and Sidner, C. 1986. Attention, Intentions, and the Structure of Discourse. Computational Linguistics 12 (3): 175–204.Google Scholar
  6. Hearst, M. A. 1994. Multi-Paragraph Segmentation of Expository Text, ACL-94, Las Cruces, New Mexico.Google Scholar
  7. Kobsa, A. and Wahlster, W. (eds.) 1989. User Models in Dialog Systems. Berlin: Springer-Verlag.zbMATHGoogle Scholar
  8. Mani, I. 1995. Very Large Scale Text Summarization, Technical Note, MITRE Corporation. Mani, I., House, D., Maybury, M. and Green, M. 1997. Towards Content-based Browsing ofGoogle Scholar
  9. Broadcast News Video. In Maybury, M. (ed.) Intelligent Multimedia InformationGoogle Scholar
  10. Retrieval, AAAUMIT Press, 241–258.Google Scholar
  11. Maybury, M. T. (ed.) 1993. Intelligent Multimedia Interfaces. Menlo Park: AAAI/MIT Press. ( Scholar
  12. Maybury, M. T. 1995. Generating Summaries from Event Data. International Journal of Information Processing and Management: Special Issue on Text Summarization 31 (5): 735–751.Google Scholar
  13. Maybury, M. T. (ed.) 1997. Intelligent Multimedia Information Retrieval. Menlo Park: AAAI/M T Press. (http://www.aaai org:80/Press/Books/Maybury-2/)Google Scholar
  14. Maybury, M., Merlino, A. and Morey, D. 1997. Broadcast News Navigation using Story Segments, Proceedings of the ACM International Multimedia Conference, Seattle, WA, November 8–14, 381–391.Google Scholar
  15. Michell, R. 1996. Forager for Information on the Super Highway (FISH). Unpublished Manuscript.Google Scholar
  16. [MUC-6]
    Proceedings of the Sixth Message Understanding Conference. Advanced Research Projects Agency Information Technology Office, Columbia, MD, 6–8 November, 1995.Google Scholar
  17. Pelachaud, C. 1992. Functional Decomposition of Facial Expressions for an Animation System. In Catarci, T., Costabile, M. F. and Levialdi, S. (eds). Advanced Visual Interfaces: Proceedings of the International Workshop, Singapore: World Scientific Series in Computer Science, Vol 36: 26–49.Google Scholar
  18. Reiter, E., Mellish, C. and Levine, J. 1995. Automatic Generation of Technical Documentation. Applied Artificial Intelligence 9 (3): 259–287CrossRefGoogle Scholar
  19. Shahraray, B. and Gibbon, D. 1995. Automated Authoring of Hypermedia Documents of Video Programs. Proceedings of ACM Multimedia. San Francisco, CA, p. 401–409.Google Scholar
  20. Smotroff, I., Hirschman, L. and Bayer, S. 1995. Integrating Natural Language with Large DataspaceVisualization, to appear in Adam, N and Bhargava, B. (eds), Advances in Digital Libraries, Lecture Notes in Computer Science, Springer Verlag.Google Scholar
  21. Stevens et al., 1994. Informedia–Improving Access to Digital Video, Interactions, October, pp. 67–71.Google Scholar
  22. Stock, O. and the ALFRESCO Project Team. 1993. ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration. In Intelligent Multimedia Interfaces, ed. M. Maybury, 197–224. Menlo Park: AAAI/NIIT Press.Google Scholar
  23. Wahlster, W. 1991. User and Discourse Models for Multimodal Communication. In Sullivan, J. W. and Tyler, S. W. (eds). Intelligent User Interfaces. Frontier Series. New York: ACM Press, 45–67.Google Scholar

Copyright information

© Springer Science+Business Media New York 1999

Authors and Affiliations

  • M. T. Maybury

There are no affiliations available

Personalised recommendations