Abstract
Multimodal dialog systems can be defined as computer systems that process two or more user input modes and combine them with multimedia system output. This paper is focused on the multimodal input, providing a proposal to process and fusion the multiple input modalities in the dialog manager of the system, so that a single combined input is used to select the next system action. We describe an application of our technique to build multimodal systems that process user’s spoken utterances, tactile and keyboard inputs, and information related to the context of the interaction. This information is divided in our proposal into external and internal context, user’s internal, represented in our contribution by the detection of their intention during the dialog and their emotional state.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Filipe, P., Mamede, N.: Ambient Intelligence Interaction via Dialogue Systems, pp. 109–124. Intech (2010)
López-Cózar, R., Callejas, Z.: Multimodal Dialogue for Ambient Intelligence and Smart Environments, pp. 559–579. Springer (2010)
Jaimes, A., Sebe, N.: Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding 108, 116–134 (2007)
Turk, M.: Multimodal interaction: A review. Pattern Recognition Letters 36, 189–195 (2014)
López-Cózar, R., Araki, M.: Spoken, Multilingual and Multimodal Dialogue Systems. John Wiley & Sons Publishers (2005)
Pieraccini, R.: The Voice in the Machine: Building Computers that Understand Speech. The MIT Press (2012)
Wahlster, W.: SmartKom: Foundations of Multimodal Dialog Systems. Springer (2006)
Dumas, B.: Frameworks, description languages and fusion engines for multimodal interactive systems. Master’s thesis, University of Fribourg, Fribourg, Switzerland (2010)
Traum, D., Larsson, S.: The Information State Approach to Dialogue Management, pp. 325–353. Kluwer (2003)
Williams, J., Young, S.: Partially Observable Markov Decision Processes for Spoken Dialog Systems. Computer Speech and Language 21(2), 393–422 (2007)
Griol, D., Hurtado, L., Segarra, E., Sanchis, E.: A Statistical Approach to Spoken Dialog Systems Design and Evaluation. Speech Communication 50(8-9), 666–682 (2008)
Ruiz, N., Chen, F., Oviatt, S.: Multimodal input, pp. 211–277. Elsevier (2010)
Dai, X., Khorram, S.: Data fusion using artificial neural networks: a case study on multitemporal change analysis. Computers, Environment and Urban Systems 23(1), 19–31 (1999)
Tsilfidis, A., Mporas, I., Mourjopoulos, J., Fakotakis, N.: Automatic speech recognition performance in different room acoustic environments with and without dereverberation preprocessing. Computer Speech & Language 27(1), 380–395 (2013)
Wu, W.L., Lu, R.Z., Duan, J.Y., Liu, H., Gao, F., Chen, Y.Q.: Spoken language understanding using weakly supervised learning. Computer Speech & Language 24(2), 358–382 (2010)
Minker, W.: Design considerations for knowledge source representations of a stochastically-based natural language understanding component. Speech Communication 28(2), 141–154 (1999)
Traum, D., Larsson, S.: The Information State Approach to Dialogue Management. In: Current and New Directions in Discourse and Dialogue. Kluwer Academic Publishers (2003)
Möller, S., Englert, R., Engelbrecht, K., Hafner, V., Jameson, A., Oulasvirta, A., Raake, A., Reithinger, N.: MeMo: towards automatic usability evaluation of spoken dialogue services by user error simulations. In: Proc. Interspeech 2006, pp. 1786–1789 (2006)
Chung, G.: Developing a flexible spoken dialog system using simulation. In: Proc. ACL 2004, pp. 63–70 (2004)
Schatzmann, J., Weilhammer, K., Stuttle, M., Young, S.: A Survey of Statistical User Simulation Techniques for Reinforcement-Learning of Dialogue Management Strategies. Knowledge Engineering Review 21(2), 97–126 (2006)
Griol, D., Carbó, J., Molina, J.: A statistical simulation technique to develop and evaluate conversational agents. AI Communication 26(4), 355–371 (2013)
Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Communication 53(9-10), 1062–1087 (2011)
Callejas, Z., López-Cózar, R.: Influence of contextual information in emotion annotation for spoken dialogue systems. Speech Communication 50(5), 416–433 (2008)
Griol, D., Carbó, J., Molina, J.: Bringing context-aware access to the web through spoken interaction. Applied Intelligence 38(4), 620–640 (2013)
Vo, M., Wood, C.: Building an application framework for speech and pen input integration in multimodal learning interfaces. In: Proc. of ICASSP 1996, pp. 3545–3548 (1996)
Johnston, M.: Unification-based multimodal parsing. In: Proc. of ACL 1996, pp. 624–630 (1996)
Wu, L., Oviatt, S., Cohen, P.: From members to teams to committee - a robust approach to gestural and multimodal recognition. IEEE Transactions on Neural Networks 13(4), 972–982 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Griol, D., Molina, J.M., García-Herrero, J. (2014). A Proposal for Processing and Fusioning Multiple Information Sources in Multimodal Dialog Systems. In: Corchado, J.M., et al. Highlights of Practical Applications of Heterogeneous Multi-Agent Systems. The PAAMS Collection. PAAMS 2014. Communications in Computer and Information Science, vol 430. Springer, Cham. https://doi.org/10.1007/978-3-319-07767-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-07767-3_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07766-6
Online ISBN: 978-3-319-07767-3
eBook Packages: Computer ScienceComputer Science (R0)