Toward Adaptive Information Fusion in Multimodal Systems

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3869)


In recent years, a new generation of multimodal systems has emerged as a major direction within the HCI community. Multimodal interfaces and architectures are time-critical and data-intensive to develop, which poses new research challenges. The goal of the present work is to model and adapt to users’ multimodal integration patterns, so that faster and more robust systems can be developed with on-line adaptation to individual’s multimodal temporal thresholds. In this paper, we summarize past user-modeling results on speech and pen multimodal integration patterns, which indicate that there are two dominant types of multimodal integration pattern among users that can be detected very early and remain highly consistent. The empirical results also indicate that, when interacting with a multimodal system, users intermix unimodal with multimodal commands. Based on these results, we present new machine-learning results comparing three models of on-line system adaptation to users’ integration patterns, which were based on Bayesian Belief Networks. This work utilized data from ten adults who provided approximately 1,000 commands while interacting with a map-based multimodal system. Initial experimental results with our learning models indicated that 85% of users’ natural mixed input could be correctly classified as either unimodal or multimodal, and 82% of users’ mulitmodal input could be correctly classified as either sequentially or simultaneously integrated. The long-term goal of this research is to develop new strategies for combining empirical user modeling with machine learning techniques to bootstrap accelerated, generalized, and improved reliability of information fusion in new types of multimodal system.


User Modeling Machine Learning Technique Speech Recognition System Integration Pattern Multimodal Interface 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., Suhm, B.: Audio-visual and multimodal speech-based systems. In: The Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation, Boston, MA, pp. 102–203 (2000)Google Scholar
  2. 2.
    Oviatt, S.: Multimodal interfaces. In: The Handbook of Human-Computer Interaction, pp. 286–304. Law. Erlb. (2003) Google Scholar
  3. 3.
    Massaro, D., Stork, D.: Sensory ntegration and speech reading by humans and machines. American Sciences 86, 236–244 (1998) Google Scholar
  4. 4.
    Oviatt, S.: Integration and sychronization of input modes during multimodal human computer interaction. In: Proc. of CHI 1997, pp. 415–422 (1997) Google Scholar
  5. 5.
    Illina, I.: Tree-structured maximum a posteriori adaptation for a segment-based speech recognition system. In: Proc. of ICSLP 2002, pp. 1405–1408 (2002) Google Scholar
  6. 6.
    Xiao, B., Lunsford, R., Coulston, R., Wesson, M., Oviatt, S.L.: Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences. In: Proc. of ICMI 2003, Vancouver, B.C., pp. 265–272 (2003)Google Scholar
  7. 7.
    Bengio, S.: An asynchronous hidden markov model for audio-visual speech recognition. In: Advances in Neural Information Processing Systems, vol. 15, pp. 1213–1220 (2003)Google Scholar
  8. 8.
    Bengio, S.: Multimodal authentication using asynchronous hmms. In: AVBPA, pp. 770–777 (2003)Google Scholar
  9. 9.
    Howard, A., Jebara, T.: Dynamical systems trees. In: Uncertainty in Artificial Intelligence (2004)Google Scholar
  10. 10.
    Huang, X., Weng, J., Zhang, Z.: Office presence detection using multimodal context information. In: Proc. of ICASSP 2004, Montreal, Quebec, Canada, USA (2004)Google Scholar
  11. 11.
    Oliver, N., Garg, A., Horvitz, E.: Layered representations for learning and inferring office activity from multiple sensory channels. Int. Journal on Computer Vision and Image Understanding 96(2), 227–248 (2004)Google Scholar
  12. 12.
    Oviatt, S.: Ten myths of multimodal interaction. Communications of the ACM 42(11), 74–81 (1999)CrossRefGoogle Scholar
  13. 13.
    Oviatt, S., Coulston, R., Lunsford, R.: When do we interact multimodally? Cognitive load and multimodal communication patterns. In: Proc. of ICMI 2004, Pennsylvania, USA, pp. 129–136. ACM Press, New York (2004)Google Scholar
  14. 14.
    Oviatt, S., Coulston, R., Tomko, S., Xiao, B., Lunsford, R., Wesson, M., Carmichael, L.: Toward a theory of organized multmodal integration patterns during human-computer interaction. In: Proc. of ICMI 2003, Vancouver, B.C., pp. 44–51. ACM Press, New York (2003)Google Scholar
  15. 15.
    Oviatt, S., Lunsford, R., Coulston, R.: Individual differences in multimodal integration patterns: What are they and why do they exist? In: Prof. of CHI 2005, pp. 241–249. ACM Press, New York (2005)Google Scholar
  16. 16.
    Heckerman, D.: A tutorial on learning with Bayesian networks. Learning in Graphical Modals. MIT Press, Cambridge (1999)Google Scholar
  17. 17.
    Murphy, K.: The Bayes net toolbox for matlab. Computing Science and Statistics 33 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  1. 1.Center for Human-Computer Communication, Computer Science DepartmentOregon Health and Science UniversityBeavertonUSA

Personalised recommendations