A Multimodal Reference Resolution Approach in Virtual Environment

  • Xiaowu Chen
  • Nan Xu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4270)


This paper presents a multimodal reference resolution approach in virtual environment, which is called RRVE. Based on the relationship between cognitive status and reference, RRVE divides the objects into four status hierarchies including pointing, in focus, activated, extinct, and step by step it processes multimodal reference resolution according to current status hierarchy. Also it defines a match function to compute the match probability of referring expression and potential referent, and describes the semantic signification and temporal constraints. Finally, sense shape is used to deal with the pointing ambiguity, which helps the user to interact precisely in immersive virtual environment.


Virtual Environment Greedy Algorithm Augmented Reality Match Probability Match Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Oviatt, S.: Ten myths of multimodal interaction. Ten Myths of Multimodal Interaction 42(11), 74–81 (1999)Google Scholar
  2. 2.
    Oviatt, S.: Multimodal interactive maps: Designing for human performance. Human-Computer Interaction 12, 93–129 (1997)CrossRefGoogle Scholar
  3. 3.
    Cohen, P.R., Johnston, M., McGee, D., Oviatt, S.L., Pittman, J., Smith, I., Chen, L., Clow, J.: Quickset: Multimodal interaction for distributed applications. In: 5th ACM Int. Multimedia Conf., pp. 31–40 (1997)Google Scholar
  4. 4.
    Chai, J.: Semantics-based representation for multimodal interpretation in conversational systems (coling-2002). In: The 19th International Conference on Computational Linguistics, pp. 141–147 (2002)Google Scholar
  5. 5.
    Olwal, A., Benko, H., Feiner, S.: Senseshapes: Using statistical geometry for object selection in a multimodal augmented reality. In: The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 300–301 (2003)Google Scholar
  6. 6.
    Chen, X., Xu, N., Li, Y.: A virtual environment for collaborative assembly. In: Second International Conference on Embedded Software and Systems (ICESS 2005), Xi’an, China, pp. 414–421. IEEE CS Press, Los Alamitos (2005)Google Scholar
  7. 7.
    Bolt, R.A.: “put-that-there”: Voice and gesture at the graphics interface. Computer Graphics 14(3), 262–270 (1980)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Koons, D.B., Sparrell, C.J., Thorisson, K.R.: Integrating simultaneous input from speech, gaze, and hand gestures. American Association for Artificial Intelligence, 257–276 (1993)Google Scholar
  9. 9.
    Pineda, L., Garza, G.: A model for multimodal reference resolution. Computational Linguistics 26(2), 139–193 (2000)CrossRefGoogle Scholar
  10. 10.
    Johnston, M., Bangalore, S.: Finite-state multimodal parsing and understanding. In: Proceedings of the 18th conference on Computational linguistics, Baldonado, pp. 369–375 (2000)Google Scholar
  11. 11.
    Chai, J.Y., Hong, P., Zhou, M.X.: A probabilistic approach to reference resolution in multimodal user interfaces. In: Proceedings of the 2004 International Conference on Intelligent User Interfaces (IUI 2004), Madeira, Portugal, pp. 70–77. ACM, New York (2004)CrossRefGoogle Scholar
  12. 12.
    Pfeiffer, T., Latoschik, M.E.: Resolving object references in multimodal dialogues for immersive virtual environments. In: Proceedings of the IEEE Virtual Reality 2004 (VR 2004), Chicago, USA, pp. 35–42 (2004)Google Scholar
  13. 13.
    Latoschik, M.E.: A user interface framework for multimodal vr interactions. In: Proceedings of the 7th international conference on Multimodal interfaces (ICMI 2005), Trento, Italy, pp. 76–83 (2005)Google Scholar
  14. 14.
    Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., Feiner, S.: Mutual disambiguation of 3d multimodal interaction in augmented and virtual reality. In: Proceedings of the 5th international conference on Multimodal interfaces (ICMI 2003), pp. 12–19 (2003)Google Scholar
  15. 15.
    Gundel, J.K., Hedberg, N., Zacharski, R.: Cognitive status and the form of referring expressions in discourse. Language 69(2), 274–307 (1993)CrossRefGoogle Scholar
  16. 16.
    Grice, H.P.: Logic and conversation, pp. 41–58. Academic Press, New York (1975)Google Scholar
  17. 17.
    Kehler, A.: Cognitive status and form of reference in multimodal human-computer interaction. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pp. 685–690 (2000)Google Scholar
  18. 18.
    Chai, J.Y., Prasov, Z., Blaim, J., Jin, R.: Linguistic theories in efficient multimodal reference resolution: an empirical investigation. In: Proceedings of the 10th international conference on Intelligent user interfaces, California, USA, pp. 43–50 (2005)Google Scholar
  19. 19.
    Pu, J., Dong, S.: A task-oriented and hierarchical multimodal integration model a task-oriented and hierarchical multimodal integration model and its corresponding algorithm. Journal of Computer Research and Development (in Chinese) 38(8), 966–971 (2001)Google Scholar
  20. 20.
    Chai, J.Y., Prasov, Z., Blaim, J., Jin, R.: The reality of virtual reality. In: The Reality of Virtual Reality. Proceedings of Seventh International Conference on Virtual Systems and Multimedia (VSMM 2001), pp. 43–50 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xiaowu Chen
    • 1
    • 2
  • Nan Xu
    • 1
    • 2
  1. 1.The Key Laboratory of Virtual Reality TechnologyMinistry of EducationChina
  2. 2.School of Computer Science and EngineeringBeihang UniversityBeijingP.R. China

Personalised recommendations