Abstract
Although large displays could allow several users to work together and to move freely in a room, their associated interfaces are limited to contact devices that must generally be shared. This paper describes a novel interface called SHIVA (Several-Humans Interface with Vision and Audio) allowing several users to interact remotely with a very large display using both speech and gesture. The head and both hands of two users are tracked in real time by a stereo vision based system. From the body parts position, the direction pointed by each user is computed and selection gestures done with the second hand are recognized. Pointing gesture is fused with n-best results from speech recognition taking into account the application context. The system is tested on a chess game with two users playing on a very large display.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bellik, Y.: Technical Requirements for a Successful Multimodal Interaction. In: International Workshop on Information Presentation and Natural Multimodal Dialogue, Verona, Italy (2001)
Bolt, R.A.: Put-that-there: Voice and gesture at the graphics interface. In: Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques, Seattle, Washington, pp. 262–270 (1980)
Carbini, S., Viallet, J.E., Bernier, O.: Pointing Gesture Visual Recognition for Large Display. In: Pointing 2004 ICPR Workshop, Cambridge (2004)
Chang, T.H., Gong, S.: Tracking multiple people with a multi-camera system. In: IEEE Workshop on Multi-Object Tracking, Vancouver, Canada, pp. 19–26 (2001)
Checka, N., Wilson, K., Siracusa, M., Darrell, T.: Multiple Person and Speaker Activity Tracking with a Particle Filter. In: ICASSP, Montreal, Canada (2004)
Demirdjian, D., Darrell, T.: 3-D Articulated Pose Tracking for Untethered Diectic Reference. In: Proceedings of International Conference on Multimodal Interfaces, Pittsburgh, Pennsylvania, p. 267 (2002)
Eisenstein, J., Christoudias, C.M.: A Salience-Based Approach to Gesture- Speech Alignement, HLT-NAACL, pp. 25-32, Boston, Massachusetts (2004)
Feraud, R., Bernier, O., Viallet, J.E., Collobert, M.: A fast and accurate face detector based on neural networks. PAMI 23 (1), 42–53 (2001)
Jojic, N., Brumitt, B., Meyers, B., Harris, S.: Detecting and Estimating Pointing Gestures in Dense Disparity Maps. In: IEEE International Conference on Face and Gesture recognition, Grenoble, France, p. 468 (2000)
Kehl, R., Van Gool, L.: Real-time Pointing Gesture Recognition for an Immersive Environment. In: IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, pp. 577–582 (2004)
Kettebekov, S., Sharma, R.: Understanding Gestures in a Multimodal Human Computer Interaction. International Journal of Artificial Intelligence Tools 9 (2), 205–223 (2000)
Krahnstoever, N., Kettebekov, S., Yeasin, M., Sharma, R.: A Real-Time Framework for Natural Multimodal Interaction with Large Screen Displays. In: International Conference on Multimodal Interfaces, Pittsburgh, Pennsylvania, p. 349 (2002)
Nickel, K., Seemann, E., Stiefelhagen, R.: 3D-Tracking of Head and Hands for Pointing Gesture Recognition in a Human-Robot Interaction Scenario. In: IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, p. 565 (2004)
Oviatt, S., De Angeli, A., Kuhn, K.: Integration and synchronization of input modes during multimodal human-computer interaction. In: CHI 1997: Proceedings of the SIGCHI Conference on Human factors in computing systems, Atlanta, Georgia, pp. 415–422 (1997)
Polat, E., Yeasin, M., Sharma, R.: A Tracking Framework for Collaborative Human Computer Interaction. In: International Conference on Multimodal Interfaces, Pittsburgh, Pennsylvania, pp. 27–32 (2002)
Sato, K., Aggarwal, J.K.: Tracking and recognizing two-person interactions in outdoor image sequences. In: Workshop on Multi-Object Tracking, Vancouver, Canada, pp. 87–94 (2001)
Yamamoto, Y., Yoda, I., Sakaue, K.: Arm-Pointing Gesture Interface Using Surrounded Stereo Cameras System. In: ICPR (International Conference on Pattern Recognition), Cambridge, UK, pp. 965–970 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carbini, S., Viallet, JE., Bernier, O., Bascle, B. (2005). Tracking Body Parts of Multiple People for Multi-person Multimodal Interface. In: Sebe, N., Lew, M., Huang, T.S. (eds) Computer Vision in Human-Computer Interaction. HCI 2005. Lecture Notes in Computer Science, vol 3766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573425_2
Download citation
DOI: https://doi.org/10.1007/11573425_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29620-1
Online ISBN: 978-3-540-32129-3
eBook Packages: Computer ScienceComputer Science (R0)