Tracking Body Parts of Multiple People for Multi-person Multimodal Interface

Carbini, Sébastien; Viallet, Jean-Emmanuel; Bernier, Olivier; Bascle, Bénédicte

doi:10.1007/11573425_2

Sébastien Carbini¹⁹,
Jean-Emmanuel Viallet¹⁹,
Olivier Bernier¹⁹ &
…
Bénédicte Bascle¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3766))

Included in the following conference series:

International Workshop on Human-Computer Interaction

1229 Accesses
3 Citations

Abstract

Although large displays could allow several users to work together and to move freely in a room, their associated interfaces are limited to contact devices that must generally be shared. This paper describes a novel interface called SHIVA (Several-Humans Interface with Vision and Audio) allowing several users to interact remotely with a very large display using both speech and gesture. The head and both hands of two users are tracked in real time by a stereo vision based system. From the body parts position, the direction pointed by each user is computed and selection gestures done with the second hand are recognized. Pointing gesture is fused with n-best results from speech recognition taking into account the application context. The system is tested on a chess game with two users playing on a very large display.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bellik, Y.: Technical Requirements for a Successful Multimodal Interaction. In: International Workshop on Information Presentation and Natural Multimodal Dialogue, Verona, Italy (2001)
Google Scholar
Bolt, R.A.: Put-that-there: Voice and gesture at the graphics interface. In: Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques, Seattle, Washington, pp. 262–270 (1980)
Google Scholar
Carbini, S., Viallet, J.E., Bernier, O.: Pointing Gesture Visual Recognition for Large Display. In: Pointing 2004 ICPR Workshop, Cambridge (2004)
Google Scholar
Chang, T.H., Gong, S.: Tracking multiple people with a multi-camera system. In: IEEE Workshop on Multi-Object Tracking, Vancouver, Canada, pp. 19–26 (2001)
Google Scholar
Checka, N., Wilson, K., Siracusa, M., Darrell, T.: Multiple Person and Speaker Activity Tracking with a Particle Filter. In: ICASSP, Montreal, Canada (2004)
Google Scholar
Demirdjian, D., Darrell, T.: 3-D Articulated Pose Tracking for Untethered Diectic Reference. In: Proceedings of International Conference on Multimodal Interfaces, Pittsburgh, Pennsylvania, p. 267 (2002)
Google Scholar
Eisenstein, J., Christoudias, C.M.: A Salience-Based Approach to Gesture- Speech Alignement, HLT-NAACL, pp. 25-32, Boston, Massachusetts (2004)
Google Scholar
Feraud, R., Bernier, O., Viallet, J.E., Collobert, M.: A fast and accurate face detector based on neural networks. PAMI 23 (1), 42–53 (2001)
Google Scholar
Jojic, N., Brumitt, B., Meyers, B., Harris, S.: Detecting and Estimating Pointing Gestures in Dense Disparity Maps. In: IEEE International Conference on Face and Gesture recognition, Grenoble, France, p. 468 (2000)
Google Scholar
Kehl, R., Van Gool, L.: Real-time Pointing Gesture Recognition for an Immersive Environment. In: IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, pp. 577–582 (2004)
Google Scholar
Kettebekov, S., Sharma, R.: Understanding Gestures in a Multimodal Human Computer Interaction. International Journal of Artificial Intelligence Tools 9 (2), 205–223 (2000)
Article Google Scholar
Krahnstoever, N., Kettebekov, S., Yeasin, M., Sharma, R.: A Real-Time Framework for Natural Multimodal Interaction with Large Screen Displays. In: International Conference on Multimodal Interfaces, Pittsburgh, Pennsylvania, p. 349 (2002)
Google Scholar
Nickel, K., Seemann, E., Stiefelhagen, R.: 3D-Tracking of Head and Hands for Pointing Gesture Recognition in a Human-Robot Interaction Scenario. In: IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, p. 565 (2004)
Google Scholar
Oviatt, S., De Angeli, A., Kuhn, K.: Integration and synchronization of input modes during multimodal human-computer interaction. In: CHI 1997: Proceedings of the SIGCHI Conference on Human factors in computing systems, Atlanta, Georgia, pp. 415–422 (1997)
Google Scholar
Polat, E., Yeasin, M., Sharma, R.: A Tracking Framework for Collaborative Human Computer Interaction. In: International Conference on Multimodal Interfaces, Pittsburgh, Pennsylvania, pp. 27–32 (2002)
Google Scholar
Sato, K., Aggarwal, J.K.: Tracking and recognizing two-person interactions in outdoor image sequences. In: Workshop on Multi-Object Tracking, Vancouver, Canada, pp. 87–94 (2001)
Google Scholar
Yamamoto, Y., Yoda, I., Sakaue, K.: Arm-Pointing Gesture Interface Using Surrounded Stereo Cameras System. In: ICPR (International Conference on Pattern Recognition), Cambridge, UK, pp. 965–970 (2004)
Google Scholar
Xboard, http://www.tim-mann.org/xboard.html

Download references

Author information

Authors and Affiliations

France Télécom R&D, 2 av Pierre Marzin, 22307, Lannion Cedex, France
Sébastien Carbini, Jean-Emmanuel Viallet, Olivier Bernier & Bénédicte Bascle

Authors

Sébastien Carbini
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Emmanuel Viallet
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Bernier
View author publications
You can also search for this author in PubMed Google Scholar
Bénédicte Bascle
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intelligent Systems Lab Amsterdam, University of Amsterdam, The Netherlands
Nicu Sebe
LIACS Media Lab, Leiden University,
Michael Lew
Beckman Institute, University of Illinois at Urbana-Champaign, USA
Thomas S. Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carbini, S., Viallet, JE., Bernier, O., Bascle, B. (2005). Tracking Body Parts of Multiple People for Multi-person Multimodal Interface. In: Sebe, N., Lew, M., Huang, T.S. (eds) Computer Vision in Human-Computer Interaction. HCI 2005. Lecture Notes in Computer Science, vol 3766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573425_2

Download citation

DOI: https://doi.org/10.1007/11573425_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29620-1
Online ISBN: 978-3-540-32129-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics