Abstract
The automatic detection, tracking, and identification of multiple people in intelligent environments is an important building block on which smart interaction systems can be designed. Those could be, e.g. gesture recognizers, head pose estimators or far field speech recognizers and dialog systems.
In this paper, we present a system which is capable of tracking multiple people in a smartroom environment while infering their identities in a completely automatic and unobtrusive way. It relies on a set of fixed and active cameras to track the users and get closeups of their faces for identification, and on several microphone arrays to determine active speakers and steer the attention of the system. Information coming asynchronously from several sources, such as position updates from audio or visual trackers and identification events from identification modules, is fused at higher level to gradually refine the room’s situation model. The system has been trained on a small set of users and showed good performance at acquiring and keeping their identities in a smart room environment.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Rania Y. Khalaf and Stephen S. Intille, “Improving Multiple People Tracking using Temporal Consistency”, Massachusetts Institute of Technology, Cambridge, MA, MIT Dept. of Architecture House_n Project Technical Report, 2001.
Rainer Lienhart and Jochen Maydt, “An Extended Set of Haar-like Features for Rapid Object Detection”. IEEE ICIP 2002, Vol. 1, pp. 900–903, Sep. 2002.
Paul Viola and Michael Jones, “Rapid Object Detection using a Boosted Cas cade of Simple Features”. Accepted Conference On Computer Vision And Pat tern Recognition, 2001.
Iain McCowan, Daniel Gatica-Perez, Samy Bengio, Guillaume Lathoud, Mark Barnard, Dong Zhang, “Automatic Analysis of Multimodal Group Actions in Meetings”. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 305–317, March, 2005.
R. Stiefelhagen, “Tracking Focus of Attention in Meetings”. IEEE International Conference on Multimodal Interfaces — ICMI 2002, pp. 273–280, Pittsburgh, 2002.
M. Voit, K. Nickel, R. Stiefelhagen, “Multi-view Head Pose Estimation using Neural Networks”. Second Workshop on Face Processing in Video (FPiV’05), in association with IEEE Second Canadian Conference on Computer and Robot Vision (CRV 2005), 9–11 May 2005, Victoria, BC, Canada.
Tanzeem Choudhury, Brian Clarkson, Tony Jebara and Alex Pentland, “Mul timodal Person Recognition using Unconstrained Audio and Video”. Second Conference on Audio-and Video-based Biometric Person Authentication’ 99 (AVBPA’ 99), pages 176–181, Washington DC
Jie Yang, Xiaojin Zhu, Ralph Gross, John Kominek, Yue Pan, Alex Waibel, “Multimodal people ID for a multimedia meeting browser”. Proceedings of the 7th ACM International Conference on Multimedia’ 99, Orlando, FL
Shinji Tsuruoka, Toru Yamaguchi, Kenji Kato, Tomohiro Yoshikawa, Tsuyoshi Shinogi, “A Camera Control Based Fuzzy Behaviour Recognition of Lecturer for Distance Lecture”. Proceedings of the 10th IEEE International Conference on Fuzzy Systems, December 2001, Melbourne, Australia.
P. Peixoto, J. Batista, H. Araujo, “A surveillance system combining peripheral and foveated motion tracking”. Proceedings of the Fourteenth International Conference on Pattern Recognition. Volume 1, 16–20 Aug. 1998 Page(s):574–577 vol.1
Arun Hampapur, Sharath Pankanti, Andrew W. Senior, Ying-li Tian, Lisa Brown, Ruud M. Bolle, “Face Cataloger: Multi-Scale Imaging for Relating Identity to Location”. IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS 2003), July 2003, Miami, FL.
S. Stillman, R. Tanawongsuwan, and I. Essa, “A system for tracking and recognizing multiple people with multiple cameras”. Technical Report GIT-GVU-98-25, Georgia Institute of Technology, Graphics, Visualization, and Usability Center, 1998.
T. Gehrig, K. Nickel, H. K. Ekenel, U. Klee, and J. McDonough, “Kalman Filters for Audio-Video Source Localization”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 2005.
H.K. Ekenel, R. Stiefelhagen, “Local Appearance based Face Recognition Using Discrete Cosine Transform”. 13th European Signal Processing Conference (EUSDPCO), Antalya Turkey, September 2005.
H. K.Ekenel, R. Stiefelhagen, “A Generic Face Representation Approach for Local Appearance based Face Verification”. CVPR IEEE Workshop on Face Recognition Grand Challenge Experiments, San Diego, CA, USA, June 2005.
CHIL — Computers In the Human Interaction Loop, http://chU.server.de
AMI — Augmented Multiparty Interaction, http://www.amiproject.org
OpenCV — Open Computer Vision Library, http://sourceforge.net/projects/opencvlibrary
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 International Federation for Information Processing
About this paper
Cite this paper
Bernardin, K., Ekenel, H.K., Stiefelhagen, R. (2006). Multimodal Identity Tracking in a Smartroom. In: Maglogiannis, I., Karpouzis, K., Bramer, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2006. IFIP International Federation for Information Processing, vol 204. Springer, Boston, MA . https://doi.org/10.1007/0-387-34224-9_37
Download citation
DOI: https://doi.org/10.1007/0-387-34224-9_37
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34223-8
Online ISBN: 978-0-387-34224-5
eBook Packages: Computer ScienceComputer Science (R0)