Abstract
An automatic video-conferencing system is proposed which employs acoustic source localization, video face tracking and pose estimation. The audio portion of the system provides the initial localization of the talkers and the video component tracks the talkers by utilizing source motion, contour geometry, color data, and simple facial features. Decisions involving which camera to use are based on an estimate of the head’s orientation. This head pose estimation is achieved using a very general head model which employs hairline features and a learned network classification procedure powered by Support Vector Machines. The procedure is capable of accurately evaluating head orientations over a complete 360 degree interval. By relying on a facial criterion that is easily extracted from video images acquired across a range of lighting and zooming conditions, the estimator is designed to be effective in practical situations such as those encountered in video conferencing or surveillance scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, C., Brandstein, M. S.: A Hybrid Real-Time Face Tracking System. ICASSP’98, vol.6, pp.3737–3741, Seattle, Washington, USA, May 12–15, 1998.
Wang, C., Brandstein, M. S.: Multi-Source Face Tracking with Audio and Visual Data. Proceedings of IEEE 3rd Workshop on Multimedia Signal Processing, pp.169–174, Copenhagen, Denmark, September 13-15, 1999.
Lopez, R., Huang, T. S.: Head Pose Computation for Very Low Bit-Rate Video Coding. 6th International Conference on Computer Analysis of Images and Patterns, pp.440–447, Springer-Verlag Berlin Heidelberg, 1995.
Kruger, N., Potzsch, M., Malsburg, C.: Determination of Face Position and Pose with a Learned Representation Based on Labeled Graphs. Image and Vision Computing, vol.15, no.8, pp.665–673, August, 1997.
Shimizu, I., Zhang, Z., Akamatsu, S., Deguchi, K.: Head Pose Determination from One Image Using a Generic Model. 3rd IEEE International Conference On Automatic Face and Gesture Recognition, pp.100–105, Nara, Japan, April, 1998.
Vapnik, V. N.: Statistical Learning Theory. Wiley, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, C., Brandstein, M. (2000). Head Pose Estimation for Video-Conferencing with Multiple Cameras and Microphones. In: Tan, T., Shi, Y., Gao, W. (eds) Advances in Multimodal Interfaces — ICMI 2000. ICMI 2000. Lecture Notes in Computer Science, vol 1948. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40063-X_15
Download citation
DOI: https://doi.org/10.1007/3-540-40063-X_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41180-2
Online ISBN: 978-3-540-40063-9
eBook Packages: Springer Book Archive