Neural Network-Based Head Pose Estimation and Multi-view Fusion
In this paper, we present two systems that were used for head pose estimation during the CLEAR06 Evaluation. We participated in two tasks: (1) estimating both pan and tilt orientation on synthetic, high resolution head captures, (2) estimating horizontal head orientation only on real seminar recordings that were captured with multiple cameras from different viewing angles. In both systems, we used a neural network to estimate the persons’ head orientation. In case of seminar recordings, a Bayes filter framework is further used to provide a statistical fusion scheme, integrating every camera view into one joint hypothesis. We achieved a mean error of 12.3° on horizontal head orientation estimation, in the monocular, high resolution task. Vertical orientation performed with 12.77° mean error. In case of the multi-view seminar recordings, our system could correctly identify head orientation in 34.9% (one of eight classes). If neighbouring classes were allowed, even 72.9% of the frames were correctly classified.
KeywordsCamera View Head Orientation Joint Hypothesis Tilt Orientation Head Pose Estimation
Unable to display preview. Download preview PDF.
- 1.Pointing’04 icpr workshop. http://www-prima.inrialpes.fr/pointing04/
- 2.Ba, S.O., Obodez, J.-M.: A probabilistic framework for joint head tracking and pose estimation. In: Proceedings of the 17th International Conference on Pattern Recognition (2004)Google Scholar
- 3.Gee, A.H., Cipolla, R.: Non-intrusive gaze tracking for human-computer interaction. In: Proceedings of Mechatronics and Machine Vision in Practise, pp. 112–117 (1994)Google Scholar
- 4.Horprasert, T., Yacoob, Y., Davis, L.S.: Computing 3-d head orientation from a monocular image sequence. In: Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (1996)Google Scholar
- 6.Stiefelhagen, R., Yang, J., Waibel, A.: Simultaneous tracking of head poses in a panoramic view. In: International Conference on Pattern Recognition (2000)Google Scholar
- 7.Voit, M., Nickel, K., Stiefelhagen, R.: Multi-view head pose estimation using neural networks. In: Second Workshop on Face Processing in Video (FPiV’05), in Proceedings of Second Canadian Conference on Computer and Robot Vision (CRV’05), Victoria, BC, Canada, 9-11 May, pp. 9–11 (2005)Google Scholar