Abstract
This paper presents our data collection and first evaluations on visual focus of attention during dynamic meeting scenes. We included moving focus targets and unforeseen interruptions in each meeting, by guiding each meeting along a predefined script of events that three participating actors were instructed to follow. Further meeting attendees were not introduced to upcoming actions or the general purpose of the meeting, hence we were able to capture their natural focus changes within this predefined dynamic scenario with an extensive setup of both visual and acoustical sensors throughout our smart room. We present an adaptive approach to estimate visual focus of attention based on head orientation under these unforeseen conditions and show, that our system achieves an overall recognition rate of 59%, compared to 9% less when choosing the best matching focus target directly from the observed head orientation angles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ekenel, H., Fischer, M., Stiefelhagen, R.: Face recognition in smart rooms. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, Springer, Heidelberg (2008)
Voit, M., Nickel, K., Stiefelhagen, R.: Head pose estimation in single- and multi-view environments - Results on the CLEAR 2007 benchmarks. In: Stiefelhagen, R., Bowers, R., Fiscus, J.G. (eds.) CLEAR 2007 and RT 2007. LNCS, vol. 4625, Springer, Heidelberg (2007)
Head orientation estimation using particle filtering in multiview scenarios. In: Stiefelhagen, R., Bowers, R., Fiscus, J.G. (eds.) CLEAR 2007 and RT 2007. LNCS, vol. 4625. Springer, Heidelberg (2007)
Lanz, O., Brunelli, R.: Joint bayesian tracking of head location and pose from low-resolution video. In: Stiefelhagen, R., Bowers, R., Fiscus, J.G. (eds.) CLEAR 2007 and RT 2007. LNCS, vol. 4625. Springer, Heidelberg (2007)
Maganti, H.K., Motlicek, P., Gatica-Perez, D.: Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2007)
Maganti, H.K., Gatica-Perez, D.: Speaker localization for microphone-array-based asr: the effects of accuracy on overlapping speech. In: Proceedings of IEEE International Conference on Multimodal Interfaces (ICMI) (2006)
Bernardin, K., Stiefelhagen, R.: Audio-visual multi-person tracking and identification for smart environments. In: Proceedings of ACM Multimedia (2007)
Lanz, O., P.C., Brunelli, R.: An appearance-based particle filter for visual tracking in smart rooms. In: Stiefelhagen, R., Bowers, R., Fiscus, J.G. (eds.) CLEAR 2007 and RT 2007. LNCS, vol. 4625. Springer, Heidelberg (2007)
Chen, L., Harper, M., Franklin, A., Rose, R.T., Kimbara, I.: A multimodal analysis of floor control in meetings. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, Springer, Heidelberg (2006)
Freedman, E.G., Sparks, D.L.: Eye-head coordination during head-unrestrained gaze shifts in rhesus monkeys. Journal of Neurophysiology 77, 2328 (1997)
Wang, X., Jin, J.: A quantitative analysis for decomposing visual signal of the gaze displacement. In: Proceedings of the Pan-Sydney area workshop on Visual information processing, p. 153 (2001)
Stiefelhagen, R.: Tracking focus of attention in meetings. In: Proceedings of IEEE International Conference on Multimodal Interfaces (ICMI), p. 273 (2002)
Ba, S., Odobez, J.: Multi-party focus of attention recognition in meetings from head pose and multimodal contextual cues. In: International Conference on on Acoustics, Speech, and Signal Processing (ICASSP) (2008)
Voit, M., Stiefelhagen, R.: Tracking head pose and focus of attention with multiple far-field cameras. In: International Conference on Multimodal Interfaces (ICMI) (2006)
Ba, S., Odobez, J.: A cognitive and unsupervised map adaptation approach to the recognition of focus of attention from head pose. In: Proceedings of International Conference on Multimedia and Expo (ICME) (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Voit, M., Stiefelhagen, R. (2008). Visual Focus of Attention in Dynamic Meeting Scenarios. In: Popescu-Belis, A., Stiefelhagen, R. (eds) Machine Learning for Multimodal Interaction. MLMI 2008. Lecture Notes in Computer Science, vol 5237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85853-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-85853-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85852-2
Online ISBN: 978-3-540-85853-9
eBook Packages: Computer ScienceComputer Science (R0)