Recognizing Conversational Interaction Based on 3D Human Pose

Deng, Jingjing; Xie, Xianghua; Daubney, Ben; Fang, Hui; Grant, Phil W.

doi:10.1007/978-3-319-02895-8_13

Jingjing Deng²¹,
Xianghua Xie²¹,
Ben Daubney²¹,
Hui Fang²¹ &
…
Phil W. Grant²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8192))

Included in the following conference series:

International Conference on Advanced Concepts for Intelligent Vision Systems

3231 Accesses
1 Citations

Abstract

In this paper, we take a bag of visual words approach to investigate whether it is possible to distinguish conversational scenarios from observing human motion alone, in particular gestures in 3D. The conversational interactions concerned in this work have rather subtle differences among them. Unlike typical action or event recognition, each interaction in our case contain many instances of primitive motions and actions, many of which are shared among different conversation scenarios. Hence, extracting and learning temporal dynamics are essential. We adopt Kinect sensors to extract low level temporal features. These features are then generalized to form a visual vocabulary that can be further generalized to a set of topics from temporal distributions of visual vocabulary. A subject-specific supervised learning approach based on both generative and discriminative classifiers is employed to classify the testing sequences to seven different conversational scenarios. We believe this is among one of the first work that is devoted to conversational interaction classification using 3D pose features and to show this task is indeed possible.

The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-319-02895-8_64

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: A review. ACM Comput. Surv. 43(3), 16 (2011)
Article Google Scholar
Yao, A., Gall, J., Fanelli, G., Gool, L.V.: Does human action recognition benefit from pose estimation? In: Proceedings of the British Machine Vision Conference, pp. 67.1–67.11. BMVA Press (2011)
Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Fang, H., Deng, J., Xie, X., Grant, P.W.: From clamped local shape models to global shape model. In: Proceedings of the 2013 International Conference on Image Processing, ICIP (2013)
Google Scholar
Fathi, A.: Social interactions: A first-person perspective. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR 2012, pp. 1226–1233. IEEE Computer Society, Washington, DC (2012), http://dl.acm.org/citation.cfm?id=2354409.2354936
Google Scholar
Gee, A.H., Cipolla, R.: Determining the gaze of faces in images. Image and Vision Computing 12, 639–647 (1994)
Article Google Scholar
Holte, M.B., Tran, C., Trivedi, M.M., Moeslund, T.B.: Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments. IEEE Journal of Selected Topics in Signal Processing 6(5), 538–552 (2012)
Article Google Scholar
Hospedales, T., Gong, S., Xiang, T.: Video behaviour mining using a dynamic topic model. International Journal of Computer Vision, 1–21 (2012)
Google Scholar
Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 852–872 (2000)
Article Google Scholar
Kovar, L., Gleicher, M.: Automated extraction and parameterization of motions in large data sets. ACM Trans. Graph. 23(3), 559–568 (2004)
Article Google Scholar
Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding 104(2-3), 90–126 (2006)
Article Google Scholar
Müller, M., Röder, T., Clausen, M.: Efficient content-based retrieval of motion capture data. ACM Trans. Graph. 24(3), 677–685 (2005)
Article Google Scholar
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3), 299–318 (2008)
Article Google Scholar
Oliver, N., Garg, A., Horvitz, E.: Layered representations for learning and inferring office activity from multiple sensory channels. Comput. Vis. Image Underst. 96(2), 163–180 (2004), http://dx.doi.org/10.1016/j.cviu.2004.02.004
Article Google Scholar
Oliver, N., Rosario, B., Pentland, A.: A bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 831–843 (2000)
Article Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Semantic representation and recognition of continued and recursive human activities. Int. J. Comput. Vision 82(1), 1–24 (2009)
Article Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Stochastic representation and recognition of high-level group activities. Int. J. Comput. Vision 93(2), 183–200 (2011)
Article MathSciNet MATH Google Scholar
Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Techn. 18(11), 1473–1488 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Swansea University, Singleton Park, Swansea, SA2 8PP, United Kingdom
Jingjing Deng, Xianghua Xie, Ben Daubney, Hui Fang & Phil W. Grant

Authors

Jingjing Deng
View author publications
You can also search for this author in PubMed Google Scholar
Xianghua Xie
View author publications
You can also search for this author in PubMed Google Scholar
Ben Daubney
View author publications
You can also search for this author in PubMed Google Scholar
Hui Fang
View author publications
You can also search for this author in PubMed Google Scholar
Phil W. Grant
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DGA, 7-9 rue des mathurins, 92 221, Bagneux, France
Jacques Blanc-Talon
Institute of Control and Information Engineering, Poznań University of Technology, Piotrowo 3A, 60-965, Poznań, Poland
Andrzej Kasinski
Telecommunications and Information Processing (TELIN), Ghent University, St.-Pietersnieuwstraat 41, 9000, Ghent, Belgium
Wilfried Philips
CSIRO ICT Centre, Epping, Po Box 76, 1710, Sydney, NSW, Australia
Dan Popescu
University of Antwerp, Universiteitsplein 1, Building N., 2610, Wilrijk, Antwerp, Belgium
Paul Scheunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deng, J., Xie, X., Daubney, B., Fang, H., Grant, P.W. (2013). Recognizing Conversational Interaction Based on 3D Human Pose. In: Blanc-Talon, J., Kasinski, A., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2013. Lecture Notes in Computer Science, vol 8192. Springer, Cham. https://doi.org/10.1007/978-3-319-02895-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-02895-8_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02894-1
Online ISBN: 978-3-319-02895-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics