Skip to main content

Recognizing Conversational Interaction Based on 3D Human Pose

  • Conference paper
Advanced Concepts for Intelligent Vision Systems (ACIVS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8192))

Abstract

In this paper, we take a bag of visual words approach to investigate whether it is possible to distinguish conversational scenarios from observing human motion alone, in particular gestures in 3D. The conversational interactions concerned in this work have rather subtle differences among them. Unlike typical action or event recognition, each interaction in our case contain many instances of primitive motions and actions, many of which are shared among different conversation scenarios. Hence, extracting and learning temporal dynamics are essential. We adopt Kinect sensors to extract low level temporal features. These features are then generalized to form a visual vocabulary that can be further generalized to a set of topics from temporal distributions of visual vocabulary. A subject-specific supervised learning approach based on both generative and discriminative classifiers is employed to classify the testing sequences to seven different conversational scenarios. We believe this is among one of the first work that is devoted to conversational interaction classification using 3D pose features and to show this task is indeed possible.

The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-319-02895-8_64

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: A review. ACM Comput. Surv. 43(3), 16 (2011)

    Article  Google Scholar 

  2. Yao, A., Gall, J., Fanelli, G., Gool, L.V.: Does human action recognition benefit from pose estimation? In: Proceedings of the British Machine Vision Conference, pp. 67.1–67.11. BMVA Press (2011)

    Google Scholar 

  3. Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Fang, H., Deng, J., Xie, X., Grant, P.W.: From clamped local shape models to global shape model. In: Proceedings of the 2013 International Conference on Image Processing, ICIP (2013)

    Google Scholar 

  5. Fathi, A.: Social interactions: A first-person perspective. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR 2012, pp. 1226–1233. IEEE Computer Society, Washington, DC (2012), http://dl.acm.org/citation.cfm?id=2354409.2354936

    Google Scholar 

  6. Gee, A.H., Cipolla, R.: Determining the gaze of faces in images. Image and Vision Computing 12, 639–647 (1994)

    Article  Google Scholar 

  7. Holte, M.B., Tran, C., Trivedi, M.M., Moeslund, T.B.: Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments. IEEE Journal of Selected Topics in Signal Processing 6(5), 538–552 (2012)

    Article  Google Scholar 

  8. Hospedales, T., Gong, S., Xiang, T.: Video behaviour mining using a dynamic topic model. International Journal of Computer Vision, 1–21 (2012)

    Google Scholar 

  9. Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 852–872 (2000)

    Article  Google Scholar 

  10. Kovar, L., Gleicher, M.: Automated extraction and parameterization of motions in large data sets. ACM Trans. Graph. 23(3), 559–568 (2004)

    Article  Google Scholar 

  11. Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding 104(2-3), 90–126 (2006)

    Article  Google Scholar 

  12. Müller, M., Röder, T., Clausen, M.: Efficient content-based retrieval of motion capture data. ACM Trans. Graph. 24(3), 677–685 (2005)

    Article  Google Scholar 

  13. Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3), 299–318 (2008)

    Article  Google Scholar 

  14. Oliver, N., Garg, A., Horvitz, E.: Layered representations for learning and inferring office activity from multiple sensory channels. Comput. Vis. Image Underst. 96(2), 163–180 (2004), http://dx.doi.org/10.1016/j.cviu.2004.02.004

    Article  Google Scholar 

  15. Oliver, N., Rosario, B., Pentland, A.: A bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 831–843 (2000)

    Article  Google Scholar 

  16. Ryoo, M.S., Aggarwal, J.K.: Semantic representation and recognition of continued and recursive human activities. Int. J. Comput. Vision 82(1), 1–24 (2009)

    Article  Google Scholar 

  17. Ryoo, M.S., Aggarwal, J.K.: Stochastic representation and recognition of high-level group activities. Int. J. Comput. Vision 93(2), 183–200 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  18. Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Techn. 18(11), 1473–1488 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Deng, J., Xie, X., Daubney, B., Fang, H., Grant, P.W. (2013). Recognizing Conversational Interaction Based on 3D Human Pose. In: Blanc-Talon, J., Kasinski, A., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2013. Lecture Notes in Computer Science, vol 8192. Springer, Cham. https://doi.org/10.1007/978-3-319-02895-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02895-8_13

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02894-1

  • Online ISBN: 978-3-319-02895-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics