Combining Skeletal Pose with Local Motion for Human Activity Recognition

  • Ran Xu
  • Priyanshu Agarwal
  • Suren Kumar
  • Venkat N. Krovi
  • Jason J. Corso
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7378)


Recent work in human activity recognition has focused on bottom-up approaches that rely on spatiotemporal features, both dense and sparse. In contrast, articulated motion, which naturally incorporates explicit human action information, has not been heavily studied; a fact likely due to the inherent challenge in modeling and inferring articulated human motion from video. However, recent developments in data-driven human pose estimation have made it plausible. In this paper, we extend these developments with a new middle-level representation called dynamic pose that couples the local motion information directly and independently with human skeletal pose, and present an appropriate distance function on the dynamic poses. We demonstrate the representative power of dynamic pose over raw skeletal pose in an activity recognition setting, using simple codebook matching and support vector machines as the classifier. Our results conclusively demonstrate that dynamic pose is a more powerful representation of human action than skeletal pose.


Human Pose Activity Recognition Dynamic Pose 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Laptev, I.: On space-time interest points. In: IJCV (2005)Google Scholar
  2. 2.
    Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC (2008)Google Scholar
  3. 3.
    Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176 (2011)Google Scholar
  4. 4.
    Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach: A spatio-temporal maximum average correlation height filter for action recognition. In: CVPR (2008)Google Scholar
  5. 5.
    Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Gaidon, A., Harchaoui, Z., Schmid, C.: A time series kernel for action recognition. In: BMVC (2011)Google Scholar
  7. 7.
    Ali, S., Basharat, A., Shah, M.: Chaotic invariants for human action recognition. In: ICCV (2007)Google Scholar
  8. 8.
    Ramanan, D., Forsyth, D.A.: Automatic annotation of everyday movements. In: NIPS (2003)Google Scholar
  9. 9.
    Shakhnarovich, G., Viola, P., Darrell, T.: Fast Pose Estimation with Parameter-Sensitive Hashing. In: ICCV (2003)Google Scholar
  10. 10.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)Google Scholar
  11. 11.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. TPAMI 32, 1627–1645 (2010)CrossRefGoogle Scholar
  12. 12.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: ICCV (2009)Google Scholar
  13. 13.
    Yao, A., Gall, J., Fanelli, G., Gool, L.V.: Does human action recognition benefit from pose estimation? In: BMVC (2011)Google Scholar
  14. 14.
    Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)Google Scholar
  15. 15.
    Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. TPAMI 29(12), 2247–2253 (2007)CrossRefGoogle Scholar
  16. 16.
    Essa, I., Pentland, A.: Coding, analysis, interpretation and recognition of facial expressions. TPAMI 19(7), 757–763 (1997)CrossRefGoogle Scholar
  17. 17.
    Derpanis, K.G., Sizintsev, M., Cannons, K., Wildes, R.P.: Efficient action spotting based on a spacetime oriented structure representation. In: CVPR (2010)Google Scholar
  18. 18.
    Tran, K.N., Kakadiaris, I.A., Shah, S.K.: Modeling motion of body parts for action recognition. In: BMVC (2011)Google Scholar
  19. 19.
    Brendel, W., Todorovic, S.: Activities as Time Series of Human Postures. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 721–734. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  20. 20.
    Lin, Z., Jiang, Z., Davis, L.S.: Recognizing actions by shape-motion prototype trees. In: ICCV (2009)Google Scholar
  21. 21.
    Maji, S., Berg, A.C., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: CVPR (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ran Xu
    • 1
  • Priyanshu Agarwal
    • 2
  • Suren Kumar
    • 2
  • Venkat N. Krovi
    • 2
  • Jason J. Corso
    • 1
  1. 1.Computer Science and EngineeringState University of New York at BuffaloUSA
  2. 2.Mechanical and Aerospace EngineeringState University of New York at BuffaloUSA

Personalised recommendations