Advertisement

Multimedia Tools and Applications

, Volume 78, Issue 2, pp 1971–1998 | Cite as

Action recognition by fusing depth video and skeletal data information

  • Ioannis KapsourasEmail author
  • Nikos Nikolaidis
Article
  • 54 Downloads

Abstract

Two action recognition approaches that utilize depth videos and skeletal information are proposed in this paper. Dense trajectories are used to represent the depth video data. Skeletal data are represented by vectors of skeleton joints positions and their forward differences in various temporal scales. The extracted features are encoded using either Bag of Words (BoW) or Vector of Locally Aggregated Descriptors (VLAD) approaches. Finally, a Support Vector Machine (SVM) is used for classification. Experiments were performed on three datasets, namely MSR Action3D, MSR Action Pairs and Florence3D in order to measure the performance of the methods. The proposed approaches outperform all state of the art action recognition methods that operate on depth video/skeletal data in the most challenging and fair experimental setup of the MSR Action3D dataset. Moreover, they achieve 100% correct recognition in the MSR Action Pairs dataset and the highest classification rate among all compared methods on the Florence3D dataset.

Keywords

Kinect Bag of Words Vector of Locally Aggregated Descriptors Action recognition Fusion Depth video Motion capture data MSR Action3D 

Notes

References

  1. 1.
    Aggarwal J, Ryoo M (2011) Human activity analysis: a review. ACM Comput Surv 43(3):16:1–16:43CrossRefGoogle Scholar
  2. 2.
    Amor BB, Su J, Srivastava A (2016) Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans Pattern Anal Mach Intell 38(1):1–13CrossRefGoogle Scholar
  3. 3.
    Anirudh R, Turaga P, Su J, Srivastava A (2015) Elastic functional coding of human actions: from vector-fields to latent variables. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3147–3155Google Scholar
  4. 4.
    Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vis Image Underst 117 (6):633–659CrossRefGoogle Scholar
  5. 5.
    Chen C, Jafari R, Kehtarnavaz N (2015) Action recognition from depth sequences using depth motion maps-based local binary patterns. In: Proceedings of 2015 IEEE winter conference on applications of computer vision (WACV), pp 1092–1099Google Scholar
  6. 6.
    Chen H, Wang G, Xue JH, He L (2016) A novel hierarchical framework for human action recognition. Pattern RecognitGoogle Scholar
  7. 7.
    Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Proceedings of workshop on statistical learning in computer vision (ECCV ’04), pp 1–22Google Scholar
  8. 8.
    Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: Proceedings of the 9th European conference on computer vision—volume part II, ECCV’06. Springer, Berlin, pp 428–441Google Scholar
  9. 9.
    Deng L, Leung H, Gu N, Yang Y (2012) Generalized model-based human motion recognition with body partition index maps. Comput Graphics Forum 31(1):202–215CrossRefGoogle Scholar
  10. 10.
    Eweiwi A, Cheema MS, Bauckhage C, Gall J (2014) Efficient pose-based action recognition. In: Cremers D, Reid I, Saito H, Yang MH (eds) Proceedings of the Asian conference on computer vision (ACCV 14). Springer International PublishingGoogle Scholar
  11. 11.
    Gowayyed MA, Torki M, Hussein ME, El-Saban M (2013) Histogram of oriented displacements (hod): describing trajectories of human joints for action recognition. In: Proceedings of the twenty-third international joint conference on artificial intelligence, IJCAI ’13. AAAI Press, pp 1351–1357Google Scholar
  12. 12.
    Han L, Wu X, Liang W, Hou G, Jia Y (2010) Discriminative human action recognition in the learned hierarchical manifold space. Image Vis Comput 28 (5):836–849CrossRefGoogle Scholar
  13. 13.
    Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3D skeletal data. Comput Vis Image Underst 158(C):85–105CrossRefGoogle Scholar
  14. 14.
    Holte MB, Tran C, Trivedi MM, Moeslund TB (2011) Human action recognition using multiple views: A comparative perspective on recent developments. In: Proceedings of the 2011 joint ACM workshop on human gesture and behavior understanding, J-HGBU ’11. ACM, New York, pp 47–52Google Scholar
  15. 15.
    Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In: Proceedings of the twenty-third international joint conference on artificial intelligence, IJCAI ’13. AAAI Press, pp 2466– 2472Google Scholar
  16. 16.
    Iosifidis A, Tefas A, Nikolaidis N, Pitas I (2012) Multi-view human movement recognition based on fuzzy distances and linear discriminant analysis. Comput Vis Image Underst 116(3):347–360. Special issue on Semantic Understanding of Human Behaviors in Image SequencesCrossRefGoogle Scholar
  17. 17.
    Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 3304–3311Google Scholar
  18. 18.
    Kadu H, Kuo M, Kuo CCJ (2011) Human motion classification and management based on mocap data analysis. In: Proceedings of the 2011 joint ACM workshop on human gesture and behaviour understanding. ACM, New York, pp 73–74Google Scholar
  19. 19.
    Kapsouras I, Nikolaidis N (2014) Action recognition on motion capture data using a dynemes and forward differences representation. J Vis Commun Image Represent 25 (6):1432–1445CrossRefGoogle Scholar
  20. 20.
    Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Proceedings of 2010 IEEE computer society conference on computer vision and pattern recognition workshops , pp 9–14Google Scholar
  21. 21.
    Luo J, Wang W, Qi H (2013) Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In: Proceedings of the 2013 IEEE international conference on computer vision (ICCV), pp 1809–1816Google Scholar
  22. 22.
    Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2014) Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J Vis Commun Image Represent 25(1):24– 38CrossRefGoogle Scholar
  23. 23.
    Ohn-Bar E, Trivedi MM (2013) Joint angles similiarities and HOG2 for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops: human activity understanding from 3D data, CVPR ’13. IEEE PressGoogle Scholar
  24. 24.
    Oreifej O, Liu Z (2013) Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences. In: Proceedings of the 2013 IEEE conference on computer vision and pattern recognition, CVPR ’13. IEEE Computer Society, Washington, DC, pp 716–723Google Scholar
  25. 25.
    Rahmani H, Mahmood A, Huynh D, Mian A (2014) Real time action recognition using histograms of depth gradients and random decision forests. In: Proceedings of the 2014 IEEE winter conference on applications of computer vision (WACV), pp 626–633Google Scholar
  26. 26.
    Rahmani H, Mahmood A, Q Huynh D, Mian A (2014) HOPC: histogram of oriented principal components of 3D pointclouds for action recognition. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Proceedings of the 13th European conference on computer vision (ECCV 14), Zurich, Switzerland, September 6–12, 2014, Proceedings, Part II. Springer International Publishing, Cham, pp 742–757Google Scholar
  27. 27.
    Raptis M, Kirovski D, Hoppe H (2011) Real-time classification of dance gestures from skeleton animation. In: Proceedings of the 2011 ACM SIGGRAPH/Eurographics symposium on computer animation. ACM, New York, pp 147–156Google Scholar
  28. 28.
    Seidenari L, Varano V, Berretti S, Bimbo AD, Pala P (2013) Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: IEEE conference on computer vision and pattern recognition workshops, pp 479–485Google Scholar
  29. 29.
    Shahroudy A, Ng TT, Yang Q, Wang G (2016) Multimodal multipart learning for action recognition in depth videos. IEEE Trans Pattern Anal Mach Intell 38(10):2123–2129CrossRefGoogle Scholar
  30. 30.
    Shariat S, Pavlovic V (2011) Isotonic cca for sequence alignment and activity recognition. In: Proceedings of the international conference on computer visionGoogle Scholar
  31. 31.
    Shi J, Tomasi C (1994) Good features to track. In: Proceedings of the 1994 IEEE computer society conference on computer vision and pattern recognition, 1994 (CVPR ’94), pp 593–600Google Scholar
  32. 32.
    Turaga P, Chellappa R (2009) Locally time-invariant models of human activities using trajectories on the grassmannian. In: IEEE conference on computer vision and pattern recognition, pp 2435– 2441Google Scholar
  33. 33.
    Veeriah V, Zhuang N, Qi G (2015) Differential recurrent neural networks for action recognition. In: IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pp 4041–4049Google Scholar
  34. 34.
    Vemulapalli R, Chellappa R (2016) Rolling rotations for recognizing human actions from 3d skeletal data. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 4471–4479Google Scholar
  35. 35.
    Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: IEEE conference on computer vision and pattern recognition, pp 588– 595Google Scholar
  36. 36.
    Vieira AW, Nascimento ER, Oliveira GL, Liu Z, Campos MF (2014) On the improvement of human action recognition from depth map sequences using spacetime occupancy patterns. Pattern Recognit Lett 36:221–227CrossRefGoogle Scholar
  37. 37.
    Vishwakarma S, Agrawal A (2013) A survey on activity recognition and behavior understanding in video surveillance. Vis Comput 29(10):983–1009CrossRefGoogle Scholar
  38. 38.
    Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision. SydneyGoogle Scholar
  39. 39.
    Wang J, Wu Y (2013) Learning maximum margin temporal warping for action recognition. In: Proceedings of the 2013 IEEE international conference on computer vision (ICCV), pp 2688–2695Google Scholar
  40. 40.
    Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3d action recognition with random occupancy patterns. In: Proceedings of the 12th European conference on computer vision—volume part II, ECCV’12. Springer, Berlin, pp 872–885Google Scholar
  41. 41.
    Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: Proceedings of the 2012 IEEE conference on computer vision and pattern recognition, pp 1290–1297Google Scholar
  42. 42.
    Wang P, Li W, Gao Z, Tang C, Zhang J, Ogunbona P (2015) Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In: Proceedings of the 23rd ACM international conference on multimedia, MM ’15. ACM, New York, pp 1119–1122Google Scholar
  43. 43.
    Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: Proceedings of the CVPR workshops. IEEE, pp 20–27Google Scholar
  44. 44.
    Yang X, Tian Y (2012) Eigenjoints-based action recognition using naïve-bayes-nearest-neighbor. In: CVPR workshops. IEEE, pp 14–19Google Scholar
  45. 45.
    Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE MultiMed 19(2):4–10CrossRefGoogle Scholar
  46. 46.
    Zhang J, Marszałek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73:213–238CrossRefGoogle Scholar
  47. 47.
    Zhu Y, Chen W, Guo G (2013) Fusing spatiotemporal features and joints for 3d action recognition. In: Proceedings of the 2013 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 486–491Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of InformaticsAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations