Continuous Gesture Recognition from Articulated Poses

  • Georgios D. EvangelidisEmail author
  • Gurkirt Singh
  • Radu Horaud
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8925)


This paper addresses the problem of continuous gesture recognition from articulated poses. Unlike the common isolated recognition scenario, the gesture boundaries are here unknown, and one has to solve two problems: segmentation and recognition. This is cast into a labeling framework, namely every site (frame) must be assigned a label (gesture ID). The inherent constraint for a piece-wise constant labeling is satisfied by solving a global optimization problem with a smoothness term. For efficiency reasons, we suggest a dynamic programming (DP) solver that seeks the optimal path in a recursive manner. To quantify the consistency between the labels and the observations, we build on a recent method that encodes sequences of articulated poses into Fisher vectors using short skeletal descriptors. A sliding window allows to frame-wise build such Fisher vectors that are then classified by a multi-class SVM, whereby each label is assigned to each frame at some cost. The evaluation in the ChalearnLAP-2014 challenge shows that the method outperforms other participants that rely only on skeleton data. We also show that the proposed method competes with the top-ranking methods when colour and skeleton features are jointly used.


Gaussian Mixture Model Action Recognition Gesture Recognition Convolutional Neural Network Jaccard Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Camgoz, N.C., Kindiroglu, A.A., Akarun, L.: Gesture recognition using template based random forest classifiers. In: ECCV Workshops (2014)Google Scholar
  2. 2.
    Chang, J.Y.: Nonparametric gesture labeling from multi-modal data. In: ECCV Workshops (2014)Google Scholar
  3. 3.
    Chaudhry, R., Ofli, F., Kurillo, G., Bajcsy, R., Vidal, R.: Bio-inspired dynamic 3d discriminative skeletal features for human action recognition. In: CVPR Workshops (CVPRW) (2013)Google Scholar
  4. 4.
    Chen, G., Clarke, D., Weikersdorfer, D., Giuliani, M., Gaschler, A., Knoll, A.: Multi-modality gesture detection and recognition with un-supervision, randomization and discrimination. In: ECCV Workshops (2014)Google Scholar
  5. 5.
    Escalera, S., Bar, X., Gonzlez, J., Bautista, M.A., Madadi, M., Reyes, M., Ponce, V., Escalante, H.J., Shotton, J., Guyon, I.: Chalearn looking at people challenge 2014: Dataset and results. In: ECCV Workshops (2014)Google Scholar
  6. 6.
    Evangelidis, G., Bauckhage, C.: Efficient subframe video alignment using short descriptors. IEEE T PAMI 35, 2371–2386 (2013)CrossRefGoogle Scholar
  7. 7.
    Evangelidis, G., Singh, G., Horaud, R., et al.: Skeletal quads: Human action recognition using joint quadruples. In: ICPR (2014)Google Scholar
  8. 8.
    Evangelidis, G.D., Bauckhage, C.: Efficient and robust alignment of unsynchronized video sequences. In: Mester, R., Felsberg, M. (eds.) DAGM 2011. LNCS, vol. 6835, pp. 286–295. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  9. 9.
    Hoai, M., Lan, Z.Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: CVPR (2011)Google Scholar
  10. 10.
    Jaakola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NIPS (1999)Google Scholar
  11. 11.
    Kulkarni, K., Evangelidis, G., Cech, J., Horaud, R.: Continuous action recognition based on sequence alignment. IJCV (2014) (preprint)Google Scholar
  12. 12.
    Lang, D., Hogg, D.W., Mierle, K., Blanton, M., Roweis, S.: Blind astrometric calibration of arbitrary astronomical images. The Astronomical Journal 137, 1782–2800 (2010)CrossRefGoogle Scholar
  13. 13.
    Liang, B., Zheng, L.: Multi-modal gesture recognition using skeletal joints and motion trail model. In: ECCV Workshops (2014)Google Scholar
  14. 14.
    Lv, F., Nevatia, R.: Recognition and segmentation of 3-D human action using HMM and multi-class adaboost. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 359–372. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  15. 15.
    Monnier, C., German, S., Ost, A.: A multi-scale boosted detector for efficient and robust gesture recognition. In: ECCV Workshops (2014)Google Scholar
  16. 16.
    Neverova, N., Wolf, C., Taylor, G.W., Nebout, F.: Multi-scale deep learning for gesture detection and localization. In: ECCV Workshops (2014)Google Scholar
  17. 17.
    Ohn-Bar, E., Trivedi, M.M.: Joint angles similiarities and hog\(^2\) for action recognition. In: Computer Vision and Pattern Recognition Workshops (CVPRW) (2013)Google Scholar
  18. 18.
    Oreifej, O., Liu, Z.: Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: CVPR (2013)Google Scholar
  19. 19.
    Peng, X., Wang, L., Cai, Z.: Action and gesture temporal spotting with super vector representation. In: ECCV Workshops (2014)Google Scholar
  20. 20.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  21. 21.
    Pigou, L., Dieleman, S., Kindermans, P.J., Schrauwen, B.: Sign language recognition using convolutional neural networks. In: ECCV Workshops (2014)Google Scholar
  22. 22.
    Shi, Q., Cheng, L., Wang, L., Smola, A.: Human action segmentation and recognition using discriminative semi-markov models. IJCV 93(1), 22–32 (2011)CrossRefzbMATHGoogle Scholar
  23. 23.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)Google Scholar
  24. 24.
    Sminchisescu, C., Kanaujia, A., Metaxas, D.: Conditional models for contextual human motion recognition. CVIU 104(2), 210–220 (2006)Google Scholar
  25. 25.
    Starner, T., Weaver, J., Pentland, A.: Real-time american sign language recognition using desk and wearable computer based video. IEEE T PAMI 20(12), 1371–1375 (1998)CrossRefGoogle Scholar
  26. 26.
    Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: CVPR (2014)Google Scholar
  27. 27.
    Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.F.: On the improvement of human action recognition from depth map sequences using spacetime occupancy patterns. Pattern Recognition Letters 36, 221–227 (2014)CrossRefGoogle Scholar
  28. 28.
    Vogler, C., Metaxas, D.: ASL recognition based on a coupling between HMMs and 3D motion analysis. In: ICCV (1998)Google Scholar
  29. 29.
    Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: CVPR (2013)Google Scholar
  30. 30.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)Google Scholar
  31. 31.
    Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012)Google Scholar
  32. 32.
    Wang, S.B., Quattoni, A., Morency, L., Demirdjian, D., Darrell, T.: Hidden conditional random fields for gesture recognition. In: CVPR (2006)Google Scholar
  33. 33.
    Wu, D., Shao, L.: Deep dynamic neural networks for gesture segmentation and recognition. In: ECCV Workshops (2014)Google Scholar
  34. 34.
    Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. The Journal of Machine Learning Research 5, 975–1005 (2004)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Xia, L., Aggarwal, J.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: CVPR (2013)Google Scholar
  36. 36.
    Yang, X., Tian, Y.: Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In: CVPR Workshops (CVPRW) (2012)Google Scholar
  37. 37.
    Yang, X., Tian, Y.: Super normal vector for activity recognition using depth sequences. In: CVPR (2014)Google Scholar
  38. 38.
    Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: An efficient 3d kinematics descriptor for low-latency action recognition and detection. In: ICCV, pp. 2752–2759 (2013)Google Scholar
  39. 39.
    Zhu, Y., Chen, W., Guo, G.: Fusing spatiotemporal features and joints for 3d action recognition. In: CVPR Workshops (CVPRW), pp. 486–491 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Georgios D. Evangelidis
    • 1
    Email author
  • Gurkirt Singh
    • 2
  • Radu Horaud
    • 1
  1. 1.INRIA Grenoble Rhône-AlpesGrenobleFrance
  2. 2.Siemens RTC-ICVBangaloreIndia

Personalised recommendations