Activity Gesture Recognition on Kinect Sensor Using Convolutional Neural Networks and FastDTW for the MSRC-12 Dataset

  • Miguel PfitscherEmail author
  • Daniel Welfer
  • Marco Antonio de Souza Leite Cuadros
  • Daniel Fernando Tello Gamarra
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 940)


In this paper, we use data from the Microsoft Kinect sensor that processes the captured image of a person, thus, reducing the number of data in just joints on each frame. Then, we propose a creation of an image from all the frames removed from the movement, which facilitates training in a convolutional neural network. Finally, we trained a CNN using two different forms of training: combined training and individual training using the MSRC-12 dataset. Thus, the trained network obtained an accuracy rate of 86.67% in combined training and 90.78% of accuracy rate in the individual training, which is a very good performance compared to related works. This demonstrates that networks based on convolutional networks can be effective for the recognition of human actions using joints.


Human physical activity recognition Deep learning Convolutional neural networks Microsoft Kinect MSRC-12 dataset 


  1. 1.
  2. 2.
    Salvador, S., Chan, P.: FastDTW: toward accurate dynamic time warping in linear time and space. Intell. Data Anal. 11(5), 561–580 (2007)CrossRefGoogle Scholar
  3. 3.
    Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, pp. 20–27 (2012)Google Scholar
  4. 4.
    Piyathilaka, L., Kodagoda, S.: Gaussian mixture based HMM for human daily activity recognition using 3D skeleton features. In: IEEE 8th Conference on Industrial Electronics and Applications (ICIEA), Melbourne, VIC, pp. 567–572 (2013)Google Scholar
  5. 5.
    Althloothi, S., Mahoor, M.H., Zhang, X., Voyles, R.M.: Human activity recognition using multi-features and multiple kernel learning. Pattern Recogn. 47(5), 1800–1812 (2014)CrossRefGoogle Scholar
  6. 6.
    Du, Y., Fu, Y., Wang, L.: Representation learning of temporal dynamics for skeleton-based action recognition. IEEE Trans. Image Process. 25(7), 3010–3022 (2016)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Ke, Q., An, S., Bennamoun, M., Sohel, F., Boussaid, F.: SkeletonNet: mining deep part features for 3-D action recognition. IEEE Signal Process. Lett. 24(6), 731–735 (2017)CrossRefGoogle Scholar
  8. 8.
    Mo, L., Li, F., Zhu, Y., Huang, A.: Human physical activity recognition based on computer vision with deep learning model. In: IEEE International Instrumentation and Measurement Technology Conference Proceedings, Taipei, pp. 1–6 (2016)Google Scholar
  9. 9.
    Hou, Y., Li, Z., Wang, P., Li, W.: Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans. Circ. Syst. Video Technol. 28(3), 807–811 (2018)CrossRefGoogle Scholar
  10. 10.
    Jiang, X., Zhong, F., Peng, Q., Qin, X.: Online robust action recognition based on a hierarchical model 30, 1021 (2014). Scholar
  11. 11.
    Sharaf, A., Torki, M., Hussein, M.E., El-Saban, M.: Real-time multi-scale action detection from 3D skeleton data. In: IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, pp. 998–1005 (2015)Google Scholar
  12. 12.
    Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., Lew, M.S.: Deep learning for visual understanding: a review. Neurocomputing 187, 27–48 (2016). Recent Developments on Deep Big VisionCrossRefGoogle Scholar
  13. 13.
    Zeiler, M.: Hierarchical convolutional deep learning in computer vision. Ph.D. thesis, New York University (2014)Google Scholar
  14. 14.
    Martín, A.: TensorFlow: learning functions at scale. ACM SIGPLAN Not. 51, 1 (2016). Scholar
  15. 15.
    Wu, F., Hu, P., Kong, D.: Flip-Rotate-Pooling Convolution and Split Dropout on Convolution Neural Networks for Image Classification (2015). arXiv:1507.08754v1
  16. 16.
    Nguyen, D., Le, H.: Kinect gesture recognition: SVM vs. RVM. In: Seventh International Conference on Knowledge and Systems Engineering (KSE), Ho Chi Minh City, pp. 395–400 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Miguel Pfitscher
    • 1
    Email author
  • Daniel Welfer
    • 1
  • Marco Antonio de Souza Leite Cuadros
    • 2
  • Daniel Fernando Tello Gamarra
    • 1
  1. 1.Universidade Federal de Santa MariaSanta MariaBrazil
  2. 2.Instituto Federal do Espirito SantoSerraBrazil

Personalised recommendations