Gesture Recognition in RGB Videos Using Human Body Keypoints and Dynamic Time Warping

  • Pascal Schneider
  • Raphael MemmesheimerEmail author
  • Ivanna Kramer
  • Dietrich Paulus
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11531)


Gesture recognition opens up new ways for humans to intuitively interact with machines. Especially for service robots, gestures can be a valuable addition to the means of communication to, for example, draw the robot’s attention to someone or something. Extracting a gesture from video data and classifying it is a challenging task and a variety of approaches have been proposed throughout the years. This paper presents a method for gesture recognition in RGB videos using OpenPose to extract the pose of a person and Dynamic Time Warping (DTW) in conjunction with One-Nearest-Neighbor (1NN) for time-series classification. The main features of this approach are the independence of any specific hardware and high flexibility, because new gestures can be added to the classifier by adding only a few examples of it. We utilize the robustness of the Deep Learning-based OpenPose framework while avoiding the data-intensive task of training a neural network ourselves. We demonstrate the classification performance of our method using a public dataset.


  1. 1.
    Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 168–172. IEEE (2015)Google Scholar
  2. 2.
    Gillian, N., Knapp, B., O’Modhrain, S.: Recognition of multivariate temporal musical gestures using n-dimensional dynamic time warping. In: NIME, pp. 337–342 (2011)Google Scholar
  3. 3.
    Rosa-Pujazón, A., Barbancho, I., Tardón, L.J., Barbancho, A.M.: Fast-gesture recognition and classification using Kinect: an application for a virtual reality drumkit. Multimed. Tools Appl. 75(14), 8137–8164 (2016)CrossRefGoogle Scholar
  4. 4.
    Jiang, F., Zhang, S., Wu, S., Gao, Y., Zhao, D.: Multi-layered gesture recognition with Kinect. J. Mach. Learn. Res. 16(1), 227–254 (2015)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Ribó, A., Warchol, D., Oszust, W.: An approach to gesture recognition with skeletal data using dynamic time warping and nearest neighbour classifier. Int. J. Intell. Syst. Appl. 8(6), 1–8 (2016)Google Scholar
  6. 6.
    Bautista, M.Á., et al.: Probability-based dynamic time warping for gesture recognition on RGB-D data. In: Jiang, X., Bellon, O.R.P., Goldgof, D., Oishi, T. (eds.) WDIA 2012. LNCS, vol. 7854, pp. 126–135. Springer, Heidelberg (2013). Scholar
  7. 7.
    Mitra, S., Acharya, T.: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part C (Applications and Reviews) 37(3), 311–324 (2007)CrossRefGoogle Scholar
  8. 8.
    Kevin, N.Y.Y., Ranganath, S., Ghosh, D.: Trajectory modeling in gesturere cognition using CyberGloves® and magnetic trackers. In: 2004 IEEE Region 10 Conference TENCON 2004, pp. 571–574. IEEE, (2004)Google Scholar
  9. 9.
    Reyes, M., Dominguez, G., Escalera, S.: Feature weighting in dynamic time warping for gesture recognition in depth data. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1182–1188. IEEE (2011)Google Scholar
  10. 10.
    Ten Holt, G.A., Reinders, M.J., Hendriks, E.: Multi-dimensional dynamic time warping for gesture recognition. In: Thirteenth Annual Conference of the Advanced School for Computing and Imaging, vol. 300, p. 1 (2007)Google Scholar
  11. 11.
    Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 1033–1040. ACM (2006)Google Scholar
  12. 12.
    Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Discov. 31(3), 606–660 (2017)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)CrossRefGoogle Scholar
  14. 14.
    Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23(1), 67–72 (1975)CrossRefGoogle Scholar
  15. 15.
    Ratanamahatana, C.A., Keogh, E.: Making time-series classification more accurate using learned constraints. In: Proceedings of the 2004 SIAM International Conference on Data Mining, SIAM 2004, pp. 11–22 (2004)Google Scholar
  16. 16.
    Salvador, S., Chan, P.: Toward accurate dynamic time warping in linear time and space. Intell. Data Anal. 11(5), 561–580 (2007)CrossRefGoogle Scholar
  17. 17.
    Müller, M.: Information Retrieval for Music and Motion. Springer, Heidelberg (2007). Scholar
  18. 18.
    Senin, P.: Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, vol. 855, pp. 1–23 (2008)Google Scholar
  19. 19.
    Chai, X., Liu, Z., Yin, F., Liu, Z., Chen, X.: Two streams recurrent neural networks for large-scale continuous gesture recognition. In: 23rd International Conference on Pattern Recognition (ICPR), pp. 31–36. IEEE (2016)Google Scholar
  20. 20.
    Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207–4215 (2016)Google Scholar
  21. 21.
    Liu, H., Wang, L.: Gesture recognition for human-robot collaboration: a review. Int. J. Industr. Ergon. 68, 355–367 (2018)CrossRefGoogle Scholar
  22. 22.
    Memmesheimer, R., Mykhalchyshyna, I., Paulus, D.: Gesture recognition on human pose features of single images. In: 2018 9th International Conference on Intelligent Systems (IS), pp. 1–7. IEEE (2018)Google Scholar
  23. 23.
    Celebi, S., Aydin, A.S., Temiz, T.T., Arici, T.: Gesture recognition using skeleton data with weighted dynamic time warping. In: VISAPP, no. 1, pp. 620–625 (2013)Google Scholar
  24. 24.
    Rwigema, J., Choi, H.-R., Kim, T.: A differential evolution approach to optimize weights of dynamic time warping for multi-sensor based gesture recognition. Sensors (Basel, Switzerland) 19(5), 1007 (2019)CrossRefGoogle Scholar
  25. 25.
    Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)Google Scholar
  26. 26.
    Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)Google Scholar
  27. 27.
    Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018)
  28. 28.
    Keogh, E.J., Pazzani, M.J.: Derivative dynamic time warping. In: Proceedings of the 2001 SIAM International Conference on Data Mining, SIAM, pp. 1–11 (2001)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Pascal Schneider
    • 1
  • Raphael Memmesheimer
    • 1
    Email author
  • Ivanna Kramer
    • 1
  • Dietrich Paulus
    • 1
  1. 1.Active Vision Group, Institute for Computational VisualisticsUniversity of Koblenz-LandauKoblenzGermany

Personalised recommendations