3D Hand Joints Position Estimation with Graph Convolutional Networks: A GraphHands Baseline

  • John-Alejandro Castro-VargasEmail author
  • Alberto Garcia-Garcia
  • Sergiu Oprea
  • Pablo Martinez-Gonzalez
  • Jose Garcia-Rodriguez
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1093)


State-of-the-art deep learning-based models used to address hand challenges, e.g. 3D hand joint estimation, need a vast amount of annotated data to achieve a good performance. The lack of data is a problem of paramount importance. Consequently, the use of synthetic datasets for training deep learning models is a trend and represents a promising avenue to improve existing approaches. Nevertheless, currently existing synthetic datasets lack of accurate and complete annotations, realism, and also rich hand-object interactions. For this purpose, in our work we present a synthetic dataset featuring rich hand-object interactions in photorealistic scenarios. The applications of our dataset for hand-related challenges are unlimited. To validate our data, we propose an initial approach to 3D hand joint estimation using a graph convolutional network feeded with point cloud data. Another point in favour of our dataset is that interactions are performed using realistic objects extracted from the YCB dataset. This could allow to test trained systems with our synthetic dataset using images/videos manipulating the same objects in real life.


Synthetic dataset Photorealism Hand-object interaction 3D hand joint estimation 



This work has been funded by the Spanish Government grant TIN2016-76515-R for the COMBAHO project, supported with Feder funds. This work has also been supported by three Spanish national grants for PhD studies (FPU15/04516, FPU17/00166, and ACIF/2018/197), by the University of Alicante project GRE16-19, and by the Valencian Government project GV/2018/022. Experiments were made possible by a generous hardware donation from NVIDIA.


  1. 1.
    Asadi-Aghbolaghi, M., Clapes, A., Bellantonio, M., Escalante, H.J., Ponce-López, V., Baró, X., Guyon, I., Kasaei, S., Escalera, S.: A survey on deep learning based approaches for action and gesture recognition in image sequences. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 476–483. IEEE (2017)Google Scholar
  2. 2.
    Barattini, P., Morand, C., Robertson, N.M.: A proposed gesture set for the control of industrial collaborative robots. In: 2012 IEEE RO-MAN, pp. 132–137. IEEE (2012)Google Scholar
  3. 3.
    Calli, B., Singh, A., Walsman, A., Srinivasa, S., Abbeel, P., Dollar, A.M.: The YCB object and model set: towards common benchmarks for manipulation research. In: 2015 International Conference on Advanced Robotics (ICAR), pp. 510–517. IEEE (2015)Google Scholar
  4. 4.
    de Carvalho Correia, A.C., de Miranda, L.C., Hornung, H.: Gesture-based interaction in domotic environments: state of the art and HCI framework inspired by the diversity. In: IFIP Conference on Human-Computer Interaction, pp. 300–317. Springer, Heidelberg (2013)Google Scholar
  5. 5.
    Castro-Vargas, J., Zapata-Impata, B., Gil, P., Garcia-Rodriguez, J., Torres, F.: 3DCNN performance in hand gesture recognition applied to robot arm interaction. In: Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods: ICPRAM, vol. 1, pp. 802–806. INSTICC, SciTePress (2019)Google Scholar
  6. 6.
    Chih, C.Y., Wan, Y.C., Hsu, Y.C., Chen, L.G.: Interactive sticker system with intel realsense. In: 2017 IEEE International Conference on Consumer Electronics (ICCE), pp. 174–175. IEEE (2017)Google Scholar
  7. 7.
    Congdon, E.L., Novack, M.A., Goldin-Meadow, S.: Gesture in experimental studies: how videotape technology can advance psychological theory. Organ. Res. Meth. 21(2), 489–499 (2018)CrossRefGoogle Scholar
  8. 8.
    Dong, C., Leu, M.C., Yin, Z.: American sign language alphabet recognition using microsoft kinect. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 44–52 (2015)Google Scholar
  9. 9.
    Garcia-Garcia, A., Martinez-Gonzalez, P., Oprea, S., Castro-Vargas, J.A., Orts-Escolano, S., Garcia-Rodriguez, J., Jover-Alvarez, A.: The RobotriX: an eXtremely photorealistic and very-large-scale indoor dataset of sequences with robot trajectories and interactions. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6790–6797. IEEE (2018)Google Scholar
  10. 10.
    Ge, L., Ren, Z., Li, Y., Xue, Z., Wang, Y., Cai, J., Yuan, J.: 3D hand shape and pose estimation from a single RGB image. arXiv preprint arXiv:1903.00812 (2019)
  11. 11.
    Gomez-Donoso, F., Orts-Escolano, S., Cazorla, M.: Large-scale multiview 3D hand pose dataset. Image Vis. Comput. 81, 25–33 (2019)CrossRefGoogle Scholar
  12. 12.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  13. 13.
    Kim, H., Lee, S., Kim, Y., Lee, S., Lee, D., Ju, J., Myung, H.: Weighted joint-based human behavior recognition algorithm using only depth information for low-cost intelligent video-surveillance system. Exp. Syst. Appl. 45, 131–141 (2016)CrossRefGoogle Scholar
  14. 14.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
  15. 15.
    Luo, R.C., Wu, Y.C.: Hand gesture recognition for human-robot interaction for service robot. In: 2012 IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pp. 318–323. IEEE (2012)Google Scholar
  16. 16.
    Martinez-Gonzalez, P., Oprea, S., Garcia-Garcia, A., Jover-Alvarez, A., Orts-Escolano, S., Rodríguez, J.G.: UnrealROX: an eXtremely photorealistic virtual reality environment for robotics simulations and synthetic data generation. CoRR abs/1810.06936 (2018).
  17. 17.
    Melax, S., Keselman, L., Orsten, S.: Dynamics based 3D skeletal hand tracking. In: Proceedings of Graphics Interface 2013, pp. 63–70. Canadian Information Processing Society (2013)Google Scholar
  18. 18.
    Miwa, H., Itoh, K., Matsumoto, M., Zecca, M., Takanobu, H., Rocella, S., Carrozza, M.C., Dario, P., Takanishi, A.: Effective emotional expressions with expression humanoid robot WE-4RII: integration of humanoid robot hand RCH-1. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004 (IROS 2004). Proceedings, vol. 3, pp. 2203–2208. IEEE (2004)Google Scholar
  19. 19.
    Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., Theobalt, C.: GANerated hands for real-time 3D hand tracking from monocular RGB. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–59 (2018)Google Scholar
  20. 20.
    Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1284–1293 (2017)Google Scholar
  21. 21.
    Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: Proceedings of International Conference on Computer Vision (ICCV) (2017).
  22. 22.
    Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Efficient model-based 3D tracking of hand articulations using kinect. In: BmVC, vol. 1, p. 3 (2011)Google Scholar
  23. 23.
    Oprea, S., Martinez-Gonzalez, P., Garcia-Garcia, A., Castro-Vargas, J.A., Orts-Escolano, S., Garcia-Rodriguez, J.: A visually plausible grasping system for object manipulation and interaction in virtual reality environments. arXiv preprint arXiv:1903.05238 (2019)
  24. 24.
    Panteleris, P., Oikonomidis, I., Argyros, A.: Using a single RGB frame for real time 3D hand pose estimation in the wild. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 436–445. IEEE (2018)Google Scholar
  25. 25.
    Pławiak, P., Sośnicki, T., Niedźwiecki, M., Tabor, Z., Rzecki, K.: Hand body language gesture recognition based on signals from specialized glove and machine learning algorithms. IEEE Trans. Ind. Inf. 12(3), 1104–1113 (2016)CrossRefGoogle Scholar
  26. 26.
    Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1106–1113 (2014)Google Scholar
  27. 27.
    Rogez, G., Khademi, M., Supančič III, J., Montiel, J.M.M., Ramanan, D.: 3D hand pose detection in egocentric RGB-D images. In: Workshop at the European Conference on Computer Vision, pp. 356–371. Springer, Heidelberg (2014)Google Scholar
  28. 28.
    Sharp, T., Keskin, C., Robertson, D., Taylor, J., Shotton, J., Kim, D., Rhemann, C., Leichter, I., Vinnikov, A., Wei, Y., et al.: Accurate, robust, and flexible real-time hand tracking. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3633–3642. ACM (2015)Google Scholar
  29. 29.
    Singh, S., Arora, C., Jawahar, C.: First person action recognition using deep learned descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2620–2628 (2016)Google Scholar
  30. 30.
    Singha, J., Roy, A., Laskar, R.H.: Dynamic hand gesture recognition using vision-based approach for human-computer interaction. Neural Comput. Appl. 29(4), 1129–1141 (2018)CrossRefGoogle Scholar
  31. 31.
    Sridhar, S., Mueller, F., Zollhoefer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Proceedings of European Conference on Computer Vision (ECCV) (2016).
  32. 32.
    Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using RGB and depth data. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2013.
  33. 33.
    Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 824–832 (2015)Google Scholar
  34. 34.
    Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: structured estimation of 3D articulated hand posture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)Google Scholar
  35. 35.
    Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 169 (2014)CrossRefGoogle Scholar
  36. 36.
    Wetzler, A., Slossberg, R., Kimmel, R.: Rule of thumb: Deep derotation for improved fingertip detection. In: Xianghua Xie, M.W.J., Tam, G.K.L. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 33.1–33.12. BMVA Press, Durham, September 2015Google Scholar
  37. 37.
    Xu, C., Nanjappa, A., Zhang, X., Cheng, L.: Estimate hand poses efficiently from single depth images. Int. J. Comput. Vis. 116(1), 21–45 (2016)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Yuan, S., Ye, Q., Stenger, B., Jain, S., Kim, T.K.: BiGHand2.2M benchmark: hand pose dataset and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4866–4874 (2017)Google Scholar
  39. 39.
    Zaman, M., Rahman, S., Rafique, T., Ali, F., Akram, M.U.: Hand gesture recognition using color markers. In: International Conference on Hybrid Intelligent Systems, pp. 1–10. Springer, Heidelberg (2016)Google Scholar
  40. 40.
    Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 4913–4921. IEEE (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • John-Alejandro Castro-Vargas
    • 1
    Email author
  • Alberto Garcia-Garcia
    • 1
  • Sergiu Oprea
    • 1
  • Pablo Martinez-Gonzalez
    • 1
  • Jose Garcia-Rodriguez
    • 1
  1. 1.3D Perception LabUniversity of AlicanteAlicanteSpain

Personalised recommendations