FPVRGame: Deep Learning for Hand Pose Recognition in Real-Time Using Low-End HMD

  • Eder de OliveiraEmail author
  • Esteban Walter Gonzalez Clua
  • Cristina Nader Vasconcelos
  • Bruno Augusto Dorta Marques
  • Daniela Gorski Trevisan
  • Luciana Cardoso de Castro Salgado
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11863)


Head Mounted Display (HMD) became a popular device, drastically increasing the usage of Virtual, Mixed, and Augmented Reality. While the systems’ visual resources are accurate and immersive, precise interfaces require depth cameras or special joysticks, requiring either complex devices or not following the natural body expression. This work presents an approach for the usage of bare hands to control an immersive game from an egocentric perspective and built from a proposed case study methodology. We used a DenseNet Convolutional Neural Network (CNN) architecture to perform the recognition in real-time, from both indoor and outdoor environments, not requiring any image segmentation process. Our research also generated a vocabulary, considering users’ preferences, seeking a set of natural and comfortable hand poses and evaluated users’ satisfaction and performance for an entertainment setup. Our recognition model achieved an accuracy of 97.89%. The user’s studies show that our method outperforms the classical controllers in regards to natural interactions. We demonstrate our results using commercial low-end HMD’s and compare our solution with state-of-the-art methods.


Hand poses recognition Convolutional neural network Deep learning Virtual reality User interfaces 


  1. 1.
    Al Maimani, A., Roudaut, A.: Frozen suit: designing a changeable stiffness suit and its application to haptic games. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI 2017, pp. 2440–2448. ACM, New York (2017).
  2. 2.
    Bowman, D.A., McMahan, R.P., Ragan, E.D.: Questioning naturalism in 3D user interfaces. Commun. ACM 55(9), 78–88 (2012)CrossRefGoogle Scholar
  3. 3.
    Buxton, B.: Sketching User Experiences: Getting the Design Right and the Right Design. Morgan kaufmann, Burlington (2010)Google Scholar
  4. 4.
    Cao, C., Zhang, Y., Wu, Y., Lu, H., Cheng, J.: Egocentric gesture recognition using recurrent 3D convolutional neural networks with spatiotemporal transformer modules. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3763–3771 (2017)Google Scholar
  5. 5.
    Crankshaw, D., Wang, X., Zhou, G., Franklin, M.J., Gonzalez, J.E., Stoica, I.: Clipper: a low-latency online prediction serving system. In: NSDI, pp. 613–627 (2017)Google Scholar
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  7. 7.
    Fushiki, T.: Estimation of prediction error by using k-fold cross-validation. Stat. Comput. 21(2), 137–146 (2011)MathSciNetCrossRefGoogle Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp. 770–778 (2016)Google Scholar
  9. 9.
    Höll, M., Oberweger, M., Arth, C., Lepetit, V.: Efficient physics-based implementation for realistic hand-object interaction in virtual reality. In: 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (2018)Google Scholar
  10. 10.
    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 3 (2017)Google Scholar
  11. 11.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  12. 12.
    JoliBrain: Deep detect (2018). Accessed 20 Sept 2018
  13. 13.
    Kelley, J.F.: An empirical methodology for writing user-friendly natural language computer applications. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 193–196. ACM (1983)Google Scholar
  14. 14.
    Knierim, P., Schwind, V., Feit, A.M., Nieuwenhuizen, F., Henze, N.: Physical keyboards in virtual reality: analysis of typing performance and effects of avatar hands. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, pp. 345:1–345:9. ACM, New York (2018).
  15. 15.
    Koller, O., Ney, H., Bowden, R.: Deep hand: how to train a CNN on 1 million hand images when your data is continuous and weakly labelled. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 3793–3802, June 2016Google Scholar
  16. 16.
    Lee, S., Park, K., Lee, J., Kim, K.: User study of VR basic controller and data glove as hand gesture inputs in VR games. In: 2017 International Symposium on Ubiquitous Virtual Reality (ISUVR), pp. 1–3, June 2017.
  17. 17.
    Li, Y., Ye, Z., Rehg, J.M.: Delving into egocentric actions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 287–295 (2015)Google Scholar
  18. 18.
    Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207–4215 (2016)Google Scholar
  19. 19.
    Mortensen, D.: Natural user interfaces-what are they and how do you design user interfaces that feel natural. Interact. Design Found. (2017)Google Scholar
  20. 20.
    NVIDIA: NVIDIA tensorrt (2018). Accessed 20 Sept 2018
  21. 21.
    Oliveira, E.: Dataset from egocentrics images for hand poses recognition (2018). Accessed 10 Sept 2018
  22. 22.
    Piumsomboon, T., Clark, A., Billinghurst, M., Cockburn, A.: User-defined gestures for augmented reality. In: Kotzé, P., Marsden, G., Lindgaard, G., Wesson, J., Winckler, M. (eds.) INTERACT 2013. LNCS, vol. 8118, pp. 282–299. Springer, Heidelberg (2013). CrossRefGoogle Scholar
  23. 23.
    Proença, P.F., Gao, Y.: SPLODE: semi-probabilistic point and line odometry with depth estimation from RGB-D camera motion. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1594–1601. IEEE (2017)Google Scholar
  24. 24.
    Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for humancomputer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015). Scholar
  25. 25.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). Scholar
  26. 26.
    Sagayam, K.M., Hemanth, D.J.: Hand posture and gesture recognition techniques for virtual reality applications: a survey. Virtual Reality 21(2), 91–107 (2017). Scholar
  27. 27.
    Son, Y.J., Choi, O.: Image-based hand pose classification using faster R-CNN. In: 2017 17th International Conference on Control, Automation and Systems (ICCAS), pp. 1569–1573. IEEE (2017)Google Scholar
  28. 28.
    Sra, M., Xu, X., Maes, P.: Breathvr: leveraging breathing as a directly controlled interface for virtual reality games. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, pp. 340:1–340:12. ACM, New York (2018).
  29. 29.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  30. 30.
    Tewari, A., Grandidier, F., Taetz, B., Stricker, D.: Adding model constraints to CNN for top view hand pose recognition in range images. In: ICPRAM, pp. 170–177 (2016)Google Scholar
  31. 31.
    Yousefi, S., Kidane, M., Delgado, Y., Chana, J., Reski, N.: 3D gesture-based interaction for immersive experience in mobile VR. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2121–2126. IEEE (2016)Google Scholar
  32. 32.
    Zhang, C., et al.: FingerPing: recognizing fine-grained hand poses using active acoustic on-body sensing. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, pp. 437:1–437:10. ACM, New York (2018).
  33. 33.
    Zhang, Y., Cao, C., Cheng, J., Lu, H.: EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimed. 20(5), 1038–1050 (2018)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2019

Authors and Affiliations

  1. 1.Fluminense Federal UniversityNiteróiBrazil

Personalised recommendations