First-Person Palm Pose Tracking and Gesture Recognition in Augmented Reality

  • Daniel Thalmann
  • Hui LiangEmail author
  • Junsong Yuan
Part of the Communications in Computer and Information Science book series (CCIS, volume 598)


We present an Augmented Reality solution to allow users to manipulate and inspect 3D virtual objects freely with their bare hands on wearable devices. To this end, we use a head-mounted depth camera to capture the RGB-D hand images from egocentric view, and propose a unified framework to jointly recover the 6D palm pose and recognize the hand gesture from the depth images. The random forest is utilized to regress for the palm pose and classify the hand gesture simultaneously via a spatial-voting framework. With a real-world annotated training dataset, the proposed method shows to predict the palm pose and gesture accurately. The output of the forest is used to render the 3D virtual objects, which are overlaid onto the hand region in input RGB images with camera calibration parameters to provide seamless virtual and real scene synthesis.


Random Forest Augmented Reality Depth Image Gesture Recognition Hand Gesture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Agarwal, A., Triggs, B.: Recovering 3D human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 44–58 (2006)CrossRefGoogle Scholar
  2. 2.
    Akman, O., Poelman, R., Caarls, W., Jonker, P.: Multi-cue hand detection and tracking for a head-mounted augmented reality system. Mach. Vis. Appl. 24(5), 931–946 (2013)CrossRefGoogle Scholar
  3. 3.
    Asad, M., Slabaugh, G.: Hand orientation regression using random forest for augmented reality. In: De Paolis, L.T. (ed.) Augmented and Virtual Reality. LNCS, vol. 8853, pp. 159–174. Springer, Switzerland (2014)Google Scholar
  4. 4.
    Baak, A., Müller, M., Bharaj, G., Seidel, H.-P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds.) Consumer Depth Cameras for Computer Vision. Advances in Computer Vision and Pattern Recognition, pp. 71–98. Springer, London (2013)CrossRefGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Chen, F.-S., Fu, C.-M., Huang, C.-L.: Hand gesture recognition using a real-time tracking method and hidden Markov models. Image vis. comput. 21(8), 745–758 (2003)CrossRefGoogle Scholar
  7. 7.
    Chen, Q., Georganas, N.D., Petriu, E.M.: Real-time vision-based hand gesture recognition using Haar-like features. In: IEEE Instrumentation and Measurement Technology Conference Proceedings, pp. 1–6. IEEE (2007)Google Scholar
  8. 8.
    Davis, J.W., Bobick, A.E.: The representation and recognition of human movement using temporal templates. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 928–934. IEEE (1997)Google Scholar
  9. 9.
    de La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3D hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)CrossRefGoogle Scholar
  10. 10.
    Freeman, W.T., Roth, M.: Orientation histograms for hand gesture recognition. In: International Workshop on Automatic Face and Gesture Recognition, vol. 12, pp. 296–301 (1995)Google Scholar
  11. 11.
    Guan, H., Feris, R.S., Turk, M.: The isometric self-organizing map for 3D hand pose estimation. In: IEEE International Conference on Automatic Face and Gesture Recognition, pp. 263–268. IEEE (2006)Google Scholar
  12. 12.
    Herdtweck, C., Curio, C.: Monocular car viewpoint estimation with circular regression forests. In: IEEE Intelligent Vehicles Symposium (2013)Google Scholar
  13. 13.
    Hsieh, C.C., Liou, D.H., Lee, D.: A real time hand gesture recognition system using motion history image. In: International Conference on Signal Processing Systems, pp. V2-394–V2-398. IEEE (2010)Google Scholar
  14. 14.
    Isard, M., Blake, A.: Condensation—conditional density propagation for visual tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998)CrossRefGoogle Scholar
  15. 15.
    Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 852–863. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. 16.
    Kirac, F., Kara, Y.E., Akarun, L.: Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data. Pattern Recogn. Lett. 50, 91–100 (2014)CrossRefGoogle Scholar
  17. 17.
    Kolsch, M.: Vision based hand gesture interfaces for wearable computing and virtual environments. Doctoral thesis, University of California, Santa Barbara (2004)Google Scholar
  18. 18.
    Lee, T., Hollerer, T.: Handy ar: markerless inspection of augmented reality objects using fingertip tracking. In: IEEE International Symposium on Wearable Computers, pp. 83–90 (2007)Google Scholar
  19. 19.
    Liang, H., Yuan, J., Thalmann, D.: Parsing the hand in depth images. IEEE Trans. Multimedia 16(5), 1241–1253 (2014)CrossRefGoogle Scholar
  20. 20.
    Lin, J.Y., Wu, Y., Huang, T.S.: 3D model-based hand tracking using stochastic direct search method. In: IEEE International Conference On Automatic Face and Gesture Recognition, pp. 693–698. IEEE (2004)Google Scholar
  21. 21.
    Lo, R., Chen, A., Rampersad, V., Huang, J., Wu, H., Mann, S.: Augmediated reality system based on 3D camera selfgesture sensing. In: IEEE International Symposium on Technology and Society, pp. 20–31 (2013)Google Scholar
  22. 22.
    Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Efficient model-based 3D tracking of hand articulations using kinect. In: British Machine Vision Conference, vol. 1, p. 3 (2011)Google Scholar
  23. 23.
    Pellegrini, S., Schindler, K., Nardi, D.: A generalisation of the icp algorithm for articulated bodies. In: British Machine Vision Conference, vol. 3, p. 4. Citeseer (2008)Google Scholar
  24. 24.
    Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1106–1113. IEEE (2014)Google Scholar
  25. 25.
    Ren, Z., Yuan, J., Meng, J., Zhang, Z.: Robust part-based hand gesture recognition using kinect sensor. IEEE Trans. Multimedia 15(5), 1110–1120 (2013)CrossRefGoogle Scholar
  26. 26.
    Schroder, M., Maycock, J., Ritter, H., Botsch, M.: Real-time hand tracking using synergistic inverse kinematics. In: IEEE International Conference on Robotics and Automation, pp. 5447–5454. IEEE (2014)Google Scholar
  27. 27.
    Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  28. 28.
    Tang, D., Yu, T.-H., Kim, T.-K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: IEEE International Conference on Computer Vision, pp. 3224–3231. IEEE (2013)Google Scholar
  29. 29.
    Thayananthan, A., Navaratnam, R., Stenger, B., Torr, P.H., Cipolla, R.: Pose estimation and tracking using multivariate regression. Pattern Recogn. Lett. 29(9), 1302–1310 (2008)CrossRefGoogle Scholar
  30. 30.
    Ueda, E., Matsumoto, Y., Imai, M., Ogasawara, T.: Hand pose estimation using multi-viewpoint silhouette images. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 4, pp. 1989–1996. IEEE (2001)Google Scholar
  31. 31.
    Wang, R., Paris, S., Popović, J.: 6D hands: markerless hand-tracking for computer aided design. In: The Annual ACM Symposium on User Interface Software and Technology, pp. 549–558. ACM (2011)Google Scholar
  32. 32.
    Wei, X., Zhang, P., Chai, J.: Accurate realtime full-body motion capture using a single depth camera. ACM Trans. Graph. 31(6), 188 (2012)CrossRefGoogle Scholar
  33. 33.
    Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: IEEE International Conference on Computer Vision, pp. 3456–3462. IEEE (2013)Google Scholar
  34. 34.
    Zhang, C., Yang, X., Tian, Y.: Histogram of 3D facets: a characteristic descriptor for hand gesture recognition. In: IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, pp. 1–8. IEEE (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Institute for Media InnovationNanyang Technological UniversitySingaporeSingapore
  2. 2.School of Electrical and Electronics EngineeringNanyang Technological UniversitySingaporeSingapore

Personalised recommendations