Real Time Hand Pose Estimation Using Depth Sensors

  • Cem Keskin
  • Furkan Kıraç
  • Yunus Emre Kara
  • Lale Akarun
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)


Real-time hand posture capture has been a difficult goal in computer vision. The extraction of hand skeleton parameters would be an important milestone for sign language recognition, since it would make classification of hand shapes and gestures possible. The recent introduction of the Kinect depth sensor has accelerated research in human body pose capture. This chapter describes a real-time hand pose estimation method employing an object recognition by parts approach, and the use of this method for hand shape classification. First, a realistic 3D hand model is used to represent the hand with 21 different parts. Then, a random decision forest (RDF) is trained on synthetic depth images generated by animating the hand model, which is used to perform per pixel classification and to assign each pixel to a hand part. The classification results are fed into a local mode finding algorithm to estimate the joint locations for the hand skeleton. The system can process depth images retrieved from Kinect in real time, and does not rely on temporal information. As a simple application of the system, we also describe a support vector machine (SVM)-based recognition module for the ten digits of American Sign Language (ASL) based on our method, which attains a recognition rate of 99.9 % on live depth images in real time.


Depth Image Synthetic Dataset Hand Gesture American Sign Language Split Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Microsoft Corp. Redmond, WA. Kinect for Xbox 360 Google Scholar
  2. 2.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: IEEE Conference on Computer Vision and Pattern Recognition (2011) Google Scholar
  3. 3.
    Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: International Conference on Computer Vision (2011) Google Scholar
  4. 4.
    Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: a review. Comput. Vis. Image Underst. 108, 52–73 (2007) CrossRefGoogle Scholar
  5. 5.
    Athitsos, V., Sclaroff, S.: Estimating 3D hand pose from a cluttered image. In: IEEE Conference on Computer Vision and Pattern Recognition (2003) Google Scholar
  6. 6.
    Romero, J., Kjellstrom, H., Kragic, D.: Monocular real-time 3D articulated hand pose estimation. In: Humanoids, pp. 87–92 (2009) Google Scholar
  7. 7.
    De Campos, T.E., Murray, D.W.: Regression-based hand pose estimation from multiple cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (2006) Google Scholar
  8. 8.
    Tipping, M.E., Smola, A.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001) MathSciNetMATHGoogle Scholar
  9. 9.
    Rosales, R., Athitsos, V., Sigal, L., Sclaroff, S.: 3D hand pose reconstruction using specialized mappings. In: International Conference on Computer Vision (2001) Google Scholar
  10. 10.
    Stergiopoulou, E., Papamarkos, N.: Hand gesture recognition using a neural network shape fitting technique. Eng. Appl. Artif. Intell. 22, 1141–1158 (2009) CrossRefGoogle Scholar
  11. 11.
    De La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3D hand pose estimation from monocular video. IEEE Trans. Pattern Anal. and Mach. Intell., Feb. 1–14 (2011) Google Scholar
  12. 12.
    Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using Kinect. In: British Machine Vision Conference (2011) Google Scholar
  13. 13.
    Stenger, B., Mendonça, P.R.S., Cipolla, R.: Model-based 3D tracking of an articulated hand. In: IEEE Conference on Computer Vision and Pattern Recognition (2001) Google Scholar
  14. 14.
    Bray, M., Koller-Meier, E., Van Gool, L.J.: Smart particle filtering for high-dimensional tracking. Comput. Vis. Image Underst. 106, 116–129 (2007) CrossRefGoogle Scholar
  15. 15.
    Heap, T., Hogg, D.: Towards 3D hand tracking using a deformable model. In: International Conference on Automatic Face and Gesture Recognition, pp. 140–145 (1996) Google Scholar
  16. 16.
    Mo, Z., Neumann, U.: Real-time hand pose recognition using low-resolution depth images. In: IEEE Conference on Computer Vision and Pattern Recognition (2006) Google Scholar
  17. 17.
    Malassiotis, S., Strintzis, M.: Real-time hand posture recognition using range data. Image Vis. Comput. 26, 1027–1037 (2008) CrossRefGoogle Scholar
  18. 18.
    Liu, X., Fujimura, K.: Hand gesture recognition using depth data. In: Automatic Face and Gesture Recognition (2004) Google Scholar
  19. 19.
    Suryanarayan, P., Subramanian, A., Mandalapu, D.: Dynamic hand pose recognition using depth data. In: International Conference on Pattern Recognition (2010) Google Scholar
  20. 20.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001) MATHCrossRefGoogle Scholar
  21. 21.
    Uebersax, D., Gall, J., Van den Bergh, M., Van Gool, L.: Real-time sign language letter and word recognition from depth data. In: International Conference on Computer Vision—Workshop on Human Computer Interaction: Real-Time Vision Aspects of Natural User Interfaces (2011) Google Scholar
  22. 22.
    Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. Pattern Anal. Mach. Intell. 24, 603–619 (2002) CrossRefGoogle Scholar
  23. 23.
    Basak, J.: Online adaptive decision trees: pattern classification and function approximation. Neural Comput. 18, 2062–2101 (2006) MathSciNetMATHCrossRefGoogle Scholar
  24. 24.
    Sharp, T.: Implementing decision trees and forests on a GPU. In: European Conference on Computer Vision (2008) Google Scholar
  25. 25.
    Welch, G., Bishop, G.: An Introduction to the Kalman Filter (1995) Google Scholar
  26. 26.
    Isard, M., Blake, A.: CONDENSATION—conditional density propagation for visual tracking. Int. J. Comput. Vis. 29, 5–28 (1998) CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Cem Keskin
    • 1
  • Furkan Kıraç
    • 1
  • Yunus Emre Kara
    • 1
  • Lale Akarun
    • 1
  1. 1.Computer Engineering DepartmentBoğaziçi UniversityIstanbulTurkey

Personalised recommendations