Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points

  • Dimitrios TzionasEmail author
  • Abhilash Srikantha
  • Pablo Aponte
  • Juergen Gall
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8753)


Hand motion capture has been an active research topic, following the success of full-body pose tracking. Despite similarities, hand tracking proves to be more challenging, characterized by a higher dimensionality, severe occlusions and self-similarity between fingers. For this reason, most approaches rely on strong assumptions, like hands in isolation or expensive multi-camera systems, that limit practical use. In this work, we propose a framework for hand tracking that can capture the motion of two interacting hands using only a single, inexpensive RGB-D camera. Our approach combines a generative model with collision detection and discriminatively learned salient points. We quantitatively evaluate our approach on 14 new sequences with challenging interactions.



The authors acknowledge the help of Javier Romero and Jessica Purmort of MPI-IS regarding the acquisition of the personalized hand model, the assistance of Philipp Rybalov with annotation and the public software release of the FORTH tracker by the CVRL lab of FORTH-ICS, enabling comparison to [27]. Financial support was provided by the DFG Emmy Noether program (GA 1927/1-1).


  1. 1.
    Albrecht, I., Haber, J., Seidel, H.P.: Construction and animation of anatomically based human hand models. In: SCA, pp. 98–109 (2003)Google Scholar
  2. 2.
    Athitsos, V., Sclaroff, S.: Estimating 3d hand pose from a cluttered image. In: CVPR, pp. 432–439 (2003)Google Scholar
  3. 3.
    Ballan, L., Taneja, A., Gall, J., Van Gool, L., Pollefeys, M.: Motion capture of hands in action using discriminative salient points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 640–653. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Baran, I., Popović, J.: Automatic rigging and animation of 3d characters. TOG 26(3), 72 (2007)CrossRefGoogle Scholar
  5. 5.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. PAMI 24(4), 509–522 (2002)CrossRefGoogle Scholar
  6. 6.
    Bregler, C., Malik, J., Pullen, K.: Twist based acquisition and tracking of animal and human kinematics. IJCV 56(3), 179–194 (2004)CrossRefGoogle Scholar
  7. 7.
    de Campos, T., Murray, D.: Regression-based hand pose estimation from multiple cameras. In: CVPR, pp. 782–789 (2006)Google Scholar
  8. 8.
    Canny, J.: A computational approach to edge detection. PAMI 8(6), 679–698 (1986)CrossRefGoogle Scholar
  9. 9.
    Chen, Y., Medioni, G.: Object modeling by registration of multiple range images. In: ICRA, pp. 2724–2729 (1991)Google Scholar
  10. 10.
    Ekvall, S., Kragic, D.: Grasp recognition for programming by demonstration. In: ICRA, pp. 748–753 (2005)Google Scholar
  11. 11.
    Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: a review. CVIU 108(1–2), 52–73 (2007)Google Scholar
  12. 12.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Distance transforms of sampled functions. Technical report. Cornell Computing and Information Science (2004)Google Scholar
  13. 13.
    Gall, J., Fossati, A., Van Gool, L.: Functional categorization of objects using real-time markerless motion capture. In: CVPR, pp. 1969–1976 (2011)Google Scholar
  14. 14.
    Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. PAMI 33(11), 2188–2202 (2011)CrossRefGoogle Scholar
  15. 15.
    Hamer, H., Schindler, K., Koller-Meier, E., Van Gool, L.: Tracking a hand manipulating an object. In: ICCV, pp. 1475–1482 (2009)Google Scholar
  16. 16.
    Hamer, H., Gall, J., Weise, T., Van Gool, L.: An object-dependent hand pose prior from sparse training data. In: CVPR, pp. 671–678 (2010)Google Scholar
  17. 17.
    Heap, T., Hogg, D.: Towards 3d hand tracking using a deformable model. In: FG, pp. 140–145 (1996)Google Scholar
  18. 18.
    Holzer, S., Rusu, R., Dixon, M., Gedikli, S., Navab, N.: Adaptive neighborhood selection for real-time surface normal estimation from organized point cloud data using integral images. In: IROS, pp. 2684–2689 (2012)Google Scholar
  19. 19.
    Jones, M.J., Rehg, J.M.: Statistical color models with application to skin detection. IJCV 46(1), 81–96 (2002)CrossRefzbMATHGoogle Scholar
  20. 20.
    Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 852–863. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  21. 21.
    Kim, D., Hilliges, O., Izadi, S., Butler, A.D., Chen, J., Oikonomidis, I., Olivier, P.: Digits: freehand 3d interactions anywhere using a wrist-worn gloveless sensor. In: UIST, pp. 167–176 (2012)Google Scholar
  22. 22.
    Kyriazis, N., Argyros, A.: Physically plausible 3d scene tracking: the single actor hypothesis. In: CVPR, pp. 9–16 (2013)Google Scholar
  23. 23.
    Lewis, J.P., Cordner, M., Fong, N.: Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. In: SIGGRAPH, pp. 165–172 (2000)Google Scholar
  24. 24.
    MacCormick, J., Isard, M.: Partitioned sampling, articulated objects, and interface-quality hand tracking. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 3–19. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  25. 25.
    Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. CVIU 104(2), 90–126 (2006)Google Scholar
  26. 26.
    Murray, R.M., Sastry, S.S., Zexiang, L.: A Mathematical Introduction to Robotic Manipulation (1994)Google Scholar
  27. 27.
    Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3d tracking of hand articulations using kinect. In: BMVC, pp. 101.1–101.11 (2011)Google Scholar
  28. 28.
    Oikonomidis, I., Kyriazis, N., Argyros, A.: Full dof tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: ICCV, pp. 2088–2095 (2011)Google Scholar
  29. 29.
    Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Tracking the articulated motion of two strongly interacting hands. In: CVPR, pp. 1862–1869 (2012)Google Scholar
  30. 30.
    Paris, S., Durand, F.: A fast approximation of the bilateral filter using a signal processing approach. IJCV 81(1), 24–52 (2009)CrossRefGoogle Scholar
  31. 31.
    Pons-Moll, G., Rosenhahn, B.: Model-Based Pose Estimation, pp. 139–170 (2011)Google Scholar
  32. 32.
    Rehg, J.M., Kanade, T.: Visual tracking of high dof articulated structures: an application to human hand tracking. In: Eklundh, J.-O. (ed.) ECCV 1994. LNCS, vol. 801, pp. 35–46. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  33. 33.
    Rehg, J., Kanade, T.: Model-based tracking of self-occluding articulated objects. In: ICCV, pp. 612–617 (1995)Google Scholar
  34. 34.
    Romero, J., Kjellström, H., Kragic, D.: Monocular real-time 3d articulated hand pose estimation. In: HUMANOIDS, pp. 87–92 (2009)Google Scholar
  35. 35.
    Romero, J., Kjellström, H., Kragic, D.: Hands in action: real-time 3d reconstruction of hands in interaction with objects. In: ICRA, pp. 458–463 (2010)Google Scholar
  36. 36.
    Rosales, R., Athitsos, V., Sigal, L., Sclaroff, S.: 3d hand pose reconstruction using specialized mappings. In: ICCV, pp. 378–387 (2001)Google Scholar
  37. 37.
    Rosenhahn, B., Brox, T., Weickert, J.: Three-dimensional shape knowledge for joint image segmentation and pose tracking. IJCV 73(3), 243–262 (2007)CrossRefGoogle Scholar
  38. 38.
    Rusinkiewicz, S., Levoy, M.: Efficient variants of the icp algorithm. In: 3DIM, pp. 145–152 (2001)Google Scholar
  39. 39.
    Rusinkiewicz, S., Hall-Holt, O., Levoy, M.: Real-time 3d model acquisition. TOG 21(3), 438–446 (2002)CrossRefGoogle Scholar
  40. 40.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR, pp. 1297–1304 (2011)Google Scholar
  41. 41.
    Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using rgb and depth data. In: ICCV, pp. 2456–2463 (2013)Google Scholar
  42. 42.
    Stenger, B., Mendonca, P., Cipolla, R.: Model-based 3D tracking of an articulated hand. In: CVPR, pp. 310–315 (2001)Google Scholar
  43. 43.
    Stolfi, J.: Oriented Proj. Geometry: A Framework for Geom. Computation (1991)Google Scholar
  44. 44.
    Teschner, M., Kimmerle, S., Heidelberger, B., Zachmann, G., Raghupathi, L., Fuhrmann, A., Cani, M.P., Faure, F., Magnetat-Thalmann, N., Strasser, W.: Collision detection for deformable objects. In: Eurographics, pp. 119–139 (2004)Google Scholar
  45. 45.
    Thayananthan, A., Stenger, B., Torr, P.H.S., Cipolla, R.: Shape context and chamfer matching in cluttered scenes. In: CVPR, pp. 127–133 (2003)Google Scholar
  46. 46.
    Tzionas, D., Gall, J.: A comparison of directional distances for hand pose estimation. In: Weickert, J., Hein, M., Schiele, B. (eds.) GCPR 2013. LNCS, vol. 8142, pp. 131–141. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  47. 47.
    Vaezi, M., Nekouie, M.A.: 3d human hand posture reconstruction using a single 2d image. IJHCI 1(4), 83–94 (2011)Google Scholar
  48. 48.
    Wang, R.Y., Popović, J.: Real-time hand-tracking with a color glove. TOG 28(3), 68:1–68:8 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Dimitrios Tzionas
    • 1
    • 2
    Email author
  • Abhilash Srikantha
    • 1
    • 2
  • Pablo Aponte
    • 2
  • Juergen Gall
    • 2
  1. 1.Perceiving Systems DepartmentMPI for Intelligent SystemsStuttgartGermany
  2. 2.Computer Vision GroupUniversity of BonnBonnGermany

Personalised recommendations