Estimation of the Distance Between Fingertips Using Silhouette and Texture Information of Dorsal of Hand

  • Takuma ShimizumeEmail author
  • Takeshi Umezawa
  • Noritaka Osawa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11844)


A three-dimensional virtual object can be manipulated by hand and finger movements with an optical hand tracking device which can recognize the posture of one’s hand. Many of the conventional hand posture recognitions are based on three-dimensional coordinates of fingertips and a skeletal model of the hand. It is difficult for the conventional methods to estimate the posture of the hand when a fingertip is hidden from an optical camera, and self-occlusion often hides the fingertip. Our study, therefore, proposes an estimation of the posture of a hand based on a hand dorsal image that can be taken even when the hand occludes its fingertips. Manipulation of a virtual object requires the recognition of movements like pinching, and many of such movements can be recognized based on the distance between the fingertips of the thumb and the forefinger. Therefore, we use a regression model to estimate the distance between the fingertips of the thumb and forefinger using hand dorsal images. The regression model was constructed using Convolution Neural Networks (CNN). Our study proposes Silhouette and Texture methods for estimation of the distance between fingertips using hand dorsal images and aggregates them into two methods: Clipping method and Aggregation method. The Root Mean Squared Error (RMSE) of estimation of the distance between fingertips was 1.98 mm or less by Aggregation method for hand dorsal images which does not contain any fingertips. The RMSE of Aggregation method is smaller than that of other methods. The result shows that the proposed Aggregation method could be an effective method which is robust to self-occlusion.


Convolutional neural network Self-occlusion 3D user interface 


  1. 1.
    Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1194–1201 (2012)Google Scholar
  2. 2.
    Zhao, W., Zhang, J., Min, J., Chai, J.: Robust realtime physics-based motion control for human grasping. ACM Trans. Graph. (TOG) 32(6), 207 (2013)CrossRefGoogle Scholar
  3. 3.
    Sinha, A., Choi, C., Ramani, K.: DeepHand: robust hand pose estimation by completing a matrix imputed with deep features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4150–4158 (2016)Google Scholar
  4. 4.
    Kawashima, K.: Vision-based data glove considering the hiding of the thumb by hand-drawn image. Nagoya Institute of Technology Graduation thesis (2014)Google Scholar
  5. 5.
    Katou, H., Mark, B., Asano, K., Tachibana, K.: Augmented reality system and its calibration based on marker tracking. Trans. Virtual Reality Soc. Jpn. 4(4), 607–616 (1999)Google Scholar
  6. 6.
    Kamakura, N.: Hand Shape and Hand Movement. Medical and Tooth Drug Publishing Co., Ltd. (1989)Google Scholar
  7. 7.
    Ichikawa, R.: Motion modeling of finger joints during grasping and manipulation of objects. Wakayama University Bachelor thesis (2002)Google Scholar
  8. 8.
    Yamamoto, S., Funahashi, K., Iwahori, Y.: Study for vision based data glove considering hidden fingertip with self-occlusion. In: Proceedings of SNPD 2012, pp. 315–320 (2012)Google Scholar
  9. 9.
    Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: ICCV (2017)Google Scholar
  10. 10.
    Jang, Y., Noh, S.-T., Chang, H.J., Kim, T.K., Woo, W.: 3D finger CAPE: clicking action and position estimation under self-occlusions in egocentric viewpoint. IEEE Trans. Vis. Comput. Graph. (TVCG) 21(4), 501–510 (2015)CrossRefGoogle Scholar
  11. 11.
    Rogez, G., Supancic, J.S., Ramanan, D.: First-person pose recognition using egocentric workspaces. In: CVPR (2015)Google Scholar
  12. 12.
    Farrell, R., Oza, O., Zhang, N., Morariu, V., Darrell, T., Davis, L.: Birdlets: subordinate categorization using volumetric primitives and pose-normalized appearance. In: ICCV (2011)Google Scholar
  13. 13.
    Branson, S., et al.: Visual recognition with humans in the loop. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 438–451. Springer, Heidelberg (2010). Scholar
  14. 14.
    Yang, S., Bo, L., Wang, J., Shapiro, L.G.: Unsupervised template learning for fine-grained object recognition. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 3122–3130. Curran Associates Inc., Red Hook (2012)Google Scholar
  15. 15.
    Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1577–1584. IEEE (2011)Google Scholar
  16. 16.
    Shimizume, T., Noritaka, O., Umezawa, T.: Contact estimation between thumb and forefinger from hand dorsal image using deep learning. Chiba University Graduation thesis (2018)Google Scholar
  17. 17.
    Schröder, M., Waltemate, T., Maycock, J., Röhlig, T., Ritter, H., Botsch, M.: Design and evaluation of reduced marker layouts for hand motion capture. Comput. Animat. Virtual Worlds 29(6), e1751 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Takuma Shimizume
    • 1
    Email author
  • Takeshi Umezawa
    • 1
  • Noritaka Osawa
    • 1
  1. 1.Chiba UniversityChibaJapan

Personalised recommendations