Single Camera Hand Pose Estimation from Bottom-Up and Top-Down Processes

  • Davide PeriquitoEmail author
  • Jacinto C. Nascimento
  • Alexandre Bernardino
  • João Sequeira
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 458)


In this paper we present a methodology for hand pose estimation from a single image, combining bottom-up and top-down processes. A fast bottom-up algorithm generates, from coarse visual cues, hypotheses about the possible locations and postures of hands in the images. The best ranked hypotheses are then analysed by a precise, but slower, top-down process. The complementary nature of bottom-up and top-down processes in terms of computational speed and precision permits the design of pose estimation algorithms with desirable characteristics, taking into account constraints in the available computational resources. We analyse the trade-off between precision and speed in a series of simulations and qualitatively illustrate the performance of the method with real imagery.


Pose estimation Geometric moments Hammoude metric Simulation 


  1. 1.
    Turk, M.: Gesture recognition. In: Stanney, K.M. (ed.) Handbook of Virtual Environments: Design, Implementation, and Applications, pp. 223–238. Lawrence Erlbaum Associates, Mahwah (2002)Google Scholar
  2. 2.
    Lenman, S., Bretzner, L., Thuresson, B.: Using marking menus to develop command sets for computer vision based hand gesture interfaces. In: 2nd Nordic Conference on Human- Computer Interaction, pp. 239–242. ACM Press (2002)Google Scholar
  3. 3.
    Nielsen, M., Storring, M., Moeslund, T.B., Granum, E.: A procedure for developing intuitive and ergonomic gesture interfaces for HCI. In: 5th International Gesture Workshop, pp. 409–420 (2003)Google Scholar
  4. 4.
    Quek, F., McNeill, D., Bryll, R., Duncan, S., Ma, X.-F., Kirbas, C., McCullough, K.E., Ansari, R.: Multimodal human discourse: gesture and speech. ACM Trans. Comput.-Hum. Interact. 9(3), 171–193 (2002)CrossRefGoogle Scholar
  5. 5.
    Bowman, D.: Principles for the design of performance-oriented interaction techniques. In: Stanney, K.M. (ed.) Handbook of Virtual Environments: Design, Implementation, and Applications, pp. 201–207. Lawrence Erlbaum Associates, Mahwah (2002)Google Scholar
  6. 6.
    Buchmann, V., Violich, S., Billinghurst, M., Cockburn, A.: FingARtips: gesture based direct manipulation in augmented reality. In: 2nd International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia, pp. 212–221. ACM Press (2004)Google Scholar
  7. 7.
    Liu, A., Tendick, F., Cleary, K., Kaufmann, C.: A survey of surgical simulation: applications, technology, and education. Presence: Teleoper. Virtual Environ. 12(6), 599–614 (2003)CrossRefGoogle Scholar
  8. 8.
    Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: A review on vision-based full DOF hand motion estimation. In: CVPR (2005)Google Scholar
  9. 9.
    Rehg, J.M., Kanade, T.: Visual tracking of high DOF articulated structures: an application to human hand tracking. In: Eklundh, J.-O. (ed.) ECCV 1994. LNCS, vol. 801, pp. 35–46. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  10. 10.
    Ramanan, D., Forsyth, D.A., Zisserman, A.: Tracking people by learning their appearance. IEEE Trans. PAMI 29(1), 65–81 (2007)CrossRefGoogle Scholar
  11. 11.
    Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: a review. CVIU 108, 52–73 (2007)Google Scholar
  12. 12.
    Carneiro, G., Nascimento, J.C.: Incremental on-line semi-supervised learning for segmenting the left ventricle of the heart from ultrasound data. In: ICCV (2013)Google Scholar
  13. 13.
    Carneiro, G., Nascimento, J.C.: The use of incremental co-training to reduce the training set size in pattern recognition methods: application to left ventricle segmentation in ultrasound. In: CVPR (2012)Google Scholar
  14. 14.
    O’Hagan, R.G., Zelinsky, A., Rougeaux, S.: Visual gesture interfaces for virtual environments. Interact. Comput. 14, 231–250 (2002)CrossRefGoogle Scholar
  15. 15.
    Sato, Y., Saito, M., Koik, H.: Real-time input of 3D pose and gestures of a user’s hand and its applications for HCI. In: Proceedings of the Virtual Reality 2001 Conference (VR’01), p. 79 (2001)Google Scholar
  16. 16.
    Rehg, J., Kanade, T.: Digiteyes: vision-based hand tracking for human-computer interaction. In: Workshop on Motion of Non- Rigid and Articulated Bodies, pp. 16–24 (1994)Google Scholar
  17. 17.
    Ouhaddi, H., Horain, P.: 3D hand gesture tracking by model registration. In: International Workshop on Synthetic-Natural Hybrid Coding and Three Dimensional Imaging (1999)Google Scholar
  18. 18.
    Lin, J.Y., Wu, Y., Huang, T.S.: 3D model-based hand tracking using stochastic direct search method. In: 6th IEEE International Conference on Automatic Face and Gesture Recognition, p. 693 (2004)Google Scholar
  19. 19.
    Stenger, B., Mendonca, P.R.S., Cipolla, R.: Model-based 3D tracking of an articulated hand. In: CVPR (2001)Google Scholar
  20. 20.
    Lin, J., Wu, Y., Huang, T.S.: Capturing human hand motion in image sequences. In: Workshop on Motion and Video, Computing, pp. 99–104 (2002)Google Scholar
  21. 21.
    Bray, M., Koller-Meier, E., Gool, L.V.: Smart particle filtering for 3D hand tracking. In: 6th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 675–680 (2004)Google Scholar
  22. 22.
    Stenger, B., Thayananthan, A., Torr, P.H.S., Cipolla, R.: Filtering using a tree-based estimator. In: ICCV, pp. 1063–1070 (2003)Google Scholar
  23. 23.
    Thayananthan, A., Stenger, B., Torr, P.H.S., Cipolla, R.: Learning a kinematic prior for tree-based filtering. BMVC 2, 589–598 (2003)Google Scholar
  24. 24.
    Sudderth, E.B., Mandel, M.I., Freeman, W.T., Willsky, A.S.: Visual hand tracking using nonparametric belief propagation. In: IEEE CVPR Workshop on Generative Model Based Vision, p. 189 (2004)Google Scholar
  25. 25.
    Tomasi, C., Petrov, S., Sastry, A.: 3D tracking = classification + interpolation. ICCB 2, 1441–1448 (2003)Google Scholar
  26. 26.
    Stenger, B., Thayananthan, A., Tor, P.H.S., Cipolla, R.: Hand Pose estimation using hierarchical detection. In: Sebe, N., Lew, M., Huang, T.S. (eds.) ECCV/HCI 2004. LNCS, vol. 3058, pp. 105–116. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  27. 27.
    Athitsos, V., Sclaroff, S.: Estimating 3D hand pose from a cluttered image. In: CVPR, vol. 2, pp. 432–439 (2003)Google Scholar
  28. 28.
    Zhou, H., Huang, T.: Okapi-chamfer matching for articulated object recognition. In: ICCV,pp. 1026–1033 (2005)Google Scholar
  29. 29.
    Rosales, R., Athitsos, V., Sigal, L., Sclaroff, S.: 3D Hand pose reconstruction using specialized mappings. In: ICCV, vol. 1, pp. 378–385 (2001)Google Scholar
  30. 30.
    Rosales, R., Sclaroff, S.: Algorithms for inference in specialized maps for recovering 3D hand Pose. In: 5th IEEE International Conference on Automatic Face and Gesture Recognition, p. 0143 (2002)Google Scholar
  31. 31.
    Micilotta, A.S., Ong, E.-J., Bowden, R.: Real-time upper body detection and 3D Pose estimation in monoscopic images. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 139–150. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  32. 32.
    Gavrila, D.M.: The visual analysis of human movement: a survey. CVIU 73, 82–98 (1999)zbMATHGoogle Scholar
  33. 33.
    Gavrila, D.M., Davis, L.S.: Tracking of humans in action: a 3-D model-based approach. In: Proceedings of the ARPA Image Understanding, Workshop (1996)Google Scholar
  34. 34.
    Delamarre, Q., Faugeras, O.: 3D articulated models and multi-view tracking with physical forces. CVIU 81(3), 328–357 (2001)zbMATHGoogle Scholar
  35. 35.
    Borenstein, E., Ullman, S.: Combined top-down/bottom-up segmentation. IEEE Trans. PAMI 30(12), 4–18 (2008)CrossRefGoogle Scholar
  36. 36.
    Brandao, M., Bernardino, A., Santos-Victor, J.: Image driven generation of pose hypotheses for 3D model-based tracking. In: 12th IAPR Conference on Machine Vision Applications (2011)Google Scholar
  37. 37.
    Poppe, R.: Vision-based human motion analysis: an overview. CVIU 108, 1–17 (2007)Google Scholar
  38. 38.
    Ramanan, D., Forsyth, D.A., Zisserman, A.: Tracking people by learning their appearance. IEEE Trans. PAMI 29(1), 65–81 (2007)CrossRefGoogle Scholar
  39. 39.
    Kyrki, V.: Integration of model-based and model-free cues for visual object tracking in 3cd. In: International Conference on Robotics and Automation, pp. 1554–1560 (2005)Google Scholar
  40. 40.
    Okuma, K., Taleghani, A., de Freitas, N., Little, J.J., Lowe, D.G.: A boosted particle filter: multitarget detection and tracking. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 28–39. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  41. 41.
    Diankov, R.: Openrave: a planning architecture for autonomous robotics. Technical report. Robotics Institute, Pittsburgh, PA (2008)Google Scholar
  42. 42.
    Nascimento, J.C., Marques, J.S.: Robust shape tracking with multiple models in ultrasound images. IEEE Trans. Image process. 17(3), 392–406 (2008)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Hammoude, A.: Computer-assited endocardial border identification from a sequence of two-dimensional echocardiographic images. Ph.D. thesis. University Washington (1988)Google Scholar
  44. 44.
    Swain, M.J., Ballard, D.H.: Color Indexing. IJCV 7(1), 11–32 (1991)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Davide Periquito
    • 1
    Email author
  • Jacinto C. Nascimento
    • 1
  • Alexandre Bernardino
    • 1
  • João Sequeira
    • 1
  1. 1.Instituto de Sistemas e RobóticaInstituto Superior TécnicoLisboaPortugal

Personalised recommendations