Decision Forests for Computer Vision and Medical Image Analysis

Part of the series Advances in Computer Vision and Pattern Recognition pp 175-192

Efficient Human Pose Estimation from Single Depth Images

  • J. ShottonAffiliated withMicrosoft Research Ltd.
  • , R. GirshickAffiliated withUniversity of California
  • , A. FitzgibbonAffiliated withMicrosoft Research Ltd.
  • , T. SharpAffiliated withMicrosoft Research Ltd.
  • , M. CookAffiliated withMicrosoft Research Ltd.
  • , M. FinocchioAffiliated withMicrosoft Corporation
  • , R. MooreAffiliated withST-Ericsson
  • , P. KohliAffiliated withMicrosoft Research Ltd.
  • , A. CriminisiAffiliated withMicrosoft Research Ltd.
    • , A. KipmanAffiliated withMicrosoft Corporation
    • , A. BlakeAffiliated withMicrosoft Research Ltd.

* Final gross prices may vary according to local VAT.

Get Access


We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image, without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are largely invariant to factors such as pose, body shape, and field-of-view cropping. Our first approach employs an intermediate body parts representation, designed so that an accurate per-pixel classification of the parts will localize the joints of the body. The second approach instead directly regresses the positions of body joints. By using simple depth pixel comparison features, and parallelizable decision forests, both approaches can run super-realtime on consumer hardware. Our evaluation investigates many aspects of our methods, and compares the approaches to each other and to the state of the art. Parts of this chapter are reprinted, with permission, from Shotton et al., Proc IEEE Conf. Computer Vision and Pattern Recognition (CVPR) (2011), © 2011 IEEE.