Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery

  • Min Sun
  • Gary Bradski
  • Bing-Xin Xu
  • Silvio Savarese
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6315)


Detecting objects, estimating their pose and recovering 3D shape information are critical problems in many vision and robotics applications. This paper addresses the above needs by proposing a new method called DEHV - Depth-Encoded Hough Voting detection scheme. Inspired by the Hough voting scheme introduced in [13], DEHV incorporates depth information into the process of learning distributions of image features (patches) representing an object category. DEHV takes advantage of the interplay between the scale of each object patch in the image and its distance (depth) from the corresponding physical patch attached to the 3D object. DEHV jointly detects objects, infers their categories, estimates their pose, and infers/decodes objects depth maps from either a single image (when no depth maps are available in testing) or a single image augmented with depth map (when this is available in testing). Extensive quantitative and qualitative experimental analysis on existing datasets [6,9,22] and a newly proposed 3D table-top object category dataset shows that our DEHV scheme obtains competitive detection and pose estimation results as well as convincing 3D shape reconstruction from just one single uncalibrated image. Finally, we demonstrate that our technique can be successfully employed as a key building block in two application scenarios (highly accurate 6 degrees of freedom (6 DOF) pose estimation and 3D object modeling).


Image Patch Object Instance Pascal VOC07 Object Hypothesis Object Depth 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

978-3-642-15555-0_48_MOESM1_ESM.wmv (9.8 mb)
Electronic Supplementary Material (10,085 KB)


  1. 1.
    Arie-Nachimson, M., Basri, R.: Constructing implicit 3d shape models for pose estimation. In: ICCV (2009)Google Scholar
  2. 2.
    Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. In: Pattern Recognition (1981)Google Scholar
  3. 3.
    Besl, P.J., Mckay, H.D.: A method for registration of 3-d shapes. IEEE Trans. PAMI 14(2), 239–256 (1992)Google Scholar
  4. 4.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  5. 5.
    Deselaers, T., Criminisi, A., Winn, J., Agarwal, A.: Incorporating on-demand stereo for real time recognition. In: CVPR (2007)Google Scholar
  6. 6.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge, VOC 2007 Results (2007)Google Scholar
  7. 7.
    Farhadi, A., Tabrizi, M.K., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: ICCV (2009)Google Scholar
  8. 8.
    Fergus, R., Perona, P., Zisserman, A.: A sparse object category model for efficient learning and exhaustive recognition. In: CVPR (2005)Google Scholar
  9. 9.
    Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for object detection. IEEE Trans. PAMI 30(1), 36–51 (2008)Google Scholar
  10. 10.
    Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: CVPR (2009)Google Scholar
  11. 11.
    Hoeim, D., Rother, C., Winn, J.: 3d layoutcrf for multi-view object class recognition and segmentation. In: CVPR (2007)Google Scholar
  12. 12.
    Huttenlocher, D.P., Ullman, S.: Recognizing solid objects by alignment with an image. IJCV 5(2), 195–212 (1990)CrossRefGoogle Scholar
  13. 13.
    Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV workshop on statistical learning in computer vision (2004)Google Scholar
  14. 14.
    Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3d feature maps. In: CVPR (2008)Google Scholar
  15. 15.
    Lowe, D.G.: Local feature view clustering for 3d object recognition. In: CVPR (2001)Google Scholar
  16. 16.
    Maji, S., Malik, J.: Object detection using a max-margin hough tranform. In: CVPR (2009)Google Scholar
  17. 17.
    Ommer, B., Malik, J.: Multi-scale object detection by clustering lines. In: ICCV (2009)Google Scholar
  18. 18.
    Romea, A.C., Berenson, D., Srinivasa, S., Ferguson, D.: Object recognition and full pose registration from a single image for robotic manipulation. In: ICRA (2009)Google Scholar
  19. 19.
    Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In: CVPR (2003)Google Scholar
  20. 20.
    Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Close-range scene segmentation and reconstruction of 3d point cloud maps for mobile manipulation in human environments. In: IROS (2009)Google Scholar
  21. 21.
    Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: ICCV (2007)Google Scholar
  22. 22.
    Savarese, S., Fei-Fei, L.: View synthesis for recognizing unseen poses of object classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 602–615. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  23. 23.
    Schneiderman, H., Kanade, T.: A statistical approach to 3D object detection applied to faces and cars. In: CVPR (2000)Google Scholar
  24. 24.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. In: SIGGRAPH (2006)Google Scholar
  25. 25.
    Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: ICCV (2009)Google Scholar
  26. 26.
    Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Van Gool, L.: Using multi-view recognition and meta-data annotation to guide a robot’s attention. Int. J. Rob. Res. (2009)Google Scholar
  27. 27.
    Yan, P., Khan, D., Shah, M.: 3d model based object class detection in an arbitrary view. In: ICCV (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Min Sun
    • 1
  • Gary Bradski
    • 2
  • Bing-Xin Xu
    • 1
  • Silvio Savarese
    • 1
  1. 1.Electrical and Computer EngineeringUniversity of MichiganAnn ArborUSA
  2. 2.Willow GarageMenlo ParkUSA

Personalised recommendations