Backprojection Revisited: Scalable Multi-view Object Detection and Similarity Metrics for Detections

  • Nima Razavi
  • Juergen Gall
  • Luc Van Gool
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6311)


Hough transform based object detectors learn a mapping from the image domain to a Hough voting space. Within this space, object hypotheses are formed by local maxima. The votes contributing to a hypothesis are called support. In this work, we investigate the use of the support and its backprojection to the image domain for multi-view object detection. To this end, we create a shared codebook with training and matching complexities independent of the number of quantized views. We show that since backprojection encodes enough information about the viewpoint all views can be handled together. In our experiments, we demonstrate that superior accuracy and efficiency can be achieved in comparison to the popular one-vs-the-rest detectors by treating views jointly especially with few training examples and no view annotations. Furthermore, we go beyond the detection case and based on the support we introduce a part-based similarity measure between two arbitrary detections which naturally takes spatial relationships of parts into account and is insensitive to partial occlusions. We also show that backprojection can be used to efficiently measure the similarity of a detection to all training examples. Finally, we demonstrate how these metrics can be used to estimate continuous object parameters like human pose and object’s viewpoint. In our experiment, we achieve state-of-the-art performance for view-classification on the PASCAL VOC’06 dataset.


Training Image Object Detection Image Domain Object Hypothesis Vote Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

978-3-642-15549-9_45_MOESM1_ESM.avi (14.9 mb)
Electronic Supplementary Material (15,253 KB)


  1. 1.
    Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. TPAMI 26, 1475–1490 (2004)Google Scholar
  2. 2.
    Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition 13, 111–122 (1981)zbMATHCrossRefGoogle Scholar
  3. 3.
    Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77, 259–289 (2008)CrossRefGoogle Scholar
  4. 4.
    Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Van Gool, L.: Using multi-view recognition and meta-data annotation to guide a robot’s attention. Int. J. Rob. Res. 28, 976–998 (2009)CrossRefGoogle Scholar
  5. 5.
    Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., Gool, L.V.: Towards multi-view object class detection. In: CVPR (2006)Google Scholar
  6. 6.
    Leibe, B., Cornelis, N., Cornelis, K., Gool, L.V.: Dynamic 3d scene analysis from a moving vehicle. In: CVPR (2007)Google Scholar
  7. 7.
    Opelt, A., Pinz, A., Zisserman, A.: Learning an alphabet of shape and appearance for multi-class object detection. IJCV (2008)Google Scholar
  8. 8.
    Shotton, J., Blake, A., Cipolla, R.: Multiscale categorical object recognition using contour fragments. TPAMI 30, 1270–1281 (2008)Google Scholar
  9. 9.
    Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR (2008)Google Scholar
  10. 10.
    Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: CVPR (2009)Google Scholar
  11. 11.
    Maji, S., Malik, J.: Object detection using a max-margin hough transform. In: CVPR (2009)Google Scholar
  12. 12.
    Ommer, B., Malik, J.: Multi-scale object detection by clustering lines. In: ICCV (2009)Google Scholar
  13. 13.
    Selinger, A., Nelson, R.C.: Appearance-based object recognition using multiple views. In: CVPR (2001)Google Scholar
  14. 14.
    Seemann, E., Leibe, B., Schiele, B.: Multi-aspect detection of articulated objects. In: CVPR (2006)Google Scholar
  15. 15.
    Kushal, A., Schmid, C., Ponce, J.: Flexible object models for category-level 3d object recognition. In: CVPR (2007)Google Scholar
  16. 16.
    Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: ICCV (2009)Google Scholar
  17. 17.
    Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: ICCV (2007)Google Scholar
  18. 18.
    Sun, M., Su, H., Savarese, S., Fei-Fei, L.: A multi-view probabilistic model for 3d object classes. In: CVPR (2009)Google Scholar
  19. 19.
    Chiu, H.P., Kaelbling, L., Lozano-Perez, T.: Virtual training for multi-view object class recognition. In: CVPR (2007)Google Scholar
  20. 20.
    Farhadi, A., Tabrizi, M., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: ICCV (2009)Google Scholar
  21. 21.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  22. 22.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The 2006 pascal visual object classes challenge (2006)Google Scholar
  23. 23.
    Blaschko, M.B., Lampert, C.H.: Learning to localize objects with structured output regression. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 2–15. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  24. 24.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  25. 25.
    Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: CVPR (2008)Google Scholar
  26. 26.
    Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing visual features for multiclass and multiview object detection. TPAMI 29, 854–869 (2007)Google Scholar
  27. 27.
    Everingham, M., et al.: The 2005 pascal visual object classes challenge (2005)Google Scholar
  28. 28.
    Winn, J.M., Shotton, J.: The layout consistent random field for recognizing and segmenting partially occluded objects. In: CVPR, vol. (1), pp. 37–44 (2006)Google Scholar
  29. 29.
    Zhang, H., Berg, A.C., Maire, M., Malik, J.: Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In: CVPR (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Nima Razavi
    • 1
  • Juergen Gall
    • 1
  • Luc Van Gool
    • 1
    • 2
  1. 1.Computer Vision LaboratoryETH Zurich 
  2. 2.ESAT-PSI/IBBTKU Leuven 

Personalised recommendations