3D2PM – 3D Deformable Part Models

  • Bojan Pepik
  • Peter Gehler
  • Michael Stark
  • Bernt Schiele
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7577)


As objects are inherently 3-dimensional, they have been modeled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 2D feature-based models are the predominant paradigm in object recognition today. While such models have shown competitive bounding box (BB) detection performance, they are clearly limited in their capability of fine-grained reasoning in 3D or continuous viewpoint estimation as required for advanced tasks such as 3D scene understanding. This work extends the deformable part model [1] to a 3D object model. It consists of multiple parts modeled in 3D and a continuous appearance model. As a result, the model generalizes beyond BB oriented object detection and can be jointly optimized in a discriminative fashion for object detection and viewpoint estimation. Our 3D Deformable Part Model (3D2PM) leverages on CAD data of the object class, as a 3D geometry proxy.


Object Detection Average Precision Object Class Deformable Part Model Pairwise Term 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Felzenszwalb, P.F., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. PAMI (2010)Google Scholar
  2. 2.
    Marr, D., Nishihara, H.: Representation and recognition of the spatial organization of three-dimensional shapes. Proc. Roy. Soc. London B 200, 269–294 (1978)CrossRefGoogle Scholar
  3. 3.
    Brooks, R.: Symbolic reasoning among 3-d models and 2-d images. Artificial Intelligence 17, 285–348 (1981)CrossRefGoogle Scholar
  4. 4.
    Pentland, A.: Perceptual organization and the representation of natural form. Artificial Intelligence 28 (1986)Google Scholar
  5. 5.
    Lowe, D.: Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence (1987)Google Scholar
  6. 6.
    Stark, L., Hoover, A., Goldgof, D., Bowyer, K.: Function-based recognition from incomplete knowledge of shape. In: WQV 1993 (1993)Google Scholar
  7. 7.
    Green, K., Eggert, D., Stark, L., Bowyer, K.: Generic recognition of articulated objects through reasoning about potential function. CVIU (1995)Google Scholar
  8. 8.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)Google Scholar
  9. 9.
    Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77, 259–289 (2008)CrossRefGoogle Scholar
  10. 10.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  11. 11.
    Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. IJCV (2008)Google Scholar
  12. 12.
    Ess, A., Leibe, B., Schindler, K., Van Gool, L.: Robust multi-person tracking from a mobile platform. PAMI (2009)Google Scholar
  13. 13.
    Wojek, C., Roth, S., Schindler, K., Schiele, B.: Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 467–481. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  14. 14.
    Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., Van Gool, L.: Towards multi-view object class detection. In: CVPR (2006)Google Scholar
  15. 15.
    Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: ICCV (2007)Google Scholar
  16. 16.
    Yan, P., Khan, S., Shah, M.: 3D model based object class detection in an arbitrary view. In: ICCV (2007)Google Scholar
  17. 17.
    Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: ICCV (2009)Google Scholar
  18. 18.
    Liebelt, J., Schmid, C.: Multi-view object class detection with a 3D geometric model. In: CVPR (2010)Google Scholar
  19. 19.
    Stark, M., Goesele, M., Schiele, B.: Back to the future: Learning shape models from 3D cad data. In: BMVC (2010)Google Scholar
  20. 20.
    Zia, Z., Stark, M., Schindler, K., Schiele, B.: Revisiting 3D geometric models for accurate object shape and pose. In: 3dRR 2011 (2011)Google Scholar
  21. 21.
    Payet, N., Todorovic, S.: From contours to 3D object detection and pose estimation. In: ICCV (2011)Google Scholar
  22. 22.
    Glasner, D., Galun, M., Alpert, S., Basri, R., Shakhnarovich, G.: Viewpoint-aware object detection and pose estimation. In: ICCV (2011)Google Scholar
  23. 23.
    Lopez-Sastre, R.J., Tuytelaars, T., Savarese, S.: Dpm revisited: A performance evaluation for object category pose estimation. In: ICCV-WS CORP (2011)Google Scholar
  24. 24.
    Bao, S.Y., Savarese, S.: Semantic structure from motion. In: CVPR (2011)Google Scholar
  25. 25.
    Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: CVPR (2009)Google Scholar
  26. 26.
    Gu, C., Ren, X.: Discriminative Mixture-of-Templates for Viewpoint Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 408–421. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  27. 27.
    Pepik, B., Stark, M., Gehler, P., Schiele, B.: Teaching 3D geometry to deformable part models. In: CVPR (2012)Google Scholar
  28. 28.
    Arie-Nachimson, M., Basri, R.: Constructing implicit 3D shape models for pose estimation. In: ICCV (2009)Google Scholar
  29. 29.
    Sun, M., Bradski, G., Xu, B.-X., Savarese, S.: Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 658–671. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  30. 30.
    Yu, C.N.J., Joachims, T.: Learning structural svms with latent variables. In: International Conference on Machine Learning, ICML (2009)Google Scholar
  31. 31.
    Blaschko, M.B., Lampert, C.H.: Learning to Localize Objects with Structured Output Regression. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 2–15. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  32. 32.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL VOC 2007 Results (2007)Google Scholar
  33. 33.
    Lehmann, A., Gehler, P., Van Gool, L.: Branch&rank: Non-linear object detection. In: BMVC (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Bojan Pepik
    • 1
  • Peter Gehler
    • 3
  • Michael Stark
    • 1
    • 2
  • Bernt Schiele
    • 1
  1. 1.Max Planck Institute for InformaticsGermany
  2. 2.Stanford UniversityUSA
  3. 3.Max Planck Institute for Intelligent SystemsGermany

Personalised recommendations