Accelerating Deformable Part Models with Branch-and-Bound

  • Iasonas KokkinosEmail author
Conference paper
Part of the Mathematics and Visualization book series (MATHVISUAL)


Deformable Part Models (DPMs) play a prominent role in current object recognition research, as they rigorously model the shape variability of an object category by breaking an object into parts and modelling the relative locations of the parts. Still, inference with such models requires solving a combinatorial optimization task. In this chapter, we will see how Branch-and-Bound can be used to efficiently perform inference with such models. Instead of evaluating the classifier score exhaustively for all part locations and scales, such techniques allow us to quickly focus on promising image locations. The core problem that we will address is how to compute bounds that accommodate part deformations; this allows us to apply Branch-and-Bound to our problem. When comparing to a baseline DPM implementation, we obtain exactly the same results but can perform the part combination substantially faster, yielding up to tenfold speedups for single object detection, or even higher speedups for multiple objects.



I thank the two anonymous reviewers and Stefan Kinauer for feedback that helped improve this manuscript. I am grateful to the authors of [7, 10, 22, 29, 30] for making their code available. This work was funded by grants ANR-10-JCJC-0205 and FP7-Reconfig.


  1. 1.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24 (4), 509–522 (2002)CrossRefGoogle Scholar
  2. 2.
    Boussaid, H., Kokkinos, I.: Fast and exact: ADMM-based discriminative shape segmentation with loopy part models. In: CVPR, Columbus (2014)Google Scholar
  3. 3.
    Boussaid, H., Kokkinos, I., Paragios, N.: Rapid mode estimation for 3D brain MRI tumor segmentation. In: Energy Minimization Methods in Computer Vision and Pattern Recognition, Lund (2013)CrossRefzbMATHGoogle Scholar
  4. 4.
    Chen, Y., Zhu, L., Lin, C., Yuille, A.L., Zhang, H.: Rapid inference on a novel AND/OR graph for object detection, segmentation and parsing. In: Proceedings of NIPS, Vancouver (2007)Google Scholar
  5. 5.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of CVPR, San Diego (2005)CrossRefGoogle Scholar
  6. 6.
    Dean, T., Ruzon, M., Segal, M., Shlens, J., Vijayanarasimhan, S., Yagnik, J.: Fast, accurate detection of 100,000 object classes on a single machine. In: Proceedings of CVPR, Portland (2013)Google Scholar
  7. 7.
    Dubout, C., Fleuret, F.: Exact acceleration of linear object detectors. In: ECCV (3), Florence (2012)Google Scholar
  8. 8.
    Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61 (1), 55–79 (2005)CrossRefGoogle Scholar
  9. 9.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Proceedings of CVPR, Anchorage (2008)CrossRefGoogle Scholar
  10. 10.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A.: Cascade object detection with deformable part models. In: Proceedings of CVPR, San Francisco (2010)CrossRefGoogle Scholar
  11. 11.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Distance transforms of sampled functions. Technical report, Cornell CS (2004)zbMATHGoogle Scholar
  12. 12.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of CVPR, Madison (2003)CrossRefGoogle Scholar
  13. 13.
    Ferrari, V., Marin-Jimenez, M.J., Zisserman, A.: Progressive search space reduction for human pose estimation. In: Proceedings of CVPR, Anchorage (2008)CrossRefGoogle Scholar
  14. 14.
    Fleuret, F., Geman, D.: Coarse-to-fine face detection. Int. J. Comput. Vis. 41 (1/2), 85–107 (2001)CrossRefzbMATHGoogle Scholar
  15. 15.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of CVPR, Columbus (2014)CrossRefGoogle Scholar
  16. 16.
    Girshick, R., Iandola, F., Darrell, T., Malik, J.: Deformable part models are convolutional neural networks. arXiv preprint arXiv:1409.5403 (2014)Google Scholar
  17. 17.
    Girshick, R.B., Felzenszwalb, P.F., McAllester, D.: Discriminatively trained deformable part models, release 5.
  18. 18.
    Gray, A.G., Moore, A.W.: Nonparametric density estimation: toward computational tractability. In: SIAM International Conference on Data Mining, San Francisco (2003)Google Scholar
  19. 19.
    Grimson, W.E.L.: Object Recognition by Computer: The Role of Geometric Constraints. MIT Press, Cambridge, MA (1990). ISBN:0-262-07130-4. Google Scholar
  20. 20.
    Huttenlocher, D., Klanderman, G., Rucklidge, W.: Comparing images using the Hausdorff distance. IEEE Trans. PAMI 15 (9), 850–863 (1993)CrossRefGoogle Scholar
  21. 21.
    Ihler, A., Sudderh, E., Freeman, W., Willsky, A.: Efficient multiscale sampling from products of Gaussian mixtures. In: Proceedings of NIPS, Vancouver (2003)Google Scholar
  22. 22.
    Ihler, A., Sudderth, E., Freeman, W., Willsky, A.: Efficient sampling of Gaussian distributions. In: Proceedings of NIPS, Vancouver (2004)Google Scholar
  23. 23.
    Jordan, M.: Graphical models. Stat. Sci. 19, 140–155 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Kokkinos, I.: Rapid deformable object detection using dual-tree branch-and-bound. In: Proceedings of NIPS, Granada (2011)Google Scholar
  25. 25.
    Kokkinos, I.: Bounding part scores for rapid detection with deformable part models. In: 2nd Parts and Attributes Workshop, in Conjunction with ECCV 2012, Florence (2012)Google Scholar
  26. 26.
    Kokkinos, I.: Shufflets: shared mid-level parts for fast multi-category detection. In: ICCV – International Conference on Computer Vision, Sydney (2013)Google Scholar
  27. 27.
    Kokkinos, I., Yuille, A.: Inference and learning with hierarchical shape models. Int. J. Comput. Vis. 93, 201–225 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS, Lake Tahoe (2012)Google Scholar
  29. 29.
    Lampert, C., Blaschko, M., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: Proceedings of CVPR, Anchorage (2008)Google Scholar
  30. 30.
    Lampert, C.H.: An efficient divide-and-conquer cascade for nonlinear object detection. In: Proceedings of CVPR, San Francisco (2010)CrossRefGoogle Scholar
  31. 31.
    Lehmann, A., Leibe, B., Gool, L.V.: Fast PRISM: branch and bound hough transform for object class detection. Int. J. Comput. Vis. 94 (2), 175–197 (2011)CrossRefzbMATHGoogle Scholar
  32. 32.
    Lempitsky, V., Blake, A., Rother, C.: Image segmentation by branch-and-mincut. In: Proceedings of ECCV, Marseille (2008)CrossRefzbMATHGoogle Scholar
  33. 33.
    Lowe, D.: Perceptual Organization and Visual Recognition. Kluwer, Boston (1985)CrossRefGoogle Scholar
  34. 34.
    Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of ICCV, Kerkyra (1999)CrossRefGoogle Scholar
  35. 35.
    Moreels, P., Maire, M., Perona, P.: Recognition by probabilistic hypothesis construction. In: Proceedings of ECCV, Prague, p. 55 (2004)Google Scholar
  36. 36.
    Mundy, J.L., Zisserman, A. (eds.): Geometric invariance in computer vision. MIT Press, Cambridge (1992)Google Scholar
  37. 37.
    Savalle, P.-A., Tsogkas, S., Papandreou, G., Kokkinos, I.: Deformable part models with CNN features. In: 3rd Parts and Attributes Workshop, ECCV, Zurich (2014)Google Scholar
  38. 38.
    Papandreou, G., Kokkinos, I., Savalle, P.A.: Untangling local and global deformations in deep convolutional networks for image classification and sliding window detection. arXiv (2014)Google Scholar
  39. 39.
    Pedersoli, M., Vedaldi, A., Gonzàlez, J.: A coarse-to-fine approach for fast deformable object detection. In: Proceedings of CVPR, Colorado Springs (2011)CrossRefGoogle Scholar
  40. 40.
    Pirsiavash, H., Ramanan, D.: Steerable part models. In: CVPR, Providence (2012)CrossRefGoogle Scholar
  41. 41.
    Sadeghi, M.A., Forsyth, D.A.: Fast template evaluation with vector quantization. In: NIPS, Lake Tahoe (2013)Google Scholar
  42. 42.
    Sadeghi, M.A., Forsyth, D.A.: 30 hz object detection with DPM V5. In: ECCV, Zurich (2014)Google Scholar
  43. 43.
    Sapp, B., Toshev, A., Taskar, B.: Cascaded models for articulated pose estimation. In: Proceedings of ECCV, Heraklion (2010)CrossRefGoogle Scholar
  44. 44.
    Song, H.O., Zickler, S., Althoff, T., Girshick, R.B., Fritz, M., Geyer, C., Felzenszwalb, P.F., Darrell, T.: Sparselet models for efficient multiclass object detection. In: Proceedings of ECCV, Florence (2012)CrossRefGoogle Scholar
  45. 45.
    Trulls, E., Tsogkas, S., Kokkinos, I., Sanfeliu, A., Moreno-Noguer, F.: Segmentation-aware deformable part models. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, 23–28 June, pp. 168–175 (2014)Google Scholar
  46. 46.
    Vedaldi, A., Zisserman, A.: Sparse kernel approximations for efficient classification and detection. In: Proceedings of CVPR, Providence (2012)CrossRefGoogle Scholar
  47. 47.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Kauai (2001)Google Scholar
  48. 48.
    Wan, L., Eigen, D., Fergus, R.: End-to-end integration of a convolutional network, deformable parts model and non-maximum suppression. arXiv (2014)Google Scholar
  49. 49.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Proceedings of ECCV, Dublin (2000)CrossRefGoogle Scholar
  50. 50.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35 (12), 2878–2890 (2013)CrossRefGoogle Scholar
  51. 51.
    Zhu, S.C., Mumford, D.: Quest for a stochastic grammar of images. Found. Trends Comput. Graph. Vis. 2 (4), 259–362 (2007)CrossRefzbMATHGoogle Scholar
  52. 52.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, 16–21 June, pp. 2879–2886 (2012)Google Scholar
  53. 53.
    Zisserman, A., Forsyth, D.A., Mundy, J.L., Rothwell, C.A., Liu, J., Pillow, N.: 3D object recognition using invariance. Artif. Intell. 78, 239–288 (1995)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Center for Visual Computing, Centrale-Supélec and INRIA-SaclayGrande Voie des VignesChatenay-MalabryFrance

Personalised recommendations