Abstract
In this work, we are concerned with the detection of multiple objects in an image. We demonstrate that typically applied objectives have the structure of a random field model, but that the energies resulting from non-maximal suppression terms lead to the maximization of a submodular function. This is in general a difficult problem to solve, which is made worse by the very large size of the output space. We make use of an optimal approximation result for this form of problem by employing a greedy algorithm that finds one detection at a time. We show that we can adopt a branch-and-bound strategy that efficiently explores the space of all subwindows to optimally detect single objects while incorporating pairwise energies resulting from previous detections. This leads to a series of inter-related branch-and-bound optimizations, which we characterize by several new theoretical results. We then show empirically that optimal branch-and-bound efficiency gains can be achieved by a simple strategy of reusing priority queues from previous detections, resulting in speedups of up to a factor of three on the PASCAL VOC data set as compared with serial application of branch-and-bound.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barinova, O., Lempitsky, V., Kohli, P.: On the detection of multiple object instances using Hough transforms. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)
Blaschko, M.B., Lampert, C.H.: Learning to localize objects with structured output regression. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 2–15. Springer, Heidelberg (2008)
Blaschko, M.B., Vedaldi, A., Zisserman, A.: Simultaneous object detection and ranking with weak supervision. In: Proc. NIPS (2010)
Blaschko, M.B., Lampert, C.H.: Object localization with global and local context kernels. In: BMVC (2009)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR (2005)
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class layout. In: Proc. ICCV (2009)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88(2), 303–338 (2010)
Felzenszwalb, P.F., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Proc. CVPR (2008)
Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: Proc. CVPR (2010)
Goldengorin, B., Sierksma, G., Tijssen, G.A., Tso, M.: The data-correcting algorithm for the minimization of supermodular functions. Management Science 45(11), 1539–1551 (1999)
Guestrin, C., Krause, A., Singh, A.: Near-optimal sensor placements in Gaussian processes. In: International Conference on Machine Learning, ICML (August 2005)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proc. of The Fourth Alvey Vision Conference, pp. 147–151 (1988)
Hemery, B., Laurent, H., Rosenberger, C.: Comparative study of metrics for evaluation of object localisation by bounding boxes. In: Fourth International Conference on Image and Graphics, ICIG 2007, pp. 459–464 (August 2007)
Hollinger, G., Singh, S.: Proofs and experiments in scalable, near-optimal search by multiple robots. In: Robotics: Science and Systems (June 2008)
Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1) (2009)
Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: Proc. KDD (2003)
Krause, A.: Sfo: A toolbox for submodular function optimization. Journal of Machine Learning Research 11, 1141–1144 (2010)
Krause, A., Guestrin, C., Gupta, A., Kleinberg, J.: Near-optimal sensor placements: Maximizing information while minimizing communication cost. In: International Symposium on Information Processing in Sensor Networks (IPSN) (April 2006)
Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: Object localizationby efficient subwindow search. In: Proc. CVPR (2008)
Lampert, C.H., Blaschko, M.B., Hofmann, T.: Efficient subwindow search: A branch and bound framework for object localization. PAMI 31, 2129–2142 (2009)
Laptev, I.: Improvements of object detection using boosted histograms. In: Proc. ECCV (2006)
Lehmann, A., Leibe, B., van Gool, L.: Feature-centric efficient subwindow search. In: Proc. ICCV (2009)
Lehmann, A., Leibe, B., Van Gool, L.: Fast prism: Branch and bound hough transform for object class detection. International Journal of Computer Vision, 1–23 (2010)
Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with implicit shape model. In: ECCV Workshop on Statistical Learning in Comp. Vision (2004)
Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions. Mathematical Programming 14, 265–294 (1978)
Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: Proc. ICCV (2009)
Vedaldi, A., Zisserman, A.: Structured output regression for detection with partial occulsion. In: Proc. NIPS (2009)
Viola, P., Jones, M.J.: Robust real-time object detection. In: IJCV (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blaschko, M.B. (2011). Branch and Bound Strategies for Non-maximal Suppression in Object Detection. In: Boykov, Y., Kahl, F., Lempitsky, V., Schmidt, F.R. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2011. Lecture Notes in Computer Science, vol 6819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23094-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-23094-3_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23093-6
Online ISBN: 978-3-642-23094-3
eBook Packages: Computer ScienceComputer Science (R0)