Abstract
Standard sliding window based object detection requires dense classifier evaluation on densely sampled locations in scale space in order to achieve an accurate localization. To avoid such dense evaluation, selective search based algorithms only evaluate the classifier on a small subset of object proposals. Notwithstanding the demonstrated success, object proposals do not guarantee perfect overlap with the object, leading to a suboptimal detection accuracy. To address this issue, we propose to first relax the dense sampling of the scale space with coarse object proposals generated from bottom-up segmentations. Based on detection results on these proposals, we then conduct a top-down search to more precisely localize the object using supervised descent. This two-stage detection strategy, dubbed location relaxation, is able to localize the object in the continuous parameter space. Furthermore, there is a conflict between accurate object detection and robust object detection. That is because the achievement of the later requires the accommodation of inaccurate and perturbed object locations in the training phase. To address this conflict, we leverage the rich spatial information learned from the Regionlets detection framework to determine where the object is precisely localized. Our proposed approaches are extensively validated on the PASCAL VOC 2007 dataset and a self-collected large scale car dataset. Our method boosts the mean average precision of the current state-of-the-art (41.7 %) to 44.1 % on PASCAL VOC 2007 dataset. To our best knowledge, it is the best performance reported without using outside data (Convolutional neural network based approaches are commonly pre-trained on a large scale outside dataset and fine-tuned on the VOC dataset.).
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We empirically found that using the four coordinates for our localization model produces better performance than using \((x,y,s,a)\). Thus we choose \((l,t, r,b)\) in our Regionlets Re-localization approach.
References
Chen, G., Ding, Y., Xiao, J., Han, T.X.: Detection evolution with multi-order contextual co-occurrence. In: CVPR (2013)
Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent hierarchical structural learning for object detection. In: CVPR (2010)
Wang, X., Han, T.X., Yan, S.: An HOG-LBP human detector with partial occlusion handling. In: ICCV (2009)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Viola, P., Jones, M.: Robust real-time object detection. IJCV (2001)
Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: CVPR (2008)
Lampert, C.H.: An efficient divide-and-conquer cascade for nonlinear object detection. In: CVPR (2010)
Dollár, P., Appel, R., Kienzle, W.: Crosstalk cascades for frame-rate pedestrian detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 645–659. Springer, Heidelberg (2012)
Van de Sande, K.E.A., Uijlings, J.R.R., Gevers, T., Smeulders, A.W.M.: Segmentation as selective search for object recognition. In: ICCV (2011)
Cinbis, R.G., Verbeek, J., Schmid, C.: Segmentation driven object detection with fisher vectors. In: ICCV (2013)
Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE T-PAMI 34, 2189–2202 (2012)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR (2013)
Chang, K.Y., Liu, T.L., Chen, H.T., Lai, S.H.: Fusing generic objectness and visual saliency for salient object detection. In: ICCV (2011)
Wang, X., Yang, M., Zhu, S., Lin, Y.: Regionlets for generic object detection. In: ICCV (2013)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. IJCV 59, 167–181 (2004)
Fan, R., Chang, K., Hsieh, C., Wang, X., Jin, C.: Liblinear: a library for large linear classification. JMLR 9, 1871–1874 (2008)
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: bject detection with discriminatively trained part-based models. IEEE T-PAMI 32, 1627–1645 (2010)
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: ICCV (2009)
Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: ICCV (2009)
Song, Z., Chen, Q., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. In: CVPR (2011)
Li, C., Parikh, D., Chen, T.: Extracting adaptive contextual cues from unlabeled regions. In: ICCV (2011)
Cinbis, R.G., Sclaroff, S.: Contextual object detection using set-based classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 43–57. Springer, Heidelberg (2012)
Van den Bergh, M., Boix, X., Roig, G., de Capitani, B., Van Gool, L.: SEEDS: superpixels extracted via energy-driven sampling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 13–26. Springer, Heidelberg (2012)
Acknowledgements
The main part of the work was carried out when the first author was a summer intern at NEC Laboratories America in Cupertino, CA. Research reported in this publication was also partly supported by the National Institute Of Nursing Research of the National Institutes of Health under Award Number R01NR015371. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work is also partly supported by US National Science Foundation Grant IIS 1350763, China National Natural Science Foundation Grant 61228303, GH’s start-up funds form Stevens Institute of Technology, a Google Research Faculty Award, a gift grant from Microsoft Research, and a gift grant from NEC Labs America.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Long, C., Wang, X., Hua, G., Yang, M., Lin, Y. (2015). Accurate Object Detection with Location Relaxation and Regionlets Re-localization. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-16865-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16864-7
Online ISBN: 978-3-319-16865-4
eBook Packages: Computer ScienceComputer Science (R0)