Weakly Supervised Object Localization Using Size Estimates

  • Miaojing ShiEmail author
  • Vittorio Ferrari
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9909)


We present a technique for weakly supervised object localization (WSOL), building on the observation that WSOL algorithms usually work better on images with bigger objects. Instead of training the object detector on the entire training set at the same time, we propose a curriculum learning strategy to feed training images into the WSOL learning loop in an order from images containing bigger objects down to smaller ones. To automatically determine the order, we train a regressor to estimate the size of the object given the whole image as input. Furthermore, we use these size estimates to further improve the re-localization step of WSOL by assigning weights to object proposals according to how close their size matches the estimated object size. We demonstrate the effectiveness of using size order and size weighting on the challenging PASCAL VOC 2007 dataset, where we achieve a significant improvement over existing state-of-the-art WSOL techniques.


Training Image Object Class Object Size Appearance Model Size Order 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Work supported by the ERC Starting Grant VisCul.


  1. 1.
    Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  2. 2.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. IJCV 88, 303–338 (2010)CrossRefGoogle Scholar
  3. 3.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. PAMI 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  4. 4.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  5. 5.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  6. 6.
    Malisiewicz, T., Gupta, A., Efros, A.: Ensemble of exemplar-SVMs for object detection and beyond. In: ICCV (2011)Google Scholar
  7. 7.
    Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV 104, 154–171 (2013)CrossRefGoogle Scholar
  8. 8.
    Viola, P.A., Platt, J., Zhang, C.: Multiple instance boosting for object detection. In: NIPS (2005)Google Scholar
  9. 9.
    Wang, X., Yang, M., Zhu, S., Lin, Y.: Regionlets for generic object detection. In: ICCV, pp. 17–24. IEEE (2013)Google Scholar
  10. 10.
    Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: CVPR (2010)Google Scholar
  11. 11.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10602-1_26 Google Scholar
  12. 12.
    Cinbis, R., Verbeek, J., Schmid, C.: Multi-fold mil training for weakly supervised object localization. In: CVPR (2014)Google Scholar
  13. 13.
    Cinbis, R., Verbeek, J., Schmid, C.: Weakly supervised object localization with multi-fold multiple instance learning. IEEE Trans. PAMI (2016)Google Scholar
  14. 14.
    Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with posterior regularization. In: BMVC (2014)Google Scholar
  15. 15.
    Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: CVPR (2015)Google Scholar
  16. 16.
    Deselaers, T., Alexe, B., Ferrari, V.: Localizing objects while learning their appearance. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 452–466. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15561-1_33 CrossRefGoogle Scholar
  17. 17.
    Russakovsky, O., Lin, Y., Yu, K., Fei-Fei, L.: Object-centric spatial pooling for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 1–15. Springer, Heidelberg (2012)Google Scholar
  18. 18.
    Siva, P., Xiang, T.: Weakly supervised object detector learning with model drift detection. In: ICCV (2011)Google Scholar
  19. 19.
    Song, H., Girshick, R., Jegelka, S., Mairal, J., Harchaoui, Z., Darell, T.: On learning to localize objects with minimal supervision. In: ICML (2014)Google Scholar
  20. 20.
    Song, H., Lee, Y., Jegelka, S., Darell, T.: Weakly-supervised discovery of visual pattern configurations. In: NIPS (2014)Google Scholar
  21. 21.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)Google Scholar
  22. 22.
    Bengio, J., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML (2009)Google Scholar
  23. 23.
    Wang, C., Ren, W., Zhang, J., Huang, K., Maybank, S.: Large-scale weakly supervised object localization via latent category learning. IEEE Trans. Image Process. 24(4), 1371–1385 (2015)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1–2), 31–71 (1997)CrossRefzbMATHGoogle Scholar
  25. 25.
    Shi, Z., Siva, P., Xiang, T.: Transfer learning by ranking for weakly supervised object annotation. In: BMVC (2012)Google Scholar
  26. 26.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  27. 27.
    Tang, K., Joulin, A., Li, L.J., Fei-Fei, L.: Co-localization in real-world images. In: CVPR (2014)Google Scholar
  28. 28.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. PAMI 34, 2189–2202 (2012)CrossRefGoogle Scholar
  29. 29.
    Guillaumin, M., Ferrari, V.: Large-scale knowledge transfer for object localization in imagenet. In: CVPR (2012)Google Scholar
  30. 30.
    Rochan, M., Wang, Y.: Weakly supervised localization of novel objects using appearance transfer. In: CVPR (2015)Google Scholar
  31. 31.
    Hoffman, J., Guadarrama, S., Tzeng, E., Hu, R., Donahue, J.: LSDA: Large scale detection through adaptation. In: NIPS (2014)Google Scholar
  32. 32.
    Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: NIPS (2010)Google Scholar
  33. 33.
    Lee, Y.J., Grauman, K.: Learning the easy things first: Self-paced visual category discovery. In: CVPR (2011)Google Scholar
  34. 34.
    Pentina, A., Sharmanska, V., Lampert, C.H.: Curriculum learning of multiple tasks. In: CVPR (2015)Google Scholar
  35. 35.
    Sharmanska, V., Quadrianto, N., Lampert, C.: Learning to rank using privileged information. In: CVPR (2013)Google Scholar
  36. 36.
    Lapin, M., Hein, M., Schiele, B.: Learning using privileged information: Svm+ and weighted svm. Neural Netw. 53, 95–108 (2014)CrossRefzbMATHGoogle Scholar
  37. 37.
    Ionescu, R.T., Alexe, B., Leordeanu, M., Popescu, M., Papadopoulos, D.P., Ferrari, V.: How hard can it be? Estimating the difficulty of visual search in an image. In: CVPR (2016)Google Scholar
  38. 38.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Jia, Y.: Caffe: an open source convolutional architecture for fast feature embedding (2013).
  40. 40.
    Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: ICCV (2011)Google Scholar
  41. 41.
    Nguyen, M., Torresani, L., de la Torre, F., Rother, C.: Weakly supervised discriminative localization and classification: a joint learning process. In: ICCV (2009)Google Scholar
  42. 42.
    Kim, G., Torralba, A.: Unsupervised detection of regions of interest using iterative link analysis. In: NIPS (2009)Google Scholar
  43. 43.
    Wheeler, D.J., Chambers, D.S., et al.: Understanding Statistical Process Control. SPC Press, Knoxville (1992)Google Scholar
  44. 44.
    Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers (1999)Google Scholar
  45. 45.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  46. 46.
    Kendall, M., Stuart, A.: The Advanced Theory of Statistics. Charles Griffin and Company, London (1983)zbMATHGoogle Scholar
  47. 47.
    Shi, Z., Hospedales, T., Xiang, T.: Bayesian joint modelling for object localisation in weakly labelled images. IEEE Trans. PAMI. 37, 1959–1972 (2015)CrossRefGoogle Scholar
  48. 48.
    Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: CVPR (2012)Google Scholar
  49. 49.
    Shapovalova, N., Vahdat, A., Cannons, K., Lan, T., Mori, G.: Similarity constrained latent support vector machine: an application to weakly supervised action classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 55–68. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33786-4_5 CrossRefGoogle Scholar
  50. 50.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.University of EdinburghEdinburghScotland, UK

Personalised recommendations