Hierarchical Object Representations for Visual Recognition via Weakly Supervised Learning

  • Tianzhu Zhang
  • Rui Cai
  • Zhiwei Li
  • Lei Zhang
  • Hanqing Lu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7724)


In this paper, we propose a weakly supervised approach to learn hierarchical object representations for visual recognition. The learning process is carried out in a bottom-up manner to discover latent visual patterns in multiple scales. To relieve the disturbance of complex backgrounds in natural images, bounding boxes of foreground objects are adopted as weak knowledge in the learning stage to promote those visual patterns which are more related to the target objects. The difference between the patterns of foreground objects and backgrounds is relatively vague at low-levels, but becomes more distinct along with the feature transformations to high-levels. In the test stage, an input image is verified against the learnt patterns level-by-level, and the responses at each level construct a hierarchy of representations which indicates the occurring possibilities of the target object at various scales. Experiments on two PASCAL datasets showed encouraging results for visual recognition.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, A., Triggs, B.: Hyperfeatures – Multilevel Local Coding for Visual Recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 30–43. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Kavukcuoglu, K., Sermanet, P., Boureau, Y.L., Gregor, K., Mathieu, M., LeCun, Y.: Learning convolutional feature hierarchies for visual recognition. In: NIPS (2010)Google Scholar
  3. 3.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2) (2006)Google Scholar
  4. 4.
    Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: ICML (2009)Google Scholar
  5. 5.
    Yang, J., Yu, K., Gong, Y., Huang, T.S.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR, pp. 1794–1801 (2009)Google Scholar
  6. 6.
    Yang, J., Yu, K., Huang, T.S.: Supervised translation-invariant sparse coding. In: CVPR (2010)Google Scholar
  7. 7.
    Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: CVPR (2010)Google Scholar
  8. 8.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)CrossRefGoogle Scholar
  9. 9.
    Li, F.F., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding 106, 59–70 (2007)CrossRefGoogle Scholar
  10. 10.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results (2007)Google Scholar
  11. 11.
    Uijlings, J.R.R., Smeulders, A.W.M., Scha, R.J.H.: What is the spatial extent of an object? In: CVPR (2009)Google Scholar
  12. 12.
    Zhang, T., Ghanem, B., Liu, S., Ahuja, N.: Robust visual tracking via multi-task sparse learning. In: CVPR, pp. 2042–2049 (2012)Google Scholar
  13. 13.
    Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1–127 (2009)MATHCrossRefGoogle Scholar
  14. 14.
    Fidler, S., Boben, M., Leonardis, A.: A Coarse-to-Fine Taxonomy of Constellations for Fast Multi-class Object Detection. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 687–700. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortex-like mechanisms. IEEE TPAMI 29, 411–426 (2007)CrossRefGoogle Scholar
  16. 16.
    Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. In: NIPS, pp. 801–808 (2006)Google Scholar
  17. 17.
    Everingham, M., Zisserman, A., Williams, C.K.I., Van Gool, L.: The PASCAL Visual Object Classes Challenge 2006 (VOC2006) Results (2006)Google Scholar
  18. 18.
    Yang, L., Jin, R., Sukthankar, R., Jurie, F.: Unifying discriminative visual codebook generation with classifier training for object category recognition. In: CVPR (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Tianzhu Zhang
    • 1
  • Rui Cai
    • 2
  • Zhiwei Li
    • 2
  • Lei Zhang
    • 2
  • Hanqing Lu
    • 1
  1. 1.Institute of AutomationChinese Academy of SciencesBeijingP.R. China
  2. 2.Microsoft Research AsiaBeijingP.R. China

Personalised recommendations