Objects as Attributes for Scene Classification

  • Li-Jia Li
  • Hao Su
  • Yongwhan Lim
  • Li Fei-Fei
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6553)


Robust low-level image features have proven to be effective representations for a variety of high-level visual recognition tasks, such as object recognition and scene classification. But as the visual recognition tasks become more challenging, the semantic gap between low-level feature representation and the meaning of the scenes increases. In this paper, we propose to use objects as attributes of scenes for scene classification. We represent images by collecting their responses to a large number of object detectors, or “object filters”. Such representation carries high-level semantic information rather than low-level image feature information, making it more suitable for high-level visual recognition tasks. Using very simple, off-the-shelf classifiers such as SVM, we show that this object-level image representation can be used effectively for high-level visual tasks such as scene classification. Our results are superior to reported state-of-the-art performance on a number of standard datasets.


Image Representation Spatial Pyramid British National Corpus Scene Dataset Scene Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 509–522 (2002)Google Scholar
  2. 2.
    Bosch, A., Zisserman, A., Muñoz, X.: Scene Classification Via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Bourdev, L., Malik, J.: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations. In: ICCV (2009)Google Scholar
  4. 4.
    Ramanan, D., Desai, C., Fowlkes, C.: Discriminative models for multi-class object layout. In: ICCV (2009)Google Scholar
  5. 5.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, p. 886 (2005)Google Scholar
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR 2009 (2009)Google Scholar
  7. 7.
    B. Edition, BNC Sampler British National Corpus.Google Scholar
  8. 8.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)Google Scholar
  9. 9.
    Fei-Fei, L., Fergus, R., Perona, P.: One-Shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence (2006)Google Scholar
  10. 10.
    Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. Short Course CVPR (2007),
  11. 11.
    Fei-Fei, L., Perona, P.: A Bayesian hierarchy model for learning natural scene categories. In: Computer Vision and Pattern Recognition (2005)Google Scholar
  12. 12.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object Detection with Discriminatively Trained Part Based Models. Journal of Artificial Intelligence Research 29 (2007)Google Scholar
  13. 13.
    Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007)Google Scholar
  14. 14.
    Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(9), 891–906 (1991)CrossRefGoogle Scholar
  15. 15.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 Object Category Dataset (2007)Google Scholar
  16. 16.
    Hauptmann, A., Yan, R., Lin, W., Christel, M., Wactlar, H.: Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news. IEEE Transactions on Multimedia 9(5), 958 (2007)CrossRefGoogle Scholar
  17. 17.
    Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models: Combining models for holistic scene understanding. In: Proceedings of Neural Information Processing Systems, NIPS, Vancouver, Canada, vol. 8 (2008)Google Scholar
  18. 18.
    Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. In: Proceedings of ACM SIGGRAPH 2005, vol. 24(3), pp. 577–584 (2005)Google Scholar
  19. 19.
    Hoiem, D., Efros, A.A., Hebert, M.: Putting Objects in Perspective. In: CVPR (2006)Google Scholar
  20. 20.
    Ide, N., Macleod, C.: The american national corpus: A standardized resource of american english. In: Proceedings of Corpus Linguistics 2001, pp. 274–280. Citeseer (2001)Google Scholar
  21. 21.
    Jin, Y., Geman, S.: Context and hierarchy in a probabilistic image model. In: CVPR (2006)Google Scholar
  22. 22.
    Kadir, T., Brady, M.: Scale, saliency and image description. International Journal of Computer Vision 45(2), 83–105 (2001)zbMATHCrossRefGoogle Scholar
  23. 23.
    Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: ICCV (2009)Google Scholar
  24. 24.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)Google Scholar
  25. 25.
    Lampert, C.H., Blaschko, M.B., Hofmann, T., Zurich, S.: Beyond sliding windows: Object localization by efficient subwindow search. In: Proc. of CVPR, vol. 1, p. 3 (2008)Google Scholar
  26. 26.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories (2006)Google Scholar
  27. 27.
    Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV 43(1), 29–44 (2001)zbMATHCrossRefGoogle Scholar
  28. 28.
    Li, L.-J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: Proc. ICCV (2007)Google Scholar
  29. 29.
    Lowe, D.: Object recognition from local scale-invariant features. In: Proc. International Conference on Computer Vision (1999)Google Scholar
  30. 30.
    Mikolajczyk, K., Schmid, C.: An Affine Invariant Interest Point Detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  31. 31.
    Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM (1995)Google Scholar
  32. 32.
    Murphy, K., Torralba, A., Freeman, W.T.: Using the forest to see the trees: a graphical model relating features, objects and scenes. In: NIPS (Neural Info. Processing Systems) (2004)Google Scholar
  33. 33.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. Journal of Computer Vision 42 (2001)Google Scholar
  34. 34.
    P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on pattern Analysis and machine intelligence, 12(7):629–639, 1990.CrossRefGoogle Scholar
  35. 35.
    Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: IEEE International Conference on Computer Vision (2007)Google Scholar
  36. 36.
    Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation (2005)Google Scholar
  37. 37.
    Smith, J.R., Naphade, M., Natsev, A.: Multimedia semantic indexing using model vectors. In: ICME 2003: Proceedings of the 2003 International Conference on Multimedia and Expo, pp. 445–448. IEEE Computer Society, Washington, DC (2003)Google Scholar
  38. 38.
    Sudderth, E., Torralba, A., Freeman, W.T., Willsky, A.: Learning hierarchical models of scenes, objects, and parts. In: Proc. International Conference on Computer Vision (2005)Google Scholar
  39. 39.
    Tversky, B., Hemenway, K.: Categories of environmental scenes. Cognitive Psychology 15(1), 121–149 (1983)CrossRefGoogle Scholar
  40. 40.
    Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple Kernels for Object Detection (2009)Google Scholar
  41. 41.
    Von Ahn, L.: Games with a purpose. Computer 39(6), 92–94 (2006)CrossRefGoogle Scholar
  42. 42.
    Zhu, L., Chen, Y., Yuille, A.: Unsupervised learning of a probabilistic grammar for object detection and parsing. In: Advances in Neural Information Processing Systems, vol. 19, p. 1617 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Li-Jia Li
    • 1
  • Hao Su
    • 1
  • Yongwhan Lim
    • 1
  • Li Fei-Fei
    • 1
  1. 1.Computer Science DepartmentStanford UniversityUSA

Personalised recommendations