Scene Discovery by Matrix Factorization

  • Nicolas Loeff
  • Ali Farhadi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5305)


What constitutes a scene? Defining a meaningful vocabulary for scene discovery is a challenging problem that has important consequences for object recognition. We consider scenes to depict correlated objects and present visual similarity. We introduce a max-margin factorization model that finds a low dimensional subspace with high discriminative power for correlated annotations. We postulate this space should allow us to discover a large number of scenes in unsupervised data; we show scene discrimination results on par with supervised approaches. This model also produces state of the art word prediction results including good annotation completion.


Matrix Factorization Latent Dirichlet Allocation Model Supervise Approach Word Annotation Auxiliary Task 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Amit, Y., Fink, M., Srebro, N., Ullman, S.: Uncovering shared structures in multiclass classification. In: ICML, pp. 17–24 (2007)Google Scholar
  2. 2.
    Ando, R.K., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: ACL (2005)Google Scholar
  3. 3.
    Bosch, A., Zisserman, A., Munoz, X.: Scene classification via plsa. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Carneiro, G., Vasconcelos, N.: Formulating semantic image annotation as a supervised learning problem. In: CVPR, vol. 2, pp. 163–168 (2005)Google Scholar
  5. 5.
    Celebi, E., Alpkocak, A.: Combining textual and visual clusters for semantic image retrieval and auto-annotation. In: 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology, 30 November - 1 December 2005, pp. 219–225 (2005)Google Scholar
  6. 6.
    Chapelle, O., Haffner, P., Vapnik, V.: SVMs for histogram-based image classification. IEEE Transactions on Neural Networks, special issue on Support Vectors (1999)Google Scholar
  7. 7.
    Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: CVPR, vol. 02, pp. 1002–1009 (2004)Google Scholar
  9. 9.
    Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: SIGIR, pp. 119–126 (2003)Google Scholar
  10. 10.
    Jeon, J., Manmatha, R.: Using maximum entropy for automatic image annotation. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 24–32. Springer, Heidelberg (2004)Google Scholar
  11. 11.
    Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: NIPS (2003)Google Scholar
  12. 12.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)Google Scholar
  13. 13.
    Li, F.-F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, vol. 2, pp. 524–531 (2005)Google Scholar
  14. 14.
    Liu, J., Shah, M.: Scene modeling using co-clustering. In: ICCV (2007)Google Scholar
  15. 15.
    Metzler, D., Manmatha, R.: An inference network approach to image retrieval. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 42–50. Springer, Heidelberg (2004)Google Scholar
  16. 16.
    Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: Proc. of the First International Workshop on Multimedia Intelligent Storage and Retrieval Management (1999)Google Scholar
  17. 17.
    Oliva, A., Torralba, A.B.: Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42(3), 145–175 (2001)zbMATHCrossRefGoogle Scholar
  18. 18.
    Quattoni, A., Collins, M., Darrell, T.: Learning visual representations using images with captions. In: CVPR (2007)Google Scholar
  19. 19.
    Quelhas, P., Odobez, J.-M.: Natural scene image modeling using color and texture visterms. Technical report, IDIAP (2006)Google Scholar
  20. 20.
    Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: ICCV (2007)Google Scholar
  21. 21.
    Rennie, J.D.M., Srebro, N.: Fast maximum margin matrix factorization for collaborative prediction. In: ICML, pp. 713–719 (2005)Google Scholar
  22. 22.
    van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Snoek, C.G.M., Smeulders, A.W.M.: Robust scene categorization by learning image statistics in context. In: CVPRW Workshop (2006)Google Scholar
  23. 23.
    Viitaniemi, V., Laaksonen, J.: Evaluating the performance in automatic image annotation: Example case by adaptive fusion of global image features. Image Commun. 22(6), 557–568 (2007)Google Scholar
  24. 24.
    Vogel, J., Schiele, B.: Natural scene retrieval based on a semantic modeling step. In: CIVR, pp. 207–215 (2004)Google Scholar
  25. 25.
    Yavlinsky, A., Schofield, E., Rger, S.: Automated image annotation using global features and robust nonparametric density estimation. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 507–517. Springer, Heidelberg (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Nicolas Loeff
    • 1
  • Ali Farhadi
    • 1
  1. 1.University of Illinois at Urbana-ChampaignUrbana

Personalised recommendations