Science China Information Sciences

, Volume 58, Issue 3, pp 1–13 | Cite as

Probabilistic modeling of scenes using object frames

Research Paper

Abstract

In this paper, we propose a probabilistic scene model using object frames, each of which is a group of co-occurring objects with fixed spatial relations. In contrast to standard co-occurrence models, which mostly explore the pairwise co-existence of objects, the proposed model captures the spatial relationship among groups of objects. Such information is closely tied to the semantics of the underlying scenes, which allows us to perform object detection and scene recognition in a unified framework. The proposed probabilistic model has two major components. The first models the dependencies between object frames and objects by adopting the Latent Dirichlet Allocation model for text analysis. The second component characterizes the dependencies between object frames and scenes by establishing a mapping between global image features and object frame distributions. Experimental results show that the induced object frames are both semantically meaningful and spatially consistent. In addition, our model significantly improves the performance of object recognition and scene retrieval.

Keywords

Bayes network scene understanding object frame probabilistic model 
032107 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Desai C, Ramanan D, Fowlkes C C. Discriminative models for multi-class object layout. Int J Comput Vis, 2011, 95: 1–12CrossRefMATHMathSciNetGoogle Scholar
  2. 2.
    Divvala S K, Hoiem D, Hays J H, et al. An empirical study of context in object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 1271–1278Google Scholar
  3. 3.
    Galleguillos C, McFee B, Belongie S, et al. Multi-class object localization by combining local contextual interactions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 113–120Google Scholar
  4. 4.
    Marszalek M, Schmid C. Semantic hierarchies for visual object recognition. In: Proceedings of the IEEE Computer Vision and Pattern Recognition, Minneapolis, 2007. 1–7Google Scholar
  5. 5.
    Rabinovich A, Vedaldi A, Galleguillos C, et al. Objects in context. In: Proceedings of the IEEE 11th International Conference on Computer Vision, Rio de Janeiro, 2007. 1–8Google Scholar
  6. 6.
    Sivic J, Russell B C, Zisserman A, et al. Unsupervised discovery of visual object class hierarchies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 2008. 1–8Google Scholar
  7. 7.
    Blei D M, Jordan M I. Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2003. 127–134Google Scholar
  8. 8.
    Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis, 2001, 42: 145–175CrossRefMATHGoogle Scholar
  9. 9.
    Mimno D, McCallum A. Topic models conditioned on arbitrary features with dirichlet-multinomial regression. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, 2008. 411–418Google Scholar
  10. 10.
    Lowe D G. Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, 1999. 1150–1157CrossRefGoogle Scholar
  11. 11.
    Perona P, Malik J. Scale-space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell, 1990, 12: 629–639CrossRefGoogle Scholar
  12. 12.
    Leung T, Malik J. Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis, 2001, 43: 29–44CrossRefMATHGoogle Scholar
  13. 13.
    Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis, 2001, 42: 145–175CrossRefMATHGoogle Scholar
  14. 14.
    Farhadi A, Endres I, Hoiem D, et al. Describing objects by their attributes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 1778–1785Google Scholar
  15. 15.
    Lampert C H, Nickisch H, Harmeling S. Learning to detect unseen object classes by between-class attribute transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 951–958Google Scholar
  16. 16.
    Ferrari V, Zisserman A. Learning visual attributes. In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems, Vancouver, 2007. 433–440Google Scholar
  17. 17.
    Kumar N, Berg A C, Belhumeur P N, et al. Attribute and simile classifiers for face verification. In: Proceedings of the 12th IEEE International Conference on Computer Vision, Kyoto, 2009. 365–372Google Scholar
  18. 18.
    Torresani L, Szummer M, Fitzgibbon A. Efficient object category recognition using classemes. In: Proceedings of 11th European Conference on Computer Vision, Heraklion, 2010. 776–789Google Scholar
  19. 19.
    Xing E P, Li L -J, Su H, et al. Object bank: a high-level image representation for scene classification & semantic feature sparsification. In: Proceedings of the 24th Annual Conference on Neural Information Processing Systems, Vancouver, 2010. 1378–1386Google Scholar
  20. 20.
    Bosch A, Zisserman A, Munoz X. Scene classification via pLSA. In: Proceedings of 9th European Conference on Computer Vision, Graz, 2006. 517–530Google Scholar
  21. 21.
    Fei-Fei L, Perona P. A Bayesian hierarchy model for learning natural scene categories. In: Proceedings of the IEEE Computer Vision and Pattern Recognition. Piscataway: IEEE, 2005. 524–531Google Scholar
  22. 22.
    Sudderth E, Torralba A, Freeman W T, et al. Learning hierarchical models of scenes, objects, and parts. In: Proceedings of the IEEE International Conference on Computer Vision, Beijing, 2005. 1331–1338Google Scholar
  23. 23.
    Li L -J, Socher R, Fei-Fei L. Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 2036–2043Google Scholar
  24. 24.
    Zhu J, Li L -J, Fei-Fei L, et al. Large margin learning of upstream scene understanding models. In: Proceedings of 24th Annual Conference on Neural Information Processing Systems, Vancouver, 2010. 2586–2594Google Scholar
  25. 25.
    Choi M J, Lim J J, Torralba A, et al. Exploiting hierarchical context on a large database of object categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 129–136Google Scholar
  26. 26.
    Carbonetto P, Freitas N, de Barnard K. A statistical model for general contextual object recognition. In: Proceedings of 8th European Conference on Computer Vision, Prague, 2004. 350–362Google Scholar
  27. 27.
    Boben M, Fidler S, Leonardis A. Evaluating multiclass learning strategies in a hierarchical framework for object detection. In: Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, Vancouver, 2009. 531–539Google Scholar
  28. 28.
    Sadeghi M A, Farhadi A. Recognition using visual phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2011. 1745–1752Google Scholar
  29. 29.
    Li C, Parikh D, Chen T. Automatic discovery of groups of objects for scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012. 2735–2742Google Scholar
  30. 30.
    Choi W, Chao Y W, Pantofaru C, et al. Understanding indoor scenes using 3D geometric phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013. 33–40Google Scholar
  31. 31.
    Zhao Y B, Zhu S -C. Scene parsing by integrating function, geometry and appearance models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013. 3119–3126Google Scholar
  32. 32.
    Yao B. I2T: image parsing to text description. Proc IEEE, 2010, 98: 1485–1508CrossRefGoogle Scholar
  33. 33.
    Lin L, Wu T, Porway J, et al. A stochastic graph grammar for compositional object representation and recognition. Pattern Recogn, 2009, 42: 1297–1307CrossRefMATHGoogle Scholar
  34. 34.
    Liu X. Integrating spatio-temporal context with multiview representation for object recognition in visual surveillance. IEEE Trans Circuit Syst Video Techn, 2011, 21: 393–407CrossRefGoogle Scholar
  35. 35.
    Ding C, Li T, Peng W. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput Stat Data Anal, 2008, 52: 3913–3927CrossRefMATHMathSciNetGoogle Scholar
  36. 36.
    Low Y, Agarwal D, Smola A J. Multiple domain user personalization. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2011. 123–131Google Scholar
  37. 37.
    Mei Q, Zhai C X. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2005. 198–207Google Scholar
  38. 38.
    Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2006. 2169–2178Google Scholar
  39. 39.
    Hoiem D, Efros A A, Hebert M. Putting objects in perspective. Int J Comput Vis, 2008, 80: 3–15CrossRefGoogle Scholar
  40. 40.
    Minka T, Winn J, Guiver J, et al. Infer.net, version 2.1.30904, 2008Google Scholar
  41. 41.
    Winn J, Bishop C M. Variational message passing. J Mach Learning Res, 2005, 6: 661–694MATHMathSciNetGoogle Scholar
  42. 42.
    Felzenszwalb P F, Girshick R B, McAllester D, et al. Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell, 2009, 32: 1627–1645CrossRefGoogle Scholar
  43. 43.
    Choi M J, Lim J J, Torralba A, et al. Exploiting hierarchical context on a large database of object categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 129–136Google Scholar
  44. 44.
    Quattoni A, Torralba A. Recognizing indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 413–420Google Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.School of MathematicsBeihang UniversityBeijingChina
  2. 2.Department of Computer ScienceStanford UniversityStanfordUSA
  3. 3.Language Technologies InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations