Abstract
In this paper, we propose a probabilistic scene model using object frames, each of which is a group of co-occurring objects with fixed spatial relations. In contrast to standard co-occurrence models, which mostly explore the pairwise co-existence of objects, the proposed model captures the spatial relationship among groups of objects. Such information is closely tied to the semantics of the underlying scenes, which allows us to perform object detection and scene recognition in a unified framework. The proposed probabilistic model has two major components. The first models the dependencies between object frames and objects by adopting the Latent Dirichlet Allocation model for text analysis. The second component characterizes the dependencies between object frames and scenes by establishing a mapping between global image features and object frame distributions. Experimental results show that the induced object frames are both semantically meaningful and spatially consistent. In addition, our model significantly improves the performance of object recognition and scene retrieval.
This is a preview of subscription content,
to check access.References
Desai C, Ramanan D, Fowlkes C C. Discriminative models for multi-class object layout. Int J Comput Vis, 2011, 95: 1–12
Divvala S K, Hoiem D, Hays J H, et al. An empirical study of context in object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 1271–1278
Galleguillos C, McFee B, Belongie S, et al. Multi-class object localization by combining local contextual interactions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 113–120
Marszalek M, Schmid C. Semantic hierarchies for visual object recognition. In: Proceedings of the IEEE Computer Vision and Pattern Recognition, Minneapolis, 2007. 1–7
Rabinovich A, Vedaldi A, Galleguillos C, et al. Objects in context. In: Proceedings of the IEEE 11th International Conference on Computer Vision, Rio de Janeiro, 2007. 1–8
Sivic J, Russell B C, Zisserman A, et al. Unsupervised discovery of visual object class hierarchies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 2008. 1–8
Blei D M, Jordan M I. Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2003. 127–134
Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis, 2001, 42: 145–175
Mimno D, McCallum A. Topic models conditioned on arbitrary features with dirichlet-multinomial regression. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, 2008. 411–418
Lowe D G. Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, 1999. 1150–1157
Perona P, Malik J. Scale-space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell, 1990, 12: 629–639
Leung T, Malik J. Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis, 2001, 43: 29–44
Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis, 2001, 42: 145–175
Farhadi A, Endres I, Hoiem D, et al. Describing objects by their attributes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 1778–1785
Lampert C H, Nickisch H, Harmeling S. Learning to detect unseen object classes by between-class attribute transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 951–958
Ferrari V, Zisserman A. Learning visual attributes. In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems, Vancouver, 2007. 433–440
Kumar N, Berg A C, Belhumeur P N, et al. Attribute and simile classifiers for face verification. In: Proceedings of the 12th IEEE International Conference on Computer Vision, Kyoto, 2009. 365–372
Torresani L, Szummer M, Fitzgibbon A. Efficient object category recognition using classemes. In: Proceedings of 11th European Conference on Computer Vision, Heraklion, 2010. 776–789
Xing E P, Li L -J, Su H, et al. Object bank: a high-level image representation for scene classification & semantic feature sparsification. In: Proceedings of the 24th Annual Conference on Neural Information Processing Systems, Vancouver, 2010. 1378–1386
Bosch A, Zisserman A, Munoz X. Scene classification via pLSA. In: Proceedings of 9th European Conference on Computer Vision, Graz, 2006. 517–530
Fei-Fei L, Perona P. A Bayesian hierarchy model for learning natural scene categories. In: Proceedings of the IEEE Computer Vision and Pattern Recognition. Piscataway: IEEE, 2005. 524–531
Sudderth E, Torralba A, Freeman W T, et al. Learning hierarchical models of scenes, objects, and parts. In: Proceedings of the IEEE International Conference on Computer Vision, Beijing, 2005. 1331–1338
Li L -J, Socher R, Fei-Fei L. Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 2036–2043
Zhu J, Li L -J, Fei-Fei L, et al. Large margin learning of upstream scene understanding models. In: Proceedings of 24th Annual Conference on Neural Information Processing Systems, Vancouver, 2010. 2586–2594
Choi M J, Lim J J, Torralba A, et al. Exploiting hierarchical context on a large database of object categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 129–136
Carbonetto P, Freitas N, de Barnard K. A statistical model for general contextual object recognition. In: Proceedings of 8th European Conference on Computer Vision, Prague, 2004. 350–362
Boben M, Fidler S, Leonardis A. Evaluating multiclass learning strategies in a hierarchical framework for object detection. In: Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, Vancouver, 2009. 531–539
Sadeghi M A, Farhadi A. Recognition using visual phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2011. 1745–1752
Li C, Parikh D, Chen T. Automatic discovery of groups of objects for scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012. 2735–2742
Choi W, Chao Y W, Pantofaru C, et al. Understanding indoor scenes using 3D geometric phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013. 33–40
Zhao Y B, Zhu S -C. Scene parsing by integrating function, geometry and appearance models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013. 3119–3126
Yao B. I2T: image parsing to text description. Proc IEEE, 2010, 98: 1485–1508
Lin L, Wu T, Porway J, et al. A stochastic graph grammar for compositional object representation and recognition. Pattern Recogn, 2009, 42: 1297–1307
Liu X. Integrating spatio-temporal context with multiview representation for object recognition in visual surveillance. IEEE Trans Circuit Syst Video Techn, 2011, 21: 393–407
Ding C, Li T, Peng W. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput Stat Data Anal, 2008, 52: 3913–3927
Low Y, Agarwal D, Smola A J. Multiple domain user personalization. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2011. 123–131
Mei Q, Zhai C X. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2005. 198–207
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2006. 2169–2178
Hoiem D, Efros A A, Hebert M. Putting objects in perspective. Int J Comput Vis, 2008, 80: 3–15
Minka T, Winn J, Guiver J, et al. Infer.net, version 2.1.30904, 2008
Winn J, Bishop C M. Variational message passing. J Mach Learning Res, 2005, 6: 661–694
Felzenszwalb P F, Girshick R B, McAllester D, et al. Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell, 2009, 32: 1627–1645
Choi M J, Lim J J, Torralba A, et al. Exploiting hierarchical context on a large database of object categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 129–136
Quattoni A, Torralba A. Recognizing indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 413–420
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Su, H., Yu, A.W. Probabilistic modeling of scenes using object frames. Sci. China Inf. Sci. 58, 1–13 (2015). https://doi.org/10.1007/s11432-014-5151-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-014-5151-3