Abstract
In this paper, a statistical model called statistical local spatial relations (SLSR) is presented as a novel technique of a learning model with spatial and statistical information for semantic image classification. The model is inspired by probabilistic Latent Semantic Analysis (PLSA) for text mining. In text analysis, PLSA is used to discover topics in a corpus using the bag-of-word document representation. In SLSR, we treat image categories as topics, therefore an image containing instances of multiple categories can be modeled as a mixture of topics. More significantly, SLSR introduces spatial relation information as a factor which is not present in PLSA. SLSR has rotation, scale, translation and affine invariant properties and can solve partial occlusion problems. Using the Dirichlet process and variational Expectation-Maximization learning algorithm, SLSR is developed as an implementation of an image classification algorithm. SLSR uses an unsupervised process which can capture both spatial relations and statistical information simultaneously. The experiments are demonstrated on some standard data sets and show that the SLSR model is a promising model for semantic image classification problems.
Similar content being viewed by others
Notes
In this paper, the terms “topic” and “category” are used interchangeably.
References
Agarwal S, Roth D (2002) Learning a sparse representation for object detection. In: Proceedings of the 7th European conference on computer vision-part IV, Copenhagen, Denmark. Springer, London, pp 113–130
Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, Berlin Heidelberg New York
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Bosch A, Munoz X, Marti R (2007) Which is the best way to organize/classify images by content? Image Vis Comput 25(6):778–791
Burl MC, Weber M, Perona P (1998) A probabilistic approach to object recognition using local photometry and global geometry. In: Proceedings of the 5th European conference on computer vision-volume II, Freiburg, Germany. Springer, London, pp 628–641
Carneiro G, Vasconcelos N (2005) Formulating semantic image annotation as a supervised learning problem. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 2. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 163–168
Crandall D, Felzenszwalb P, Huttenlocher D (2005) Spatial priors for part-based recognition using statistical models. In: Proceedings of the 2005 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR 2005), vol 1. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 10–17
Dance C, Jutta W, Lixin F, Cedric B, Csurka G (2004) Visual categorization with bags of keypoints. In: Proceedings of the ECCV international workshop on statistical learning in computer vision, Prague, Czech Republic. Springer, Berlin Heidelberg New York, pp 59–74
Fan J, Gao Y, Luo H, Xu G (2005) Statistical modeling and conceptualization of natural images. Pattern Recogn 38(6):865–885
Fan J, Luo H, Gao Y (2005) Learning the semantics of images by using unlabeled samples. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 2. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 704–710
Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79
Fergus R, Li FF, Perona P, Zisserman A (2005) Learning object categories from google’s image search. In: Proceedings of international conference on computer vision (ICCV 2005), vol. 2. Beijing, China. IEEE Computer Society, Washington, DC, pp 1816–1823
Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the 2003 IEEE computer society conference on computer vision and pattern recognition (CVPR 2003), Madison, Wisconsin, USA. IEEE Computer Society, Washington, DC, pp 264–271
Greg Griffin AH, Perona P (2007) Caltech-256 object category dataset. Tech. Rep. UCB/CSD-04-1366, California Institute of Technology
Guo GD, Jain AK, Ma WY, Zhang HJ (2002) Learning similarity measure for natural image retrieval with relevance feedback. IEEE Trans Neural Netw 13(4):811–820
Heidemann G (2006) The principal components of natural images revisited. IEEE Trans Pattern Anal Mach Intell 28(5):822–826
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196
Ioffe S, Forsyth DA (2001) Probabilistic methods for finding people. Int J Comput Vis 43(1):45–68
Jojic N (2005) A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE Trans Pattern Anal Mach Intell 27(9):1392–1416 (Senior Member-Brendan J. Frey)
Li FF, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 2. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 524–531
Lim JH, Jin JS (2005) Combining intra-image and inter-class semantics for consumer image retrieval. Pattern Recogn 38(6):847–864
Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis 60(1):63–86
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630
Papageorgiou C, Poggio T (2000) A trainable system for object detection. Int J Comput Vis 38(1):15–33
Saha SK, Das AK, Chanda B (2007) Image retrieval based on indexing and relevance feedback. Pattern Recogn Lett 28(3):357–366 (special issue of pattern recognition letters on advances in visual information processing)
Siagian C, Itti L (2007) Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans Pattern Anal Mach Intell 29(2):300–312
Sivic J, Russell B, Efros AA, Zisserman A, Freeman B (2005) Discovering objects and their location in images. In: Proceedings of international conference on computer vision (ICCV 2005), vol 1. Beijing, China. IEEE Computer Society, Washington, DC, pp 370–377
Verbeek J (2006) Learning nonlinear image manifolds by global alignment of local linear models. IEEE Trans Pattern Anal Mach Intell 28(8):1236–1250
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 1. Kauai Marriott, Hawaii, USA. IEEE Computer Society, Washington, DC, pp 511–518
Vogel J, Schiele B (2004) A semantic typicality measure for natural scene categorization. In: Proceedings of pattern recognition symposium DAGM’04, Tübingen, September 2004
Wainwright MJ, Jordan MI (2003) Graphical models, exponential families, and variational inference. Technical report, Department of Statistics, University of California, Berkeley
Wainwright MJ, Jordan MI (2004) Variational inference in graphical models: the view from the marginal polytope. Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, September 2004
Weber M (2000) Unsupervised learning of models for object recognition. Ph.D. thesis, Caltech
Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: Proceedings of international conference on computer vision (ICCV 2005), vol 2. Beijing, China. IEEE Computer Society, Washington, DC, pp 1800–1807
Zhang R, Zhang ZM, Li M, Ma WY, Zhang HJ (2005) A probabilistic semantic model for image annotation and multi-modal image retrieva. In: Proceedings of the tenth IEEE international conference on computer vision (ICCV 2005), vol 1. Beijing, China. IEEE Computer Society, Washington, DC, pp 846–851
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported by the National Natural Science Foundation of China (grant No. 60573182), the Doctor Foundation of China (grant No. 20060183042) and the Jilin Science Foundation (grant No, 20060527).
Rights and permissions
About this article
Cite this article
Han, D., Li, W. & Li, Z. Semantic image classification using statistical local spatial relations model. Multimed Tools Appl 39, 169–188 (2008). https://doi.org/10.1007/s11042-008-0203-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-008-0203-6