Skip to main content
Log in

Semantic image classification using statistical local spatial relations model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, a statistical model called statistical local spatial relations (SLSR) is presented as a novel technique of a learning model with spatial and statistical information for semantic image classification. The model is inspired by probabilistic Latent Semantic Analysis (PLSA) for text mining. In text analysis, PLSA is used to discover topics in a corpus using the bag-of-word document representation. In SLSR, we treat image categories as topics, therefore an image containing instances of multiple categories can be modeled as a mixture of topics. More significantly, SLSR introduces spatial relation information as a factor which is not present in PLSA. SLSR has rotation, scale, translation and affine invariant properties and can solve partial occlusion problems. Using the Dirichlet process and variational Expectation-Maximization learning algorithm, SLSR is developed as an implementation of an image classification algorithm. SLSR uses an unsupervised process which can capture both spatial relations and statistical information simultaneously. The experiments are demonstrated on some standard data sets and show that the SLSR model is a promising model for semantic image classification problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. In this paper, the terms “topic” and “category” are used interchangeably.

  2. www.robots.ox.ac.uk/~vgg/research/affine

  3. http://www.pascal-network.org/challenges/VOC/databases.html

References

  1. Agarwal S, Roth D (2002) Learning a sparse representation for object detection. In: Proceedings of the 7th European conference on computer vision-part IV, Copenhagen, Denmark. Springer, London, pp 113–130

    Google Scholar 

  2. Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, Berlin Heidelberg New York

    Google Scholar 

  3. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    Article  MATH  Google Scholar 

  4. Bosch A, Munoz X, Marti R (2007) Which is the best way to organize/classify images by content? Image Vis Comput 25(6):778–791

    Article  Google Scholar 

  5. Burl MC, Weber M, Perona P (1998) A probabilistic approach to object recognition using local photometry and global geometry. In: Proceedings of the 5th European conference on computer vision-volume II, Freiburg, Germany. Springer, London, pp 628–641

    Google Scholar 

  6. Carneiro G, Vasconcelos N (2005) Formulating semantic image annotation as a supervised learning problem. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 2. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 163–168

    Google Scholar 

  7. Crandall D, Felzenszwalb P, Huttenlocher D (2005) Spatial priors for part-based recognition using statistical models. In: Proceedings of the 2005 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR 2005), vol 1. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 10–17

    Google Scholar 

  8. Dance C, Jutta W, Lixin F, Cedric B, Csurka G (2004) Visual categorization with bags of keypoints. In: Proceedings of the ECCV international workshop on statistical learning in computer vision, Prague, Czech Republic. Springer, Berlin Heidelberg New York, pp 59–74

    Google Scholar 

  9. Fan J, Gao Y, Luo H, Xu G (2005) Statistical modeling and conceptualization of natural images. Pattern Recogn 38(6):865–885

    Article  Google Scholar 

  10. Fan J, Luo H, Gao Y (2005) Learning the semantics of images by using unlabeled samples. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 2. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 704–710

    Google Scholar 

  11. Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79

    Article  Google Scholar 

  12. Fergus R, Li FF, Perona P, Zisserman A (2005) Learning object categories from google’s image search. In: Proceedings of international conference on computer vision (ICCV 2005), vol. 2. Beijing, China. IEEE Computer Society, Washington, DC, pp 1816–1823

    Chapter  Google Scholar 

  13. Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the 2003 IEEE computer society conference on computer vision and pattern recognition (CVPR 2003), Madison, Wisconsin, USA. IEEE Computer Society, Washington, DC, pp 264–271

    Google Scholar 

  14. Greg Griffin AH, Perona P (2007) Caltech-256 object category dataset. Tech. Rep. UCB/CSD-04-1366, California Institute of Technology

  15. Guo GD, Jain AK, Ma WY, Zhang HJ (2002) Learning similarity measure for natural image retrieval with relevance feedback. IEEE Trans Neural Netw 13(4):811–820

    Article  Google Scholar 

  16. Heidemann G (2006) The principal components of natural images revisited. IEEE Trans Pattern Anal Mach Intell 28(5):822–826

    Article  Google Scholar 

  17. Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196

    Article  MATH  Google Scholar 

  18. Ioffe S, Forsyth DA (2001) Probabilistic methods for finding people. Int J Comput Vis 43(1):45–68

    Article  MATH  Google Scholar 

  19. Jojic N (2005) A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE Trans Pattern Anal Mach Intell 27(9):1392–1416 (Senior Member-Brendan J. Frey)

    Article  Google Scholar 

  20. Li FF, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 2. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 524–531

    Google Scholar 

  21. Lim JH, Jin JS (2005) Combining intra-image and inter-class semantics for consumer image retrieval. Pattern Recogn 38(6):847–864

    Article  Google Scholar 

  22. Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis 60(1):63–86

    Article  Google Scholar 

  23. Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630

    Article  Google Scholar 

  24. Papageorgiou C, Poggio T (2000) A trainable system for object detection. Int J Comput Vis 38(1):15–33

    Article  MATH  Google Scholar 

  25. Saha SK, Das AK, Chanda B (2007) Image retrieval based on indexing and relevance feedback. Pattern Recogn Lett 28(3):357–366 (special issue of pattern recognition letters on advances in visual information processing)

    Article  Google Scholar 

  26. Siagian C, Itti L (2007) Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans Pattern Anal Mach Intell 29(2):300–312

    Article  Google Scholar 

  27. Sivic J, Russell B, Efros AA, Zisserman A, Freeman B (2005) Discovering objects and their location in images. In: Proceedings of international conference on computer vision (ICCV 2005), vol 1. Beijing, China. IEEE Computer Society, Washington, DC, pp 370–377

    Chapter  Google Scholar 

  28. Verbeek J (2006) Learning nonlinear image manifolds by global alignment of local linear models. IEEE Trans Pattern Anal Mach Intell 28(8):1236–1250

    Article  Google Scholar 

  29. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 1. Kauai Marriott, Hawaii, USA. IEEE Computer Society, Washington, DC, pp 511–518

    Google Scholar 

  30. Vogel J, Schiele B (2004) A semantic typicality measure for natural scene categorization. In: Proceedings of pattern recognition symposium DAGM’04, Tübingen, September 2004

  31. Wainwright MJ, Jordan MI (2003) Graphical models, exponential families, and variational inference. Technical report, Department of Statistics, University of California, Berkeley

  32. Wainwright MJ, Jordan MI (2004) Variational inference in graphical models: the view from the marginal polytope. Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, September 2004

  33. Weber M (2000) Unsupervised learning of models for object recognition. Ph.D. thesis, Caltech

  34. Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: Proceedings of international conference on computer vision (ICCV 2005), vol 2. Beijing, China. IEEE Computer Society, Washington, DC, pp 1800–1807

    Chapter  Google Scholar 

  35. Zhang R, Zhang ZM, Li M, Ma WY, Zhang HJ (2005) A probabilistic semantic model for image annotation and multi-modal image retrieva. In: Proceedings of the tenth IEEE international conference on computer vision (ICCV 2005), vol 1. Beijing, China. IEEE Computer Society, Washington, DC, pp 846–851

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenhui Li.

Additional information

This research is supported by the National Natural Science Foundation of China (grant No. 60573182), the Doctor Foundation of China (grant No. 20060183042) and the Jilin Science Foundation (grant No, 20060527).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, D., Li, W. & Li, Z. Semantic image classification using statistical local spatial relations model. Multimed Tools Appl 39, 169–188 (2008). https://doi.org/10.1007/s11042-008-0203-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-008-0203-6

Keywords

Navigation