Skip to main content
Log in

A semi-supervised probabilistic model for clustering large databases of complex images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Image content clustering is an effective way to organize large databases thereby making the content based image retrieval process much easier. However, clustering of images with varied background and foreground is quite challenging. In this paper, we propose a novel image content clustering paradigm suitable for clustering large and diverse image databases. In our approach images are represented in a continuous domain based on a probabilistic Gaussian Mixture Model (GMM) with the images modeled as mixture of Gaussian distributions in the selected feature space. The distance metric between the Gaussian distributions is defined in the sense of Kullback–Leibler (KL) divergence. The clustering is done using a semi-supervised learning framework where labeled data in the form of cluster templates is used to classify the unlabelled data. The clusters are formed around initially chosen seeds and are updated in the due course based on user inputs. In our clustering approach the user interaction is done in a structured way as to get maximum inputs from the user in a limited time. We propose two methods to carry out the structured user interaction using which the cluster templates are updated to improve the quality of the clusters formed. The proposed method is experimentally evaluated on benchmark datasets that are specifically chosen to include a wide variation of images around a common theme that is typically encountered in applications like photo-summarization and poses a major semantic gap challenge to conventional clustering approaches. The experimental results presented demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Bade K, Nurnberger A (2006) Personalized hierarchical clustering in Proc. IEEE/WIC/ACM Int.Conf. on Web Intelligence, Hong Kong, 181–187

  2. Bair E (2013) Semi-supervised clustering methods. Wiley Interdiscip Rev Comput Stat 5(5):349–361

    Article  Google Scholar 

  3. Bair E, Tibshirani R (2004) Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2(4):e108

    Article  Google Scholar 

  4. Basu S, Banerjee A, Mooney R (2002) Semi-supervised clustering by seeding, in Proc. 19th Int. Conf. on Machine Learning (ICML-2002) Jul 8–12, Sydney

  5. Basu S, Bilenko M, Mooney RJ (2003) Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering, in Proc. of ICML-2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, Washington pp. 42–49

  6. Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering in Proc. Tenth ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, Seattle, 22–25, pp. 59–68

  7. Bhatt CA, Kankanhalli MS (2011) Multimedia data mining: state of the art and challenges. Multimed Tools Appl 51(1):35–76

    Article  Google Scholar 

  8. Chen Y, Wang JZ, Krovetz R (2003) Content-based image retrieval by clustering, in Proc. 5th ACM SIGMM Int. workshop on Multimedia information retrieval, Berkeley, 193–200

  9. Chuang J, Hsu DJ (2013) Human-Centered Interactive Clustering for Data Analysis, [Online] Available:http://dsp.rice.edu/sites/dsp.rice.edu/files/Paper6.pdf

  10. Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms, Artificial Neural Networks In Engineering (ANNIE-99), pp.809–814

  11. Eick CF, Zeidat N, Zhao Z (2004) Supervised clustering-algorithms and benefits in Proc. 16th IEEE Int. Conf. on Tools with Artificial Intelligence, (ICTAI), Boca Raton, pp. 774–776

  12. Face database (2007) [Online] Available: http://cswwwessex.ac.uk/mv/allfaces/faces94.html

  13. Gao K, Zhang Y, Luo P, Zhang W, Xia J and Lin S (2012) Visual stem mapping and geometric tense coding for augmented visual vocabulary. in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Jun. 16–21, Providence, Rhode Island, pp. 3234–3241

  14. Goldberger J, Greenspan H, Gordon S (2002) Unsupervised image clustering using the information bottleneck method, Pattern Recognition, Springer, pp. 158–165.

  15. Goldberger J, Gordon S, Greenspan H (2003) An efficient image similarity measure based on approximations of KL-divergence between two Gaussian mixtures, in Proc. Ninth IEEE Int. Conf. in Computer Vision, Nice, pp. 487–493

  16. Greenspan H, Goldberger J, Ridel L (2001) A continuous probabilistic framework for image matching. Comput Vis Image Underst 84(3):384–406

    Article  MATH  Google Scholar 

  17. Hamasuna Y, Endo Y, Miyamoto S (2010) Semi-supervised agglomerative hierarchical clustering using clusterwise tolerance based pairwise constraints, in Modeling Decisions for Artificial Intelligence, Springer, pp. 152–162

  18. Hu W, He Pan Q (2015) "data clustering and analyzing techniques using hierarchical clustering method", in. Multimedia Tools Appl 74(19):8495–8504

    Article  Google Scholar 

  19. Jain AK, Dubes RC (1988) Algorithms for clustering data, Prentice-Hall, Inc

  20. Kestler HA, Kraus JM (2006) On the effects of constraints in semi-supervised hierarchical clustering, Artificial Neural Networks in Pattern Recognition, Springer, pp. 57–66

  21. Lande MV, Bhanodiya P, Jain P (2014) An effective content-based image retrieval using color, texture and shape feature, in Intelligent Computing, Networking, and Informatics, ed: Springer, pp. 1163–1170

  22. Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Trans Image Process 25(5):2103–2116

    Article  MathSciNet  Google Scholar 

  23. Miyamoto S, Terami A (2010) Semi-supervised agglomerative hierarchical clustering algorithms with pairwise constraints in Proc. IEEE Int. Conf. on Fuzzy Systems (FUZZ), Barcelona, pp. 1–6

  24. Murala S, Maheshwari R, Balasubramanian R (2012) Local tetra patterns: a new feature descriptor for content-based image retrieval. IEEE Trans Image Process 21(5):2874–2886

    Article  MathSciNet  Google Scholar 

  25. Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134

    Article  MATH  Google Scholar 

  26. Rokach L, Maimon O (2005) Clustering methods, Data mining and knowledge discovery handbook, Springer, pp. 321–352

  27. Silvén O, Niskanen M, Kauppinen H (2003) Wood inspection with non-supervised clustering. Mach Vis Appl 13(5):275–285

    Article  Google Scholar 

  28. Tian Y (2007) A face annotation framework with partial clustering and interactive labeling, in Proc.IEEE Int. Conf. on Computer Vision and Pattern Recognition, CVPR'07, Minneapolis, pp.1–8

  29. Wagstaff K, Cardi C (2001) Constrained k-means clustering with background knowledge, in Proc. Int. Conf. on Machine Learning (ICML-2001), Jun-28-Jul 1, Williamstown, pp. 577–584

  30. Wang image databases (2017) [Online]Available: http://savvash.blogspot.in/2008/12/benchmark-databases-for-cbir.html

  31. Wang X, Wang Z (2013) A novel method for image retrieval based on structure elements’ descriptor. J Vis Commun Image Represent 24(1):63–74

    Article  Google Scholar 

  32. Wang M, Ji D, Tian Q, Hua XS (2012) Intelligent photo clustering with user interaction and distance metric learning. Pattern Recogn Lett 33(4):462–470

    Article  Google Scholar 

  33. Wang M, Ji D, Tian Q, Hua XS (2012) Intelligent photo clustering with user interaction and distance metric learning. Pattern Recogn Lett 33(4):462–470

    Article  Google Scholar 

  34. Zeng S (2014) Image segmentation using spectral clustering of Gaussian mixture models. Neurocomputing 144:346–356

    Article  Google Scholar 

  35. Zhang B, Gao Y, Liu J (2010) Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor. IEEE Trans Image Process 19(2):533–544

    Article  MathSciNet  MATH  Google Scholar 

  36. Zhang H, Shang X, Luan H, Wang M, Chua TS (2016) Learning from collective intelligence: feature learning using social images and tags. ACM Trans Multimedia Comput, Commun, Appl (TOMM) 13(1):1

    Article  Google Scholar 

  37. Zhao Y, Karypsis G (2002) Evaluation of hierarchical clustering algorithms for document datasets, in Proc. 11th Int. Conf. on Information and knowledge management, Virginia pp. 515–524

  38. Zhao H, Qi Z (2010) Hierarchical agglomerative clustering with ordering constraints, in Proc. Third Int. Conf. on Knowledge Discovery and Data Mining, Phuket, pp. 195–199

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Nisha Chandran.

Appendices

Appendix A

1.1 Seed updation algorithm 1

figure d

Appendix B

1.1 Seed updation algorithm 2

figure e

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nisha Chandran, S., Gangodkar, D. & Mittal, A. A semi-supervised probabilistic model for clustering large databases of complex images. Multimed Tools Appl 76, 21937–21959 (2017). https://doi.org/10.1007/s11042-017-4664-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4664-3

Keywords

Navigation