Abstract
Image content clustering is an effective way to organize large databases thereby making the content based image retrieval process much easier. However, clustering of images with varied background and foreground is quite challenging. In this paper, we propose a novel image content clustering paradigm suitable for clustering large and diverse image databases. In our approach images are represented in a continuous domain based on a probabilistic Gaussian Mixture Model (GMM) with the images modeled as mixture of Gaussian distributions in the selected feature space. The distance metric between the Gaussian distributions is defined in the sense of Kullback–Leibler (KL) divergence. The clustering is done using a semi-supervised learning framework where labeled data in the form of cluster templates is used to classify the unlabelled data. The clusters are formed around initially chosen seeds and are updated in the due course based on user inputs. In our clustering approach the user interaction is done in a structured way as to get maximum inputs from the user in a limited time. We propose two methods to carry out the structured user interaction using which the cluster templates are updated to improve the quality of the clusters formed. The proposed method is experimentally evaluated on benchmark datasets that are specifically chosen to include a wide variation of images around a common theme that is typically encountered in applications like photo-summarization and poses a major semantic gap challenge to conventional clustering approaches. The experimental results presented demonstrate the effectiveness of the proposed approach.
Similar content being viewed by others
References
Bade K, Nurnberger A (2006) Personalized hierarchical clustering in Proc. IEEE/WIC/ACM Int.Conf. on Web Intelligence, Hong Kong, 181–187
Bair E (2013) Semi-supervised clustering methods. Wiley Interdiscip Rev Comput Stat 5(5):349–361
Bair E, Tibshirani R (2004) Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2(4):e108
Basu S, Banerjee A, Mooney R (2002) Semi-supervised clustering by seeding, in Proc. 19th Int. Conf. on Machine Learning (ICML-2002) Jul 8–12, Sydney
Basu S, Bilenko M, Mooney RJ (2003) Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering, in Proc. of ICML-2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, Washington pp. 42–49
Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering in Proc. Tenth ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, Seattle, 22–25, pp. 59–68
Bhatt CA, Kankanhalli MS (2011) Multimedia data mining: state of the art and challenges. Multimed Tools Appl 51(1):35–76
Chen Y, Wang JZ, Krovetz R (2003) Content-based image retrieval by clustering, in Proc. 5th ACM SIGMM Int. workshop on Multimedia information retrieval, Berkeley, 193–200
Chuang J, Hsu DJ (2013) Human-Centered Interactive Clustering for Data Analysis, [Online] Available:http://dsp.rice.edu/sites/dsp.rice.edu/files/Paper6.pdf
Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms, Artificial Neural Networks In Engineering (ANNIE-99), pp.809–814
Eick CF, Zeidat N, Zhao Z (2004) Supervised clustering-algorithms and benefits in Proc. 16th IEEE Int. Conf. on Tools with Artificial Intelligence, (ICTAI), Boca Raton, pp. 774–776
Face database (2007) [Online] Available: http://cswwwessex.ac.uk/mv/allfaces/faces94.html
Gao K, Zhang Y, Luo P, Zhang W, Xia J and Lin S (2012) Visual stem mapping and geometric tense coding for augmented visual vocabulary. in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Jun. 16–21, Providence, Rhode Island, pp. 3234–3241
Goldberger J, Greenspan H, Gordon S (2002) Unsupervised image clustering using the information bottleneck method, Pattern Recognition, Springer, pp. 158–165.
Goldberger J, Gordon S, Greenspan H (2003) An efficient image similarity measure based on approximations of KL-divergence between two Gaussian mixtures, in Proc. Ninth IEEE Int. Conf. in Computer Vision, Nice, pp. 487–493
Greenspan H, Goldberger J, Ridel L (2001) A continuous probabilistic framework for image matching. Comput Vis Image Underst 84(3):384–406
Hamasuna Y, Endo Y, Miyamoto S (2010) Semi-supervised agglomerative hierarchical clustering using clusterwise tolerance based pairwise constraints, in Modeling Decisions for Artificial Intelligence, Springer, pp. 152–162
Hu W, He Pan Q (2015) "data clustering and analyzing techniques using hierarchical clustering method", in. Multimedia Tools Appl 74(19):8495–8504
Jain AK, Dubes RC (1988) Algorithms for clustering data, Prentice-Hall, Inc
Kestler HA, Kraus JM (2006) On the effects of constraints in semi-supervised hierarchical clustering, Artificial Neural Networks in Pattern Recognition, Springer, pp. 57–66
Lande MV, Bhanodiya P, Jain P (2014) An effective content-based image retrieval using color, texture and shape feature, in Intelligent Computing, Networking, and Informatics, ed: Springer, pp. 1163–1170
Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Trans Image Process 25(5):2103–2116
Miyamoto S, Terami A (2010) Semi-supervised agglomerative hierarchical clustering algorithms with pairwise constraints in Proc. IEEE Int. Conf. on Fuzzy Systems (FUZZ), Barcelona, pp. 1–6
Murala S, Maheshwari R, Balasubramanian R (2012) Local tetra patterns: a new feature descriptor for content-based image retrieval. IEEE Trans Image Process 21(5):2874–2886
Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134
Rokach L, Maimon O (2005) Clustering methods, Data mining and knowledge discovery handbook, Springer, pp. 321–352
Silvén O, Niskanen M, Kauppinen H (2003) Wood inspection with non-supervised clustering. Mach Vis Appl 13(5):275–285
Tian Y (2007) A face annotation framework with partial clustering and interactive labeling, in Proc.IEEE Int. Conf. on Computer Vision and Pattern Recognition, CVPR'07, Minneapolis, pp.1–8
Wagstaff K, Cardi C (2001) Constrained k-means clustering with background knowledge, in Proc. Int. Conf. on Machine Learning (ICML-2001), Jun-28-Jul 1, Williamstown, pp. 577–584
Wang image databases (2017) [Online]Available: http://savvash.blogspot.in/2008/12/benchmark-databases-for-cbir.html
Wang X, Wang Z (2013) A novel method for image retrieval based on structure elements’ descriptor. J Vis Commun Image Represent 24(1):63–74
Wang M, Ji D, Tian Q, Hua XS (2012) Intelligent photo clustering with user interaction and distance metric learning. Pattern Recogn Lett 33(4):462–470
Wang M, Ji D, Tian Q, Hua XS (2012) Intelligent photo clustering with user interaction and distance metric learning. Pattern Recogn Lett 33(4):462–470
Zeng S (2014) Image segmentation using spectral clustering of Gaussian mixture models. Neurocomputing 144:346–356
Zhang B, Gao Y, Liu J (2010) Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor. IEEE Trans Image Process 19(2):533–544
Zhang H, Shang X, Luan H, Wang M, Chua TS (2016) Learning from collective intelligence: feature learning using social images and tags. ACM Trans Multimedia Comput, Commun, Appl (TOMM) 13(1):1
Zhao Y, Karypsis G (2002) Evaluation of hierarchical clustering algorithms for document datasets, in Proc. 11th Int. Conf. on Information and knowledge management, Virginia pp. 515–524
Zhao H, Qi Z (2010) Hierarchical agglomerative clustering with ordering constraints, in Proc. Third Int. Conf. on Knowledge Discovery and Data Mining, Phuket, pp. 195–199
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
1.1 Seed updation algorithm 1
Appendix B
1.1 Seed updation algorithm 2
Rights and permissions
About this article
Cite this article
Nisha Chandran, S., Gangodkar, D. & Mittal, A. A semi-supervised probabilistic model for clustering large databases of complex images. Multimed Tools Appl 76, 21937–21959 (2017). https://doi.org/10.1007/s11042-017-4664-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4664-3