GMM-ClusterForest: A Novel Indexing Approach for Multi-features Based Similarity Search in High-Dimensional Spaces
This paper proposes a novel clustering based indexing approach called GMM-ClusterForest for supporting multi-features based similarity search in high-dimensional spaces. We fit a Gaussian Mixture Model (GMM) to data through the Expectation-Maximization (EM) algorithm for estimating GMM parameters and the Minimum Description Length (MDL) criterion for selecting GMM structure. Each Gaussian component in the GMM is taken as a cluster center and each data point is assigned to the cluster according to the Bayesian decision rule. By performing this clustering method hierarchically, an index tree is constructed and the corresponding similarity search method is developed for a type of features. Then multi-features based similarity search is fulfilled by fusing the index trees for all the types of features considered. We evaluated the proposed indexing approach through applying it to example-based image retrieval and conducting the experiments on Corel 1000 dataset and self-collected large dataset. The experimental results show that our approach is effective and promising.
KeywordsHigh-dimensional data indexing Similarity search Clustering Gaussian Mixture Models (GMM) Content-Based Image Retrieval (CBIR)
Unable to display preview. Download preview PDF.
- 1.Weber, R., Schek, H.J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proceedings of VLDB 1998, San Francisco, USA, pp. 194–205 (1998)Google Scholar
- 2.Bennett, K.P., Fayyad, U., Geigery, D.: Density-Based Indexing for Approximate, Nearest-Neighbor Queries. In: Proceedings of SIGKDD 1999, pp. 233–243 (1999)Google Scholar
- 5.Xu, H., Yu, D., Xu, D., Zhang, A.: SS-ClusterTree: A Subspace Clustering Based Indexing Algorithm over High-Dimensional Image Features. In: Proceedings of CIVR 2008, New York, NY, USA, pp. 95–104 (2008)Google Scholar
- 6.Tao, W., Jin, H., Luo, F., Wu, K.: Integrating Image Clustering and Memory Indexing for Large Scale Content-based Image Retrieval. In: MIPPR 2009. Proceedings of SPIE, vol. 7498 (2009)Google Scholar
- 7.Cui, B., Ooi, B.C., Su, J., Tan, K.L.: Contorting High Dimensional Data for Efficient Main Memory KNN Processing. In: Proceedings of SIGMOD 2003, pp. 479–490 (2003)Google Scholar
- 11.Deng, Y., Liu, X.: Combined Similarity Measure Based Approach to Image Retrieval. Journal of Information & Computational Science 5(1), 345–350 (2008)Google Scholar