Advertisement

GMM-ClusterForest: A Novel Indexing Approach for Multi-features Based Similarity Search in High-Dimensional Spaces

  • Yuchai Wan
  • Xiabi Liu
  • Kunqi Tong
  • Xue Wei
  • Yi Wu
  • Fei Guan
  • Kunpeng Pang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7664)

Abstract

This paper proposes a novel clustering based indexing approach called GMM-ClusterForest for supporting multi-features based similarity search in high-dimensional spaces. We fit a Gaussian Mixture Model (GMM) to data through the Expectation-Maximization (EM) algorithm for estimating GMM parameters and the Minimum Description Length (MDL) criterion for selecting GMM structure. Each Gaussian component in the GMM is taken as a cluster center and each data point is assigned to the cluster according to the Bayesian decision rule. By performing this clustering method hierarchically, an index tree is constructed and the corresponding similarity search method is developed for a type of features. Then multi-features based similarity search is fulfilled by fusing the index trees for all the types of features considered. We evaluated the proposed indexing approach through applying it to example-based image retrieval and conducting the experiments on Corel 1000 dataset and self-collected large dataset. The experimental results show that our approach is effective and promising.

Keywords

High-dimensional data indexing Similarity search Clustering Gaussian Mixture Models (GMM) Content-Based Image Retrieval (CBIR) 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Weber, R., Schek, H.J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proceedings of VLDB 1998, San Francisco, USA, pp. 194–205 (1998)Google Scholar
  2. 2.
    Bennett, K.P., Fayyad, U., Geigery, D.: Density-Based Indexing for Approximate, Nearest-Neighbor Queries. In: Proceedings of SIGKDD 1999, pp. 233–243 (1999)Google Scholar
  3. 3.
    Li, C., Chang, E., Garcia-Molina, H., Wiederhold, G.: Clustering for Approximate Similarity Search in High-Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering 14(4), 792–808 (2002)CrossRefGoogle Scholar
  4. 4.
    Yu, D., Zhang, A.: ClusterTree: Integration of Cluster Representation and Nearest-Neighbor Search for Large Data Sets with High Dimensions. IEEE Transactions on Knowledge and Data Engineering 15(5), 1316–1337 (2003)CrossRefGoogle Scholar
  5. 5.
    Xu, H., Yu, D., Xu, D., Zhang, A.: SS-ClusterTree: A Subspace Clustering Based Indexing Algorithm over High-Dimensional Image Features. In: Proceedings of CIVR 2008, New York, NY, USA, pp. 95–104 (2008)Google Scholar
  6. 6.
    Tao, W., Jin, H., Luo, F., Wu, K.: Integrating Image Clustering and Memory Indexing for Large Scale Content-based Image Retrieval. In: MIPPR 2009. Proceedings of SPIE, vol. 7498 (2009)Google Scholar
  7. 7.
    Cui, B., Ooi, B.C., Su, J., Tan, K.L.: Contorting High Dimensional Data for Efficient Main Memory KNN Processing. In: Proceedings of SIGMOD 2003, pp. 479–490 (2003)Google Scholar
  8. 8.
    Wang, B., Gan, J.Q.: Integration of Projected Clusters and Principal Axis Trees for High-Dimensional Data Indexing and Query. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 191–196. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Hansen, M.H., Yu, B.: Model Selection and the Principle of Minimum Description Length. J. Amer. Statist. Assoc. 96(454), 746–774 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Vlassis, N., Likas, A.: A Kurtosis-Based Dynamic Approach to Gaussian Mixture Modeling. IEEE Trans. Sys. Man Cybern. 29, 393–399 (1999)CrossRefGoogle Scholar
  11. 11.
    Deng, Y., Liu, X.: Combined Similarity Measure Based Approach to Image Retrieval. Journal of Information & Computational Science 5(1), 345–350 (2008)Google Scholar
  12. 12.
    James, Z.W., Jia, L., Gio, W.: SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture Libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(9), 947–963 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yuchai Wan
    • 1
  • Xiabi Liu
    • 1
  • Kunqi Tong
    • 1
  • Xue Wei
    • 1
  • Yi Wu
    • 1
  • Fei Guan
    • 1
  • Kunpeng Pang
    • 1
  1. 1.Beijing Lab of Intelligent Information Technology, School of Computer Science and TechnologyBeijing Institute of TechnologyBeijingChina

Personalised recommendations