Outlier Robust Geodesic K-means Algorithm for High Dimensional Data
This paper proposes an outlier robust geodesic K-mean algorithm for high dimensional data. The proposed algorithm features three novel contributions. First, it employs a shared nearest neighbour (SNN) based distance metric to construct the nearest neighbour data model. Second, it combines the notion of geodesic distance to the well-known local outlier factor (LOF) model to distinguish outliers from inlier data. Third, it introduces a new ad-hoc strategy to integrate outlier scores into geodesic distances. Numerical experiments with synthetic and real world remote sensing spectral data show the efficiency of the proposed algorithm in clustering of high-dimensional data in terms of the overall clustering accuracy and the average precision.
KeywordsClustering K-means High-dimensional data Geodesic distance Shared nearest neighbour Local outlier factor
- 1.Asgharbeygi, N., Maleki, A.: Geodesic k-means clustering. In: 19th International Conference on Pattern Recognition 2008, pp. 1–4. IEEE, Tampa, December 2008Google Scholar
- 2.Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 93–104. ACM, New York (2000)Google Scholar
- 3.Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of Second SIAM International Conference on Data Mining. SIAM (2003)Google Scholar
- 7.Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, chap. 2, pp. 68-125. Wiley (2008)Google Scholar
- 8.Moëllic, P.A., Haugeard, J.E., Pitel, G.: Image clustering based on a shared nearest neighbors approach for tagged collections. In: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval, CIVR 2008, pp. 269–278. ACM, New York (2008)Google Scholar