Advertisement

Outlier Robust Geodesic K-means Algorithm for High Dimensional Data

  • Aidin HassanzadehEmail author
  • Arto Kaarna
  • Tuomo Kauranne
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10029)

Abstract

This paper proposes an outlier robust geodesic K-mean algorithm for high dimensional data. The proposed algorithm features three novel contributions. First, it employs a shared nearest neighbour (SNN) based distance metric to construct the nearest neighbour data model. Second, it combines the notion of geodesic distance to the well-known local outlier factor (LOF) model to distinguish outliers from inlier data. Third, it introduces a new ad-hoc strategy to integrate outlier scores into geodesic distances. Numerical experiments with synthetic and real world remote sensing spectral data show the efficiency of the proposed algorithm in clustering of high-dimensional data in terms of the overall clustering accuracy and the average precision.

Keywords

Clustering K-means High-dimensional data Geodesic distance Shared nearest neighbour Local outlier factor 

References

  1. 1.
    Asgharbeygi, N., Maleki, A.: Geodesic k-means clustering. In: 19th International Conference on Pattern Recognition 2008, pp. 1–4. IEEE, Tampa, December 2008Google Scholar
  2. 2.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 93–104. ACM, New York (2000)Google Scholar
  3. 3.
    Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of Second SIAM International Conference on Data Mining. SIAM (2003)Google Scholar
  4. 4.
    Francois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19(7), 873–886 (2007)CrossRefGoogle Scholar
  5. 5.
    Houle, M.E., Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Can shared-neighbor distances defeat the curse of dimensionality? In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 482–500. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-13818-8_34 CrossRefGoogle Scholar
  6. 6.
    Jarvis, R., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Trans. Comput. C–22(11), 1025–1034 (1973)CrossRefGoogle Scholar
  7. 7.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, chap. 2, pp. 68-125. Wiley (2008)Google Scholar
  8. 8.
    Moëllic, P.A., Haugeard, J.E., Pitel, G.: Image clustering based on a shared nearest neighbors approach for tagged collections. In: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval, CIVR 2008, pp. 269–278. ACM, New York (2008)Google Scholar
  9. 9.
    Tomasev, N., Mladeni, D.: Hubness-aware shared neighbor distances for high-dimensional k-nearest neighbor classification. Knowl. Inf. Syst. 39(1), 89–122 (2014)CrossRefGoogle Scholar
  10. 10.
    Wang, D., Ding, C., Li, T.: K-subspace clustering. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5782, pp. 506–521. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04174-7_33 CrossRefGoogle Scholar
  11. 11.
    Wu, J.: Cluster analysis and k-means clustering: an introduction. In: Wu, J. (ed.) Advances in K-means Clustering. Springer Theses, pp. 1–16. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Yin, J., Fan, X., Chen, Y., Ren, J.: High-dimensional shared nearest neighbor clustering algorithm. In: Wang, L., Jin, Y. (eds.) FSKD 2005. LNCS (LNAI), vol. 3614, pp. 494–502. Springer, Heidelberg (2005). doi: 10.1007/11540007_60 CrossRefGoogle Scholar
  13. 13.
    Zimek, A., Schubert, E., Kriegel, H.P.: A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. 5(5), 363–387 (2012)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Aidin Hassanzadeh
    • 1
    Email author
  • Arto Kaarna
    • 1
  • Tuomo Kauranne
    • 2
  1. 1.Machine Vision and Pattern Recognition Laboratory, School of Engineering ScienceLappeenranta University of TechnologyLappeenrantaFinland
  2. 2.Mathematics Laboratory, School of Engineering ScienceLappeenranta University of TechnologyLappeenrantaFinland

Personalised recommendations