Advertisement

Soft Computing

, Volume 22, Issue 24, pp 8243–8258 | Cite as

Enhanced shared nearest neighbor clustering approach using fuzzy for teleconnection analysis

  • Rika Sharma
  • Kesari Verma
Methodologies and Application

Abstract

Massive amount of Earth science data open an unprecedented opportunity to discover potentially valuable information. Earth science data are complex, nonlinear, high-dimensional data, and the sparsity of data in high-dimensional space poses major challenge in clustering of the data. Shared nearest neighbor clustering (SNN) algorithm is one of the well-known and efficient methods to handle high-dimensional spatiotemporal data. The SNN clustering method does not cluster all the data forming rigid boundary selection. This paper reports fuzzy shared nearest neighbor (FSNN) algorithm which is an enhancement of the SNN clustering method that has the capability of handling the data lying in the boundary regions by means of a fuzzy concept. The clusters obtained can be characterized by the cluster centroid, which summarizes the behavior of the ocean points in the cluster. The statistical measure is used to find the significant relation between the cluster centroids and the existing climate indices. In this study, correlation measure is used to find the significant pattern, such as teleconnection or dipole. The experimentation is performed on Indian continent latitude range \(7.5^{\circ }{-}37.5^{\circ }\hbox {N}\) and longitude range \(67.5^{\circ }{-}97.5^{\circ }\hbox {E}\). Extensive experiments are carried out to compare the proposed approach with existing clustering methods such as K-means, fuzzy C-means and SNN. The proposed method, FSNN algorithm, not only handles the data lying in the overlapping region, but it also finds more compact and well-separated clusters. FSNN shows better results in terms of finding a significant correlation between cluster centroids and existing climate indices and validated by ground truth dataset.

Keywords

Clustering Shared nearest neighbor Fuzzy shared nearest neighbor Earth science data Climate indices 

Notes

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203CrossRefGoogle Scholar
  2. Boriah S, Simon G, Naorem M, Steinbach M, Kumar V, Klooster S, Potter C (2004) Predicting land temperature using ocean data. In: Proceedings of the knowledge discovery in databases KDDGoogle Scholar
  3. Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27MathSciNetCrossRefGoogle Scholar
  4. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–7CrossRefGoogle Scholar
  5. Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybern 4(1):95–104MathSciNetCrossRefGoogle Scholar
  6. Ertoz L, Steinbach M, Kumar V (2002) A new shared nearest neighbor clustering algorithm and its applications. In: Workshop on clustering high dimensional data and its applications at 2nd SIAM international conference on data mining, pp 105–115Google Scholar
  7. Ertöz L, Steinbach M, Kumar V (2003a) Finding topics in collections of documents: a shared nearest neighbor approach. Clust Inf Retr 11:83–103MathSciNetCrossRefGoogle Scholar
  8. Ertöz L, Steinbach M, Kumar V (2003b) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, Philadelphia, pp 47–58CrossRefGoogle Scholar
  9. Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, New YorkCrossRefGoogle Scholar
  10. Faghmous JH, Kumar V (2014) A big data guide to understanding climate change: the case for theory-guided data science. Big Data 2(3):155–63CrossRefGoogle Scholar
  11. Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–8zbMATHGoogle Scholar
  12. Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 100(11):1025–34CrossRefGoogle Scholar
  13. Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y (1996) The NCEP/NCAR 40-year reanalysis project. Bull Am Meteorol Soc 77(3):437–71CrossRefGoogle Scholar
  14. Kawale J, Liess S, Kumar A, Steinbach M, Ganguly AR, Samatova NF, Semazzi FH, Snyder PK, Kumar V (2011) Data guided discovery of dynamic climate dipoles. In: CIDU 2011, pp 30–44Google Scholar
  15. Kawale J, Chatterjee S, Ormsby D, Steinhaeuser K, Liess S, Kumar V (2012) Testing the significance of spatio-temporal teleconnection patterns. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 642–650Google Scholar
  16. Krzanowski WJ, Lai YT (1988) A criterion for determining the number of groups in a data set using sum-of-squares clustering. Biometrics 44:23–34MathSciNetCrossRefGoogle Scholar
  17. Kumar V, Steinbach M, Tan PN, Klooster S, Potter C, Torregrosa A (2001) Mining scientific data: discovery of patterns in the global climate system. In: Joint statistical meetingGoogle Scholar
  18. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRefGoogle Scholar
  19. Steinbach M, Tan PN, Kumar V, Klooster S, Potter C (2002a) Temporal data mining for the discovery and analysis of ocean climate indices. In: the 2nd workshop on temporal data mining, at the 8th ACM SIGKDD international conference on knowledge discovery and data mining, vol 23, Edmonton, Alberta, CanadaGoogle Scholar
  20. Steinbach M, Tan PN, Kumar V, Potter C, Klooster S, Torregrosa A (2002b) Data mining for the discovery of ocean climate indices. In: Proceedings of the fifth workshop on scientific data miningGoogle Scholar
  21. Steinbach M, Tan PN, Kumar V, Klooster S, Potter C (2003) Discovery of climate indices using clustering. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 446–455Google Scholar
  22. Steinbach M, Tan PN, Boriah S, Kumar V, Klooster S, Potter C (2006) The application of clustering to earth science data: progress and challenges. In: Proceedings of the 2nd NASA data mining workshopGoogle Scholar
  23. Steinhaeuser K, Chawla NV, Ganguly AR (2011) Comparing predictive power in climate data: clustering matters. International symposium on spatial and temporal databases. Springer, Berlin, pp 39–55CrossRefGoogle Scholar
  24. Tan P, Steinbach M, Kumar V, Potter C, Klooster S, Torregrosa A (2001) Finding spatio-temporal patterns in earth science data. In: KDD 2001 workshop on temporal data mining, vol 19Google Scholar
  25. Zhang P, Steinbach M, Kumar V, Shekhar S, Tan P, Klooster S, Potter C (2005) Discovery of patterns of earth science data using data mining. In: Zurada J, Kantardzic M (eds) New generation of data mining applications. IEEE PressGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Department of Computer ApplicationsNational Institute of Technology RaipurRaipurIndia

Personalised recommendations