Advertisement

Discovery of Intrinsic Clustering in Spatial Data

  • Yee LeungEmail author
Chapter
Part of the Advances in Spatial Science book series (ADVSPATIAL)

Abstract

A fundamental task in knowledge discovery is the unraveling of clusters intrinsically formed in spatial databases. These clusters can be natural groups of variables, data-points or objects that are similar to each other in terms of a concept of similarity. They render a general and high-level scrutiny of the databases that can serve as an end in itself or a means to further data mining activities. Segmentation of spatial data into homogenous or interconnected groups, identification of regions with varying levels of information granularity, detection of spatial group structures of specific characteristics, and visualization of spatial phenomena under natural groupings are typical purpose of clustering with very little or no prior knowledge about the data. Often, clustering is employed as an initial exploration of the data that might form natural structures or relationships. It usually sets the stage for further data analysis or mining of structures and processes. Clustering has long been a main concern in statistical investigations and other data-heavy researches (Duda and Hart 1974; Jain and Dubes 1988; Everitt 1993). It is essentially an unsupervised learning, a terminology used in the field of pattern recognition and artificial intelligence, which aims at the discovery from data a class structure or classes that are unknown a priori. It has found its applications in fields such as pattern recognition, image processing, micro array data analysis, data storage, data transmission, machine learning, computer vision, remote sensing, geographical information science, and geographical research. Novel algorithms have also been developed arising from these applications. The advancement of data mining applications and the associated data sets have however posed new challenges to clustering, and it in turn intensifies the interest in clustering research. Catering for very large databases, particularly spatial databases, some new methods have also been developed over the years (Murray and Estivilli-Castro 1998; Miller and Han 2001; Li et al. 2006). To facilitate our discussion, a brief review of the clustering methods is first made in this section.

Keywords

Cluster Algorithm Convex Hull Cluster Center Scale Space Dissimilarity Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Acton ST, Mukherjee DP (2000) Scale space classification using area morphology. J IEEE Trans Image Process 9(4):623–635Google Scholar
  2. Amorese D, Lagarde JL, Laville E (1999) A point pattern analysis of the distribution of earthquakes in Normandy (in France). J Bull Seismol Soc Am 89(3):742–749Google Scholar
  3. Anderberg MR (1973) Cluster analysis for applications. Academic, New YorkGoogle Scholar
  4. Ankerst M, Breuning M, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM-SIGMOD international conference on management of data (SIGMOD’99), pp. 49–60Google Scholar
  5. Arbia G (1989) Spatial data configuration in statistical analysis of regional economic and related problems. Kluwer, DordrechtGoogle Scholar
  6. Asano T, Katoh N (1996) Variants for the Hough transform for line detection. Comput Geom 6:231–252Google Scholar
  7. Atallah MJ (1992) Parallel techniques for computational geometry. Proc IEEE 80(9):1435–1448Google Scholar
  8. Babaud J, Witkin AP, Baudin M, Duda R (1986) Uniqueness of the Gaussian kernel for scale-space filtering. J IEEE Trans Pattern Anal Mach Intell 8:26–33Google Scholar
  9. Ball G, Hall D (1976) A clustering technique for summarizing multivariate data. J Behav Sci 12:153–155Google Scholar
  10. Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrtics 49:803–821Google Scholar
  11. Barnett V, Lewis T (1994) Outliners in statistical data. Wiley, New YorkGoogle Scholar
  12. Basak J, Mahata D (2000) A connectionist model for corner detection in binary and gray images. J IEEE Trans Neural Network 11:1124–1132Google Scholar
  13. Bentley JL, Clarkson KL, Levine DB (1993) Fast Linear expected-time algorithms for computing maxima and convex hulls. J Algorithmica 9:168–183Google Scholar
  14. Bern MW, Karloff HJ, Schieber B (1992) Fast Geometric approximation techniques and geometric embedding problems. J Theor Comput Sci 106:265–281Google Scholar
  15. Bezdek JC (1980) A convergence theorem for the fuzzy ISODATA clustering algorithms. J IEEE Trans Pattern Anal Machine Intell 2:1–8Google Scholar
  16. Bezdek JC, Coray C, Gunderson R, Watson J (1981) Detection and characterization of cluster substructure. I. linear structure: fuzzy C-line. SIAN J Appl Math 40(2):339–357Google Scholar
  17. Bezdek JC, Hathaway RJ, Windham MP (1991) Numerical comparison of the RFCM and AP algorithms for clustering relational data. J Pattern Recogn 24(8):783–791Google Scholar
  18. Bezdek JC, Keller JM, Krishnapuram R, Pal NR (1999) Fuzzy models and algorithms for pattern recognition and image processing. Kluwer, BostonGoogle Scholar
  19. Blatt M, Wiseman S, Domany E (1997) Data clustering using a model granular magnet. J Neural Comput 9:1805–1847Google Scholar
  20. Bruzzone L, Prieto DF, Serpico SB (1999) A neural-statistical approach to multitrmporal and multisource remote-sensing image classification. J IEEE Trans Geosci Rem Sens 37:1350–1359Google Scholar
  21. Cao Z, Fu Z (1999) Clustering of long- and medium-term seismicity on China mainland (in Chinese with English abstract). J Earthquake 19(4):338–344Google Scholar
  22. Carpenter GA, Grossberg S (1987) ART2: self-organization of stable category recognition codes for analog input pattern. In: Proceedings IEEE international conference neural networks. San Diego, CA. pp. 727–736Google Scholar
  23. Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. J Comput Stat Data Anal 14:315–332Google Scholar
  24. Chakravarthy SV, Ghosh J (1996) Scale-based clustering using the radial basis function network. J IEEE Trans Nural Network 7(5):1250–1261Google Scholar
  25. Coren S, Ward L, Enns J (1994) Sensation and perception. Harcourt Brace College Publishers, Fort Worth, TXGoogle Scholar
  26. Dave RN (1991) Characterization and detection of noise in clustering. J Pattern Recogn Lett 12:657–664Google Scholar
  27. Dave RN, Krishnapuram R (1997a) Robust clustering methods: a unified view. J IEEE Trans Fuzzy Syst 5:270–293Google Scholar
  28. Derin H (1987) Estimating components of univariate Gaussian mixtures using Prony’s methods. J IEEE Trans Pattern Anal Mach Intell 9:142–148Google Scholar
  29. Di K, Li DL, Li DY (1998) A mathematical morphology based algorithm for discovering clusters in spatial databases. J Image Graph 3(3):173–178Google Scholar
  30. Dubes RO, Jain AK (1976) Clusterinf techniques: the user’s dilemma. J Pattern Recogn 8:247–260Google Scholar
  31. Duda RO, Hart PF (1974) Pattern classification and scene analysis. Wiley, New YorkGoogle Scholar
  32. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, Portland, Oregon, pp.324-331Google Scholar
  33. Everitt BS (1993) Cluster analysis. Halsted Press, New YorkGoogle Scholar
  34. Frigui H, Krishnapuram R (1999) A robust competitive clustering algorithm with applications in computer vision. J IEEE Trans Pattern Anal Mach Intell 21(5):450–465Google Scholar
  35. Fu Z (1997) Research on the earthquake activity mechanics in China’s mainland. Earthquake Press, Beijing, pp 124–128Google Scholar
  36. Fu Z, Jiang L (1994) Strong earthquake clustering in the Fenwei and North China Plain seismic zones (in Chinese with English Abstract). J Earthquake Res China 10(2):160–167Google Scholar
  37. Goldberg DE (1989) Genetic algorithms in search optimization and machine learning. Addison-Wesley, New YorkGoogle Scholar
  38. Gorden RL (1977) Unidimensional scaling of social variables: concepts and procedures. The Free Press, New YorkGoogle Scholar
  39. Gowda KC, Diday E (1992) Symbolic clustering using a new similarity measure. J IEEE Trans Syst Man Cybern 22:368–378Google Scholar
  40. Graham RL (1972) An efficient algorithm for determining the convex hull of a finite planar set. Inform Process Lett 1:132–133Google Scholar
  41. Guha S, Rastogi R, Shim K (1998) CURE: An efficient clustering algorithm for large databases. In: Proceeding of the 1998 ACM-SIGMOD international conference management of data (SIGMOD’98), pp. 73–84Google Scholar
  42. Guibas L, Salesin D, Stolfi J (1993) Constructing strongly convex approximate hulls with inaccurate primitives. J Algorithmica 9:534–560Google Scholar
  43. Guttman L (1968) A general nonmetric technique for finding the smallest the smallest coordinate space for a configuration of points. Psychometrika 33:469–506Google Scholar
  44. Hathaway RJ, Bezdek JC (1994) NERF c-means non-Euclidean relational fuzzy clustering. J Pattern Recogn 27(3):429–437Google Scholar
  45. Hathaway RJ, Bezdek JC, Davenport JW (1994) On relational data versions of c-means algorithms. J Pattern Recogn Lett 17:607–612Google Scholar
  46. Hathaway RJ, Davenport JW, Bezdek JC (1989) Relational duals of c-means clustering algorithms. J Pattern Recogn 22(2):205–212Google Scholar
  47. Hawkins D (1980) Identification of outliners. Chapman and Hall, Boca Raton, FLGoogle Scholar
  48. Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceedings 1998 international conference knowledge discovery and data mining (KDD’98), pp. 58–65Google Scholar
  49. Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann ArborGoogle Scholar
  50. Honda K, Togo N, Fujii T, Ichihashi H (2002) Linear fuzzy clustering based on least absolute deviations. In: Proceedings of 2002 IEEE international conference of fuzzy systems, pp. 1444–1449Google Scholar
  51. Hubert L (1974) Approximate evaluation technique for the single-link and complete-link hierarchical clustering procedure. J Am Stat Assoc 69:968Google Scholar
  52. Hummel R, Moniot R (1989) Reconstructions form zero crossings in scale space. IEEE Trans Acoust Speech Signal Process 37(12):245–295Google Scholar
  53. Hwang YK, Ahuja N (1993) Cross motion planning – A survey. ACM Comput Surv 24(3):219–291Google Scholar
  54. Jain AK, Dubes RO (1998) Algorithms for clustering data. Prentice Hall, Englewood Cliffs, NJGoogle Scholar
  55. Jiang M, Ma Z (1985) A comparison between the third and fourth seismic periods in north china. J Earthquake 6:5–11Google Scholar
  56. Johnson RA, Wichern DW (1992) Applied multivariate statistical analysis, 3rd edn. Prentice-Hall, Englewood Cliffs, NJGoogle Scholar
  57. Johnson SC (1967) Hierarchical clustering scheme. J Psychometrika 32:241Google Scholar
  58. Kagan YY (1999) Is earthquake seismology a hard, quantitative science? J Pure Appl Geophys 155:233–258Google Scholar
  59. Kagan YY, Jackson DD (1991) Long-term earthquake clustering. Geophys J Int 104:117–133Google Scholar
  60. Karypis G, Han EH, Kumar V (1999) CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. Computer 32:68–75Google Scholar
  61. Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkGoogle Scholar
  62. Kirpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. J Sci 200:671–680Google Scholar
  63. Koenderink JJ (1984) The structure of images. Biol Cybern 50:363–370Google Scholar
  64. Kohonen T (1982) Clustering, taxonomy, and topological maps of patterns. In: Proceedings of the 6th international conference of pattern recognition. Munich, Germany, pp. 114–128Google Scholar
  65. Krishnapuram R, Keller JM (1993) A possibilistic approach to clustering. J IEEE Trans Fuzzy Syst 1:98–110Google Scholar
  66. Lau K, Leung PL, Tse KA (1998) Nonlinear programming approach to metric unidimensional scaling. J Classif 15:2–14Google Scholar
  67. Lawson AB (2001) Statistical methods in spatial epidemiology. Wiley, ChichesterGoogle Scholar
  68. Lepage R, Rouhana RG, St-onge B, Noumeir R, Desjardins R (2000) Cellular neural network for automated detection of geological lineaments on radarsat images. J IEEE Trans Geosci Rem Sens 38:1224–1233Google Scholar
  69. Leung Y (1984) Towards a flexible framework for regionalization. Environ Plann A 16:1613–1632Google Scholar
  70. Leung Y (2001) Neural and evolutionary computation methods for spatial classification and knowledge acquisition. In: Fisher MM, Leung Y (eds) GeoComputational modelling: techniques and applications. Springer, Berlin, pp 71–108Google Scholar
  71. Leung Y, Gao Y, Xu ZB (1997a) Degree of population diversity – A perspective on premature convergence in genetic algorithms and its Markov chain analysis. IEEE Trans Neural Network 8:1165–1176Google Scholar
  72. Leung Y, Ge Y, Ma JH (2004e) Clustering of remote sensing data by unidimentional scaling (unpublished paper)Google Scholar
  73. Leung Y, Ma JH, Zhang WX (2001b) A New method for mining regression classes in Large data sets. IEEE Trans Pattern Anal Mach Intell 23(1):5–21Google Scholar
  74. Leung Y, Mei CL, Zhang WX (2000a) Statistical tests for spatial non-stationarity based on geographically weighted regression model. Environ Plann A 32:9–32Google Scholar
  75. Leung Y, Mei CL, Zhang WX (2000b) Testing for spatial autocorrection among the residuals of the geographically weighted regression. Environ Plann A 32:871–890Google Scholar
  76. Leung Y, Mei CL, Zhang WX (2003d) Statistical test for local patterns of spatial association. Environ Plann A 35:725–744Google Scholar
  77. Leung Y, Wu WZ, Zhang WX (2006a) Knowledge acquisition in incomplete information systems: a rough set approach. Eur J Oper Res 168:164–180Google Scholar
  78. Leung Y, Zhang JS, Xu ZB (2000c) Clustering by scale-space filtering. IEEE Trans Pattern Anal Mach Intell 22(12):1396–1410Google Scholar
  79. Leung Y, Zhang JS, Xu ZB (2000d) Clustering by scale-space filtering. IEEE Trans Pattern Anal Mach Intell 22:1396–1410Google Scholar
  80. Li DR, Wang SL, Li DY (2006) Spatial data mining theories and applications. Science Publisher, BeijingGoogle Scholar
  81. Ma Z, Jiang M (1987) Strong earthquake periods and episodes in China (in Chinese with English Abstract). J Earthquake Res China 3(1):47–51Google Scholar
  82. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, 1, pp. 281–297Google Scholar
  83. Man Y, Gath I (1994) Detection and separation of ring-shaped clusters using fuzzy clustering. J IEEE Trans Pattern Anal Mach Intell 16:855–861Google Scholar
  84. Maragos P (1989) Pattern spectrum and multiscale shape representation. J IEEE Trans Pattern Anal Mach Intell 11(7):701–716Google Scholar
  85. McIver JP, Carmines EG (1981) Unidimensional scaling. Sage Publications, Beverly Hills, CAGoogle Scholar
  86. McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New YorkGoogle Scholar
  87. McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, LondonGoogle Scholar
  88. Miller D, Rose K (1996) Hierarchical, unsupercised learning with growing via phase transitions. J Neural Comput 8:425–450Google Scholar
  89. Miller HJ, Han J (2001) Geographic data mining and knowledge discovery. Taylor and Francis, LondonGoogle Scholar
  90. Ng R, Han J (1994) Efficient and effective clustering method for spatial data mining. In: Proceedings of 1994 international conference on very large data bases (VLDB’94), pp. 144–155Google Scholar
  91. Ohashi Y (1984) Fuzzy clustering and robust estimation. In: 9th Meet. SAS Users Group International FL, Hollywood BeachGoogle Scholar
  92. Park KR, Lee C (1996) Scale-space using mathematical morphology. J IEEE Trans Pattern Anal Mach Intell 18(11):1121–1126Google Scholar
  93. Pliner V (1984) A class of metic scaling models. J Autom Rem Contr 47:560–567Google Scholar
  94. Pliner V (1996) Metric unidimensional scaling and global optimization. J Classif 13:3–18Google Scholar
  95. Postaire JG, Zhang RD, Botte CL (1993) Cluster Analysis by Binary Morphology. J IEEE Trans Pattern Anal Mach Intell 15(2):170–180Google Scholar
  96. Preparata FP (1979) An optimal real-time algorithm for planar convex hull. Commun ACM 22:402–405Google Scholar
  97. Preparata FP, Hong SJ (1977) Convex hulls of finite sets of points in two and three dimensions. Commun ACM 20:87–93Google Scholar
  98. Preparata FP, Shamos MI (1985) Computational geometry: an introduction. Springer, BerlinGoogle Scholar
  99. Qin CZ, Leung Y, Zhang JS (2006) Identification of seismic activities through visualization and scale-space filtering. In: Ruan D, D’hondt P, Fantoni PF, De Cock M, Nachtegael M, Kerre EE (eds) Applied artificial intelligence, processings of the 7th international FLINS conference. World Scientific, New Jersey, pp 643–650Google Scholar
  100. Roberts SJ (1997) parametric and non-parametric unsupervised clustering analysis. J Pattern Recogn 30(2):261–272Google Scholar
  101. Rose K, Gurewitz E, Fox G (1990) A deterministic annealing approach to clustering. J Pattern Recogn Lett 11:589–594Google Scholar
  102. Sen S, Dave RN (1998) Clustering of relational data containing noise and outliers. In: Proceedings of 1998 IEEE international conference on fuzzy systems 2, pp. 1411–1416Google Scholar
  103. Sheikholeslami G, Chatterjee S, Zhang A (1998) WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings of 1998 international conference on very large data bases (VLDB’98), pp. 428–439Google Scholar
  104. Simantiraki E (1996) Unidimensional scaling: a linear programming approach minimizing absolute deviations. J Classif 13:19–25Google Scholar
  105. Tadjudin S, Landgrebe DA (2000) Robust parameter estimation for mixture model. IEEE Trans Geosci Rem Sens 38(1):439–445Google Scholar
  106. Taven P, Grubmuller H, Huhnel H (1990) Self-organization of associative memory and pattern classification: recurrent signal processing on topological feature maps. J Biol Cybern 64:95–105Google Scholar
  107. Waldemark J (1997) An automated procedure for cluster analysis of multivariate satellite data. J Int J Neural System 8(1):3–15Google Scholar
  108. Wang M, Leung Y, Zhou CH, Pei T, Luo JC (2006) A mathematical morphology based scale space method for the mining of linear features in geographic data. Data Min Knowl Discov 12:97–118Google Scholar
  109. Wang N, Mei CL, Yan XD (2005) Analysis of spatial relationship between mean and extreme temperatures in China with geographically weighted regression technique (unpublished paper)Google Scholar
  110. Wang W, Yang J, Muntz R (1997) STING: A Statistical Information Grid Approach to Spatial Data Mining. In: Proceedings of 1997 interface conference on very large data bases (VLDB’97), pp. 186–195Google Scholar
  111. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244Google Scholar
  112. Wennmyr E (1989) A convex hull algorithm for neural networks. J IEEE Trans Circuits Syst 36:1478–1484Google Scholar
  113. Wilson R, Spann M (1990) A new approach to clustering. J Pattern Recogn 23(12):1413–1425Google Scholar
  114. Witkin AP (1983) Scale space filtering. In: Proceedings of International Joint Conference on Artificial Intelligence, Karlsruhe, pp. 1019–1022Google Scholar
  115. Wong HS, Guan L (2001) A neural learning approach for adaptive image restoration using a fuzzy model-based network architecture. J IEEE Trans Neural Network 12:516–531Google Scholar
  116. Wong YF (1993) Clustering data by melting. J Neural Comput 5(1):89–104Google Scholar
  117. Zhang D, Lutz T (1989) Structural control of igneous complexes and kimberlites: a new statistical method. J Tectonophys 159:137–148Google Scholar
  118. Zhang JS, Leung Y (2001) A method for robust fuzzy relational clustering (unpublished paper)Google Scholar
  119. Zhang TS, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method foe very large databases. In: Proceedings of 1996 ACM-SIGMOD International Conference on Management of Data (SIGMOD’96), pp. 103–114Google Scholar
  120. Zhuang X, Huang Y, Zhao Y (1996) Gaussian mixture density modeling, decomposition, and applications. IEEE Trans Image Process 5(9):1293–1301Google Scholar
  121. Zhuang X, Wang T, Zhang P (1992) A highly robust estimator through partially likehood function modeling and its application in computer vision. J IEEE Trans Pattern Anal Mach Intell 14:19–35Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Dept. of Geography & Resource Management ShatinThe Chinese University of Hong KongNew TerritoriesHong Kong SAR

Personalised recommendations