Minimum Cluster Size Estimation and Cluster Refinement for the Randomized Gravitational Clustering Algorithm

  • Jonatan Gomez
  • Elizabeth León
  • Olfa Nasraoui
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7637)

Abstract

Although clustering is an unsupervised learning approach, most clustering algorithms require setting several parameters (such as the number of clusters, minimum density or distance threshold) in advance to work properly. In this paper, we eliminate the necessity of setting the minimum cluster size parameter of the Randomized Gravitational Clustering algorithm proposed by Gomez et al. Basically, the minimum cluster size is estimated using a heuristic that takes in consideration the functional relation between the number of clusters and the clusters with at least a given number of points. Then a data point’s region of action (region of the space assigned to a point) is defined and a cluster refinement process is proposed in order to merge overlapping clusters. Our experimental results show that the proposed algorithm is able to deal with noise, while finding an appropriate number of clusters without requiring a manual setting of the minimum cluster size.

Keywords

data mining data clustering gravitational clustering cluster refinement cluster size estimation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenun Press (1981)Google Scholar
  2. 2.
    Cormer, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. McGraw-Hill (1990)Google Scholar
  3. 3.
    Ester, M., Kriegel, H., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: 2nd Intl. Conf. on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231. AAAI (1996)Google Scholar
  4. 4.
    Gomez, J., Dasgupta, D., Nasraoui, O.: A New Gravitational Clustering Algorithm. In: 3rd SIAM Intl. Conf. on Data Mining (SDM 2003), vol. 3, pp. 83–94. Society for Industrial and Applied Mathematics (2003)Google Scholar
  5. 5.
    Gomez, J., Nasraoui, O., Leon, E.: RAIN – Data Clustering Using Randomized Interactions between Data Points. In: 3rd Intl. Conf. on Machine Learning and Applications (ICMLA 2004), pp. 250–255 (2004)Google Scholar
  6. 6.
    Han, J., Kamber, M.: Data Mining – Concepts and Techniques. Morgan Kaufmann (2000)Google Scholar
  7. 7.
    Jain, A.K.: Data Clustering – 50 Years Beyond K-Means. Pattern Recognition Letters 31(8), 651–666 (2010)CrossRefGoogle Scholar
  8. 8.
    Karypis, G., Han, E., Kumar, V.: CHAMELEON – A Hierarchical Clustering Algorithm Using Dynamic Model. IEEE Computer 32(8), 68–75 (1999)CrossRefGoogle Scholar
  9. 9.
    Kundu, S.: Gravitational Clustering – A New Approach Based on the Spatial Distribution of the Points. Pattern Recognition 32(7), 1149–1160 (1999)CrossRefGoogle Scholar
  10. 10.
    Leon, E., Nasraoui, O., Gomez, J.: A Scalable Evolutionary Clustering Algorithm with Self-Adaptive Genetic Operators. In: 2010 IEEE Congress on Evolutionary Computation (CEC 2010), pp. 4010–4017. IEEE (2010)Google Scholar
  11. 11.
    MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: 5th Berkeley Symposium on Mathematics, Statistics, and Probabilities, pp. 281–297. University of California (1967)Google Scholar
  12. 12.
    Nasraoui, O., Krishnapuram, R.: A Novel Approach to Unsupervised Robust Clustering Using Genetic Niching. In: 9th IEEE Intl. Conf. on Fuzzy Systems (FUZZ IEEE 2000), vol. 1, pp. 170–175 (2000)Google Scholar
  13. 13.
    Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley & Sons (1987)Google Scholar
  14. 14.
    Wright, W.E.: Gravitational Clustering. Pattern Recognition 9(3), 151–166 (1977)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jonatan Gomez
    • 1
  • Elizabeth León
    • 1
  • Olfa Nasraoui
    • 2
  1. 1.Alife & Midas Research Groups, Computer Systems EngineeringUniversidad Nacional de ColombiaColombia
  2. 2.Knowledge Discovery & Web Mining Lab, Dept. of Computer Engineering & Computer ScienceUniversity of LouisvilleUSA

Personalised recommendations