Advertisement

Journal of Central South University

, Volume 18, Issue 3, pp 823–829 | Cite as

A new clustering algorithm for large datasets

  • Qing-feng Li (李清峰)Email author
  • Wen-feng Peng (彭文峰)
Article

Abstract

The Circle algorithm was proposed for large datasets. The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices. This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering. The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem, and showed how sampling can be used to scale the algorithms for large datasets. An extensive empirical evaluation was given for the usefulness of the problem and the solutions. The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering.

Key words

data mining Circle algorithm clustering categorical data clustering aggregation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    AILON N, CHARIKAR M, NEWMAN A. Aggregating inconsistent information: Ranking and clustering [C]// Proceedings of the ACM Symposium on Theory of Computing. Arlington: IEEE, 2009: 684–693.Google Scholar
  2. [2]
    ANDRITSOS P, TSAPARAS P, MILLER R J, SEVCIK K C, LIMBO. Scalable clustering of categorical data [C]// Proceedings of the International Conference on Extending Database Technology. Vancourer: IEEE, 2007: 123–146.Google Scholar
  3. [3]
    BANSAL N, BLUM A, CHAWLA S. Correlation clustering in Machine Learn [M]. Cambridge: Kluwer Academic Publisher, 2008: 3–11.Google Scholar
  4. [4]
    BARTHELEMY J, LECLERC B. The median procedure for partitions [J]. DIMACS Series in Discrete Mathematics, 2008, 41(10): 23–32.Google Scholar
  5. [5]
    BOULIS C, OSTENDORF M. Combining multiple clustering systems [C]// Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases. Selangor: IEEE, 2005: 63–74.Google Scholar
  6. [6]
    CHARIKAR M, GURUSWAMI V, WIRTH A. Clustering with qualitative information [C]// Proceedings of the IEEE Symposium on Foundations of Computer Science. Seattle: IEEE Computer Society, 2008: 524–533.Google Scholar
  7. [7]
    CRISTOFOR D, SIMOVICI D A. An information-theoretical approach to genetic algorithms for clustering [J]. IEEE Communication Magazine, 2009, 23(6): 876–881.Google Scholar
  8. [8]
    HAN J W, KAMBER M, FAN M, MENG X F. Data mining concepts and techniques [M]. Beijing: China Machine Press, 2001: 232–236.Google Scholar
  9. [9]
    DEMAINE E D, EMANUEL D, FIAT A, IMMORLICA N. Correlation clustering in general weighted graphs [J]. Theoretical Computer Science, 2006, 36(1): 172–187.MathSciNetCrossRefGoogle Scholar
  10. [10]
    DWORK C, KUMAR R, NAOR M, SIVAKUMAR D. Rank aggregation methods for the Web [C]// Proceedings of the International World Wide Web Conference. Athens: IEEE, 2003: 613–622.Google Scholar
  11. [11]
    FAGIN R, KUMAR R, SIVAKUMAR D. Comparing top kind lists [C]// Proceedings of the ACMSIAM Symposium on Discrete Algorithms. Alaska: IEEE, 2008: 28–36.Google Scholar
  12. [12]
    FERN X Z, BRODLEY C E. Random projection for high dimensional data clustering: A cluster ensemble approach [C]// Proceedings of the International Conference on Machine Learning. Texas: IEEE, 2003: 186–193.Google Scholar
  13. [13]
    GUAN Qing-yang, ZHAO Hong-lin, GUO Qing. Cancellation for frequency offset in OFDM system based on TF-LMS algorithm [J]. Journal of Central South University of Technology, 2010, 17(6): 1293–1299.CrossRefGoogle Scholar
  14. [14]
    BAI S. Concept clustering under insufficient knowledge [J]. Chinese Journal of Computers, 1995, 18(6): 409–416.Google Scholar
  15. [15]
    GUO J S, ZHAO Y, SHI P F. An efficient dynamic conceptual clustering algorithm for data mining [J]. Journal of Software, 2001, 12(4): 582–591.Google Scholar

Copyright information

© Central South University Press and Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Qing-feng Li (李清峰)
    • 1
    • 2
    Email author
  • Wen-feng Peng (彭文峰)
    • 1
  1. 1.Department of Computer and Electronic EngineeringHunan University of CommerceChangshaChina
  2. 2.School of Information Science and EngineeringCentral South UniversityChangshaChina

Personalised recommendations