Skip to main content

Effective Data Clustering Algorithms

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 742))

Abstract

Clustering in data mining is a supreme step toward organizing data into some meaningful patterns. It plays an extremely crucial role in the entire KDD process, and also as categorizing data is one of the most rudimentary steps in knowledge discovery. Clustering is used for creating partitions or clusters of similar objects. It is an unsupervised learning task used for exploratory data analysis to find some unrevealed patterns which are present in data but cannot be categorized clearly. Sets of data can be designated or grouped together based on some common characteristics and termed clusters, and the implementation steps involved in cluster analysis are essentially dependent upon the primary task of keeping objects within a cluster more closer than objects belonging to other groups or clusters. Depending on the data and expected cluster characteristics, there are different types of clustering algorithms. In the very recent times, many new algorithms have emerged, which aim toward bridging the different approaches toward clustering and merging different clustering algorithms given the requirement of handling sequential, high-dimensional data with multiple relationships in many applications across a broad spectrum. The paper aims to survey, study, and analyze few clustering algorithms and provides a comprehensive comparison of their efficiency on some common grounds. This study also contributes in correlating some very important characteristics of an efficient clustering algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Cherkassky, V., Mulier, F.: Learning From Data: Concepts, Theory, and Methods. Wiley, New York (1998)

    MATH  Google Scholar 

  2. Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowl. Inf. Syst. 5(4), 387–415 (2003)

    Article  Google Scholar 

  3. Mann, A.K., Kaur, N.: Survey paper onclustering techniques. IJSETR: Int. J. Sci. Eng. Technol. Res. 2(4) (2013) (ISSN: 2278-7798)

    Google Scholar 

  4. Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Arnold, London (2001)

    MATH  Google Scholar 

  5. Xu, R., Wunch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3) (2005)

    Google Scholar 

  6. Berkhin, P.: Survey of clustering data mining techniques (2001). http://www.accrue.com/products/rp_cluster_review.p, http://citeseer.nj.nec.com/berkhin02survey.html.

  7. Kleinberg, J.: An impossibility theorem for clustering. In: Proceedings of the 2002 Conference on Advances in Neural Information Processing Systems, vol. 15, pp. 463–470 (2002)

    Google Scholar 

  8. Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets, min knowl disc, vol. 10, p. 141 (2005). https://doi.org/10.1007/s10618-005-0361-3

  9. Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 73–84 (1998)

    Google Scholar 

  10. Guha, S., Rastogi, R., Shim. K.: ROCK: a robust clustering algorithm for categorical attributes. In: 18th Proceedings of the 15th International Conference on Data Engineering (1999)

    Google Scholar 

  11. Sneath, P.: The application of computers to taxonomy. J. Gen. Microbiol. 17, 201–226 (1957)

    Article  Google Scholar 

  12. Fasulo, D.: An analysis of recent work on clustering algorithms. Department of Computer Science Engineering University of Washington, Seattle, WA, Technical Report, 01-03-02 (1999)

    Google Scholar 

  13. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of data, Montreal, Quebec, Canada, pp. 103–114, 04–06 June 1996

    Google Scholar 

  14. Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. Inf. Syst. 25(5), 345–366 (2000)

    Google Scholar 

  15. Karypis, G., Han, E.-H., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)

    Google Scholar 

  16. Cutting, D., Pedersen, J., Karger, D., Tukey, J.: Scatter/gather: a cluster-based approach to browsing large document collections. In Proceedings of the ACM SIGIR, Copenhagen, pp. 318–329 (1992)

    Google Scholar 

  17. Ball, G.H., Hall, D.J.: ISODATA–A novel method data analysis and pattern classification. Menlo park: Stanford Res. Inst, CA (1965)

    Google Scholar 

  18. Macqueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkely Symposium on Mathematical statistics and probability, 1, 281–297 (1967)

    Google Scholar 

  19. He, J., Lan, M., Tan, C.-L., Sung, S.-Y., Low, H.-B.: Initialization of Cluster refinement algorithms: a review and comparative study. In: Proceeding of International Joint Conference on Neural Networks, Budapest (2004)

    Google Scholar 

  20. Biswas, G., Weingberg, J., Fisher, D.H.: ITERATE: a conceptual clustering algorithm for data mining. IEEE Trans. Syst. Cybern. 28C, 219–230

    Google Scholar 

  21. Han, J., Kamber, M.: Data Mining Concepts and Techniques-a Reference Book, pp. 383–422

    Google Scholar 

  22. Pujari, A.K.: Data Mining Techniques-a Reference Book, pp. 114–147

    Google Scholar 

  23. He, Z., Xu, X., Deng, S.: Scalable algorithms for clustering large datasets with mixed type attributes. Int. J. Intell. Syst. 20, 1077–1089

    Google Scholar 

  24. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 226–231 (1996)

    Google Scholar 

  25. Ball, G., Hall, D.: A clustering technique for summarizing multivariatedata. Behav. Sci. 12, 153–155 (1967)

    Article  Google Scholar 

  26. Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)

    MATH  Google Scholar 

  27. Idrissi, A., Rehioui, H.: An improvement of denclue algorithm for the data clustering. In: 2015 5th International Conference Information & Communication Technology and Accessibility (ICTA). IEEE Xplore, 10 Mar 2016

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamalpreet Bindra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bindra, K., Mishra, A., Suryakant (2019). Effective Data Clustering Algorithms. In: Ray, K., Sharma, T., Rawat, S., Saini, R., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 742. Springer, Singapore. https://doi.org/10.1007/978-981-13-0589-4_39

Download citation

Publish with us

Policies and ethics