Effective Data Clustering Algorithms

Bindra, Kamalpreet; Mishra, Anuranjan; Suryakant

doi:10.1007/978-981-13-0589-4_39

Effective Data Clustering Algorithms

Kamalpreet Bindra¹⁹,
Anuranjan Mishra¹⁹ &
Suryakant¹⁹

Conference paper
First Online: 31 August 2018

894 Accesses
6 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 742))

Abstract

Clustering in data mining is a supreme step toward organizing data into some meaningful patterns. It plays an extremely crucial role in the entire KDD process, and also as categorizing data is one of the most rudimentary steps in knowledge discovery. Clustering is used for creating partitions or clusters of similar objects. It is an unsupervised learning task used for exploratory data analysis to find some unrevealed patterns which are present in data but cannot be categorized clearly. Sets of data can be designated or grouped together based on some common characteristics and termed clusters, and the implementation steps involved in cluster analysis are essentially dependent upon the primary task of keeping objects within a cluster more closer than objects belonging to other groups or clusters. Depending on the data and expected cluster characteristics, there are different types of clustering algorithms. In the very recent times, many new algorithms have emerged, which aim toward bridging the different approaches toward clustering and merging different clustering algorithms given the requirement of handling sequential, high-dimensional data with multiple relationships in many applications across a broad spectrum. The paper aims to survey, study, and analyze few clustering algorithms and provides a comprehensive comparison of their efficiency on some common grounds. This study also contributes in correlating some very important characteristics of an efficient clustering algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Cherkassky, V., Mulier, F.: Learning From Data: Concepts, Theory, and Methods. Wiley, New York (1998)
MATH Google Scholar
Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowl. Inf. Syst. 5(4), 387–415 (2003)
Article Google Scholar
Mann, A.K., Kaur, N.: Survey paper onclustering techniques. IJSETR: Int. J. Sci. Eng. Technol. Res. 2(4) (2013) (ISSN: 2278-7798)
Google Scholar
Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Arnold, London (2001)
MATH Google Scholar
Xu, R., Wunch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3) (2005)
Google Scholar
Berkhin, P.: Survey of clustering data mining techniques (2001). http://www.accrue.com/products/rp_cluster_review.p, http://citeseer.nj.nec.com/berkhin02survey.html.
Kleinberg, J.: An impossibility theorem for clustering. In: Proceedings of the 2002 Conference on Advances in Neural Information Processing Systems, vol. 15, pp. 463–470 (2002)
Google Scholar
Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets, min knowl disc, vol. 10, p. 141 (2005). https://doi.org/10.1007/s10618-005-0361-3
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 73–84 (1998)
Google Scholar
Guha, S., Rastogi, R., Shim. K.: ROCK: a robust clustering algorithm for categorical attributes. In: 18th Proceedings of the 15th International Conference on Data Engineering (1999)
Google Scholar
Sneath, P.: The application of computers to taxonomy. J. Gen. Microbiol. 17, 201–226 (1957)
Article Google Scholar
Fasulo, D.: An analysis of recent work on clustering algorithms. Department of Computer Science Engineering University of Washington, Seattle, WA, Technical Report, 01-03-02 (1999)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of data, Montreal, Quebec, Canada, pp. 103–114, 04–06 June 1996
Google Scholar
Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. Inf. Syst. 25(5), 345–366 (2000)
Google Scholar
Karypis, G., Han, E.-H., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)
Google Scholar
Cutting, D., Pedersen, J., Karger, D., Tukey, J.: Scatter/gather: a cluster-based approach to browsing large document collections. In Proceedings of the ACM SIGIR, Copenhagen, pp. 318–329 (1992)
Google Scholar
Ball, G.H., Hall, D.J.: ISODATA–A novel method data analysis and pattern classification. Menlo park: Stanford Res. Inst, CA (1965)
Google Scholar
Macqueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkely Symposium on Mathematical statistics and probability, 1, 281–297 (1967)
Google Scholar
He, J., Lan, M., Tan, C.-L., Sung, S.-Y., Low, H.-B.: Initialization of Cluster refinement algorithms: a review and comparative study. In: Proceeding of International Joint Conference on Neural Networks, Budapest (2004)
Google Scholar
Biswas, G., Weingberg, J., Fisher, D.H.: ITERATE: a conceptual clustering algorithm for data mining. IEEE Trans. Syst. Cybern. 28C, 219–230
Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques-a Reference Book, pp. 383–422
Google Scholar
Pujari, A.K.: Data Mining Techniques-a Reference Book, pp. 114–147
Google Scholar
He, Z., Xu, X., Deng, S.: Scalable algorithms for clustering large datasets with mixed type attributes. Int. J. Intell. Syst. 20, 1077–1089
Google Scholar
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 226–231 (1996)
Google Scholar
Ball, G., Hall, D.: A clustering technique for summarizing multivariatedata. Behav. Sci. 12, 153–155 (1967)
Article Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar
Idrissi, A., Rehioui, H.: An improvement of denclue algorithm for the data clustering. In: 2015 5th International Conference Information & Communication Technology and Accessibility (ICTA). IEEE Xplore, 10 Mar 2016
Google Scholar

Download references

Author information

Authors and Affiliations

Noida International University, Plot 1, Sector-17 A, Yamuna Expressway, Gautam Budh Nagar, Noida, 203201, Uttar Pradesh, India
Kamalpreet Bindra, Anuranjan Mishra & Suryakant

Authors

Kamalpreet Bindra
View author publications
You can also search for this author in PubMed Google Scholar
Anuranjan Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Suryakant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kamalpreet Bindra .

Editor information

Editors and Affiliations

Department of Physics, Amity School of Applied Sciences, Amity University Rajasthan, Jaipur, Rajasthan, India
Kanad Ray
Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Rajasthan, Jaipur, Rajasthan, India
Tarun K. Sharma
Department of Electronics and Communication Engineering, SEEC, Manipal University Jaipur, Jaipur, Rajasthan, India
Sanyog Rawat
Institute of Basic Science, Bundelkhand University, Jhansi, Uttar Pradesh, India
R. K. Saini
Advanced Key Technologies Division, Nano Characterization Unit, Surface Characterization Group, National Institute for Materials Science, Tsukuba, Ibaraki, Japan
Anirban Bandyopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bindra, K., Mishra, A., Suryakant (2019). Effective Data Clustering Algorithms. In: Ray, K., Sharma, T., Rawat, S., Saini, R., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 742. Springer, Singapore. https://doi.org/10.1007/978-981-13-0589-4_39

Download citation

DOI: https://doi.org/10.1007/978-981-13-0589-4_39
Published: 31 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0588-7
Online ISBN: 978-981-13-0589-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics