Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time

Hassan, Syed Imtiyaz; Samad, Afreen; Ahmad, Omair; Alam, Afshar

doi:10.1007/s41870-019-00406-7

Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time

Original Research
Published: 29 November 2019

Volume 12, pages 1377–1384, (2020)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

Syed Imtiyaz Hassan¹,
Afreen Samad¹,
Omair Ahmad¹ &
…
Afshar Alam¹

184 Accesses
10 Citations
Explore all metrics

Abstract

Clustering is an unsupervised data mining technique where exploration is done with little knowledge of data classes. Its aim is to recognize the hidden information from the data for effective decision-making. Though many clustering algorithms has already been implemented till date, still it is an active topic of research for data mining. Researcher’s attempts to explore, compare, evaluate, and improve the different clustering algorithms available, for specialized situation and context. The purpose of all these efforts are to refine and propose improved version of algorithm after statistical evaluation by different metrices. The present research is an attempt to analysis empirically, the partitioning based clustering algorithms and hierarchical based clustering algorithm; by conducting extensive experiments. Both algorithms effectiveness has been measured through external and internal validity indices and Pearson’s correlation distance function using anatomized experiments. The parameters of evaluation that have been taken into consideration; for Internal Indices: Silhouette Index, Davies-Bouldin Validity Index and Calinski-Harabasz index; for external indices: Jaccard index, Rand Index, Entropy and Normalized Mutual Information. The other parameters of evaluation are accuracy and time of execution. Based on the experiments it may be concluded that K-means algorithm produces more promising result than hierarchical algorithm except in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Clustering Algorithms: Experimentation and Comparison

A Comparative Study on k-means Clustering Method and Analysis

A Generalized Study on Data Mining and Clustering Algorithms

References

Deborah LJ, Baskaran R, Kannan A (2010) A survey on internal validity measure for cluster validation. Int J Comput Sci Eng Surv. https://doi.org/10.5121/ijcses.2010.1207
Article Google Scholar
Hassan SI (2017) Designing a flexible system for automatic detection of categorical student sentiment polarity using machine learning. Int J u- e- Serv Sci Technol 10(3):25–32
Article Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):255–323
Article Google Scholar
Kryszczuk K, Hurley P (2010) Estimation of the number of clusters using multiple clustering validity indices. IBM Zurich Research Laboratory, Switzerland
Book Google Scholar
Thakre YS, Bagal SB (2015) Performance evaluation of K-means clustering algorithm with various distance metrics. Int J Comput Appl 110(11):12–16
Google Scholar
Tang DJC, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386
Article Google Scholar
Chormunge S, Jena S (2015) Efficiency and effectiveness of clustering algorithms for high dimensional data. Int J Comput Appl 125(11):35–40
Google Scholar
Firdaus S, Uddin A (2015) A survey on clustering algorithms and complexity analysis. IJCSI Issues 12(2):62–85
Google Scholar
Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using MCDM methods. Inf Sci 275:1–12. https://doi.org/10.1016/j.ins.2014.02.137
Article Google Scholar
Riyaz R, Wani MA (2014) Review and comparative study of cluster validity techniques using k-means algorithm. Int J Adv Found Res Sci Eng 1(3):236–241
Google Scholar
Ansari Z, Babu AV, Azeem MF, Ahmed W (2011) Quantitative evaluation of performance and validity indices for clustering the web navigational session. WCSIT 1(5):217–226
Google Scholar
Chen G, Jaradat SA, Banerjee N, Tanaka T, Ko MSH (2002) Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data. Stat Sin 12:241–262
MathSciNet MATH Google Scholar
Larsen B, Aone C (1999) Fast and effective text mining using linear time document clustering. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '99). ACM, New York, NY, USA, pp 16–22. https://doi.org/10.1145/312129.312186
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. KDD Workshop on Text Mining
Abbas OA (2008) Comparison between data clustering algorithm. IAJIT 5(3):320–325
Google Scholar
Bala R, Sikka S, Singh J (2014) A comparative analysis of clustering algorithms. Int J Comput Appl 100(15):35–39 (00975-8875)
Google Scholar
Kaur M, Kaur U (2013) Comparison between k-means and hierarchical algorithm using query redirection. Int J Adv Res Comput Sci Softw Eng 3(7):1454–1459
Google Scholar
Maulik U, Bandhyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654
Article Google Scholar
Rajalakshmi K (2015) Comparative analysis of K-means algorithm in disease prediction. IJSETR 4(7):2697–2699
Google Scholar
Singh P, Surya A (2015) Performance analysis of clustering algorithms in data mining in weka. IJAET 7:1866 (ISSN: 22311963)
Google Scholar
Gunaskara RPTH, Wijegunasekara MC, Dias NGJ (2014) Comparison of major clustering algorithms using Weka tool. In: Internal Conference in advances in ICT for emerging regions
Pal NR, Biswas J (1997) Cluster validation using graph theoretic concepts. Pattern Recogn 1997(30):847–857
Article Google Scholar
Arbelaitz O, Gurrutxagan I, Muguerza J, Pe´rez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46:243–256
Article Google Scholar
Fahad A, Alshatr N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A survey of clustering algorithm for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/TETC.2014.2330519
Article Google Scholar
Gupta GK (2006) Introduction to data mining with case studies. PHI Learning Pvt Ltd, Delhi
Google Scholar
Hassan SI (2016) Extracting the sentiment score of customer review from unstructured big data using map reduce algorithm. Int J Database Theory Appl 9(12):289–298. https://doi.org/10.14257/ijdta.2016.9.12.26(ISSN: 2005-4270)
Article Google Scholar
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall Inc, Upper Saddle River
MATH Google Scholar
Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci. https://doi.org/10.1007/s40745-015-0040-1
Article Google Scholar
Xu R (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar
Kapil S, Chawla M (2016) Performance evaluation of k-means clustering algorithm with various distance metrics. In: 1st IEEE International Conference on power electronics, intelligent control and energy systems (ICPEICES)
Halkidi et al (2001) J Intell Inf Syst 17(2/3):107–145
Article Google Scholar
Theodoridis S, Koutroubas K (1999) Pattern recognition. Academic Press, Cambridge
Google Scholar
Johnson S (1967) Hierarchical clustering schemes. Psychometrika 32:241–254
Article Google Scholar
Liu Y, Li Z, Xong H, Gao X, Wu J, Wu S (2010) Understanding of internal custer validation measures. In: IEEE International conference on data mining (ICDM '10). IEEE Computer Society, Washington, DC, USA, pp 911–916. https://doi.org/10.1109/ICDM.2010.35
Guyon I, von Luxburg U, Williamson RC (2009) Clustering: science or art? In: NIPS (ed) Workshop on clustering theory. Vancouver, Canada
Google Scholar
Rendon E, Abundez I, Arizmendi A, Quiroz EM (2011) Internal versus external cluster validation indexes. Int J Comput Commun 5(1):27–34
Google Scholar
Halkid M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2/3):107–145
Article Google Scholar
Everitt Brian (1980) Cluster analysis. Qual Quant 14(1):75–100
Article Google Scholar
Dziopa T (2016) Clustering validity indices evaluation with regards to semantic homogeneity. In: Position papers of the federated conference on computer science and information systems, ACSIS, vol 9, pp 3–9. https://doi.org/10.15439/2016f371(ISSN 2300-5963)
Mary SAL, Sivagami AN, Rani MU (2015) Cluster validity measures dynamic clustering algorithms. ARPN J Eng Appl Sci 10(9):4009–4012
Google Scholar
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(4):224–227
Article Google Scholar
Jaccard P (1912) The distribution of flora in the alpine zone. New Phytol 11:37–50
Article Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Article Google Scholar
Dias DB, Madeo RCB, Rocha T, Peres SM (2009) Hand movement recognition for brazilian sign language: a study using distance-based neural networks. In: International Joint conference on neural networks 2009, Atlanta, GA. Proceedings of 2009 International Joint Conference on Neural Networks. Eau Claire, WI, USA : Documation LLC, 2009. pp 697–704. https://doi.org/10.1109/ijcnn.2009.5178917

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, School of Engineering Sciences and Technology, Jamia Hamdard (Deemed to be University), New Delhi, India
Syed Imtiyaz Hassan, Afreen Samad, Omair Ahmad & Afshar Alam

Authors

Syed Imtiyaz Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Afreen Samad
View author publications
You can also search for this author in PubMed Google Scholar
Omair Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Afshar Alam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Syed Imtiyaz Hassan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassan, S.I., Samad, A., Ahmad, O. et al. Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time. Int. j. inf. tecnol. 12, 1377–1384 (2020). https://doi.org/10.1007/s41870-019-00406-7

Download citation

Received: 20 February 2019
Accepted: 20 November 2019
Published: 29 November 2019
Issue Date: December 2020
DOI: https://doi.org/10.1007/s41870-019-00406-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time

Abstract

Access this article

Similar content being viewed by others

Data Clustering Algorithms: Experimentation and Comparison

A Comparative Study on k-means Clustering Method and Analysis

A Generalized Study on Data Mining and Clustering Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time

Abstract

Access this article

Similar content being viewed by others

Data Clustering Algorithms: Experimentation and Comparison

A Comparative Study on k-means Clustering Method and Analysis

A Generalized Study on Data Mining and Clustering Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation