Abstract
Average-link (AL) is a distance based hierarchical clustering method, which is not sensitive to the noisy patterns. However, like all hierarchical clustering methods AL also needs to scan the dataset many times. AL has time and space complexity of O(n 2), where n is the size of the dataset. These prohibit the use of AL for large datasets. In this paper, we have proposed a distance based hierarchical clustering method termed l-AL which speeds up the classical AL method in any metric (vector or non-vector) space. In this scheme, first leaders clustering method is applied to the dataset to derive a set of leaders and subsequently AL clustering is applied to the leaders. To speed-up the leaders clustering method, reduction in distance computations is also proposed in this paper. Experimental results confirm that the l-AL method is considerably faster than the classical AL method yet keeping clustering results at par with the classical AL method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hartigan, J.A.: Clustering Algorithms. John Wiley & Sons, Inc., New York (1975)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. In: Proceedings ACM SIGMOD, pp. 49–60 (1999)
Sneath, A., Sokal, P.H.: Numerical Taxonomy. Freeman, London (1973)
King, B.: Step-Wise Clustering Procedures. Journal of the American Statistical Association 62(317), 86–101 (1967)
Murtagh, F.: Complexities of hierarchic clustering algorithms: state of the art. Computational Statistics Quarterly 1, 101–113 (1984)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proceedings of the 1996 ACM SIGMOD, pp. 103–114 (1996)
Dash, M., Liu, H., Scheuermann, P., Tan, K.L.: Fast hierarchical clustering and its validation. Data Knowl. Eng. 44(1), 109–138 (2003)
Nanni, M.: Speeding-up hierarchical agglomerative clustering in presence of expensive metrics. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 378–387. Springer, Heidelberg (2005)
Koga, H., Ishibashi, T., Watanabe, T.: Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing. Knowledge and Information Systems 12(1), 25–53 (2007)
Viswanath, P., Babu, V.: Rough-dbscan: a fast hybrid density based clustering method for latge data sets. Pattern Recognition Letters 30(16), 1477–1488 (2009)
Patra, B.K., Nandi, S.: A Fast Single Link Clustering Method Based on Tolerance Rough Set Model. In: Sakai, H., et al. (eds.) RSFDGrC 2009. LNCS (LNAI), vol. 5908, pp. 414–422. Springer, Heidelberg (2009)
Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Computing 21, 1313–1325 (1995)
Elkan, C.: Using the triangle inequality to accelerate k-means. In: ICML, pp. 147–153 (2003)
Nassar, S., Sander, J., Cheng, C.: Incremental and effective data summarization for dynamic hierarchical clustering. In: Proceedings of SIGMOD Conference, pp. 467–478 (2004)
Rand, W.M.: Objective Criteria for Evaluation of Clustering Methods. Journal of American Statistical Association 66(336), 846–850 (1971)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Patra, B.K., Hubballi, N., Biswas, S., Nandi, S. (2010). Distance Based Fast Hierarchical Clustering Method for Large Datasets. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds) Rough Sets and Current Trends in Computing. RSCTC 2010. Lecture Notes in Computer Science(), vol 6086. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13529-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-13529-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13528-6
Online ISBN: 978-3-642-13529-3
eBook Packages: Computer ScienceComputer Science (R0)