Speeding-Up Hierarchical Agglomerative Clustering in Presence of Expensive Metrics

Nanni, Mirco

doi:10.1007/11430919_45

Speeding-Up Hierarchical Agglomerative Clustering in Presence of Expensive Metrics

Mirco Nanni²¹

Conference paper

2583 Accesses
13 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Abstract

In several contexts and domains, hierarchical agglomerative clustering (HAC) offers best-quality results, but at the price of a high complexity which reduces the size of datasets which can be handled. In some contexts, in particular, computing distances between objects is the most expensive task. In this paper we propose a pruning heuristics aimed at improving performances in these cases, which is well integrated in all the phases of the HAC process and can be applied to two HAC variants: single-linkage and complete-linkage. After describing the method, we provide some theoretical evidence of its pruning power, followed by an empirical study of its effectiveness over different data domains, with a special focus on dimensionality issues.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breunig, M.M., Kriegel, H.-P., Krüger, P., Sander, J.: Data bubbles: quality preserving performance boosting for hierarchical clustering. In: SIGMOD 2001: Proc. of the 2001 ACM SIGMOD Int’ Conf. on Management of data, pp. 79–90 (2001)
Google Scholar
Eppstein, D.: Fast hiearchical clustering and other applications of dynamic closet pairs. In: SODA 1998: Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, pp. 619–628 (1998)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Krznaric, D., Levcopoulos, C.: The first subquadratic algorithm for complete linkage clustering. In: Staples, J., Katoh, N., Eades, P., Moffat, A. (eds.) ISAAC 1995. LNCS, vol. 1004, pp. 392–401. Springer, Heidelberg (1995)
Chapter Google Scholar
Krznaric, D., Levcopoulos, C.: Optimal algorithms for complete linkage clustering in d dimensions. Theor. Comput. Sci. 286(1), 139–149 (2002)
Article MATH MathSciNet Google Scholar
Mettu, R.R., Plaxton, C.G.: Optimal time bounds for approximate clustering. Machine Learning 56(1–3), 35–60 (2004)
Article MATH Google Scholar
Nanni, M.: Clustering methods for spatio-temporal data. PhD thesis, Dipartimento di Informatica, Università di Pisa (2002)
Google Scholar
Nanni, M.: Hierarchical clustering in presence of expensive metrics. Technical report, ISTI-CNR (2005), http://ercolino.isti.cnr.it/mirco/papers.html

Download references

Author information

Authors and Affiliations

ISTI-CNR, Pisa, Italy
Mirco Nanni

Authors

Mirco Nanni
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Asahidai 1-1, 923-12292, Nomi, Japan
Tu Bao Ho
University of Hong Kong, Pokfulam Road, Hong Kong, China
David Cheung
Department of Computer Science and Engineering, Arizona State University, Tempe, Arizona, USA
Huan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nanni, M. (2005). Speeding-Up Hierarchical Agglomerative Clustering in Presence of Expensive Metrics. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_45

Download citation

DOI: https://doi.org/10.1007/11430919_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics