Skip to main content

Speeding-Up Hierarchical Agglomerative Clustering in Presence of Expensive Metrics

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Abstract

In several contexts and domains, hierarchical agglomerative clustering (HAC) offers best-quality results, but at the price of a high complexity which reduces the size of datasets which can be handled. In some contexts, in particular, computing distances between objects is the most expensive task. In this paper we propose a pruning heuristics aimed at improving performances in these cases, which is well integrated in all the phases of the HAC process and can be applied to two HAC variants: single-linkage and complete-linkage. After describing the method, we provide some theoretical evidence of its pruning power, followed by an empirical study of its effectiveness over different data domains, with a special focus on dimensionality issues.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breunig, M.M., Kriegel, H.-P., Krüger, P., Sander, J.: Data bubbles: quality preserving performance boosting for hierarchical clustering. In: SIGMOD 2001: Proc. of the 2001 ACM SIGMOD Int’ Conf. on Management of data, pp. 79–90 (2001)

    Google Scholar 

  2. Eppstein, D.: Fast hiearchical clustering and other applications of dynamic closet pairs. In: SODA 1998: Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, pp. 619–628 (1998)

    Google Scholar 

  3. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  4. Krznaric, D., Levcopoulos, C.: The first subquadratic algorithm for complete linkage clustering. In: Staples, J., Katoh, N., Eades, P., Moffat, A. (eds.) ISAAC 1995. LNCS, vol. 1004, pp. 392–401. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  5. Krznaric, D., Levcopoulos, C.: Optimal algorithms for complete linkage clustering in d dimensions. Theor. Comput. Sci. 286(1), 139–149 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  6. Mettu, R.R., Plaxton, C.G.: Optimal time bounds for approximate clustering. Machine Learning 56(1–3), 35–60 (2004)

    Article  MATH  Google Scholar 

  7. Nanni, M.: Clustering methods for spatio-temporal data. PhD thesis, Dipartimento di Informatica, Università di Pisa (2002)

    Google Scholar 

  8. Nanni, M.: Hierarchical clustering in presence of expensive metrics. Technical report, ISTI-CNR (2005), http://ercolino.isti.cnr.it/mirco/papers.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nanni, M. (2005). Speeding-Up Hierarchical Agglomerative Clustering in Presence of Expensive Metrics. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_45

Download citation

  • DOI: https://doi.org/10.1007/11430919_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26076-9

  • Online ISBN: 978-3-540-31935-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics