Skip to main content

DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation

  • Conference paper
Book cover Advances in Intelligent Data Analysis VII (IDA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4723))

Included in the following conference series:

Abstract

The Denclue algorithm employs a cluster model based on kernel density estimation. A cluster is defined by a local maximum of the estimated density function. Data points are assigned to clusters by hill climbing, i.e. points going to the same local maximum are put into the same cluster. A disadvantage of Denclue 1.0 is, that the used hill climbing may make unnecessary small steps in the beginning and never converges exactly to the maximum, it just comes close.

We introduce a new hill climbing procedure for Gaussian kernels, which adjusts the step size automatically at no extra costs. We prove that the procedure converges exactly towards a local maximum by reducing it to a special case of the expectation maximization algorithm. We show experimentally that the new procedure needs much less iterations and can be accelerated by sampling based methods with sacrificing only a small amount of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: Optics: Ordering points to identify the clustering structure. In: Proceedings SIGMOD 1999, pp. 49–60. ACM Press, New York (1999)

    Chapter  Google Scholar 

  2. Bezdek, J.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic Publishers, Dordrecht (1999)

    MATH  Google Scholar 

  3. Bock, H.H.: Automatic Classification. Vandenhoeck and Ruprecht (1974)

    Google Scholar 

  4. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, London (1990)

    MATH  Google Scholar 

  5. Fukunaga, K., Hostler, L.: The estimation of the gradient of a density function, with application in pattern recognition. IEEE Trans. Info. Thy. 21, 32–40 (1975)

    Article  MATH  Google Scholar 

  6. Herbin, M., Bonnet, N., Vautrot, P.: Estimation of the number of clusters and influence zones. Pattern Recognition Letters 22, 1557–1568 (2001)

    Article  MATH  Google Scholar 

  7. Hinneburg, A., Keim, D.: An efficient approach to clustering in large multimedia databases with noise. In: Proceedings KDD’98, pp. 58–65. AAAI Press, Stanford (1998)

    Google Scholar 

  8. Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowledge and Information Systems (KAIS) 5(4), 387–415 (2003)

    Article  Google Scholar 

  9. McLachlan, G.J., Krishnan, T.: EM Algorithm and Extensions. Wiley, Chichester (1997)

    MATH  Google Scholar 

  10. Nasraoui, O., Krishnapuram, R.: The unsupervised niche clustering algorithm: extension tomultivariate clusters and application to color image segmentation. In: IFSA World Congress and 20th NAFIPS International Conference, vol. 3 (2001)

    Google Scholar 

  11. Neal, R.M., Hinton, G.E.: A view of the em algorithm that justifies incremental, sparse, and other variants. In: Learning in graphical models, pp. 355–368. MIT Press, Cambridge (1999)

    Google Scholar 

  12. Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery 2(2), 169–194 (1997)

    Article  Google Scholar 

  13. Schnell, P.: A method to find point-groups. Biometrika 6, 47–48 (1964)

    Google Scholar 

  14. Scott, D.: Multivariate Density Estimation. Wiley, Chichester (1992)

    MATH  Google Scholar 

  15. Yager, R., Filev, D.: Approximate clustering via the mountain method. IEEE Transactions on Systems, Man and Cybernetics 24(8), 1279–1284 (1994)

    Article  Google Scholar 

  16. Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall, Sydney (1986)

    MATH  Google Scholar 

  17. Zhang, T., Ramakrishnan, R., Livny, M.: Fast density estimation using cf-kernel for very large databases. In: Proceedings KDD 1999, pp. 312–316. ACM Press, New York (1999)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michael R. Berthold John Shawe-Taylor Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hinneburg, A., Gabriel, HH. (2007). DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74825-0_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74824-3

  • Online ISBN: 978-3-540-74825-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics