DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation

Hinneburg, Alexander; Gabriel, Hans-Henning

doi:10.1007/978-3-540-74825-0_7

Alexander Hinneburg¹ &
Hans-Henning Gabriel²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4723))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

2059 Accesses
79 Citations

Abstract

The Denclue algorithm employs a cluster model based on kernel density estimation. A cluster is defined by a local maximum of the estimated density function. Data points are assigned to clusters by hill climbing, i.e. points going to the same local maximum are put into the same cluster. A disadvantage of Denclue 1.0 is, that the used hill climbing may make unnecessary small steps in the beginning and never converges exactly to the maximum, it just comes close.

We introduce a new hill climbing procedure for Gaussian kernels, which adjusts the step size automatically at no extra costs. We prove that the procedure converges exactly towards a local maximum by reducing it to a special case of the expectation maximization algorithm. We show experimentally that the new procedure needs much less iterations and can be accelerated by sampling based methods with sacrificing only a small amount of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: Optics: Ordering points to identify the clustering structure. In: Proceedings SIGMOD 1999, pp. 49–60. ACM Press, New York (1999)
Chapter Google Scholar
Bezdek, J.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic Publishers, Dordrecht (1999)
MATH Google Scholar
Bock, H.H.: Automatic Classification. Vandenhoeck and Ruprecht (1974)
Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, London (1990)
MATH Google Scholar
Fukunaga, K., Hostler, L.: The estimation of the gradient of a density function, with application in pattern recognition. IEEE Trans. Info. Thy. 21, 32–40 (1975)
Article MATH Google Scholar
Herbin, M., Bonnet, N., Vautrot, P.: Estimation of the number of clusters and influence zones. Pattern Recognition Letters 22, 1557–1568 (2001)
Article MATH Google Scholar
Hinneburg, A., Keim, D.: An efficient approach to clustering in large multimedia databases with noise. In: Proceedings KDD’98, pp. 58–65. AAAI Press, Stanford (1998)
Google Scholar
Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowledge and Information Systems (KAIS) 5(4), 387–415 (2003)
Article Google Scholar
McLachlan, G.J., Krishnan, T.: EM Algorithm and Extensions. Wiley, Chichester (1997)
MATH Google Scholar
Nasraoui, O., Krishnapuram, R.: The unsupervised niche clustering algorithm: extension tomultivariate clusters and application to color image segmentation. In: IFSA World Congress and 20th NAFIPS International Conference, vol. 3 (2001)
Google Scholar
Neal, R.M., Hinton, G.E.: A view of the em algorithm that justifies incremental, sparse, and other variants. In: Learning in graphical models, pp. 355–368. MIT Press, Cambridge (1999)
Google Scholar
Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery 2(2), 169–194 (1997)
Article Google Scholar
Schnell, P.: A method to find point-groups. Biometrika 6, 47–48 (1964)
Google Scholar
Scott, D.: Multivariate Density Estimation. Wiley, Chichester (1992)
MATH Google Scholar
Yager, R., Filev, D.: Approximate clustering via the mountain method. IEEE Transactions on Systems, Man and Cybernetics 24(8), 1279–1284 (1994)
Article Google Scholar
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall, Sydney (1986)
MATH Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Fast density estimation using cf-kernel for very large databases. In: Proceedings KDD 1999, pp. 312–316. ACM Press, New York (1999)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Martin-Luther-University Halle-Wittenberg, Germany
Alexander Hinneburg
Otto-von-Guericke-University Magdeburg, Germany
Hans-Henning Gabriel

Authors

Alexander Hinneburg
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Henning Gabriel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Michael R. Berthold John Shawe-Taylor Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hinneburg, A., Gabriel, HH. (2007). DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-74825-0_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74824-3
Online ISBN: 978-3-540-74825-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics