Abstract
The Denclue algorithm employs a cluster model based on kernel density estimation. A cluster is defined by a local maximum of the estimated density function. Data points are assigned to clusters by hill climbing, i.e. points going to the same local maximum are put into the same cluster. A disadvantage of Denclue 1.0 is, that the used hill climbing may make unnecessary small steps in the beginning and never converges exactly to the maximum, it just comes close.
We introduce a new hill climbing procedure for Gaussian kernels, which adjusts the step size automatically at no extra costs. We prove that the procedure converges exactly towards a local maximum by reducing it to a special case of the expectation maximization algorithm. We show experimentally that the new procedure needs much less iterations and can be accelerated by sampling based methods with sacrificing only a small amount of accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: Optics: Ordering points to identify the clustering structure. In: Proceedings SIGMOD 1999, pp. 49–60. ACM Press, New York (1999)
Bezdek, J.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic Publishers, Dordrecht (1999)
Bock, H.H.: Automatic Classification. Vandenhoeck and Ruprecht (1974)
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, London (1990)
Fukunaga, K., Hostler, L.: The estimation of the gradient of a density function, with application in pattern recognition. IEEE Trans. Info. Thy. 21, 32–40 (1975)
Herbin, M., Bonnet, N., Vautrot, P.: Estimation of the number of clusters and influence zones. Pattern Recognition Letters 22, 1557–1568 (2001)
Hinneburg, A., Keim, D.: An efficient approach to clustering in large multimedia databases with noise. In: Proceedings KDD’98, pp. 58–65. AAAI Press, Stanford (1998)
Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowledge and Information Systems (KAIS) 5(4), 387–415 (2003)
McLachlan, G.J., Krishnan, T.: EM Algorithm and Extensions. Wiley, Chichester (1997)
Nasraoui, O., Krishnapuram, R.: The unsupervised niche clustering algorithm: extension tomultivariate clusters and application to color image segmentation. In: IFSA World Congress and 20th NAFIPS International Conference, vol. 3 (2001)
Neal, R.M., Hinton, G.E.: A view of the em algorithm that justifies incremental, sparse, and other variants. In: Learning in graphical models, pp. 355–368. MIT Press, Cambridge (1999)
Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery 2(2), 169–194 (1997)
Schnell, P.: A method to find point-groups. Biometrika 6, 47–48 (1964)
Scott, D.: Multivariate Density Estimation. Wiley, Chichester (1992)
Yager, R., Filev, D.: Approximate clustering via the mountain method. IEEE Transactions on Systems, Man and Cybernetics 24(8), 1279–1284 (1994)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall, Sydney (1986)
Zhang, T., Ramakrishnan, R., Livny, M.: Fast density estimation using cf-kernel for very large databases. In: Proceedings KDD 1999, pp. 312–316. ACM Press, New York (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hinneburg, A., Gabriel, HH. (2007). DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-74825-0_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74824-3
Online ISBN: 978-3-540-74825-0
eBook Packages: Computer ScienceComputer Science (R0)