Dimensionally Distributed Density Estimation

Fränti, Pasi; Sieranoja, Sami

doi:10.1007/978-3-319-91262-2_31

Pasi Fränti¹⁸ &
Sami Sieranoja¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10842))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1900 Accesses
2 Citations

Abstract

Estimating density is needed in several clustering algorithms and other data analysis methods. Straightforward calculation takes O(N²) because of the calculation of all pairwise distances. This is the main bottleneck for making the algorithms scalable. We propose a faster O(N logN) time algorithm that calculates the density estimates in each dimension separately, and then simply cumulates the individual estimates into the final density values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Astrahan, M.M.: Speech Analysis by Clustering, or the Hyperphome Method, Stanford Artificial Intelligence Project Memorandum AIM-124, Stanford University, Stanford, CA (1970)
Google Scholar
Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recognit. 71, 375–386 (2017)
Article Google Scholar
Cao, F., Liang, J., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. App. 36(7), 10223–10228 (2009)
Article Google Scholar
Cao, F., Liang, J., Jiang, G.: An initialization method for the k-means algorithm using neighborhood model. Comput. Math. App. 58, 474–483 (2009)
MathSciNet MATH Google Scholar
Denoeux, T., Kanhanatarakul, O., Sriboonchitta, S.: EK-NNclus: A clustering procedure based on the evidential K-nearest neighbor rule. Knowl.-Based Syst. 88, 57–69 (2015)
Article Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)
Google Scholar
Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recognit. 47(9), 3034–3045 (2014)
Article Google Scholar
Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recognit. 39(5), 761–765 (2006)
Article Google Scholar
Fränti, P., Virmajoki, O., Hautamäki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)
Article Google Scholar
Gourgaris, P., Makris, C.: A density based k-means initialization scheme. In: EANN Workshops, Rhodes Island, Greece (2015)
Google Scholar
Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbour graph. In: International Conference on Pattern Recognition (ICPR’2004), Cambridge, UK, pp. 430–433, August 2004
Google Scholar
Hou, J., Pellilo, M.: A new density kernel in density peak based clustering. In: International Conference on Pattern Recognition, Cancun, Mexico, pp. 468–473, December 2014
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Upper Saddle River (1988)
MATH Google Scholar
Katsavounidis, I., Kuo, C.C.J., Zhang, Z.: A new initialization technique for generalized Lloyd iteration. IEEE Sig. Process. Lett. 1(10), 144–146 (1994)
Article Google Scholar
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: International Conference on Very Large Data Bases, New York, USA, pp. 392–403 (1998)
Google Scholar
Kärkkäinen, I., Fränti, P.: Dynamic local search algorithm for the clustering problem, Research Report A-2002-6
Google Scholar
Lemke, O., Keller, B.: Common nearest neighbor clustering: why core sets matter. Algorithms (2018)
Google Scholar
Lulli, A., Dell’Amico, M., Michiardi, P., Ricci, L.: NGDBSCAN: scalable density-based clustering for arbitrary data. VLDB Endow. 10(3), 157–168 (2016)
Article Google Scholar
Loftsgaarden, D.O., Quesenberry, C.P.: A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36(3), 1049–1051 (1965)
Article MathSciNet Google Scholar
Mak, K.F., He, K., Shan, J., Heinz, T.F.: Nat. Nanotechnol. 7, 494–498 (2012)
Article Google Scholar
Melnykov, I., Melnykov, V.: On k-means algorithm with the use of Mahalanobis distances. Stat. Probab. Lett. 84, 88–95 (2014)
Article MathSciNet Google Scholar
Mitra, P., Murthy, C.A., Pal, S.K.: Density-based multiscale data condensation. IEEE Trans. Pattern Anal. Mach. Intell. 24(6), 734–747 (2002)
Article Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. ACM SIGMOD Rec. 29(2), 427–438 (2000)
Article Google Scholar
Redmond, S.J., Heneghan, C.: A method for initialising the K-means clustering algorithm using kd-trees. Pattern Recognit. Lett. 28(8), 965–973 (2007)
Article Google Scholar
Rezaei, M., Fränti, P.: Set-matching methods for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)
Article Google Scholar
Rodriquez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Article Google Scholar
Sieranoja, S., Fränti, P.: High-dimensional kNN-graph construction using z-order curve. ACM J. Exp. Algorithmics (submitted)
Google Scholar
Steinley, D.: Initializing k-means batch clustering: a critical evaluation of several techniques. J. Classif. 24, 99–121 (2007)
Article MathSciNet Google Scholar
Steinwart, I.: Fully adaptive density-based clustering. Ann. Stat. 43(5), 2132–2167 (2015)
Article MathSciNet Google Scholar
Wang, Q., Kulkarni, R., Verdu, S.: Divergence estimation for multidimensional densities via k–nearest-neighbor distances. IEEE Trans. Inf. Theory 55(5), 2392–2405 (2009)
Article MathSciNet Google Scholar
Wang, J., Zhang, Y., Lan, X.: Automatic cluster number selection by finding density peaks. In: IEEE International Conference on Computers and Communications, Chengdu, China, October 2016
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), 141–182 (1997)
Article Google Scholar
Zhao, Q., Fränti, P.: WB-index: a sum-of-squares based index for cluster validity. Data Knowl. Eng. 92, 77–89 (2014)
Article Google Scholar
Zhao, Q., Shi, Y., Liu, Q., Fränti, P.: A grid-growing clustering algorithm for geo-spatial data. Pattern Recogn. Lett. 53(1), 77–84 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, University of Eastern Finland, Joensuu, Finland
Pasi Fränti & Sami Sieranoja

Authors

Pasi Fränti
View author publications
You can also search for this author in PubMed Google Scholar
Sami Sieranoja
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pasi Fränti .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Częstochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
University of Alberta, Edmonton, AB, Canada
Witold Pedrycz
AGH University of Science and Technology, Kraków, Poland
Ryszard Tadeusiewicz
University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fränti, P., Sieranoja, S. (2018). Dimensionally Distributed Density Estimation. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10842. Springer, Cham. https://doi.org/10.1007/978-3-319-91262-2_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-91262-2_31
Published: 11 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91261-5
Online ISBN: 978-3-319-91262-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics