Skip to main content

Dimensionally Distributed Density Estimation

  • Conference paper
  • First Online:
Book cover Artificial Intelligence and Soft Computing (ICAISC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10842))

Included in the following conference series:

Abstract

Estimating density is needed in several clustering algorithms and other data analysis methods. Straightforward calculation takes O(N2) because of the calculation of all pairwise distances. This is the main bottleneck for making the algorithms scalable. We propose a faster O(N logN) time algorithm that calculates the density estimates in each dimension separately, and then simply cumulates the individual estimates into the final density values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Astrahan, M.M.: Speech Analysis by Clustering, or the Hyperphome Method, Stanford Artificial Intelligence Project Memorandum AIM-124, Stanford University, Stanford, CA (1970)

    Google Scholar 

  2. Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recognit. 71, 375–386 (2017)

    Article  Google Scholar 

  3. Cao, F., Liang, J., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. App. 36(7), 10223–10228 (2009)

    Article  Google Scholar 

  4. Cao, F., Liang, J., Jiang, G.: An initialization method for the k-means algorithm using neighborhood model. Comput. Math. App. 58, 474–483 (2009)

    MathSciNet  MATH  Google Scholar 

  5. Denoeux, T., Kanhanatarakul, O., Sriboonchitta, S.: EK-NNclus: A clustering procedure based on the evidential K-nearest neighbor rule. Knowl.-Based Syst. 88, 57–69 (2015)

    Article  Google Scholar 

  6. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)

    Google Scholar 

  7. Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recognit. 47(9), 3034–3045 (2014)

    Article  Google Scholar 

  8. Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recognit. 39(5), 761–765 (2006)

    Article  Google Scholar 

  9. Fränti, P., Virmajoki, O., Hautamäki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)

    Article  Google Scholar 

  10. Gourgaris, P., Makris, C.: A density based k-means initialization scheme. In: EANN Workshops, Rhodes Island, Greece (2015)

    Google Scholar 

  11. Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbour graph. In: International Conference on Pattern Recognition (ICPR’2004), Cambridge, UK, pp. 430–433, August 2004

    Google Scholar 

  12. Hou, J., Pellilo, M.: A new density kernel in density peak based clustering. In: International Conference on Pattern Recognition, Cancun, Mexico, pp. 468–473, December 2014

    Google Scholar 

  13. Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Upper Saddle River (1988)

    MATH  Google Scholar 

  14. Katsavounidis, I., Kuo, C.C.J., Zhang, Z.: A new initialization technique for generalized Lloyd iteration. IEEE Sig. Process. Lett. 1(10), 144–146 (1994)

    Article  Google Scholar 

  15. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: International Conference on Very Large Data Bases, New York, USA, pp. 392–403 (1998)

    Google Scholar 

  16. Kärkkäinen, I., Fränti, P.: Dynamic local search algorithm for the clustering problem, Research Report A-2002-6

    Google Scholar 

  17. Lemke, O., Keller, B.: Common nearest neighbor clustering: why core sets matter. Algorithms (2018)

    Google Scholar 

  18. Lulli, A., Dell’Amico, M., Michiardi, P., Ricci, L.: NGDBSCAN: scalable density-based clustering for arbitrary data. VLDB Endow. 10(3), 157–168 (2016)

    Article  Google Scholar 

  19. Loftsgaarden, D.O., Quesenberry, C.P.: A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36(3), 1049–1051 (1965)

    Article  MathSciNet  Google Scholar 

  20. Mak, K.F., He, K., Shan, J., Heinz, T.F.: Nat. Nanotechnol. 7, 494–498 (2012)

    Article  Google Scholar 

  21. Melnykov, I., Melnykov, V.: On k-means algorithm with the use of Mahalanobis distances. Stat. Probab. Lett. 84, 88–95 (2014)

    Article  MathSciNet  Google Scholar 

  22. Mitra, P., Murthy, C.A., Pal, S.K.: Density-based multiscale data condensation. IEEE Trans. Pattern Anal. Mach. Intell. 24(6), 734–747 (2002)

    Article  Google Scholar 

  23. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. ACM SIGMOD Rec. 29(2), 427–438 (2000)

    Article  Google Scholar 

  24. Redmond, S.J., Heneghan, C.: A method for initialising the K-means clustering algorithm using kd-trees. Pattern Recognit. Lett. 28(8), 965–973 (2007)

    Article  Google Scholar 

  25. Rezaei, M., Fränti, P.: Set-matching methods for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)

    Article  Google Scholar 

  26. Rodriquez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  27. Sieranoja, S., Fränti, P.: High-dimensional kNN-graph construction using z-order curve. ACM J. Exp. Algorithmics (submitted)

    Google Scholar 

  28. Steinley, D.: Initializing k-means batch clustering: a critical evaluation of several techniques. J. Classif. 24, 99–121 (2007)

    Article  MathSciNet  Google Scholar 

  29. Steinwart, I.: Fully adaptive density-based clustering. Ann. Stat. 43(5), 2132–2167 (2015)

    Article  MathSciNet  Google Scholar 

  30. Wang, Q., Kulkarni, R., Verdu, S.: Divergence estimation for multidimensional densities via k–nearest-neighbor distances. IEEE Trans. Inf. Theory 55(5), 2392–2405 (2009)

    Article  MathSciNet  Google Scholar 

  31. Wang, J., Zhang, Y., Lan, X.: Automatic cluster number selection by finding density peaks. In: IEEE International Conference on Computers and Communications, Chengdu, China, October 2016

    Google Scholar 

  32. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), 141–182 (1997)

    Article  Google Scholar 

  33. Zhao, Q., Fränti, P.: WB-index: a sum-of-squares based index for cluster validity. Data Knowl. Eng. 92, 77–89 (2014)

    Article  Google Scholar 

  34. Zhao, Q., Shi, Y., Liu, Q., Fränti, P.: A grid-growing clustering algorithm for geo-spatial data. Pattern Recogn. Lett. 53(1), 77–84 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pasi Fränti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fränti, P., Sieranoja, S. (2018). Dimensionally Distributed Density Estimation. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10842. Springer, Cham. https://doi.org/10.1007/978-3-319-91262-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91262-2_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91261-5

  • Online ISBN: 978-3-319-91262-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics