Skip to main content
Log in

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Clustering is a typical data mining technique that partitions a dataset into multiple subsets of similar objects according to similarity metrics. In particular, density-based algorithms can find clusters of different shapes and sizes while remaining robust to noise objects. DBSCAN, a representative density-based algorithm, finds clusters by defining the density criterion with global parameters, \( \varepsilon \)-distance and \( MinPts \). However, most density-based algorithms, including DBSCAN, find clusters incorrectly because the density criterion is fixed to the global parameters and misapplied to clusters of varying densities. Although studies have been conducted to determine optimal parameters or to improve clustering performance using additional parameters and computations, running time for clustering has been significantly increased, particularly when the dataset is large. In this study, we focus on minimizing the additional computation required to determine the parameters by using the approximate adaptive \( \varepsilon \)-distance for each density while finding the clusters with varying densities that DBSCAN cannot find. Specifically, we propose a new tree structure based on a quadtree to define a dataset density layer. In addition, we propose approximate adaptive DBSCAN (AA-DBSCAN) and kAA-DBSCAN that have clustering performance similar to those of existing algorithms for finding clusters with varying densities while significantly reducing the running time required to perform clustering. We evaluate the proposed algorithms, AA-DBSCAN and kAA-DBSCAN, via extensive experiments using the state-of-the-art algorithms. Experimental results demonstrate an improvement in clustering performance and reduction in running time of the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Lv Y, Ma T, Tang M et al (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171:9–22. https://doi.org/10.1016/j.neucom.2015.05.109

    Article  Google Scholar 

  2. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, Waltham

    MATH  Google Scholar 

  3. Zhu Y, Ting KM, Carman MJ (2016) Density-ratio based clustering for discovering clusters with varying densities. Pattern Recogn 60:983–997. https://doi.org/10.1016/j.patcog.2016.07.007

    Article  Google Scholar 

  4. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34):226–231

    Google Scholar 

  5. Wang X, Hamilton HJ (2003) DBRS: a density-based spatial clustering method with random sampling. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 563–575. https://doi.org/10.1007/3-540-36175-8_56

  6. Roy S, Bhattacharyya DK (2005) An approach to find embedded clusters using density based techniques. In: International Conference on Distributed Computing and Internet Technology, pp 523–535. https://doi.org/10.1007/11604655_59

  7. Zhou A, Zhou S, Cao J et al (2000) Approaches for scaling DBSCAN algorithm to large spatial databases. J Comput Sci Technol 15(6):509–526. https://doi.org/10.1007/BF02948834

    Article  MATH  Google Scholar 

  8. Xiong Z, Chen R, Zhang Y, Zhang X (2012) Multi-density DBSCAN algorithm based on density levels partitioning. J Inform Comput Sci 9(10):2739–2749

    Google Scholar 

  9. El-Sonbaty Y, Ismail MA, Farouk M (2004) An efficient density based clustering algorithm for large databases. In: 16th IEEE International Conference on Tools with Artificial Intelligence, pp 673–677. https://doi.org/10.1109/ictai.2004.27

  10. Xiaoyun C, Yufang M, Yan Z, Ping W (2008) GMDBSCAN: multi-density DBSCAN cluster based on grid. In: IEEE International Conference on e-Business Engineering, pp 780–783. https://doi.org/10.1109/ICEBE.2008.54

  11. Jiang H, Li J, Yi S et al (2011) A new hybrid method based on partitioning-based DBSCAN and ant clustering. Expert Syst Appl 38(8):9373–9381. https://doi.org/10.1016/j.eswa.2011.01.135

    Article  Google Scholar 

  12. Chen X, Liu W, Qiu H, Lai J (2011) APSCAN: a parameter free algorithm for clustering. Pattern Recogn Lett 32(7):973–986. https://doi.org/10.1016/j.patrec.2011.02.001

    Article  Google Scholar 

  13. Hou J, Gao H, Li X (2016) DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Trans Image Process 25(7):3182–3193. https://doi.org/10.1109/TIP.2016.2559803

    Article  MathSciNet  MATH  Google Scholar 

  14. Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM Sigmod Rec 28(2):49–60. https://doi.org/10.1145/304182.304187

    Article  Google Scholar 

  15. Liu P, Zhou D, Wu N (2007) VDBSCAN: varied density based spatial clustering of applications with noise. In: International Conference on Service Systems and Service Management, pp 1–4. https://doi.org/10.1109/ICSSSM.2007.4280175

  16. Jahirabadkar S, Kulkarni P (2014) Algorithm to determine ε-distance parameter in density based clustering. Expert Syst Appl 41(6):2939–2946. https://doi.org/10.1016/j.eswa.2013.10.025

    Article  Google Scholar 

  17. Huang TQ, Yu YQ, Li K, Zeng WF (2009) Reckon the parameter of dbscan for multi-density data sets with constraints. Int Conf Artif Intell Comput Intell 4:375–379. https://doi.org/10.1109/AICI.2009.393

    Google Scholar 

  18. Xu X, Jäger J, Kriegel H-P (1999) A fast parallel clustering algorithm for large spatial databases. Data Min Knowl Disccov 3(3):263–290. https://doi.org/10.1007/0-306-47011-X_3

    Article  Google Scholar 

  19. Lumer ED, Faieta B (1994) Diversity and adaptation in populations of clustering ants. Proc Third Int Conf Simul Adapt Behav 3:501–508

    Google Scholar 

  20. Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J Roy Stat Soc Ser C (Appl Stat) 28(1):100–108

    MATH  Google Scholar 

  21. Limwattanapibool O, Arch-int S (2017) Determination of the appropriate parameters for K-means clustering using selection of region clusters based on density DBSCAN (SRCD-DBSCAN). Expert Syst. https://doi.org/10.1111/exsy.12204

    Google Scholar 

  22. Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp 47–58. https://doi.org/10.1137/1.9781611972733.5

  23. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496. https://doi.org/10.1126/science.1242072

    Article  Google Scholar 

  24. Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619. https://doi.org/10.1109/34.1000236

    Article  Google Scholar 

  25. Liu X, Yang Q, He L (2017) A novel DBSCAN with entropy and probability for mixed data. Cluster Comput 20(2):1313–1323. https://doi.org/10.1007/s10586-017-0818-3

    Article  Google Scholar 

  26. Kim J, Lee W, Song JJ, Lee SB (2017) Optimized combinatorial clustering for stochastic processes. Cluster Comput 20(2):1135–1148. https://doi.org/10.1007/s10586-017-0763-1

    Article  Google Scholar 

  27. Lulli A, Dell’Amico M, Michiardi P, Ricci L (2016) NG-DBSCAN: scalable density-based clustering for arbitrary data. Proc VLDB Endow 10(3):157–168. https://doi.org/10.14778/3021924.3021932

    Article  Google Scholar 

  28. Dalli A (2003) Adaptation of the F-measure to cluster based lexicon quality evaluation. In: Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: Are Evaluation Methods, Metrics and Resources Reusable? pp 51–56

  29. Duan L, Xu L, Guo F et al (2007) A local-density based spatial clustering algorithm with noise. Inform Syst 32(7):978–986. https://doi.org/10.1016/j.is.2006.10.006

    Article  Google Scholar 

  30. Machine Learning. Clustering datasets (2016) http://cs.joensuu.fi/sipu/datasets

  31. Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml

  32. Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220. https://doi.org/10.1016/j.knosys.2017.07.010

    Article  Google Scholar 

  33. Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. ACM Sigmod Rec 19(2):322–331. https://doi.org/10.1145/93597.98741

    Article  Google Scholar 

  34. Loh WK, Yu H (2015) Fast density-based clustering through dataset partition using graphics processing units. Inf Sci 308:94–112. https://doi.org/10.1016/j.ins.2014.10.023

    Article  Google Scholar 

  35. Andrade G, Ramos G et al (2013) G-dbscan: a gpu accelerated algorithm for density-based clustering. Proc Comput Sci 18:369–378. https://doi.org/10.1016/j.procs.2013.05.200

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1D1A3B03035729).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aziz Nasridinov.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, JH., Choi, JH., Yoo, KH. et al. AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities. J Supercomput 75, 142–169 (2019). https://doi.org/10.1007/s11227-018-2380-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2380-z

Keywords

Navigation