Skip to main content

A Novel Method to Find Appropriate ε for DBSCAN

  • Conference paper
Book cover Intelligent Information and Database Systems (ACIIDS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5990))

Included in the following conference series:

Abstract

Clustering is one of the most useful methods of data mining, in which a set of real or abstract objects are categorized into clusters. The DBSCAN clustering method, one of the most famous density based clustering methods, categorizes points in dense areas into same clusters. In DBSCAN a point is said to be dense if the ε-radius circular area around it contains at least MinPts points. To find such dense areas, region queries are fired. Two points are defined as density connected if the distance between them is less than ε and at least one of them is dense. Finally, density connected parts of the data set extracted as clusters. The significant issue of such a method is that its parameters (ε and MinPts) are very hard for a user to guess. So, it is better to remove them or to replace them with some other parameters that are simpler to estimate. In this paper, we have focused on the DBSCAN algorithm, tried to remove the ε and replace it with another parameter named ρ (Noise ratio of the data set). Using this method will not reduce the number of parameters but the ρ parameter is usually much more simpler to set than the ε. Even in some applications the user knows the noise ratio of the data set in advance. Being a relative (not absolute) measure is another advantage of ρ over ε. We have also proposed a novel visualization technique that may help users to set the ε value interactively. Also experimental results have been represented to show that our algorithm gets almost similar results to the original DBSCAN with ε set to an appropriate value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U. (eds.) Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 226–231. AAAI Press, Menlo Park (1996)

    Google Scholar 

  2. MacQueen, J.B.: Some methods for classification and analysis ofmultivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probabilities, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  3. Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics and Systems 3(3), 32–57 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  4. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. The Morgan Kaufmann Series in DataManagement Systems. Morgan Kaufmann, San Francisco (2006)

    Google Scholar 

  5. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. In: Proceedings of 1999 ACM International Conference on Management of Data (SIGMOD 1999), vol. 28, pp. 49–60. ACM, New York (1999)

    Google Scholar 

  6. Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Knowledge Discovery and Data Mining, pp. 58–65 (1998)

    Google Scholar 

  7. Wang, X., Hamilton, H.J.: Dbrs: A density-based spatial clustering method with random sampling. In: Proceedings of the 7th PAKDD, Seoul, Korea, pp. 563–575 (2003)

    Google Scholar 

  8. Wang, X., Rostoker, C., Hamilton, H.J.: Density-Based Spatial Clustering in the Presence of Obstacles and Facilitators. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 446–458. Springer, Heidelberg (2004)

    Google Scholar 

  9. Yeganeh, S.H., Habibi, J., Abolhassani, H., Tehrani, M.A., Esmaelnezhad, J.: An approximation algorithm for finding skeletal points for density based clustering approaches. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009, part of the IEEE Symposium Series on Computational Intelligence 2009, March 2009, pp. 403–410. IEEE, Los Alamitos (2009)

    Chapter  Google Scholar 

  10. Yeganeh, S.H., Habibi, J., Abolhassani, H., Shirali-Shahreza, S.: A novel clustering algorithm based on circlusters to find arbitrary shaped clusters. In: International Conference on Computer and Electrical Engineering, pp. 619–624. IEEE Computer Society, Los Alamitos (2008)

    Chapter  Google Scholar 

  11. Shirali-Shahreza, S., Hassas-Yeganeh, S., Abolhassani, H., Habibi, J.: Circluster: Storing cluster shapes for clustering. To appear in the Proceedings of the 4th IEEE International Conference on Intelligent Systems, Varna, Bulgaria (September 2008)

    Google Scholar 

  12. Gorawski, M., Malczok, R.: AEC Algorithm: A Heuristic Approach to Calculating Density-Based Clustering Eps Parameter. In: Yakhno, T., Neuhold, E.J. (eds.) ADVIS 2006. LNCS, vol. 4243, pp. 90–99. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Gorawski, M., Malczok, R.: Towards Automatic Eps Calculation in Density-Based Clustering. In: Manolopoulos, Y., Pokorný, J., Sellis, T.K. (eds.) ADBIS 2006. LNCS, vol. 4152, pp. 313–328. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Karypis, G.: Chameleon data set (2008), http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Esmaelnejad, J., Habibi, J., Yeganeh, S.H. (2010). A Novel Method to Find Appropriate ε for DBSCAN. In: Nguyen, N.T., Le, M.T., Świątek, J. (eds) Intelligent Information and Database Systems. ACIIDS 2010. Lecture Notes in Computer Science(), vol 5990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12145-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12145-6_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12144-9

  • Online ISBN: 978-3-642-12145-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics