A Novel Method to Find Appropriate ε for DBSCAN

Esmaelnejad, Jamshid; Habibi, Jafar; Yeganeh, Soheil Hassas

doi:10.1007/978-3-642-12145-6_10

Jamshid Esmaelnejad²²,
Jafar Habibi²² &
Soheil Hassas Yeganeh²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5990))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

1609 Accesses
8 Citations

Abstract

Clustering is one of the most useful methods of data mining, in which a set of real or abstract objects are categorized into clusters. The DBSCAN clustering method, one of the most famous density based clustering methods, categorizes points in dense areas into same clusters. In DBSCAN a point is said to be dense if the ε-radius circular area around it contains at least MinPts points. To find such dense areas, region queries are fired. Two points are defined as density connected if the distance between them is less than ε and at least one of them is dense. Finally, density connected parts of the data set extracted as clusters. The significant issue of such a method is that its parameters (ε and MinPts) are very hard for a user to guess. So, it is better to remove them or to replace them with some other parameters that are simpler to estimate. In this paper, we have focused on the DBSCAN algorithm, tried to remove the ε and replace it with another parameter named ρ (Noise ratio of the data set). Using this method will not reduce the number of parameters but the ρ parameter is usually much more simpler to set than the ε. Even in some applications the user knows the noise ratio of the data set in advance. Being a relative (not absolute) measure is another advantage of ρ over ε. We have also proposed a novel visualization technique that may help users to set the ε value interactively. Also experimental results have been represented to show that our algorithm gets almost similar results to the original DBSCAN with ε set to an appropriate value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U. (eds.) Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 226–231. AAAI Press, Menlo Park (1996)
Google Scholar
MacQueen, J.B.: Some methods for classification and analysis ofmultivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probabilities, vol. 1, pp. 281–297 (1967)
Google Scholar
Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics and Systems 3(3), 32–57 (1973)
Article MATH MathSciNet Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. The Morgan Kaufmann Series in DataManagement Systems. Morgan Kaufmann, San Francisco (2006)
Google Scholar
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. In: Proceedings of 1999 ACM International Conference on Management of Data (SIGMOD 1999), vol. 28, pp. 49–60. ACM, New York (1999)
Google Scholar
Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Knowledge Discovery and Data Mining, pp. 58–65 (1998)
Google Scholar
Wang, X., Hamilton, H.J.: Dbrs: A density-based spatial clustering method with random sampling. In: Proceedings of the 7th PAKDD, Seoul, Korea, pp. 563–575 (2003)
Google Scholar
Wang, X., Rostoker, C., Hamilton, H.J.: Density-Based Spatial Clustering in the Presence of Obstacles and Facilitators. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 446–458. Springer, Heidelberg (2004)
Google Scholar
Yeganeh, S.H., Habibi, J., Abolhassani, H., Tehrani, M.A., Esmaelnezhad, J.: An approximation algorithm for finding skeletal points for density based clustering approaches. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009, part of the IEEE Symposium Series on Computational Intelligence 2009, March 2009, pp. 403–410. IEEE, Los Alamitos (2009)
Chapter Google Scholar
Yeganeh, S.H., Habibi, J., Abolhassani, H., Shirali-Shahreza, S.: A novel clustering algorithm based on circlusters to find arbitrary shaped clusters. In: International Conference on Computer and Electrical Engineering, pp. 619–624. IEEE Computer Society, Los Alamitos (2008)
Chapter Google Scholar
Shirali-Shahreza, S., Hassas-Yeganeh, S., Abolhassani, H., Habibi, J.: Circluster: Storing cluster shapes for clustering. To appear in the Proceedings of the 4th IEEE International Conference on Intelligent Systems, Varna, Bulgaria (September 2008)
Google Scholar
Gorawski, M., Malczok, R.: AEC Algorithm: A Heuristic Approach to Calculating Density-Based Clustering Eps Parameter. In: Yakhno, T., Neuhold, E.J. (eds.) ADVIS 2006. LNCS, vol. 4243, pp. 90–99. Springer, Heidelberg (2006)
Chapter Google Scholar
Gorawski, M., Malczok, R.: Towards Automatic Eps Calculation in Density-Based Clustering. In: Manolopoulos, Y., Pokorný, J., Sellis, T.K. (eds.) ADBIS 2006. LNCS, vol. 4152, pp. 313–328. Springer, Heidelberg (2006)
Chapter Google Scholar
Karypis, G.: Chameleon data set (2008), http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Sharif University of Technology, Tehran, Iran
Jamshid Esmaelnejad, Jafar Habibi & Soheil Hassas Yeganeh

Authors

Jamshid Esmaelnejad
View author publications
You can also search for this author in PubMed Google Scholar
Jafar Habibi
View author publications
You can also search for this author in PubMed Google Scholar
Soheil Hassas Yeganeh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Informatics, Wroclaw University of Technology, Str. Wyb. Wyspianskiego 27, 50-370, Wroclaw, Poland
Ngoc Thanh Nguyen
Hue University, Str. Le Loi 3, Hue City, Vietnam
Manh Thanh Le
Faculty of Computer Science and Management, Wroclaw University of Technology, Str. Lukasiewicza 5, 50-370, Wroclaw, Poland
Jerzy Świątek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Esmaelnejad, J., Habibi, J., Yeganeh, S.H. (2010). A Novel Method to Find Appropriate ε for DBSCAN. In: Nguyen, N.T., Le, M.T., Świątek, J. (eds) Intelligent Information and Database Systems. ACIIDS 2010. Lecture Notes in Computer Science(), vol 5990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12145-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-12145-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12144-9
Online ISBN: 978-3-642-12145-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics