Abstract
Clustering is one of the most useful methods of data mining, in which a set of real or abstract objects are categorized into clusters. The DBSCAN clustering method, one of the most famous density based clustering methods, categorizes points in dense areas into same clusters. In DBSCAN a point is said to be dense if the ε-radius circular area around it contains at least MinPts points. To find such dense areas, region queries are fired. Two points are defined as density connected if the distance between them is less than ε and at least one of them is dense. Finally, density connected parts of the data set extracted as clusters. The significant issue of such a method is that its parameters (ε and MinPts) are very hard for a user to guess. So, it is better to remove them or to replace them with some other parameters that are simpler to estimate. In this paper, we have focused on the DBSCAN algorithm, tried to remove the ε and replace it with another parameter named ρ (Noise ratio of the data set). Using this method will not reduce the number of parameters but the ρ parameter is usually much more simpler to set than the ε. Even in some applications the user knows the noise ratio of the data set in advance. Being a relative (not absolute) measure is another advantage of ρ over ε. We have also proposed a novel visualization technique that may help users to set the ε value interactively. Also experimental results have been represented to show that our algorithm gets almost similar results to the original DBSCAN with ε set to an appropriate value.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U. (eds.) Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 226–231. AAAI Press, Menlo Park (1996)
MacQueen, J.B.: Some methods for classification and analysis ofmultivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probabilities, vol. 1, pp. 281–297 (1967)
Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics and Systems 3(3), 32–57 (1973)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. The Morgan Kaufmann Series in DataManagement Systems. Morgan Kaufmann, San Francisco (2006)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. In: Proceedings of 1999 ACM International Conference on Management of Data (SIGMOD 1999), vol. 28, pp. 49–60. ACM, New York (1999)
Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Knowledge Discovery and Data Mining, pp. 58–65 (1998)
Wang, X., Hamilton, H.J.: Dbrs: A density-based spatial clustering method with random sampling. In: Proceedings of the 7th PAKDD, Seoul, Korea, pp. 563–575 (2003)
Wang, X., Rostoker, C., Hamilton, H.J.: Density-Based Spatial Clustering in the Presence of Obstacles and Facilitators. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 446–458. Springer, Heidelberg (2004)
Yeganeh, S.H., Habibi, J., Abolhassani, H., Tehrani, M.A., Esmaelnezhad, J.: An approximation algorithm for finding skeletal points for density based clustering approaches. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009, part of the IEEE Symposium Series on Computational Intelligence 2009, March 2009, pp. 403–410. IEEE, Los Alamitos (2009)
Yeganeh, S.H., Habibi, J., Abolhassani, H., Shirali-Shahreza, S.: A novel clustering algorithm based on circlusters to find arbitrary shaped clusters. In: International Conference on Computer and Electrical Engineering, pp. 619–624. IEEE Computer Society, Los Alamitos (2008)
Shirali-Shahreza, S., Hassas-Yeganeh, S., Abolhassani, H., Habibi, J.: Circluster: Storing cluster shapes for clustering. To appear in the Proceedings of the 4th IEEE International Conference on Intelligent Systems, Varna, Bulgaria (September 2008)
Gorawski, M., Malczok, R.: AEC Algorithm: A Heuristic Approach to Calculating Density-Based Clustering Eps Parameter. In: Yakhno, T., Neuhold, E.J. (eds.) ADVIS 2006. LNCS, vol. 4243, pp. 90–99. Springer, Heidelberg (2006)
Gorawski, M., Malczok, R.: Towards Automatic Eps Calculation in Density-Based Clustering. In: Manolopoulos, Y., Pokorný, J., Sellis, T.K. (eds.) ADBIS 2006. LNCS, vol. 4152, pp. 313–328. Springer, Heidelberg (2006)
Karypis, G.: Chameleon data set (2008), http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Esmaelnejad, J., Habibi, J., Yeganeh, S.H. (2010). A Novel Method to Find Appropriate ε for DBSCAN. In: Nguyen, N.T., Le, M.T., Świątek, J. (eds) Intelligent Information and Database Systems. ACIIDS 2010. Lecture Notes in Computer Science(), vol 5990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12145-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-12145-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12144-9
Online ISBN: 978-3-642-12145-6
eBook Packages: Computer ScienceComputer Science (R0)