Abstract
The purpose of data clustering is to identify useful patterns in the underlying dataset. However, finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Density-based clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities. Existing methods are based on DBSCAN which depends on two specified parameters (Eps and Minpts) that define a single density. Moreover, most of these methods are unsupervised, which cannot improve the clustering quality by utilizing a small number of prior knowledge. In this paper we show how background knowledge can be used to bias a density-based clustering algorithm for multi-density data. First we divide the dataset into different density levels and detect suitable density parameters for each density level. Then we describe how pairwise constraints can be used to help the algorithm expanding the clustering process based on the computed density parameters. Experimental results on both synthetic and real datasets confirm that the proposed algorithm gives better results than other semi-supervised and unsupervised clustering algorithms.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Zhu, X.: Semi-supervised learning literature survey. Technical Report, Computer Sciences. University of Wisconsin-Madison (2007)
Bilenko, M., Basu, S., Mooney, R.: Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the 21st International Conference on Machine Learning, pp. 81–88 (2004)
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the 17th International Conference on Machine Learning, pp. 1103–1110 (2000)
Zeng, H., Cheung, Y.: Semi-supervised maximum margin clustering with pairwise constraints. IEEE Transactions on Knowledge and Data Engineering 24, 926–939 (2012)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J.: OPTICS: Ordering points to identify the clustering structure. In ACM SIGMOD International Conference on the Management of Data (1999)
Hinneburg, A., Keim, D.: An efficient approach to clustering in large multimedia data sets with noise. In: Proceedings of 4th International Conference on Knowledge Discovery and Data Mining, pp. 58–65, (1998)
Chen, X., Liu, W., Qiu, K., Lai, J.: APSCAN: A Parameter Free Algorithm for Clustering. Pattern Recognition Letters 32, 973–986 (2011)
Bohm, C., Plant, C.: HISSCLU: a hierarchical density-based method for semi-supervised clustering. In: Proceedings of 11th International Conference on Extending Database Technology (2008)
Ruiz, C., Spiliopoulou, M., Menasalvas, E.: Density-based semi-supervised clustering. Data Mining and Knowledge Discovery 21, 345–370 (2010)
Lelis, L., Sander, J.: Semi-Supervised Density-Based Clustering. In: Proceedings of 8th IEEE International Conference on Data Mining, pp. 842–847 (2009)
Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraints-set utility for partitional clustering algorithms. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML, PKDD, pp. 115–126 (2006)
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. Journal of the American statistical Association 97(458), 611–631 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Atwa, W., Li, K. (2017). Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2017. Lecture Notes in Computer Science(), vol 10357. Springer, Cham. https://doi.org/10.1007/978-3-319-62701-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-62701-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62700-7
Online ISBN: 978-3-319-62701-4
eBook Packages: Computer ScienceComputer Science (R0)