Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes

Atwa, Walid; Li, Kan

doi:10.1007/978-3-319-62701-4_7

Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes

Walid Atwa¹⁴ &
Kan Li¹⁵

Conference paper
First Online: 01 July 2017

1955 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10357))

Abstract

The purpose of data clustering is to identify useful patterns in the underlying dataset. However, finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Density-based clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities. Existing methods are based on DBSCAN which depends on two specified parameters (Eps and Minpts) that define a single density. Moreover, most of these methods are unsupervised, which cannot improve the clustering quality by utilizing a small number of prior knowledge. In this paper we show how background knowledge can be used to bias a density-based clustering algorithm for multi-density data. First we divide the dataset into different density levels and detect suitable density parameters for each density level. Then we describe how pairwise constraints can be used to help the algorithm expanding the clustering process based on the computed density parameters. Experimental results on both synthetic and real datasets confirm that the proposed algorithm gives better results than other semi-supervised and unsupervised clustering algorithms.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhu, X.: Semi-supervised learning literature survey. Technical Report, Computer Sciences. University of Wisconsin-Madison (2007)
Google Scholar
Bilenko, M., Basu, S., Mooney, R.: Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the 21st International Conference on Machine Learning, pp. 81–88 (2004)
Google Scholar
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the 17th International Conference on Machine Learning, pp. 1103–1110 (2000)
Google Scholar
Zeng, H., Cheung, Y.: Semi-supervised maximum margin clustering with pairwise constraints. IEEE Transactions on Knowledge and Data Engineering 24, 926–939 (2012)
Article Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Google Scholar
Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J.: OPTICS: Ordering points to identify the clustering structure. In ACM SIGMOD International Conference on the Management of Data (1999)
Google Scholar
Hinneburg, A., Keim, D.: An efficient approach to clustering in large multimedia data sets with noise. In: Proceedings of 4th International Conference on Knowledge Discovery and Data Mining, pp. 58–65, (1998)
Google Scholar
Chen, X., Liu, W., Qiu, K., Lai, J.: APSCAN: A Parameter Free Algorithm for Clustering. Pattern Recognition Letters 32, 973–986 (2011)
Article Google Scholar
Bohm, C., Plant, C.: HISSCLU: a hierarchical density-based method for semi-supervised clustering. In: Proceedings of 11th International Conference on Extending Database Technology (2008)
Google Scholar
Ruiz, C., Spiliopoulou, M., Menasalvas, E.: Density-based semi-supervised clustering. Data Mining and Knowledge Discovery 21, 345–370 (2010)
Article MathSciNet Google Scholar
Lelis, L., Sander, J.: Semi-Supervised Density-Based Clustering. In: Proceedings of 8th IEEE International Conference on Data Mining, pp. 842–847 (2009)
Google Scholar
Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraints-set utility for partitional clustering algorithms. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML, PKDD, pp. 115–126 (2006)
Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. Journal of the American statistical Association 97(458), 611–631 (2002)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer and Information, Menoufia University, Shebeen El-Kom, Egypt
Walid Atwa
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Kan Li

Authors

Walid Atwa
View author publications
You can also search for this author in PubMed Google Scholar
Kan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Walid Atwa .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Sachsen, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Atwa, W., Li, K. (2017). Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2017. Lecture Notes in Computer Science(), vol 10357. Springer, Cham. https://doi.org/10.1007/978-3-319-62701-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-62701-4_7
Published: 01 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62700-7
Online ISBN: 978-3-319-62701-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics