Skip to main content

Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes

  • Conference paper
  • First Online:
  • 1955 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10357))

Abstract

The purpose of data clustering is to identify useful patterns in the underlying dataset. However, finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Density-based clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities. Existing methods are based on DBSCAN which depends on two specified parameters (Eps and Minpts) that define a single density. Moreover, most of these methods are unsupervised, which cannot improve the clustering quality by utilizing a small number of prior knowledge. In this paper we show how background knowledge can be used to bias a density-based clustering algorithm for multi-density data. First we divide the dataset into different density levels and detect suitable density parameters for each density level. Then we describe how pairwise constraints can be used to help the algorithm expanding the clustering process based on the computed density parameters. Experimental results on both synthetic and real datasets confirm that the proposed algorithm gives better results than other semi-supervised and unsupervised clustering algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhu, X.: Semi-supervised learning literature survey. Technical Report, Computer Sciences. University of Wisconsin-Madison (2007)

    Google Scholar 

  2. Bilenko, M., Basu, S., Mooney, R.: Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the 21st International Conference on Machine Learning, pp. 81–88 (2004)

    Google Scholar 

  3. Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the 17th International Conference on Machine Learning, pp. 1103–1110 (2000)

    Google Scholar 

  4. Zeng, H., Cheung, Y.: Semi-supervised maximum margin clustering with pairwise constraints. IEEE Transactions on Knowledge and Data Engineering 24, 926–939 (2012)

    Article  Google Scholar 

  5. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)

    Google Scholar 

  6. Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J.: OPTICS: Ordering points to identify the clustering structure. In ACM SIGMOD International Conference on the Management of Data (1999)

    Google Scholar 

  7. Hinneburg, A., Keim, D.: An efficient approach to clustering in large multimedia data sets with noise. In: Proceedings of 4th International Conference on Knowledge Discovery and Data Mining, pp. 58–65, (1998)

    Google Scholar 

  8. Chen, X., Liu, W., Qiu, K., Lai, J.: APSCAN: A Parameter Free Algorithm for Clustering. Pattern Recognition Letters 32, 973–986 (2011)

    Article  Google Scholar 

  9. Bohm, C., Plant, C.: HISSCLU: a hierarchical density-based method for semi-supervised clustering. In: Proceedings of 11th International Conference on Extending Database Technology (2008)

    Google Scholar 

  10. Ruiz, C., Spiliopoulou, M., Menasalvas, E.: Density-based semi-supervised clustering. Data Mining and Knowledge Discovery 21, 345–370 (2010)

    Article  MathSciNet  Google Scholar 

  11. Lelis, L., Sander, J.: Semi-Supervised Density-Based Clustering. In: Proceedings of 8th IEEE International Conference on Data Mining, pp. 842–847 (2009)

    Google Scholar 

  12. Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraints-set utility for partitional clustering algorithms. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML, PKDD, pp. 115–126 (2006)

    Google Scholar 

  13. Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. Journal of the American statistical Association 97(458), 611–631 (2002)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Walid Atwa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Atwa, W., Li, K. (2017). Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2017. Lecture Notes in Computer Science(), vol 10357. Springer, Cham. https://doi.org/10.1007/978-3-319-62701-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62701-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62700-7

  • Online ISBN: 978-3-319-62701-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics