Skip to main content

A Density-Penalized Distance Measure for Clustering

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9091))

Included in the following conference series:

Abstract

Measure of similarity between objects plays an important role in clustering. Most of the clustering methods use Euclidean metric as a measure of distance. However, due to the limitations of the partitioning clustering methods, another family of clustering algorithms called density-based methods has been developed. This paper introduces a new distance measure that equips the distance function with a density-aware component. This distance measure, called Density-Penalized Distance (DPD), is a regularized Euclidean distance that adds a penalty term to Euclidean distance based on the difference between the densities around the two points. The intuition behind the idea is that if the densities around two points differ from each other, they are less likely to belong to same cluster. A new point density estimation method, an analysis on the computational complexity of the algorithm in addition to theoretical analysis of the distance function properties are also provided in this work. Experiments were conducted in five different clustering algorithms and the results of DPD are compared with that obtained by using three other standard distance measures. Nine different UCI datasets were used for evaluation. The results show that the performance of DPD is significantly better or at least comparable to the classical distance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the 5th Berkeley Symp. on Math. Stat. and Prob., vol. 1, pp. 281–297 (1967)

    Google Scholar 

  2. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Stat. Society 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  3. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of KDD, pp. 226–231 (1996)

    Google Scholar 

  4. Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery 2, 169–194 (1998)

    Article  Google Scholar 

  5. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. ACM SIGMOD Record 28, 49–60 (1999)

    Article  Google Scholar 

  6. Ye, J., Zhao, Z., Liu, H.: Adaptive distance metric learning for clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–7 (2007)

    Google Scholar 

  7. Chen, J., Zhao, Z., Ye, J., Liu, H.: Nonlinear adaptive distance metric learning for clustering. In: Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 123–132 (2007)

    Google Scholar 

  8. Charalampidis, D.: A modified k-means algorithm for circular invariant clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 27, 1856–1865 (2005)

    Article  Google Scholar 

  9. Bandyopadhyay, S., Saha, S.: GAPS: A clustering method using a new point symmetry-based distance measure. Pattern Recognition 40, 3430–3451 (2007)

    Article  MATH  Google Scholar 

  10. Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  11. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  12. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 1373–1396 (2003)

    Article  MATH  Google Scholar 

  13. Niyogi, X.: Locality preserving projections. Neural Information Processing Systems 16, 153 (2004)

    Google Scholar 

  14. Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. John Wiley & Sons (2009)

    Google Scholar 

  15. Ng, R.T., Han, J.: CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering 14, 1003–1016 (2002)

    Article  Google Scholar 

  16. Luxburg, U.V.: A tutorial on spectral clustering. Statistics and Computing 17, 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  17. Scott, D.W.: Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons (2009)

    Google Scholar 

  18. Kim, J., Scott, C.D.: Robust kernel density estimation. The Journal of Machine Learning Research (JMLR) 13, 2529–2565 (2012)

    MATH  MathSciNet  Google Scholar 

  19. Seber, G.A.: Multivariate observations. John Wiley & Sons (1984)

    Google Scholar 

  20. Vaidya, P.M.: An \(O(n\log n)\) algorithm for the all-nearest-neighbors problem. Discrete & Computational Geometry 4, 101–115 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  21. Clarkson, K.L.: Fast algorithms for the all nearest neighbors problem. In: IEEE 24th Annual Symposium on Foundations of Computer Science (FOCS), pp. 226–232 (1983)

    Google Scholar 

  22. Bache, K., Lichman, M.: UCI machine learning repository. Univ. of California, Sch. of Inf. and Computer Science, Irvine (2013)

    Google Scholar 

  23. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. of the American Stat. Assoc. 66, 846–850 (1971)

    Article  Google Scholar 

  24. Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. of the American Medical Informatics Assoc. 12, 296–298 (2005)

    Article  Google Scholar 

  25. Strehl, A., Ghosh, J.: Cluster ensembles–a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research (JMLR) 3, 583–617 (2003)

    MATH  MathSciNet  Google Scholar 

  26. Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary?. In: Proc. of the 26th Int. Conf. on Machine Learning (ICML), pp. 1073–1080 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Behrouz Haji Soleimani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Soleimani, B.H., Matwin, S., De Souza, E.N. (2015). A Density-Penalized Distance Measure for Clustering. In: Barbosa, D., Milios, E. (eds) Advances in Artificial Intelligence. Canadian AI 2015. Lecture Notes in Computer Science(), vol 9091. Springer, Cham. https://doi.org/10.1007/978-3-319-18356-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18356-5_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18355-8

  • Online ISBN: 978-3-319-18356-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics