A Density-Penalized Distance Measure for Clustering

Soleimani, Behrouz Haji; Matwin, Stan; De Souza, Erico N.

doi:10.1007/978-3-319-18356-5_21

Behrouz Haji Soleimani⁶,
Stan Matwin^6,7 &
Erico N. De Souza⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9091))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

2685 Accesses
1 Citations

Abstract

Measure of similarity between objects plays an important role in clustering. Most of the clustering methods use Euclidean metric as a measure of distance. However, due to the limitations of the partitioning clustering methods, another family of clustering algorithms called density-based methods has been developed. This paper introduces a new distance measure that equips the distance function with a density-aware component. This distance measure, called Density-Penalized Distance (DPD), is a regularized Euclidean distance that adds a penalty term to Euclidean distance based on the difference between the densities around the two points. The intuition behind the idea is that if the densities around two points differ from each other, they are less likely to belong to same cluster. A new point density estimation method, an analysis on the computational complexity of the algorithm in addition to theoretical analysis of the distance function properties are also provided in this work. Experiments were conducted in five different clustering algorithms and the results of DPD are compared with that obtained by using three other standard distance measures. Nine different UCI datasets were used for evaluation. The results show that the performance of DPD is significantly better or at least comparable to the classical distance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the 5th Berkeley Symp. on Math. Stat. and Prob., vol. 1, pp. 281–297 (1967)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Stat. Society 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of KDD, pp. 226–231 (1996)
Google Scholar
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery 2, 169–194 (1998)
Article Google Scholar
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. ACM SIGMOD Record 28, 49–60 (1999)
Article Google Scholar
Ye, J., Zhao, Z., Liu, H.: Adaptive distance metric learning for clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–7 (2007)
Google Scholar
Chen, J., Zhao, Z., Ye, J., Liu, H.: Nonlinear adaptive distance metric learning for clustering. In: Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 123–132 (2007)
Google Scholar
Charalampidis, D.: A modified k-means algorithm for circular invariant clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 27, 1856–1865 (2005)
Article Google Scholar
Bandyopadhyay, S., Saha, S.: GAPS: A clustering method using a new point symmetry-based distance measure. Pattern Recognition 40, 3430–3451 (2007)
Article MATH Google Scholar
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Article Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 1373–1396 (2003)
Article MATH Google Scholar
Niyogi, X.: Locality preserving projections. Neural Information Processing Systems 16, 153 (2004)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. John Wiley & Sons (2009)
Google Scholar
Ng, R.T., Han, J.: CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering 14, 1003–1016 (2002)
Article Google Scholar
Luxburg, U.V.: A tutorial on spectral clustering. Statistics and Computing 17, 395–416 (2007)
Article MathSciNet Google Scholar
Scott, D.W.: Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons (2009)
Google Scholar
Kim, J., Scott, C.D.: Robust kernel density estimation. The Journal of Machine Learning Research (JMLR) 13, 2529–2565 (2012)
MATH MathSciNet Google Scholar
Seber, G.A.: Multivariate observations. John Wiley & Sons (1984)
Google Scholar
Vaidya, P.M.: An \(O(n\log n)\) algorithm for the all-nearest-neighbors problem. Discrete & Computational Geometry 4, 101–115 (1989)
Article MATH MathSciNet Google Scholar
Clarkson, K.L.: Fast algorithms for the all nearest neighbors problem. In: IEEE 24th Annual Symposium on Foundations of Computer Science (FOCS), pp. 226–232 (1983)
Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository. Univ. of California, Sch. of Inf. and Computer Science, Irvine (2013)
Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. of the American Stat. Assoc. 66, 846–850 (1971)
Article Google Scholar
Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. of the American Medical Informatics Assoc. 12, 296–298 (2005)
Article Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles–a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research (JMLR) 3, 583–617 (2003)
MATH MathSciNet Google Scholar
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary?. In: Proc. of the 26th Int. Conf. on Machine Learning (ICML), pp. 1073–1080 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, Halifx, NS, Canada
Behrouz Haji Soleimani, Stan Matwin & Erico N. De Souza
Institute for Computer Science, Polish Academy of Sciences, Warsaw, Poland
Stan Matwin

Authors

Behrouz Haji Soleimani
View author publications
You can also search for this author in PubMed Google Scholar
Stan Matwin
View author publications
You can also search for this author in PubMed Google Scholar
Erico N. De Souza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Behrouz Haji Soleimani .

Editor information

Editors and Affiliations

University of Alberta, Edmonton, Canada
Denilson Barbosa
Dalhousie University, Halifax, Canada
Evangelos Milios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soleimani, B.H., Matwin, S., De Souza, E.N. (2015). A Density-Penalized Distance Measure for Clustering. In: Barbosa, D., Milios, E. (eds) Advances in Artificial Intelligence. Canadian AI 2015. Lecture Notes in Computer Science(), vol 9091. Springer, Cham. https://doi.org/10.1007/978-3-319-18356-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-18356-5_21
Published: 29 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18355-8
Online ISBN: 978-3-319-18356-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics