Abstract
Measure of similarity between objects plays an important role in clustering. Most of the clustering methods use Euclidean metric as a measure of distance. However, due to the limitations of the partitioning clustering methods, another family of clustering algorithms called density-based methods has been developed. This paper introduces a new distance measure that equips the distance function with a density-aware component. This distance measure, called Density-Penalized Distance (DPD), is a regularized Euclidean distance that adds a penalty term to Euclidean distance based on the difference between the densities around the two points. The intuition behind the idea is that if the densities around two points differ from each other, they are less likely to belong to same cluster. A new point density estimation method, an analysis on the computational complexity of the algorithm in addition to theoretical analysis of the distance function properties are also provided in this work. Experiments were conducted in five different clustering algorithms and the results of DPD are compared with that obtained by using three other standard distance measures. Nine different UCI datasets were used for evaluation. The results show that the performance of DPD is significantly better or at least comparable to the classical distance measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the 5th Berkeley Symp. on Math. Stat. and Prob., vol. 1, pp. 281–297 (1967)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Stat. Society 39, 1–38 (1977)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of KDD, pp. 226–231 (1996)
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery 2, 169–194 (1998)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. ACM SIGMOD Record 28, 49–60 (1999)
Ye, J., Zhao, Z., Liu, H.: Adaptive distance metric learning for clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–7 (2007)
Chen, J., Zhao, Z., Ye, J., Liu, H.: Nonlinear adaptive distance metric learning for clustering. In: Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 123–132 (2007)
Charalampidis, D.: A modified k-means algorithm for circular invariant clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 27, 1856–1865 (2005)
Bandyopadhyay, S., Saha, S.: GAPS: A clustering method using a new point symmetry-based distance measure. Pattern Recognition 40, 3430–3451 (2007)
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 1373–1396 (2003)
Niyogi, X.: Locality preserving projections. Neural Information Processing Systems 16, 153 (2004)
Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. John Wiley & Sons (2009)
Ng, R.T., Han, J.: CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering 14, 1003–1016 (2002)
Luxburg, U.V.: A tutorial on spectral clustering. Statistics and Computing 17, 395–416 (2007)
Scott, D.W.: Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons (2009)
Kim, J., Scott, C.D.: Robust kernel density estimation. The Journal of Machine Learning Research (JMLR) 13, 2529–2565 (2012)
Seber, G.A.: Multivariate observations. John Wiley & Sons (1984)
Vaidya, P.M.: An \(O(n\log n)\) algorithm for the all-nearest-neighbors problem. Discrete & Computational Geometry 4, 101–115 (1989)
Clarkson, K.L.: Fast algorithms for the all nearest neighbors problem. In: IEEE 24th Annual Symposium on Foundations of Computer Science (FOCS), pp. 226–232 (1983)
Bache, K., Lichman, M.: UCI machine learning repository. Univ. of California, Sch. of Inf. and Computer Science, Irvine (2013)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. of the American Stat. Assoc. 66, 846–850 (1971)
Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. of the American Medical Informatics Assoc. 12, 296–298 (2005)
Strehl, A., Ghosh, J.: Cluster ensembles–a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research (JMLR) 3, 583–617 (2003)
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary?. In: Proc. of the 26th Int. Conf. on Machine Learning (ICML), pp. 1073–1080 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Soleimani, B.H., Matwin, S., De Souza, E.N. (2015). A Density-Penalized Distance Measure for Clustering. In: Barbosa, D., Milios, E. (eds) Advances in Artificial Intelligence. Canadian AI 2015. Lecture Notes in Computer Science(), vol 9091. Springer, Cham. https://doi.org/10.1007/978-3-319-18356-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-18356-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18355-8
Online ISBN: 978-3-319-18356-5
eBook Packages: Computer ScienceComputer Science (R0)