Abstract
High dimensional data analysis poses some interesting and counter intuitive problems. One of this problems is, that some clustering algorithms do not work or work only very poorly if the dimensionality of the feature space is high. The reason for this is an effect called distance concentration. In this paper, we show that the effect can be countered for prototype based clustering algorithms by using a clever alteration of the distance function. We show the success of this process by applying (but not restricting) it on FCM. A useful side effect is, that our method can also be used to estimate the number of clusters in a data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is nearest neighbor meaningful? In Database theory - ICDT’99, vol.1540 of Lecture notes in computer science (pp. 217–235). Berlin/Heidelberg: Springer.
Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.
Dave, R. N. (1991). Characterization and detection of noise in clustering. Pattern Recognition Letters, 12(11), 657–664.
Dunn, J. C. (1973). A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Cybernetics and Systems: An International Journal, 3(3), 32–57.
Durrant, R. J., & Kabán, A. (2008). When is ‘nearest neighbour’ meaningful: A converse theorem and implications. Journal of Complexity, 25(4), 385–397.
Hinneburg, A., Aggarwal, C. C., & Keim, D. A. (2000). What is the nearest neighbor in high dimensional spaces? In VLDB ’00: Proceedings of the 26th international conference on very large data bases (pp. 506–515). San Francisco, CA: Morgan Kaufmann Publishers.
Hsu, C. -M., & Chen, M. -S. (2009). On the design and applicability of distance functions in high-dimensional data space. IEEE Transactions on Knowledge and Data Engineering, 21(4), 523–536.
Winkler, R., Klawonn, F., & Kruse, R. (2011). Fuzzy c-means in high dimensional spaces. IJFSA, 1(1), 1–16.
Acknowledgements
We like to thank the FRAPORT AG for providing the data for scientific analysis, represented by Steffen Wendeberg, Thilo Schneider and Andreas Figur. We also would like to thank the engineers of DLR Braunschweig for setting up the database system, namely Hans Kawohl and his staff.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Winkler, R., Klawonn, F., Kruse, R. (2013). A New Distance Function for Prototype Based Clustering Algorithms in High Dimensional Spaces. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-00032-9_42
Published:
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00031-2
Online ISBN: 978-3-319-00032-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)