A New Distance Function for Prototype Based Clustering Algorithms in High Dimensional Spaces

Winkler, Roland; Klawonn, Frank; Kruse, Rudolf

doi:10.1007/978-3-319-00032-9_42

Roland Winkler⁴,
Frank Klawonn⁵ &
Rudolf Kruse⁶

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

5142 Accesses
2 Citations

Abstract

High dimensional data analysis poses some interesting and counter intuitive problems. One of this problems is, that some clustering algorithms do not work or work only very poorly if the dimensionality of the feature space is high. The reason for this is an effect called distance concentration. In this paper, we show that the effect can be countered for prototype based clustering algorithms by using a clever alteration of the distance function. We show the success of this process by applying (but not restricting) it on FCM. A useful side effect is, that our method can also be used to estimate the number of clusters in a data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is nearest neighbor meaningful? In Database theory - ICDT’99, vol.1540 of Lecture notes in computer science (pp. 217–235). Berlin/Heidelberg: Springer.
Google Scholar
Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.
Book MATH Google Scholar
Dave, R. N. (1991). Characterization and detection of noise in clustering. Pattern Recognition Letters, 12(11), 657–664.
Article Google Scholar
Dunn, J. C. (1973). A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Cybernetics and Systems: An International Journal, 3(3), 32–57.
Article MathSciNet MATH Google Scholar
Durrant, R. J., & Kabán, A. (2008). When is ‘nearest neighbour’ meaningful: A converse theorem and implications. Journal of Complexity, 25(4), 385–397.
Article Google Scholar
Hinneburg, A., Aggarwal, C. C., & Keim, D. A. (2000). What is the nearest neighbor in high dimensional spaces? In VLDB ’00: Proceedings of the 26th international conference on very large data bases (pp. 506–515). San Francisco, CA: Morgan Kaufmann Publishers.
Google Scholar
Hsu, C. -M., & Chen, M. -S. (2009). On the design and applicability of distance functions in high-dimensional data space. IEEE Transactions on Knowledge and Data Engineering, 21(4), 523–536.
Article MathSciNet Google Scholar
Winkler, R., Klawonn, F., & Kruse, R. (2011). Fuzzy c-means in high dimensional spaces. IJFSA, 1(1), 1–16.
MathSciNet Google Scholar

Download references

Acknowledgements

We like to thank the FRAPORT AG for providing the data for scientific analysis, represented by Steffen Wendeberg, Thilo Schneider and Andreas Figur. We also would like to thank the engineers of DLR Braunschweig for setting up the database system, namely Hans Kawohl and his staff.

Author information

Authors and Affiliations

German Aerospace Center Braunschweig, Braunschweig, Germany
Roland Winkler
Ostfalia, University of Applied Sciences, Wolfenbüttel, Germany
Frank Klawonn
Otto-von-Guericke University Magdeburg, Magdeburg, Germany
Rudolf Kruse

Authors

Roland Winkler
View author publications
You can also search for this author in PubMed Google Scholar
Frank Klawonn
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Kruse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roland Winkler .

Editor information

Editors and Affiliations

Department of Economics, and Management, University of Pavia, Via San Felice 7, Pavia, 27100, Italy
Paolo Giudici
Department of Economics, and Business, University of Catania, Corso Italia 55, Catania, 95129, Italy
Salvatore Ingrassia
, Department of Statistics, University of Rome "La Sapienza", Piazzale Aldo Moro 5, Rome, 00185, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Winkler, R., Klawonn, F., Kruse, R. (2013). A New Distance Function for Prototype Based Clustering Algorithms in High Dimensional Spaces. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-00032-9_42
Published: 22 May 2013
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00031-2
Online ISBN: 978-3-319-00032-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics