Skip to main content

A New Distance Function for Prototype Based Clustering Algorithms in High Dimensional Spaces

  • Conference paper
  • First Online:
Statistical Models for Data Analysis

Abstract

High dimensional data analysis poses some interesting and counter intuitive problems. One of this problems is, that some clustering algorithms do not work or work only very poorly if the dimensionality of the feature space is high. The reason for this is an effect called distance concentration. In this paper, we show that the effect can be countered for prototype based clustering algorithms by using a clever alteration of the distance function. We show the success of this process by applying (but not restricting) it on FCM. A useful side effect is, that our method can also be used to estimate the number of clusters in a data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is nearest neighbor meaningful? In Database theory - ICDT’99, vol.1540 of Lecture notes in computer science (pp. 217–235). Berlin/Heidelberg: Springer.

    Google Scholar 

  • Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.

    Book  MATH  Google Scholar 

  • Dave, R. N. (1991). Characterization and detection of noise in clustering. Pattern Recognition Letters, 12(11), 657–664.

    Article  Google Scholar 

  • Dunn, J. C. (1973). A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Cybernetics and Systems: An International Journal, 3(3), 32–57.

    Article  MathSciNet  MATH  Google Scholar 

  • Durrant, R. J., & Kabán, A. (2008). When is ‘nearest neighbour’ meaningful: A converse theorem and implications. Journal of Complexity, 25(4), 385–397.

    Article  Google Scholar 

  • Hinneburg, A., Aggarwal, C. C., & Keim, D. A. (2000). What is the nearest neighbor in high dimensional spaces? In VLDB ’00: Proceedings of the 26th international conference on very large data bases (pp. 506–515). San Francisco, CA: Morgan Kaufmann Publishers.

    Google Scholar 

  • Hsu, C. -M., & Chen, M. -S. (2009). On the design and applicability of distance functions in high-dimensional data space. IEEE Transactions on Knowledge and Data Engineering, 21(4), 523–536.

    Article  MathSciNet  Google Scholar 

  • Winkler, R., Klawonn, F., & Kruse, R. (2011). Fuzzy c-means in high dimensional spaces. IJFSA, 1(1), 1–16.

    MathSciNet  Google Scholar 

Download references

Acknowledgements

We like to thank the FRAPORT AG for providing the data for scientific analysis, represented by Steffen Wendeberg, Thilo Schneider and Andreas Figur. We also would like to thank the engineers of DLR Braunschweig for setting up the database system, namely Hans Kawohl and his staff.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roland Winkler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Winkler, R., Klawonn, F., Kruse, R. (2013). A New Distance Function for Prototype Based Clustering Algorithms in High Dimensional Spaces. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_42

Download citation

Publish with us

Policies and ethics