The Hubness Phenomenon: Fact or Artifact?

Low, Thomas; Borgelt, Christian; Stober, Sebastian; Nürnberger, Andreas

doi:10.1007/978-3-642-30278-7_21

Thomas Low⁵,
Christian Borgelt⁶,
Sebastian Stober⁵ &
…
Andreas Nürnberger⁵

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 285))

1653 Accesses
8 Citations

Abstract

The hubness phenomenon, as it was recently described, consists in the observation that for increasing dimensionality of a data set the distribution of the number of times a data point occurs among the k nearest neighbors of other data points becomes increasingly skewed to the right. As a consequence, so-called hubs emerge, that is, data points that appear in the lists of the k nearest neighbors of other data points much more often than others. In this paper we challenge the hypothesis that the hubness phenomenon is an effect of the dimensionality of the data set and provide evidence that it is rather a boundary effect or, more generally, an effect of a density gradient. As such, it may be seen as an artifact that results from the process in which the data is generated that is used to demonstrate this phenomenon. We report experiments showing that the hubness phenomenon need not occur in high-dimensional data and can be made to occur in low-dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bellman, R.E.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
MATH Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is Nearest Neighbor Meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Conway, J.H., Sloane, N.J.A.: Sphere Packings, Lattices and Groups, 3rd edn. Springer, New York (1999)
MATH Google Scholar
Groeneveld, R.A., Meeden, G.: Measuring Skewness and Kurtosis. J. of the Royal Statistical Society, Series D (The Statistician) 33(4), 391–399 (1984)
Google Scholar
Knuth, D.E.: The Art of Computer Programming, vol. 2: Seminumerical Algorithms. Addison-Wesley, Reading (1998)
MATH Google Scholar
Marsaglia, G.: Re: good C random number generator. Post on newsgroup comp.lang.c, date: 2003-05-13 08:55:05 PST (2003), http://groups.google.com/group/comp.lang.c/browse_thread/thread/a9915080a4424068/
Marsaglia, G., Bray, T.A.: A Convenient Method for Generating Normal Variables. SIAM Review 6, 260–264 (1964)
Article MathSciNet MATH Google Scholar
Matsumoto, M., Nishimura, T.: Mersenne Twister: A 623-dimensionally Equidistributed Uniform Pseudorandom Number Generator. ACM Trans. on Modeling and Computer Simulation 8, 3–30 (1998)
Article MATH Google Scholar
Nebe, G., Sloane, N.J.A.: Table of the Highest Kissing Numbers Presently Known (2012), http://www.math.rwth-aachen.de/~Gabriele.Nebe/LATTICES/kiss.html (retrieved January 16, 2012)
Radovanović, M., Nanopoulos, A., Ivanović, M.: Nearest Neighbors in High-Dimensional Data: The Emergence and Influence of Hubs. In: Proc. 26th Int. Conf. on Machine Learning (ICML 2009), Montreal, Canada, pp. 865–872. ACM Press, New York (2009)
Google Scholar
Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data. J. Machine Learning Research 11, 2487–2531 (2010)
MATH Google Scholar
Rubinstein, R.Y., Kroese, D.P.: Simulation and the Monte Carlo Method, 2nd edn. J. Wiley & Sons, Chichester (2007)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Data and Knowledge Engineering Group, Otto-von-Guericke-University of Magdeburg, Universitätsplatz 2, D-39106, Magdeburg, Germany
Thomas Low, Sebastian Stober & Andreas Nürnberger
European Centre for Soft Computing, c/ Gonzalo Gutiérrez Quirós s/n, E-33600, Mieres, Asturias, Spain
Christian Borgelt

Authors

Thomas Low
View author publications
You can also search for this author in PubMed Google Scholar
Christian Borgelt
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Stober
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Nürnberger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Low .

Editor information

Editors and Affiliations

Intelligent Data Analysis & Graphical, Models Research Unit, European Centre for Soft Computing, C/ Gonzalo Gutierre, Mieres, 33600, Asturias, Spain
Christian Borgelt
, Departamento de Estadistica e I. O. y, Universidad de Oviedo, C/Calvo Sotelo, s/n, Oviedo, 33071, Spain
María Ángeles Gil
Instituto Superior Técnico, Department of Mechanical Engineering, Technical University Lisbon, Av. Rovisco Pais 1, Lisboa, 1049-001, Portugal
João M.C. Sousa
Labo. Microelectronique, Université Catholique de Louvain, place du Levant 3, Leuven, 1348, Belgium
Michel Verleysen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Low, T., Borgelt, C., Stober, S., Nürnberger, A. (2013). The Hubness Phenomenon: Fact or Artifact?. In: Borgelt, C., Gil, M., Sousa, J., Verleysen, M. (eds) Towards Advanced Data Analysis by Combining Soft Computing and Statistics. Studies in Fuzziness and Soft Computing, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30278-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-30278-7_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30277-0
Online ISBN: 978-3-642-30278-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics