Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size

Hagedoorn, Michiel

doi:10.1007/3-540-36285-1_29

Michiel Hagedoorn⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2572))

Included in the following conference series:

International Conference on Database Theory

535 Accesses
7 Citations

Abstract

We consider the problem of nearest-neighbor search for a set of n data points in d-dimensional Euclidean space. We propose a simple, practical data structure, which is basically a directed acyclic graph in which each node has at most two outgoing arcs. We analyze the performance of this data structure for the setting in which the n data points are chosen independently from a d-dimensional ball under the uniform distribution. In the average case, for fixed dimension d, we achieve a query time of O(log² n) using only O(n) storage space. For variable dimension, both the query time and the storage space are multiplied with a dimension-dependent factor that is at most exponential in d. This is an improvement over previously known time-space tradeoffs, which all have a super-exponential factor of at least d ^θ(d) either in the query time or in the storage space. Our data structure can be stored efficiently in secondary memory: In a standard secondary-memory model, for fixed dimension d, we achieve average-case bounds of O((log² n)/B + log n) query time and O(N) storage space, where B is the block-size parameter and N = n/B. Our data structure is not limited to Euclidean space; its definition generalizes to all possible choices of query objects, data objects, and distance functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. In B. Chazelle, J. E. Goodman, and R. Pollack, editors, Advances in Discrete and Computational Geometry, volume 223 of Contemporary Mathematics, pages 1–56.American Mathematical Society, 1999.
Google Scholar
H. Alt and L. Heinrich-Litan. Exact L∞ nearest neighbor search in high dimensions. In Proceedings of the 17th ACM Symposium on Computational Geometry, pages 157–163, 2001.
Google Scholar
S. Arya and D. M. Mount. Algorithms for fast vector quantization. In Proceedings of the 1993 IEEE Data Compression Conference, pages 381–390, 1993.
Google Scholar
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. Journalof the ACM, 45:891–923, 1998.
Article MATH MathSciNet Google Scholar
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “Nearest Neighbor” meaningful. In Proceedings of the 7th International Conference on Database Theory, pages 217–235, 1999.
Google Scholar
T. M. Chan. Closest-point problems simpli.ed on the RAM. In Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms, 2002.
Google Scholar
K. L. Clarkson. A randomized algorithm for closest-point queries. SIAM Journal on Computing, 17(4):830–847, 1988.
Article MATH MathSciNet Google Scholar
S. Dasgupta and A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. Technical Report TR-99-006, International Computer Science Institute, 1999.
Google Scholar
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. Wiley, 2000.
Google Scholar
R. A. Dwyer. The expected number of k-faces of a Voronoi diagram. Computers and Mathematics with Applications, 26(5):13–19, 1993.
Article MATH MathSciNet Google Scholar
C. Faloutsos and K.-I. Lin. FastMap: a fast algorithm for indexing, data-mining, and visualization of traditional and multimedia databases. In Proceedings of the 1995 ACM-SIGMOD International Conference on Management of Data, pages 163–173, 1995.
Google Scholar
J. Goldstein and R. Ramakrishnan. Contrast plots and p-sphere trees: Space vs. time in nearest neighbor searches. In Proceedings of the 26th International Conferenceon Very Large Data Bases, pages 429–440, 2000.
Google Scholar
S. Har-Peled. A replacement for voronoi diagrams of near linear size. In Proceedings of the 42th IEEE Symposium on the Foundations of Computer Science, pages 94–103, 2001.
Google Scholar
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th ACM Symposium on the Theory of Computing, pages 604–613, 1998.
Google Scholar
F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas. Fast nearest neighbor search in medical image databases. In The International Journal on Very Large Data Bases, pages 215–226, 1996.
Google Scholar
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the 30th ACM Symposium on the Theory of Computing, pages 614–623, 1998.
Google Scholar
S. Meiser. Point location in arrangements of hyperplanes. Information and Computation, 106(2):286–303, 1993.
Article MATH MathSciNet Google Scholar
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University, 1995.
Google Scholar
K. Mulmuley. Computational Geometry: An introduction through randomized algorithms. Prentice Hall, 1994.
Google Scholar
B.-U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In Proceedings of the 16th International Conference on Data Engineering, pages 589–598, 2000.
Google Scholar
A. Pentland, R. W. Picard, and S. Sclaro.. Photobook: tools for content-based manipulation of image databases. International Journal of Computer Vision, 18(3):233–254, 1996.
Article Google Scholar
H. Samet. Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS. Addison-Wesley, 1990.
Google Scholar
J. Vleugels and R. C. Veltkamp. Efficient image retrieval through vantage objects. In Proceedings of the 3rd International Conference on Visual Information Systems, pages 575–584, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Max-Planck-Institut für Informatik, Stuhlsatzenhausweg 85, 66123, Saarbrücken, Germany
Michiel Hagedoorn

Authors

Michiel Hagedoorn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dip. di Informatica e Sistemistica, Università di Roma “La Sapienza”, Via Salaria 113, 00198, Roma, Italy
Diego Calvanese & Maurizio Lenzerini &
Department of Computer Science, Stanford University, Room 474 Gates Computer Science Building 4B, 94305-9045, Stanford, CA, USA
Rajeev Motwani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hagedoorn, M. (2003). Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds) Database Theory — ICDT 2003. ICDT 2003. Lecture Notes in Computer Science, vol 2572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36285-1_29

Download citation

DOI: https://doi.org/10.1007/3-540-36285-1_29
Published: 16 December 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00323-6
Online ISBN: 978-3-540-36285-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics