Abstract
We consider the problem of nearest-neighbor search for a set of n data points in d-dimensional Euclidean space. We propose a simple, practical data structure, which is basically a directed acyclic graph in which each node has at most two outgoing arcs. We analyze the performance of this data structure for the setting in which the n data points are chosen independently from a d-dimensional ball under the uniform distribution. In the average case, for fixed dimension d, we achieve a query time of O(log2 n) using only O(n) storage space. For variable dimension, both the query time and the storage space are multiplied with a dimension-dependent factor that is at most exponential in d. This is an improvement over previously known time-space tradeoffs, which all have a super-exponential factor of at least d θ(d) either in the query time or in the storage space. Our data structure can be stored efficiently in secondary memory: In a standard secondary-memory model, for fixed dimension d, we achieve average-case bounds of O((log2 n)/B + log n) query time and O(N) storage space, where B is the block-size parameter and N = n/B. Our data structure is not limited to Euclidean space; its definition generalizes to all possible choices of query objects, data objects, and distance functions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. In B. Chazelle, J. E. Goodman, and R. Pollack, editors, Advances in Discrete and Computational Geometry, volume 223 of Contemporary Mathematics, pages 1–56.American Mathematical Society, 1999.
H. Alt and L. Heinrich-Litan. Exact L∞ nearest neighbor search in high dimensions. In Proceedings of the 17th ACM Symposium on Computational Geometry, pages 157–163, 2001.
S. Arya and D. M. Mount. Algorithms for fast vector quantization. In Proceedings of the 1993 IEEE Data Compression Conference, pages 381–390, 1993.
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. Journalof the ACM, 45:891–923, 1998.
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “Nearest Neighbor” meaningful. In Proceedings of the 7th International Conference on Database Theory, pages 217–235, 1999.
T. M. Chan. Closest-point problems simpli.ed on the RAM. In Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms, 2002.
K. L. Clarkson. A randomized algorithm for closest-point queries. SIAM Journal on Computing, 17(4):830–847, 1988.
S. Dasgupta and A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. Technical Report TR-99-006, International Computer Science Institute, 1999.
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. Wiley, 2000.
R. A. Dwyer. The expected number of k-faces of a Voronoi diagram. Computers and Mathematics with Applications, 26(5):13–19, 1993.
C. Faloutsos and K.-I. Lin. FastMap: a fast algorithm for indexing, data-mining, and visualization of traditional and multimedia databases. In Proceedings of the 1995 ACM-SIGMOD International Conference on Management of Data, pages 163–173, 1995.
J. Goldstein and R. Ramakrishnan. Contrast plots and p-sphere trees: Space vs. time in nearest neighbor searches. In Proceedings of the 26th International Conferenceon Very Large Data Bases, pages 429–440, 2000.
S. Har-Peled. A replacement for voronoi diagrams of near linear size. In Proceedings of the 42th IEEE Symposium on the Foundations of Computer Science, pages 94–103, 2001.
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th ACM Symposium on the Theory of Computing, pages 604–613, 1998.
F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas. Fast nearest neighbor search in medical image databases. In The International Journal on Very Large Data Bases, pages 215–226, 1996.
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the 30th ACM Symposium on the Theory of Computing, pages 614–623, 1998.
S. Meiser. Point location in arrangements of hyperplanes. Information and Computation, 106(2):286–303, 1993.
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University, 1995.
K. Mulmuley. Computational Geometry: An introduction through randomized algorithms. Prentice Hall, 1994.
B.-U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In Proceedings of the 16th International Conference on Data Engineering, pages 589–598, 2000.
A. Pentland, R. W. Picard, and S. Sclaro.. Photobook: tools for content-based manipulation of image databases. International Journal of Computer Vision, 18(3):233–254, 1996.
H. Samet. Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS. Addison-Wesley, 1990.
J. Vleugels and R. C. Veltkamp. Efficient image retrieval through vantage objects. In Proceedings of the 3rd International Conference on Visual Information Systems, pages 575–584, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hagedoorn, M. (2003). Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds) Database Theory — ICDT 2003. ICDT 2003. Lecture Notes in Computer Science, vol 2572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36285-1_29
Download citation
DOI: https://doi.org/10.1007/3-540-36285-1_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00323-6
Online ISBN: 978-3-540-36285-2
eBook Packages: Springer Book Archive