Skip to main content

Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size

  • Conference paper
  • First Online:
Database Theory — ICDT 2003 (ICDT 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2572))

Included in the following conference series:

Abstract

We consider the problem of nearest-neighbor search for a set of n data points in d-dimensional Euclidean space. We propose a simple, practical data structure, which is basically a directed acyclic graph in which each node has at most two outgoing arcs. We analyze the performance of this data structure for the setting in which the n data points are chosen independently from a d-dimensional ball under the uniform distribution. In the average case, for fixed dimension d, we achieve a query time of O(log2 n) using only O(n) storage space. For variable dimension, both the query time and the storage space are multiplied with a dimension-dependent factor that is at most exponential in d. This is an improvement over previously known time-space tradeoffs, which all have a super-exponential factor of at least d θ(d) either in the query time or in the storage space. Our data structure can be stored efficiently in secondary memory: In a standard secondary-memory model, for fixed dimension d, we achieve average-case bounds of O((log2 n)/B + log n) query time and O(N) storage space, where B is the block-size parameter and N = n/B. Our data structure is not limited to Euclidean space; its definition generalizes to all possible choices of query objects, data objects, and distance functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. In B. Chazelle, J. E. Goodman, and R. Pollack, editors, Advances in Discrete and Computational Geometry, volume 223 of Contemporary Mathematics, pages 1–56.American Mathematical Society, 1999.

    Google Scholar 

  2. H. Alt and L. Heinrich-Litan. Exact L∞ nearest neighbor search in high dimensions. In Proceedings of the 17th ACM Symposium on Computational Geometry, pages 157–163, 2001.

    Google Scholar 

  3. S. Arya and D. M. Mount. Algorithms for fast vector quantization. In Proceedings of the 1993 IEEE Data Compression Conference, pages 381–390, 1993.

    Google Scholar 

  4. S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. Journalof the ACM, 45:891–923, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  5. K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “Nearest Neighbor” meaningful. In Proceedings of the 7th International Conference on Database Theory, pages 217–235, 1999.

    Google Scholar 

  6. T. M. Chan. Closest-point problems simpli.ed on the RAM. In Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms, 2002.

    Google Scholar 

  7. K. L. Clarkson. A randomized algorithm for closest-point queries. SIAM Journal on Computing, 17(4):830–847, 1988.

    Article  MATH  MathSciNet  Google Scholar 

  8. S. Dasgupta and A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. Technical Report TR-99-006, International Computer Science Institute, 1999.

    Google Scholar 

  9. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. Wiley, 2000.

    Google Scholar 

  10. R. A. Dwyer. The expected number of k-faces of a Voronoi diagram. Computers and Mathematics with Applications, 26(5):13–19, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  11. C. Faloutsos and K.-I. Lin. FastMap: a fast algorithm for indexing, data-mining, and visualization of traditional and multimedia databases. In Proceedings of the 1995 ACM-SIGMOD International Conference on Management of Data, pages 163–173, 1995.

    Google Scholar 

  12. J. Goldstein and R. Ramakrishnan. Contrast plots and p-sphere trees: Space vs. time in nearest neighbor searches. In Proceedings of the 26th International Conferenceon Very Large Data Bases, pages 429–440, 2000.

    Google Scholar 

  13. S. Har-Peled. A replacement for voronoi diagrams of near linear size. In Proceedings of the 42th IEEE Symposium on the Foundations of Computer Science, pages 94–103, 2001.

    Google Scholar 

  14. P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th ACM Symposium on the Theory of Computing, pages 604–613, 1998.

    Google Scholar 

  15. F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas. Fast nearest neighbor search in medical image databases. In The International Journal on Very Large Data Bases, pages 215–226, 1996.

    Google Scholar 

  16. E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the 30th ACM Symposium on the Theory of Computing, pages 614–623, 1998.

    Google Scholar 

  17. S. Meiser. Point location in arrangements of hyperplanes. Information and Computation, 106(2):286–303, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  18. R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University, 1995.

    Google Scholar 

  19. K. Mulmuley. Computational Geometry: An introduction through randomized algorithms. Prentice Hall, 1994.

    Google Scholar 

  20. B.-U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In Proceedings of the 16th International Conference on Data Engineering, pages 589–598, 2000.

    Google Scholar 

  21. A. Pentland, R. W. Picard, and S. Sclaro.. Photobook: tools for content-based manipulation of image databases. International Journal of Computer Vision, 18(3):233–254, 1996.

    Article  Google Scholar 

  22. H. Samet. Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS. Addison-Wesley, 1990.

    Google Scholar 

  23. J. Vleugels and R. C. Veltkamp. Efficient image retrieval through vantage objects. In Proceedings of the 3rd International Conference on Visual Information Systems, pages 575–584, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hagedoorn, M. (2003). Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds) Database Theory — ICDT 2003. ICDT 2003. Lecture Notes in Computer Science, vol 2572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36285-1_29

Download citation

  • DOI: https://doi.org/10.1007/3-540-36285-1_29

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00323-6

  • Online ISBN: 978-3-540-36285-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics