Skip to main content

On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces

  • Conference paper
  • First Online:
Database Theory — ICDT 2001 (ICDT 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1973))

Included in the following conference series:

Abstract

Nearest-neighbor queries in high-dimensional space are of high importance in various applications, especially in content-based indexing of multimedia data. For an optimization of the query processing, accurate models for estimating the query processing costs are needed. In this paper, we propose a new cost model for nearest neighbor queries in high-dimensional space, which we apply to enhance the performance of high-dimensional index structures. The model is based on new insights into effects occurring in high-dimensional space and provides a closed formula for the processing costs of nearest neighbor queries depending on the dimensionality, the block size and the database size. From the wide range of possible applications of our model, we select two interesting samples: First, we use the model to prove the known linear complexity of the nearest neighbor search problem in high-dimensional space, and second, we provide a technique for optimizing the block size. For data of medium dimensionality, the optimized block size allows significant speed-ups of the query processing time when compared to traditional block sizes and to the linear scan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J.: ‘A Basic Local Alignment Search Tool’, Journal of Molecular Biology, Vol. 215, No. 3, 1990, pp. 403–410.

    Google Scholar 

  2. Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: ‘The R*-tree: An Efficient and Robust Access Method for Points and Rectangles’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 322–331.

    Google Scholar 

  3. Berchtold S., Böhm C., Keim D., Kriegel H.-P.: ‘A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space’, Proc. ACM PODS Int. Conf. on Principles of Databases, Tucson, Arizona, 1997.

    Google Scholar 

  4. Berchtold S., Böhm C., Braunmüller B., Keim D., Kriegel H.-P.: ‘Fast Parallel Similarity Search in Multimedia Databases’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, Arizona, 1997.

    Google Scholar 

  5. Berchtold S., Keim D. A.: ‘High-dimensional Index Structures: Database Support for Next Decades’s Applications’, Tutorial, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1998, p. 501.

    Google Scholar 

  6. Berchtold S., Keim D., Kriegel H.-P.: ‘The X-tree: An Index Structure for High-Dimensional Data’, 22nd Conf. on Very Large Databases, 1996, Bombay, India.

    Google Scholar 

  7. Berchtold S., Keim D., Kriegel H.-P.: ‘Fast Searching for Partial Similarity in Polygon Databases’, VLDB Journal, Dec. 1997.

    Google Scholar 

  8. Ciacia P., Patella M., Zezula P.: ‘A Cost Model for Similarity Queries in Metric Spaces’, Proc. ACM PODS Int. Conf. on Principals of Databases, Seattle, WA, 1998, pp. 59–68.

    Google Scholar 

  9. Cleary J. G.: ‘Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space’, ACM Transactions on Mathematical Software, Vol. 5, No. 2, June 1979, pp.183–192.

    Article  MATH  MathSciNet  Google Scholar 

  10. Faloutsos C., Barber R., Flickner M., Hafner J., et al.: ‘Efficient and Effective Querying by Image Content’, Journal of Intelligent Information Systems, 1994, Vol. 3, pp. 231–262.

    Article  Google Scholar 

  11. Friedman J. H., Bentley J. L., Finkel R. A.: “An Algorithm for Finding Best Matches in Logarithmic Expected Time”, ACM Transactions on Mathematical Software, Vol. 3, No. 3, September 1977, pp. 209–226.

    Article  MATH  Google Scholar 

  12. Hjaltason G. R., Samet H.: ‘Ranking in Spatial Databases’, Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME, 1995, pp. 83–95.

    Google Scholar 

  13. Katayama N., Satoh S.: ‘The SR-Tree: An Index Structure for High-Dimensional Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997.

    Google Scholar 

  14. Kukich K.: ‘Techniques for Automatically Correcting Words in Text’, ACM Computing Surveys, Vol. 24, No. 4, 1992, pp. 377–440.

    Article  Google Scholar 

  15. Jagadish H. V.: ‘A Retrieval Technique for Similar Shapes’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 208–217.

    Google Scholar 

  16. Lin K., Jagadish H. V., Faloutsos C.: ‘The TV-tree: An Index Structure for High-Dimensional Data’, VLDB Journal, Vol. 3, 1995, pp. 517–542.

    Article  Google Scholar 

  17. Mehrotra R., Gary J. E.: ‘Feature-Based Retrieval of Similar Shapes’, Proc. 9th Int. Conf. on Data Engineering, Vienna, Austria, 1993, pp. 108–115.

    Google Scholar 

  18. Mehrotra R., Gary J. E.: ‘Feature-Index-Based Similar Shape Retrieval’, Proc. of the 3rd Working Conf. on Visual Database Systems, March 1995.

    Google Scholar 

  19. Roussopoulos N., Kelley S., Vincent F.: ‘Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1995, pp. 71–79.

    Google Scholar 

  20. Shawney H., Hafner J.: ‘Efficient Color Histogram Indexing’, Proc. Int. Conf. on Image Processing, 1994, pp. 66–70.

    Google Scholar 

  21. Shoichet B. K., Bodian D. L., Kuntz I. D.: ‘Molecular Docking Using Shape Descriptors’, Journal of Computational Chemistry, Vol. 13, No. 3, 1992, pp. 380–397.

    Article  Google Scholar 

  22. Sproull R.F.: ‘Refinements to Nearest Neighbor Searching in k-Dimensional Trees’, Algorithmica 1991, pp. 579–589.

    Google Scholar 

  23. Wallace T., Wintz P.: ‘An Efficient Three-Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors’, Computer Graphics and Image Processing, Vol. 13, pp. 99–126, 1980.

    Article  Google Scholar 

  24. Weber R., Schek H.-J., Blott S.: ‘A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces’, Proc. Int. Conf. on Very Large Databases, New York, 1998.

    Google Scholar 

  25. White, D., Jain R.: ‘Similarity Indexing with the SS-Tree’, Proc. 12th Int. Conf. on Data Engineering, New Orleans, LA, 1996, pp. 516–523.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Berchtold, S., Böhm, C., Keim, D., Krebs, F., Kriegel, HP. (2001). On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces. In: Van den Bussche, J., Vianu, V. (eds) Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44503-X_28

Download citation

  • DOI: https://doi.org/10.1007/3-540-44503-X_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41456-8

  • Online ISBN: 978-3-540-44503-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics