Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

High-Dimensional Indexing

  • Christian Böhm
  • Claudia Plant
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_804

Synonyms

Indexing for similarity search

Definition

The term high-dimensional indexing [6, 9] subsumes all techniques for indexing vector spaces addressing problems which are specific in the context of high-dimensional data spaces, and all optimization techniques to improve index structures, and the algorithms for various variants of similarity search (nearest neighbor, reverse nearest neighbor queries, range queries, similarity joins, etc.) for high-dimensional spaces. The well-known curse of dimensionality leads to a worsening of the index selectivity with increasing dimensionality of the data space, an effect which already starts at dimensions of 10–15, also depending on the size of the database and the data distribution (clustering, attribute dependencies). During query processing, large parts of conventional hierarchical indexes (e.g., R-tree) need to be randomly accessed, which is by a factor of up to 20 more expensive than sequential reading operations. Therefore, specialized...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Berchtold S, Böhm C, Kriegel H-P. The pyramid-technique: towards breaking the curse of dimensionality. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 142–53.Google Scholar
  2. 2.
    Berchtold S, Böhm C, Jagadish HV, Kriegel HP, Sander J. Independent quantization: an index compression technique for high-dimensional data spaces. In: Proceedings of the 16th International Conference on Data Engineering; 2000. p. 577–88.Google Scholar
  3. 3.
    Berchtold S, Böhm C, Keim DA, Kriegel H-P. A cost model for nearest neighbor search in high-dimensional data space. In: Proceedings of the 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 1997. p. 78–86.Google Scholar
  4. 4.
    Berchtold S, Böhm C, Keim DA, Kriegel H-P, Xu X. Optimal multidimensional query processing using tree striping. In: Proceedings of the 2nd International Conference Data Warehousing and Knowledge Discovery; 2000. p. 244–57.CrossRefGoogle Scholar
  5. 5.
    Berchtold S, Keim DA, Kriegel H-P. The x-tree: an index structure for high-dimensional data. In: Proceedings of the 22nd International Conference on Very Large Data Bases; 1996. p. 28–39.Google Scholar
  6. 6.
    Beyer KS, Goldstein J, Ramakrishnan R, Shaft U. When is “nearest neighbor” meaningful? In: Proceedings of the 7th International Conference on Database Theory; 1999. p. 217–35.Google Scholar
  7. 7.
    Böhm C. A cost model for query processing in high dimensional data spaces. ACM Trans Database Syst. 2000;25(2):129–78.CrossRefGoogle Scholar
  8. 8.
    Böhm C, Kriegel H-P. Dynamically optimizing high-dimensional index structures. In: Advances in Database Technology, Proceedings of the 7th International Conference on Extending Database Technology; 2000. p. 36–50.CrossRefGoogle Scholar
  9. 9.
    Böhm C, Berchtold S, Keim DA. Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv. 2001;33(3):322–73.CrossRefGoogle Scholar
  10. 10.
    Chang Y-C, Bergman LD, Castelli V, Li C-S, Lo M-L, Smith JR. The onion technique: indexing for linear optimization queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2000. p. 391–402.CrossRefGoogle Scholar
  11. 11.
    Cui B, Ooi BC, Su J, Tan KL. Indexing high-dimensional data for efficient in-memory similarity search. IEEE Trans Knowl Data Eng (TKDE). 2005;17(3):339–53.CrossRefGoogle Scholar
  12. 12.
    Ferhatosmanoglu H, Agrawal D, Abbadi AE. Concentric hyperspaces and disk allocation for fast parallel range searching. In: Proceedings of the 15th International Conference on Data Engineering; 1999. p. 608–15.Google Scholar
  13. 13.
    Günnemann S, Kremer H, Lenhard D Seidl T. Subspace clustering for indexing high dimensional data: a main memory index based on local reductions and individual multi-representations. In: Proceedings of the International Conference on Extending Database Technology; 2011. p. 237–48.Google Scholar
  14. 14.
    Guttman A. R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1984. p. 47–57.Google Scholar
  15. 15.
    Heisterkamp DR, Peng J. Kernel vector approximation files for relevance feedback retrieval in large image databases. Multimed Tools Appl. 2005;26(2):175–89.CrossRefGoogle Scholar
  16. 16.
    Jin H, Ooi BC, Shen HT, Yu C, Zhou A. An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing. In: Proceedings of the 19th International Conference on Data Engineering; 2003. p. 87–98.Google Scholar
  17. 17.
    Katayama N, Satoh S. The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1997. p. 369–80.Google Scholar
  18. 18.
    Kim C, Chhugani J, Satish N, Sedlar E, Nguyen AD, Kaldewey T, Lee VW, Brandt SA, Dubey P. FAST: fast architecture sensitive tree search on modern CPUs and GPUs. In: Proceeding of the ACM SIGMOD International Conference on Management of Data; 2010. p. 339–50.Google Scholar
  19. 19.
    Leis V, Kemper A, Neumann T. The adaptive radix tree: ARTful indexing for main-memory databases. In: Proceedings of the International Conference on Data Engineering; 2013. p. 38–49.Google Scholar
  20. 20.
    Levandoski JJ, Lomet DB Sengupta S. The Bw-Tree: a B-tee for new hardware platforms. In: Proceedings of the 29th International Conference on Data Engineering; 2013. p. 302–13.Google Scholar
  21. 21.
    Lin K-I, Jagadish HV, Faloutsos C. The tv-tree: an index structure for high-dimensional data. VLDB J. 1994;3(4):517–42.CrossRefGoogle Scholar
  22. 22.
    Moise D, Shestakov D, Gudmundsson G, Amsaleg A. Indexing and searching 100M images with map-reduce. In: Proceedings of the 3rd ACM International Conference on Multimedia Retrieval; 2013. p. 17–24.Google Scholar
  23. 23.
    Sakurai Y, Yoshikawa M, Uemura S, Kojima H. The A-tree: an index structure for high-dimensional spaces using relative approximation. In: Proceedings of the 26th International Conference on Very Large Data Bases; 2000. p. 516–26.Google Scholar
  24. 24.
    Weber R, Böhm K, Schek H-J. Interactive-time similarity search for large image collections using parallel VA-files. In: Proceedings of the 4th European Conference Research and Advanced Technology for Digital Libraries; 2000. p. 83–92.CrossRefGoogle Scholar
  25. 25.
    Weber R, Schek H-J, Blott S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24th International Conference on Very Large Data Bases; 1998. p. 194–205.Google Scholar
  26. 26.
    White DA, Jain R. Similarity indexing with the ss-tree. In: Proceedings of the 12th International Conference on Data Engineering; 1996. p. 516–23.Google Scholar
  27. 27.
    Yu C, Ooi BC, Tan K-L, Jagadish HV. Indexing the distance: an efficient method to KNN processing. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 421–30.Google Scholar
  28. 28.
    Wang J, Wu S, Gao H, Li J, Ooi B.C. Indexing multi-dimensional data in a cloud system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2010. p. 591–602.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of MunichMunichGermany
  2. 2.University of ViennaViennaAustria

Section editors and affiliations

  • Dimitris Papadias
    • 1
  1. 1.Dept. of Computer Science and Eng.Hong Kong Univ. of Science and TechnologyKowloonHong Kong SAR