Efficient Approximate Indexing in High-Dimensional Feature Spaces

Santini, Simone

doi:10.1007/978-3-642-41062-8_20

Simone Santini¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8199))

Included in the following conference series:

International Conference on Similarity Search and Applications

1650 Accesses

Abstract

In this paper we present a fast approximate indexing method for high dimensional feature space that uses the error probability as an independent variable.

The idea of the algorithm is to define a low-dimensional feature space in which a significant portion of the inter-distance variance is concentrated, to search for the nearest neighborhood of the query in this space, and then to extend the search by a factor ζ to include a number of objects “near” this nearest neighborhood. We shall show that, under reasonable hypotheses on the distribution of items in the feature space, it is possible to derive a relation between the value ζ and the error probability.

We study the error probability and the complexity of the algorithm, validate the model using a data set of images, and show how the results can be used to design indexing schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching in xed dimensions. Journal of the ACM 45(6), 891–923 (1998)
Article MathSciNet MATH Google Scholar
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The r ^*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD International Conference on the Management of Data (1990)
Google Scholar
Bentley, J.L.: K-D trees for semidynamic point sets. In: Proceedings of the Sixth ACM Annual Symposium on Computational Geometry (1990)
Google Scholar
Burkhard, W.: Interpolation/based index maintenance. In: Proceedings of the ACM Symposium on Principles of database systems (PODS), pp. 76–89 (1983)
Google Scholar
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys (CSUR) 33(3), 273–321 (2001)
Article Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd VLDB (Very Large Data Bases) Conference, Athens, Greece (1997)
Google Scholar
Ciaccia, P., Patella, M.: Pac nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. In: Proceedings of the International Conference on Data Engineering, ICDE (2000)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Article Google Scholar
Gravila, D.M.: The R-Tree index optimization. In: Waugh, T., Healey, R. (eds.) Advances in GIS Research. Taylor & Francis (1994)
Google Scholar
Ordoñez, V., Kulkarni, G., Berg, T.L.: Im2text: Describing images using 1 million captioned photographs. Neural Information Processing Systems (2011)
Google Scholar
Pestov, V.: On the geometry of similarity search: dimensionality curse and concentration of measure. Information Processing Letters 73(1-2), 47–51 (2000)
Article MathSciNet Google Scholar
Seidl, T., Kriegel, H.-P.: Optimal multi-step k-nearest neighbor search. In: Proceedings of the ACM SIGMOD, International Conference on Management of Data, Seattle, USA, pp. 154–165 (1998)
Google Scholar
White, D., Jain, R.: Similarity indexing with the SS-tree. In: Proc. 12th IEEE International Conference on Data Engineering (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Escuela Politécnica Superior, Universidad Autónoma de Madrid, Spain
Simone Santini

Authors

Simone Santini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Database Laboratory, Universidade da Coruña, Spain
Nieves Brisaboa & Oscar Pedreira &
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Pavel Zezula

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santini, S. (2013). Efficient Approximate Indexing in High-Dimensional Feature Spaces. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds) Similarity Search and Applications. SISAP 2013. Lecture Notes in Computer Science, vol 8199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41062-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-41062-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41061-1
Online ISBN: 978-3-642-41062-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics