Skip to main content

Efficient Approximate Indexing in High-Dimensional Feature Spaces

  • Conference paper
Similarity Search and Applications (SISAP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8199))

Included in the following conference series:

  • 1650 Accesses

Abstract

In this paper we present a fast approximate indexing method for high dimensional feature space that uses the error probability as an independent variable.

The idea of the algorithm is to define a low-dimensional feature space in which a significant portion of the inter-distance variance is concentrated, to search for the nearest neighborhood of the query in this space, and then to extend the search by a factor ζ to include a number of objects “near” this nearest neighborhood. We shall show that, under reasonable hypotheses on the distribution of items in the feature space, it is possible to derive a relation between the value ζ and the error probability.

We study the error probability and the complexity of the algorithm, validate the model using a data set of images, and show how the results can be used to design indexing schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching in xed dimensions. Journal of the ACM 45(6), 891–923 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  2. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The r *-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD International Conference on the Management of Data (1990)

    Google Scholar 

  3. Bentley, J.L.: K-D trees for semidynamic point sets. In: Proceedings of the Sixth ACM Annual Symposium on Computational Geometry (1990)

    Google Scholar 

  4. Burkhard, W.: Interpolation/based index maintenance. In: Proceedings of the ACM Symposium on Principles of database systems (PODS), pp. 76–89 (1983)

    Google Scholar 

  5. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys (CSUR) 33(3), 273–321 (2001)

    Article  Google Scholar 

  6. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd VLDB (Very Large Data Bases) Conference, Athens, Greece (1997)

    Google Scholar 

  7. Ciaccia, P., Patella, M.: Pac nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. In: Proceedings of the International Conference on Data Engineering, ICDE (2000)

    Google Scholar 

  8. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  9. Gravila, D.M.: The R-Tree index optimization. In: Waugh, T., Healey, R. (eds.) Advances in GIS Research. Taylor & Francis (1994)

    Google Scholar 

  10. Ordoñez, V., Kulkarni, G., Berg, T.L.: Im2text: Describing images using 1 million captioned photographs. Neural Information Processing Systems (2011)

    Google Scholar 

  11. Pestov, V.: On the geometry of similarity search: dimensionality curse and concentration of measure. Information Processing Letters 73(1-2), 47–51 (2000)

    Article  MathSciNet  Google Scholar 

  12. Seidl, T., Kriegel, H.-P.: Optimal multi-step k-nearest neighbor search. In: Proceedings of the ACM SIGMOD, International Conference on Management of Data, Seattle, USA, pp. 154–165 (1998)

    Google Scholar 

  13. White, D., Jain, R.: Similarity indexing with the SS-tree. In: Proc. 12th IEEE International Conference on Data Engineering (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Santini, S. (2013). Efficient Approximate Indexing in High-Dimensional Feature Spaces. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds) Similarity Search and Applications. SISAP 2013. Lecture Notes in Computer Science, vol 8199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41062-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41062-8_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41061-1

  • Online ISBN: 978-3-642-41062-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics