Skip to main content

Efficient Similarity Search by Combining Indexing and Caching Strategies

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8939))

Abstract

A critical issue in large scale search engines is to efficiently handle sudden peaks of incoming query traffic. Research in metric spaces has addressed this problem from the point of view of creating caches that provide information to, if possible, exactly/approximately answer a query very quickly without needing to further process an index. However, one of the problems of that approach is that, if the cache is not able to provide an answer, the distances computed up to that moment are wasted, and the search must proceed through the index structure. In this paper we present an index structure that serves a twofold role: that of a cache and an index in the same structure. In this way, if we are not able to provide a quick approximate answer for the query, the distances computed up to that moment are used to query the index. We present an experimental evaluation of the performance obtained with our structure.

Partially funded by: MICIN ref. TIN2009-14560-C03-02 (PGE & FEDER), Xunta de Galicia ref. GRC2013/053 (FEDER), and CDTI-MINECO-Axencia Galega de Innovación EXP 00064563/ITC-20133062 for authors in UDC1. FONDEF IDeA CA12i10314 for M. Marín. Mincyt-Conicyt CH1204 for V. Gil-Costa.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: Caching content-based queries for robust and efficient image retrieval. In: Procs. of EDBT, pp. 780–790 (2009)

    Google Scholar 

  2. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Procs. of VLDB, pp. 426–435 (1997)

    Google Scholar 

  3. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys 33, 273–321 (2001)

    Article  Google Scholar 

  4. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity search. The metric space approach. Advances in Database Systems, vol. 32. Springer (2006)

    Google Scholar 

  5. Chavez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognition Letters 26(9), 1363–1376 (2005)

    Article  Google Scholar 

  6. Gil-Costa, V., Marin, M., Reyes, N.: Parallel query processing on distributed clustering indexes. Journal of Discrete Algorithms 7(1), 3–17 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  7. Pedreira, O., Brisaboa, N.R.: Spatial selection of sparse pivots for similarity search in metric spaces. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds.) SOFSEM 2007. LNCS, vol. 4362, pp. 434–445. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Bustos, B., Pedreira, O., Brisaboa, N.: A dynamic pivot selection technique for similarity search in metric spaces. In: Procs. of SISAP, pp. 105–112. IEEE Press (2008)

    Google Scholar 

  9. Ares, L.G., Brisaboa, N.R., Esteller, M.F., Pedreira, O., Ángeles, S.: Places: Optimal pivots to minimize the index size for metric access methods. In: Procs. of SISAP, pp. 74–80. IEEE Press (2009)

    Google Scholar 

  10. Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: A metric cache for similarity search. In: Procs. of LSDS-IR, pp. 43–50 (2008)

    Google Scholar 

  11. Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: Similarity caching in large-scale image retrieval. Information Processing and Management (2011)

    Google Scholar 

  12. Skopal, T., Lokoc, J., Bustos, B.: D-cache: Universal distance cache for metric access methods. Transactions on Knowledge and Data Engineering 99 (2011)

    Google Scholar 

  13. Barrios, J., Bustos, B., Skopal, T.: Snake table: A dynamic pivot table for streams of k-nn searches. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 25–39. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Marin, M., Gil-Costa, V., Bonacic, C.: A search engine index for multimedia content. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 866–875. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Figueroa, K., Navarro, G., Chávez, E.: Metric spaces library (2007), http://www.sisap.org/Metric_Space_Library.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brisaboa, N.R., Cerdeira-Pena, A., Gil-Costa, V., Marin, M., Pedreira, O. (2015). Efficient Similarity Search by Combining Indexing and Caching Strategies. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, JJ., Wattenhofer, R. (eds) SOFSEM 2015: Theory and Practice of Computer Science. SOFSEM 2015. Lecture Notes in Computer Science, vol 8939. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46078-8_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46078-8_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46077-1

  • Online ISBN: 978-3-662-46078-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics