Skip to main content

Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 11250))

Abstract

Current era of digital data explosion calls for employment of content-based similarity search techniques, since traditional searchable metadata like annotations are not always available. In our work, we focus on a scenario where the similarity search is used in the context of stream processing, which is one of the suitable approaches to deal with huge amounts of data. Our goal is to maximize the throughput of processed queries while a slight delay is acceptable. We propose a technique that dynamically reorders the queries coming from the stream in order to use our caching mechanism in huge data spaces more effectively. We were able to achieve significantly higher throughput compared to the baseline when no reordering and no caching were used. Moreover, our proposal does not incur any additional precision loss of the similarity search, as opposed to some other caching techniques. In addition to the throughput maximization, we also study the potential of trading off the throughput for low delays (waiting times). The proposed technique allows to be parameterized by the amount of the throughput that can be sacrificed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Amato, G., Esuli, A., Falchi, F.: A comparison of pivot selection techniques for permutation-based indexing. Inf. Syst. 52, 176–188 (2015)

    Article  Google Scholar 

  2. Barrios, J.M., Bustos, B., Skopal, T.: Analyzing and dynamically indexing the query set. Inf. Syst. 45, 37–47 (2014)

    Article  Google Scholar 

  3. Batko, M., Novak, D., Zezula, P.: MESSIF: metric similarity search implementation framework. In: Thanos, C., Borri, F., Candela, L. (eds.) DELOS 2007. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77088-6_1

    Chapter  Google Scholar 

  4. Bellmore, M., Nemhauser, G.L.: The traveling salesman problem: a survey. Oper. Res. 16(3), 538–558 (1968)

    Article  MathSciNet  Google Scholar 

  5. Brisaboa, N.R., Cerdeira-Pena, A., Gil-Costa, V., Marin, M., Pedreira, O.: Efficient similarity search by combining indexing and caching strategies. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, J.-J., Wattenhofer, R. (eds.) SOFSEM 2015. LNCS, vol. 8939, pp. 486–497. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46078-8_40

    Chapter  MATH  Google Scholar 

  6. Budikova, P., Batko, M., Zezula, P.: Evaluation platform for content-based image retrieval systems. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 130–142. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24469-8_15

    Chapter  Google Scholar 

  7. Chávez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Trans. Patt. Anal. Mach. Intell. 30(9), 1647–1658 (2008)

    Article  Google Scholar 

  8. Chung, Y., Su, I., Lee, C., Liu, P.: Multiple k nearest neighbor search. World Wide Web 20(2), 371–398 (2017)

    Article  Google Scholar 

  9. Fagni, T., Perego, R., Silvestri, F., Orlando, S.: Boosting the performance of web search engines: caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 24(1), 51–78 (2006)

    Article  Google Scholar 

  10. Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: Similarity caching in large-scale image retrieval. Inf. Process. Manage. 48(5), 803–818 (2012)

    Article  Google Scholar 

  11. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, MM 2014, Orlando, FL, USA, 03–07 November 2014, pp. 675–678. ACM (2014)

    Google Scholar 

  12. Karedla, R., Love, J.S., Wherry, B.G.: Caching strategies to improve disk system performance. IEEE Comput. 27(3), 38–46 (1994)

    Article  Google Scholar 

  13. Laporte, G.: The traveling salesman problem: an overview of exact and approximate algorithms. Eur. J. Oper. Res. 59(2), 231–247 (1992)

    Article  MathSciNet  Google Scholar 

  14. Nalepa, F., Batko, M., Zezula, P.: Enhancing similarity search throughput by dynamic query reordering. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9828, pp. 185–200. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44406-2_14

    Chapter  Google Scholar 

  15. Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36(4), 721–733 (2011)

    Article  Google Scholar 

  16. Pandey, S., Broder, A.Z., Chierichetti, F., Josifovski, V., Kumar, R., Vassilvitskii, S.: Nearest-neighbor caching for content-match applications. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, 20–24 April 2009, pp. 441–450. ACM (2009)

    Google Scholar 

  17. Shao, J., Huang, Z., Shen, H.T., Zhou, X., Lim, E., Li, Y.: Batch nearest neighbor search for video retrieval. IEEE Trans. Multimedia 10(3), 409–420 (2008)

    Article  Google Scholar 

  18. Skopal, T., Lokoc, J., Bustos, B.: D-cache: universal distance cache for metric access methods. IEEE Trans. Knowl. Data Eng. 24(5), 868–881 (2012)

    Article  Google Scholar 

  19. Solar, R., Gil-Costa, V., Marín, M.: Evaluation of static/dynamic cache for similarity search engines. In: Freivalds, R.M., Engels, G., Catania, B. (eds.) SOFSEM 2016. LNCS, vol. 9587, pp. 615–627. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49192-8_50

    Chapter  MATH  Google Scholar 

  20. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity search - the metric space approach. In: Advances in Database Systems, vol. 32. Kluwer (2006)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the Czech national research project GA16-18889S.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filip Nalepa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Nalepa, F., Batko, M., Zezula, P. (2018). Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries. In: Hameurlain, A., Wagner, R., Hartmann, S., Ma, H. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII. Lecture Notes in Computer Science(), vol 11250. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58384-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-58384-5_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-58383-8

  • Online ISBN: 978-3-662-58384-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics