Enhancing Similarity Search Throughput by Dynamic Query Reordering

Nalepa, Filip; Batko, Michal; Zezula, Pavel

doi:10.1007/978-3-319-44406-2_14

Filip Nalepa¹⁵,
Michal Batko¹⁵ &
Pavel Zezula¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9828))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

919 Accesses
3 Citations

Abstract

A lot of multimedia data are being created nowadays, which can only be searched by content since no searching metadata are available for them. To make the content search efficient, similarity indexing structures based on the metric-space model can be used. In our work, we focus on a scenario where the similarity search is used in the context of stream processing. In particular, there is a potentially infinite sequence (stream) of query objects, and a query needs to be executed for each of them. The goal is to maximize the throughput of processed queries while maintaining an acceptable delay. We propose an approach based on dynamic reordering of the incoming queries combined with caching of recent results. We were able to achieve up to 3.7 times higher throughput compared to the base case when no reordering and caching is used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Batko, M., Novak, D., Zezula, P.: MESSIF: metric similarity search implementation framework. In: Thanos, C., Borri, F., Candela, L. (eds.) Digital Libraries: Research and Development. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007)
Chapter Google Scholar
Budikova, P., Batko, M., Zezula, P.: Evaluation platform for content-based image retrieval systems. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 130–142. Springer, Heidelberg (2011)
Chapter Google Scholar
Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: Similarity caching in large-scale image retrieval. Inf. Process. Manage. 48(5), 803–818 (2012)
Article Google Scholar
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: A survey of classification methods in data streams. In: Aggarwal, C.C. (ed.) Data Streams: Models and Algorithms. Advances in Database Systems, vol. 31, pp. 39–59. Springer, Heidelberg (2007)
Chapter Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Google Scholar
Law, Y.-N., Zaniolo, C.: An adaptive nearest neighbor classification algorithm for data streams. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 108–120. Springer, Heidelberg (2005)
Chapter Google Scholar
Mera, D., Batko, M., Zezula, P.: Towards fast multimedia feature extraction: hadoop or storm. In: 2014 IEEE International Symposium on Multimedia (ISM), pp. 106–109. IEEE (2014)
Google Scholar
Nalepa, F., Batko, M., Zezula, P.: Performance analysis of distributed stream processing applications through colored petri nets. In: Kofron, J., Vojnar, T. (eds.) MEMICS 2015. LNCS, vol. 9548, pp. 93–106. Springer, Heidelberg (2016). doi:10.1007/978-3-319-29817-7_9
Chapter Google Scholar
Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36(4), 721–733 (2011)
Article Google Scholar
Pandey, S., Broder, A., Chierichetti, F., Josifovski, V., Kumar, R., Vassilvitskii, S.: Nearest-neighbor caching for content-match applications. In: Proceedings of the 18th International Conference on World Wide Web, pp. 441–450. ACM (2009)
Google Scholar
Pietruczuk, L., Duda, P., Jaworski, M.: A new fuzzy classifier for data streams. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 318–324. Springer, Heidelberg (2012)
Chapter Google Scholar
Shao, J., Huang, Z., Shen, H.T., Zhou, X., Lim, E.P., Li, Y.: Batch nearest neighbor search for video retrieval. IEEE Trans. Multimed. 10(3), 409–420 (2008)
Article Google Scholar
Tao, Y., Papadias, D., Shen, Q.: Continuous nearest neighbor search. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB Endowment, pp. 287–298 (2002)
Google Scholar
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer, Heidelberg (2006)
MATH Google Scholar
Zhang, P., Zhou, C., Wang, P., Gao, B.J., Zhu, X., Guo, L.: E-tree: an efficient indexing structure for ensemble models on data streams. IEEE Trans. Knowl. Data Eng. 27(2), 461–474 (2015)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Czech national research project GA16-18889S.

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Filip Nalepa, Michal Batko & Pavel Zezula

Authors

Filip Nalepa
View author publications
You can also search for this author in PubMed Google Scholar
Michal Batko
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Zezula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Filip Nalepa .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington, Wellington, New Zealand
Hui Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nalepa, F., Batko, M., Zezula, P. (2016). Enhancing Similarity Search Throughput by Dynamic Query Reordering. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9828. Springer, Cham. https://doi.org/10.1007/978-3-319-44406-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-44406-2_14
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44405-5
Online ISBN: 978-3-319-44406-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics