Fusion Strategies for Large-Scale Multi-modal Image Retrieval

Budikova, Petra; Batko, Michal; Zezula, Pavel

doi:10.1007/978-3-662-55696-2_5

Fusion Strategies for Large-Scale Multi-modal Image Retrieval

Petra Budikova¹⁸,
Michal Batko¹⁸ &
Pavel Zezula¹⁸

Chapter
First Online: 08 August 2017

573 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 10430))

Abstract

Large-scale data management and retrieval in complex domains such as images, videos, or biometrical data remains one of the most important and challenging information processing tasks. Even after two decades of intensive research, many questions still remain to be answered before working tools become available for everyday use. In this work, we focus on the practical applicability of different multi-modal retrieval techniques. Multi-modal searching, which combines several complementary views on complex data objects, follows the human thinking process and represents a very promising retrieval paradigm. However, a rapid development of modality fusion techniques in several diverse directions and a lack of comparisons between individual approaches have resulted in a confusing situation when the applicability of individual solutions is unclear. Aiming at improving the research community’s comprehension of this topic, we analyze and systematically categorize existing multi-modal search techniques, identify their strengths, and describe selected representatives. In the second part of the paper, we focus on the specific problem of large-scale multi-modal image retrieval on the web. We analyze the requirements of such task, implement several applicable fusion methods, and experimentally evaluate their performance in terms of both efficiency and effectiveness. The extensive experiments provide a unique comparison of diverse approaches to modality fusion in equal settings on two large real-world datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Abu-Shareha, A.A., Mandava, R., Khan, L., Ramachandram, D.: Multimodal concept fusion using semantic closeness for image concept disambiguation. Multimedia Tools Appl. 61(1), 69–86 (2011). doi:10.1007/s11042-010-0707-8
Article Google Scholar
Ah-Pine, J., Csurka, G., Clinchant, S.: Unsupervised visual and textual information fusion in CBMIR using graph-based methods. ACM Trans. Inform. Syst. 33(2), 9:1–9:31 (2015). doi:10.1145/2699668
Andrade, F.S.P., Almeida, J., Pedrini, H., S.Torres, R.: Fusion of local and global descriptors for content-based image and video retrieval. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) CIARP 2012. LNCS, vol. 7441, pp. 845–853. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33275-3_104
Chapter Google Scholar
Arampatzis, A., Zagoris, K., Chatzichristofis, S.A.: Dynamic two-stage image retrieval from large multimodal databases. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 326–337. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20161-5_33
Chapter Google Scholar
Atrey, P.K., Hossain, M.A., El-Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010). doi:10.1007/s00530-010-0182-0
Article Google Scholar
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval - The Concepts and Technology Behind Search, 2nd edn. Pearson Education Ltd., Harlow (2011)
Google Scholar
Barrios, J.M., Bustos, B.: Automatic weight selection for multi-metric distances. In: Proceedings of the 4th International Conference on Similarity Search and Applications (SISAP 2011), pp. 61–68 (2011). doi:10.1145/1995412.1995425
Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., Sedmidubsky, J., Zezula, P.: Building a web-scale image similarity search system. Multimedia Tools Appl. 47(3), 599–629 (2010). doi:10.1007/s11042-009-0339-z
Article Google Scholar
Batko, M., Kohoutkova, P., Zezula, P.: Combining metric features in large collections. In: 24th International Conference on Data Engineering Workshops (ICDE 2008), pp. 370–377 (2008). doi:10.1109/ICDEW.2008.4498347
Batko, M., Novak, D., Zezula, P.: MESSIF: metric similarity search implementation framework. In: Thanos, C., Borri, F., Candela, L. (eds.) DELOS 2007. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77088-6_1
Chapter Google Scholar
Benavent, X., Garcia-Serrano, A., Granados, R., Benavent, J., de Ves, E.: Multimedia information retrieval based on late semantic fusion approaches: experiments on a wikipedia image collection. IEEE Trans. Multimedia 15(8), 2009–2021 (2013). doi:10.1109/TMM.2013.2267726
Article Google Scholar
Blanken, H., de Vries, A., Blok, H., Feng, L.: Multimedia Retrieval. Data-Centric Systems and Applications. Springer, Secaucus (2007)
Google Scholar
Bossé, É., Roy, J., Wark, S.: Concepts, Models, and Tools for Information Fusion. Artech House, Inc., Norwood (2007)
Google Scholar
Bozzon, A., Fraternali, P.: Chapter 8: multimedia and multimodal information retrieval. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 5950, pp. 135–155. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12310-8_8
Chapter Google Scholar
Budikova, P., Batko, M., Novak, D., Zezula, P.: Inherent fusion: towards scalable multi-modal similarity search. J. Database Manag. 27(4), 1–23 (2016). doi:10.4018/JDM.2016100101
Article Google Scholar
Budikova, P., Batko, M., Zezula, P.: Evaluation platform for content-based image retrieval systems. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 130–142. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24469-8_15
Chapter Google Scholar
Budikova, P., Batko, M., Zezula, P.: Similarity query postprocessing by ranking. In: Detyniecki, M., Knees, P., Nürnberger, A., Schedl, M., Stober, S. (eds.) AMR 2010. LNCS, vol. 6817, pp. 159–173. Springer, Heidelberg (2012). doi:10.1007/978-3-642-27169-4_12
Chapter Google Scholar
Bustos, B., Kreft, S., Skopal, T.: Adapting metric indexes for searching in multi-metric spaces. Multimedia Tools Appl. 58(3), 467–496 (2012). doi:10.1007/s11042-011-0731-3
Article Google Scholar
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012). doi:10.1145/2071389.2071390
Chatzichristofis, S.A., Zagoris, K., Boutalis, Y., Arampatzis, A.: A fuzzy rank-based late fusion method for image retrieval. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 463–472. Springer, Heidelberg (2012). doi:10.1007/978-3-642-27355-1_43
Chapter Google Scholar
Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. In: The Proceedings of the VLDB Endowment (PVLDB), pp. 217–228 (2013). doi:10.14778/2535569.2448955
Chen, Y., Yu, N., Luo, B., wen Chen, X.: iLike: integrating visual and textual features for vertical search. In: 18th International Conference on Multimedia (ACM Multimedia 2010), pp. 221–230 (2010). doi:10.1145/1873951.1873984
Ciaccia, P., Patella, M.: Searching in metric spaces with user-defined and approximate distances. ACM Trans. Database Syst. 27(4), 398–437 (2002). doi:10.1145/582410.582412
Article Google Scholar
Clinchant, S., Ah-Pine, J., Csurka, G.: Semantic combination of textual and visual information in multimedia retrieval. In: Proceedings of the 1st International Conference on Multimedia Retrieval (ICMR 2011), p. 44 (2011). doi:10.1145/1991996.1992040
Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. Proc. VLDB Endowment (PVLDB) 2(1), 337–348 (2009). doi:10.14778/1687627.1687666
Article Google Scholar
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 5:1–5:60 (2008). doi:10.1145/1348246.1348248
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: ImageNet: a large-scale hierarchical image database. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 248–255 (2009). doi:10.1109/CVPRW.2009.5206848
Depeursinge, A., Müller, H.: Fusion techniques for combining textual and visual information retrieval. In: ImageCLEF. The Kluwer International Series on Information Retrieval, vol. 32, pp. 95–114. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15181-1_6
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 647–655 (2014). http://jmlr.org/proceedings/papers/v32/donahue14.html
Dong, Y., Gao, S., Tao, K., Liu, J., Wang, H.: Performance evaluation of early and late fusion methods for generic semantics indexing. Pattern Anal. Appl. 17(1), 37–50 (2013). doi:10.1007/s10044-013-0336-8
Article MathSciNet Google Scholar
Eickhoff, C., Li, W., Vries, A.P.: Exploiting user comments for audio-visual content indexing and retrieval. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 38–49. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_4
Chapter Google Scholar
Escalante, H.J., Montes, M., Sucar, L.E.: Multimodal indexing based on semantic cohesion for image retrieval. Inform. Retrieval 15(1), 1–32 (2012). doi:10.1007/s10791-011-9170-z
Fagin, R.: Combining fuzzy information: an overview. SIGMOD Rec. 31(2), 109–118 (2002). doi:10.1145/565117.565143
Article Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998)
MATH Google Scholar
Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011). doi:10.1109/TMM.2010.2098858
Article Google Scholar
Ha, H., Yang, Y., Fleites, F., Chen, S.: Correlation-based feature analysis and multi-modality fusion framework for multimedia semantic retrieval. In: Proceedings of the 2013 IEEE International Conference on Multimedia and Expo (ICME 2013), pp. 1–6 (2013). doi:10.1109/ICME.2013.6607639
Hemayati, R., Meng, W., Yu, C.: Semantic-based grouping of search engine results using wordnet. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM -2007. LNCS, vol. 4505, pp. 678–686. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72524-4_70
Chapter Google Scholar
Hoque, E., Strong, G., Hoeber, O., Gong, M.: Conceptual query expansion and visual search results exploration for web image retrieval. In: 7th Atlantic Web Intelligence Conference (AWIC 2011), pp. 73–82 (2011). doi:10.1007/978-3-642-18029-3_8
Hörster, E., Slaney, M., Ranzato, M., Weinberger, K.: Unsupervised image ranking. In: 1st ACM Workshop on Large-Scale Multimedia Retrieval and Mining (LS-MMRM 2009), pp. 81–88 (2009). doi:10.1145/1631058.1631074
Hsu, W.H., Kennedy, L.S., Chang, S.F.: Reranking methods for visual search. IEEE Multimedia 14(3), 14–22 (2007). doi:10.1109/MMUL.2007.61
Article Google Scholar
Jain, R., Sinha, P.: Content without context is meaningless. In: International Conference on Multimedia (ACM Multimedia 2010), pp. 1259–1268. ACM (2010). doi:10.1145/1873951.1874199
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inform. Syst. 20(4), 422–446 (2002). doi:10.1145/582415.582418
Article Google Scholar
Jegou, H., Schmid, C., Harzallah, H., Verbeek, J.J.: Accurate image search using the contextual dissimilarity measure. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 2–11 (2010). doi:10.1109/TPAMI.2008.285
Article Google Scholar
Jing, Y., Baluja, S.: VisualRank: applying PageRank to large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1877–1890 (2008). doi:10.1109/TPAMI.2008.121
Article Google Scholar
Khasanova, R., Dong, X., Frossard, P.: Multi-modal image retrieval with random walk on multi-layer graphs. In: IEEE International Symposium on Multimedia (ISM 2016), pp. 1–6 (2016). doi:10.1109/ISM.2016.0011
Kherfi, M.L., Ziou, D., Bernardi, A.: Image retrieval from the World Wide Web: Issues, techniques, and systems. ACM Comput. Surv. 36(1), 35–67 (2004). doi:10.1145/1013208.1013210
Article Google Scholar
Kludas, J., Bruno, E., Marchand-Maillet, S.: Information fusion in multimedia information retrieval. In: Boujemaa, N., Detyniecki, M., Nürnberger, A. (eds.) AMR 2007. LNCS, vol. 4918, pp. 147–159. Springer, Heidelberg (2008). doi:10.1007/978-3-540-79860-6_12
Chapter Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: 26th Annual Conference on Neural Information Processing Systems (NIPS 2012), pp. 1106–1114 (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
Lai, K., Liu, D., Chang, S., Chen, M.: Learning sample specific weights for late fusion. IEEE Trans. Image Process. 24(9), 2772–2783 (2015). doi:10.1109/TIP.2015.2423560
Article MathSciNet Google Scholar
Lan, Z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Double fusion for multimedia event detection. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 173–185. Springer, Heidelberg (2012). doi:10.1007/978-3-642-27355-1_18
Chapter Google Scholar
Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: State of the art and challenges. TOMCCAP 2(1), 1–19 (2006). doi:10.1145/1126004.1126005
Article Google Scholar
Li, J.: Reachability based ranking in interactive image retrieval. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015), pp. 867–870 (2015). doi:10.1145/2766462.2767777
Li, J., Ma, Q., Asano, Y., Yoshikawa, M.: Re-ranking by multi-modal relevance feedback for content-based social image retrieval. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds.) APWeb 2012. LNCS, vol. 7235, pp. 399–410. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29253-8_34
Chapter Google Scholar
Liu, Y., Mei, T., Hua, X.S.: CrowdReranking: exploring multiple search engines for visual search reranking. In: 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), pp. 500–507 (2009). doi:10.1145/1571941.1572027
Lokoč, J., Novák, D., Batko, M., Skopal, T.: Visual image search: feature signatures or/and global descriptors. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 177–191. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32153-5_13
Chapter Google Scholar
Ma, D., Yu, Z.: New video target tracking algorithm based on KNN. J. Multimedia 9(5), 709–714 (2014). doi:10.4304/jmm.9.5.709-714
Article Google Scholar
Magalhães, J., Rüger, S.: An information-theoretic framework for semantic-multimedia retrieval. ACM Trans. Inform. Syst. 28(4), 1–32 (2010). doi:10.1145/1852102.1852105
May, W., Fidler, S., Fazly, A.: Unsupervised disambiguation of image captions. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics (SemEval 2012), pp. 85–89, June 2012. http://dl.acm.org/citation.cfm?id=2387636.2387652
McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in Action: Covers Apache Lucene V. 3. 0. Manning Pubs Co Series, Manning (2010)
Google Scholar
Mei, T., Rui, Y., Li, S., Tian, Q.: Multimedia search reranking. ACM Comput. Surv. 46(3), 1–38 (2014). doi:10.1145/2536798
Article Google Scholar
Mironica, I., Ionescu, B., Vertan, C.: Hierarchical clustering relevance feedback for content-based image retrieval. In: 10th International Workshop on Content-Based Multimedia Indexing (CBMI 2012), pp. 1–6 (2012). doi:10.1109/CBMI.2012.6269811
MPEG-7: Multimedia content description interfaces. Part 3: Visual. ISO/IEC 15938–3:2002 (2002)
Google Scholar
Müller, H., Clough, P., Deselaers, T., Caputo, B.: ImageCLEF: Experimental Evaluation in Visual Information Retrieval, 1st edn. Springer, Heidelberg (2010)
Book MATH Google Scholar
Nga, D.H., Yanai, K.: VisualTextualRank: an extension of VisualRank to large-scale video shot extraction exploiting tag co-occurrence. IEICE Trans. Inform. Syst. 98-D(1), 166–172 (2015). http://search.ieice.org/bin/summary.php?id=e98-d_1_166
Novák, D.: Multi-modal similarity retrieval with distributed key-value store. Mob. Networks Appl. 20(4), 521–532 (2015). doi:10.1007/s11036-014-0561-4
Article Google Scholar
Novák, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inform. Syst. 36(4), 721–733 (2011). doi:10.1016/j.is.2010.10.002
Article Google Scholar
Oh, S., McCloskey, S., Kim, I., Vahdat, A., Cannons, K.J., Hajimirsadeghi, H., Mori, G., Perera, A.G.A., Pandey, M., Corso, J.J.: Multimedia event detection with multimodal feature fusion and temporal concept localization. Mach. Vis. Appl. 25(1), 49–69 (2013). doi:10.1007/s00138-013-0525-x
Article Google Scholar
Park, G., Baek, Y., Lee, H.K.: Web image retrieval using majority-based ranking approach. Multimedia Tools Appl. 31(2), 195–219 (2006). doi:10.1007/s11042-006-0039-x
Article Google Scholar
Patella, M., Ciaccia, P.: Approximate similarity search: a multi-faceted problem. J. Discrete Algorithms 7(1), 36–48 (2009). doi:10.1016/j.jda.2008.09.014
Article MathSciNet MATH Google Scholar
Pedronette, D.C.G., da Silva Torres, R.: Combining re-ranking and rank aggregation methods for image retrieval. Multimedia Tools Appl. 75(15), 9121–9144 (2016). doi:10.1007/s11042-015-3044-0
Pham, T.T., Maillot, N., Lim, J.H., Chevallet, J.P.: Latent semantic fusion model for image retrieval and annotation. In: Sixteenth ACM Conference on Information and Knowledge Management (CIKM 2007), pp. 439–444 (2007). doi:10.1145/1321440.1321503
Pulla, C., Jawahar, C.V.: Multi modal semantic indexing for image retrieval. In: 9th ACM International Conference on Image and Video Retrieval (CIVR 2010), pp. 342–349 (2010). doi:10.1145/1816041.1816091
Qi, S., Wang, F., Wang, X., Guan, Y., Wei, J., Guan, J.: Multiple level visual semantic fusion method for image re-ranking. Multimedia Syst. 23(1), 155–167 (2017). doi:10.1007/s00530-014-0448-z
Article Google Scholar
Richter, F., Romberg, S., Hörster, E., Lienhart, R.: Multimodal ranking for image search on community databases. In: Proceedings of the International Conference on Multimedia Information Retrieval (MIR 2010), pp. 63–72 (2010). doi:10.1145/1743384.1743402
Rokach, L.: Taxonomy for characterizing ensemble methods in classification tasks: a review and annotated bibliography. Comput. Stat. Data Anal. 53(12), 4046–4072 (2009). doi:10.1016/j.csda.2009.07.017
Article MathSciNet MATH Google Scholar
Ross, A., Jain, A.K.: Multimodal biometrics: an overview. In: 12th European Signal Processing Conference, pp. 1221–1224 (2004). http://ieeexplore.ieee.org/abstract/document/7080214/
Rui, Y., Huang, T., Ortega, M., Mehrotra, S.: Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans. Circuits Syst. Video Technol. 8(5), 644–655 (1998). http://ieeexplore.ieee.org/abstract/document/718510/
Article Google Scholar
Safadi, B., Sahuguet, M., Huet, B.: When textual and visual information join forces for multimedia retrieval. In: International Conference on Multimedia Retrieval (ICMR 2014), p. 265 (2014). doi:10.1145/2578726.2578760
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Computer Graphics and Geometric Modeling. Morgan Kaufmann Publishers Inc. (2005)
Google Scholar
Santos, J.M., Cavalcanti, J.M.B., Saraiva, P.C., Moura, E.S.: Multimodal re-ranking of product image search results. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 62–73. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_6
Chapter Google Scholar
Santos, E., Gu, Q.: Automatic content based image retrieval using semantic analysis. J. Intell. Inform. Syst. 43(2), 247–269 (2014). doi:10.1007/s10844-014-0321-8
Siddiquie, B., White, B., Sharma, A., Davis, L.S.: Multi-modal image retrieval for complex queries using small codes. In: International Conference on Multimedia Retrieval (ICMR 2014), p. 321 (2014). doi:10.1145/2578726.2578767
Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–1380 (2000). doi:10.1109/34.895972
Article Google Scholar
Snoek, C., Worring, M., Smeulders, A.W.M.: Early versus late fusion in semantic video analysis. In: 13th ACM International Conference on Multimedia (ACM Multimedia), pp. 399–402 (2005). doi:10.1145/1101149.1101236
Sugiyama, Y., Kato, M.P., Ohshima, H., Tanaka, K.: Relative relevance feedback in image retrieval. In: International Conference on Multimedia and Expo (ICME 2012), pp. 272–277 (2012). doi:10.1109/ICME.2012.161
Tollari, S., Detyniecki, M., Marsala, C., Fakeri-Tabrizi, A., Amini, M.-R., Gallinari, P.: Exploiting visual concepts to improve text-based image retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 701–705. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00958-7_70
Chapter Google Scholar
Tran, T., Phung, D., Venkatesh, S.: Learning sparse latent representation and distance metric for image retrieval. In: IEEE International Conference on Multimedia and Expo (ICME 2013), pp. 1–6. IEEE (2013). doi:10.1109/ICME.2013.6607435
Uluwitige, D., Chappell, T., Geva, S., Chandran, V.: Improving retrieval quality using pseudo relevance feedback in content-based image retrieval. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016), pp. 873–876 (2016). doi:10.1145/2911451.2914747
Wang, L., Yang, L., Tian, X.: Query aware visual similarity propagation for image search reranking. In: ACM Multimedia 2009, pp. 725–728 (2009). doi:10.1145/1631272.1631398
Wang, W., Yang, X., Ooi, B.C., Zhang, D., Zhuang, Y.: Effective deep learning-based multi-modal retrieval. VLDB J. 25(1), 79–101 (2016). doi:10.1007/s00778-015-0391-4
Article Google Scholar
Wang, X.J., Zhang, L., Ma, W.Y.: Duplicate-search-based image annotation using web-scale data. Proc. IEEE 100(9), 2705–2721 (2012). doi:10.1109/JPROC.2012.2193109
Article Google Scholar
Wei, Y., Song, Y., Zhen, Y., Liu, B., Yang, Q.: Heterogeneous translated hashing: A scalable solution towards multi-modal similarity search. ACM Trans. Knowl. Discov. Data 10(4), 36:1–36:28 (2016). doi:10.1145/2744204
Wilkins, P., Smeaton, A.F., Ferguson, P.: Properties of optimally weighted data fusion in CBMIR. In: 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2010), pp. 643–650 (2010). doi:10.1145/1835449.1835556
Wu, P., Hoi, S.C.H., Zhao, P., Miao, C., Liu, Z.: Online multi-modal distance metric learning with application to image retrieval. IEEE Trans. Knowl. Data Eng. 28(2), 454–467 (2016). doi:10.1109/TKDE.2015.2477296
Article Google Scholar
Xiao, Z., Qi, X.: Complementary relevance feedback-based content-based image retrieval. Multimedia Tools Appl. 73(3), 2157–2177 (2014). doi:10.1007/s11042-013-1693-4
Article Google Scholar
Xu, S., Li, H., Chang, X., Yu, S., Du, X., Li, X., Jiang, L., Mao, Z., Lan, Z., Burger, S., Hauptmann, A.G.: Incremental multimodal query construction for video search. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR 2015), pp. 675–678 (2015). doi:10.1145/2671188.2749413
Yang, X., Zhang, Y., Yao, T., Ngo, C., Mei, T.: Click-boosting multi-modality graph-based reranking for image search. Multimedia Syst. 21(2), 217–227 (2015). doi:10.1007/s00530-014-0379-8
Article Google Scholar
Zezula, P.: Future trends in similarity searching. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 8–24. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32153-5_2
Chapter Google Scholar
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search - The Metric Space Approach, Advances in Database Systems, vol. 32. Springer (2006)
Google Scholar
Zhang, D., Islam, M.M., Lu, G.: A review on automatic image annotation techniques. Pattern Recogn. 45(1), 346–362 (2012). doi:10.1016/j.patcog.2011.05.013
Article Google Scholar
Zhang, S., Yang, M., Cour, T., Yu, K., Metaxas, D.N.: Query specific fusion for image retrieval. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 660–673. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33709-3_47
Chapter Google Scholar
Zheng, L., Wang, S., Tian, L., He, F., Liu, Z., Tian, Q.: Query-adaptive late fusion for image search and person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 1741–1750 (2015). doi:10.1109/CVPR.2015.7298783
Zitouni, H., Sevil, S.G., Ozkan, D., Duygulu, P.: Re-ranking of web image search results using a graph algorithm. In: 19th International Conference on Pattern Recognition (ICPR 2008), pp. 1–4 (2008). doi:10.1109/ICPR.2008.4761472

Download references

Acknowledgments

This work was supported by the Czech national research project GA16-18889S. Computational resources were provided by the CESNET LM2015042 and the CERIT Scientific Cloud LM2015085.

Author information

Authors and Affiliations

Masaryk University, Brno, Czech Republic
Petra Budikova, Michal Batko & Pavel Zezula

Authors

Petra Budikova
View author publications
You can also search for this author in PubMed Google Scholar
Michal Batko
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Zezula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michal Batko .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner
Inria and LIRMM, University of Montpellier, Montpellier, France
Reza Akbarinia
Inria and LIRMM, University of Montpellier, Montpellier, France
Esther Pacitti

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Budikova, P., Batko, M., Zezula, P. (2017). Fusion Strategies for Large-Scale Multi-modal Image Retrieval. In: Hameurlain, A., Küng, J., Wagner, R., Akbarinia, R., Pacitti, E. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIII. Lecture Notes in Computer Science(), vol 10430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-55696-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-55696-2_5
Published: 08 August 2017
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-55695-5
Online ISBN: 978-3-662-55696-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics