Skip to main content

Fusion Strategies for Large-Scale Multi-modal Image Retrieval

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 10430))

Abstract

Large-scale data management and retrieval in complex domains such as images, videos, or biometrical data remains one of the most important and challenging information processing tasks. Even after two decades of intensive research, many questions still remain to be answered before working tools become available for everyday use. In this work, we focus on the practical applicability of different multi-modal retrieval techniques. Multi-modal searching, which combines several complementary views on complex data objects, follows the human thinking process and represents a very promising retrieval paradigm. However, a rapid development of modality fusion techniques in several diverse directions and a lack of comparisons between individual approaches have resulted in a confusing situation when the applicability of individual solutions is unclear. Aiming at improving the research community’s comprehension of this topic, we analyze and systematically categorize existing multi-modal search techniques, identify their strengths, and describe selected representatives. In the second part of the paper, we focus on the specific problem of large-scale multi-modal image retrieval on the web. We analyze the requirements of such task, implement several applicable fusion methods, and experimentally evaluate their performance in terms of both efficiency and effectiveness. The extensive experiments provide a unique comparison of diverse approaches to modality fusion in equal settings on two large real-world datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://disa.fi.muni.cz/profiset.

  2. 2.

    http://www.profimedia.com.

  3. 3.

    http://cophir.isti.cnr.it/.

References

  1. Abu-Shareha, A.A., Mandava, R., Khan, L., Ramachandram, D.: Multimodal concept fusion using semantic closeness for image concept disambiguation. Multimedia Tools Appl. 61(1), 69–86 (2011). doi:10.1007/s11042-010-0707-8

    Article  Google Scholar 

  2. Ah-Pine, J., Csurka, G., Clinchant, S.: Unsupervised visual and textual information fusion in CBMIR using graph-based methods. ACM Trans. Inform. Syst. 33(2), 9:1–9:31 (2015). doi:10.1145/2699668

  3. Andrade, F.S.P., Almeida, J., Pedrini, H., S.Torres, R.: Fusion of local and global descriptors for content-based image and video retrieval. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) CIARP 2012. LNCS, vol. 7441, pp. 845–853. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33275-3_104

    Chapter  Google Scholar 

  4. Arampatzis, A., Zagoris, K., Chatzichristofis, S.A.: Dynamic two-stage image retrieval from large multimodal databases. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 326–337. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20161-5_33

    Chapter  Google Scholar 

  5. Atrey, P.K., Hossain, M.A., El-Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010). doi:10.1007/s00530-010-0182-0

    Article  Google Scholar 

  6. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval - The Concepts and Technology Behind Search, 2nd edn. Pearson Education Ltd., Harlow (2011)

    Google Scholar 

  7. Barrios, J.M., Bustos, B.: Automatic weight selection for multi-metric distances. In: Proceedings of the 4th International Conference on Similarity Search and Applications (SISAP 2011), pp. 61–68 (2011). doi:10.1145/1995412.1995425

  8. Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., Sedmidubsky, J., Zezula, P.: Building a web-scale image similarity search system. Multimedia Tools Appl. 47(3), 599–629 (2010). doi:10.1007/s11042-009-0339-z

    Article  Google Scholar 

  9. Batko, M., Kohoutkova, P., Zezula, P.: Combining metric features in large collections. In: 24th International Conference on Data Engineering Workshops (ICDE 2008), pp. 370–377 (2008). doi:10.1109/ICDEW.2008.4498347

  10. Batko, M., Novak, D., Zezula, P.: MESSIF: metric similarity search implementation framework. In: Thanos, C., Borri, F., Candela, L. (eds.) DELOS 2007. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77088-6_1

    Chapter  Google Scholar 

  11. Benavent, X., Garcia-Serrano, A., Granados, R., Benavent, J., de Ves, E.: Multimedia information retrieval based on late semantic fusion approaches: experiments on a wikipedia image collection. IEEE Trans. Multimedia 15(8), 2009–2021 (2013). doi:10.1109/TMM.2013.2267726

    Article  Google Scholar 

  12. Blanken, H., de Vries, A., Blok, H., Feng, L.: Multimedia Retrieval. Data-Centric Systems and Applications. Springer, Secaucus (2007)

    Google Scholar 

  13. Bossé, É., Roy, J., Wark, S.: Concepts, Models, and Tools for Information Fusion. Artech House, Inc., Norwood (2007)

    Google Scholar 

  14. Bozzon, A., Fraternali, P.: Chapter 8: multimedia and multimodal information retrieval. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 5950, pp. 135–155. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12310-8_8

    Chapter  Google Scholar 

  15. Budikova, P., Batko, M., Novak, D., Zezula, P.: Inherent fusion: towards scalable multi-modal similarity search. J. Database Manag. 27(4), 1–23 (2016). doi:10.4018/JDM.2016100101

    Article  Google Scholar 

  16. Budikova, P., Batko, M., Zezula, P.: Evaluation platform for content-based image retrieval systems. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 130–142. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24469-8_15

    Chapter  Google Scholar 

  17. Budikova, P., Batko, M., Zezula, P.: Similarity query postprocessing by ranking. In: Detyniecki, M., Knees, P., Nürnberger, A., Schedl, M., Stober, S. (eds.) AMR 2010. LNCS, vol. 6817, pp. 159–173. Springer, Heidelberg (2012). doi:10.1007/978-3-642-27169-4_12

    Chapter  Google Scholar 

  18. Bustos, B., Kreft, S., Skopal, T.: Adapting metric indexes for searching in multi-metric spaces. Multimedia Tools Appl. 58(3), 467–496 (2012). doi:10.1007/s11042-011-0731-3

    Article  Google Scholar 

  19. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012). doi:10.1145/2071389.2071390

  20. Chatzichristofis, S.A., Zagoris, K., Boutalis, Y., Arampatzis, A.: A fuzzy rank-based late fusion method for image retrieval. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 463–472. Springer, Heidelberg (2012). doi:10.1007/978-3-642-27355-1_43

    Chapter  Google Scholar 

  21. Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. In: The Proceedings of the VLDB Endowment (PVLDB), pp. 217–228 (2013). doi:10.14778/2535569.2448955

  22. Chen, Y., Yu, N., Luo, B., wen Chen, X.: iLike: integrating visual and textual features for vertical search. In: 18th International Conference on Multimedia (ACM Multimedia 2010), pp. 221–230 (2010). doi:10.1145/1873951.1873984

  23. Ciaccia, P., Patella, M.: Searching in metric spaces with user-defined and approximate distances. ACM Trans. Database Syst. 27(4), 398–437 (2002). doi:10.1145/582410.582412

    Article  Google Scholar 

  24. Clinchant, S., Ah-Pine, J., Csurka, G.: Semantic combination of textual and visual information in multimedia retrieval. In: Proceedings of the 1st International Conference on Multimedia Retrieval (ICMR 2011), p. 44 (2011). doi:10.1145/1991996.1992040

  25. Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. Proc. VLDB Endowment (PVLDB) 2(1), 337–348 (2009). doi:10.14778/1687627.1687666

    Article  Google Scholar 

  26. Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 5:1–5:60 (2008). doi:10.1145/1348246.1348248

  27. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: ImageNet: a large-scale hierarchical image database. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 248–255 (2009). doi:10.1109/CVPRW.2009.5206848

  28. Depeursinge, A., Müller, H.: Fusion techniques for combining textual and visual information retrieval. In: ImageCLEF. The Kluwer International Series on Information Retrieval, vol. 32, pp. 95–114. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15181-1_6

  29. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 647–655 (2014). http://jmlr.org/proceedings/papers/v32/donahue14.html

  30. Dong, Y., Gao, S., Tao, K., Liu, J., Wang, H.: Performance evaluation of early and late fusion methods for generic semantics indexing. Pattern Anal. Appl. 17(1), 37–50 (2013). doi:10.1007/s10044-013-0336-8

    Article  MathSciNet  Google Scholar 

  31. Eickhoff, C., Li, W., Vries, A.P.: Exploiting user comments for audio-visual content indexing and retrieval. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 38–49. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_4

    Chapter  Google Scholar 

  32. Escalante, H.J., Montes, M., Sucar, L.E.: Multimodal indexing based on semantic cohesion for image retrieval. Inform. Retrieval 15(1), 1–32 (2012). doi:10.1007/s10791-011-9170-z

  33. Fagin, R.: Combining fuzzy information: an overview. SIGMOD Rec. 31(2), 109–118 (2002). doi:10.1145/565117.565143

    Article  Google Scholar 

  34. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  35. Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011). doi:10.1109/TMM.2010.2098858

    Article  Google Scholar 

  36. Ha, H., Yang, Y., Fleites, F., Chen, S.: Correlation-based feature analysis and multi-modality fusion framework for multimedia semantic retrieval. In: Proceedings of the 2013 IEEE International Conference on Multimedia and Expo (ICME 2013), pp. 1–6 (2013). doi:10.1109/ICME.2013.6607639

  37. Hemayati, R., Meng, W., Yu, C.: Semantic-based grouping of search engine results using wordnet. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM -2007. LNCS, vol. 4505, pp. 678–686. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72524-4_70

    Chapter  Google Scholar 

  38. Hoque, E., Strong, G., Hoeber, O., Gong, M.: Conceptual query expansion and visual search results exploration for web image retrieval. In: 7th Atlantic Web Intelligence Conference (AWIC 2011), pp. 73–82 (2011). doi:10.1007/978-3-642-18029-3_8

  39. Hörster, E., Slaney, M., Ranzato, M., Weinberger, K.: Unsupervised image ranking. In: 1st ACM Workshop on Large-Scale Multimedia Retrieval and Mining (LS-MMRM 2009), pp. 81–88 (2009). doi:10.1145/1631058.1631074

  40. Hsu, W.H., Kennedy, L.S., Chang, S.F.: Reranking methods for visual search. IEEE Multimedia 14(3), 14–22 (2007). doi:10.1109/MMUL.2007.61

    Article  Google Scholar 

  41. Jain, R., Sinha, P.: Content without context is meaningless. In: International Conference on Multimedia (ACM Multimedia 2010), pp. 1259–1268. ACM (2010). doi:10.1145/1873951.1874199

  42. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inform. Syst. 20(4), 422–446 (2002). doi:10.1145/582415.582418

    Article  Google Scholar 

  43. Jegou, H., Schmid, C., Harzallah, H., Verbeek, J.J.: Accurate image search using the contextual dissimilarity measure. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 2–11 (2010). doi:10.1109/TPAMI.2008.285

    Article  Google Scholar 

  44. Jing, Y., Baluja, S.: VisualRank: applying PageRank to large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1877–1890 (2008). doi:10.1109/TPAMI.2008.121

    Article  Google Scholar 

  45. Khasanova, R., Dong, X., Frossard, P.: Multi-modal image retrieval with random walk on multi-layer graphs. In: IEEE International Symposium on Multimedia (ISM 2016), pp. 1–6 (2016). doi:10.1109/ISM.2016.0011

  46. Kherfi, M.L., Ziou, D., Bernardi, A.: Image retrieval from the World Wide Web: Issues, techniques, and systems. ACM Comput. Surv. 36(1), 35–67 (2004). doi:10.1145/1013208.1013210

    Article  Google Scholar 

  47. Kludas, J., Bruno, E., Marchand-Maillet, S.: Information fusion in multimedia information retrieval. In: Boujemaa, N., Detyniecki, M., Nürnberger, A. (eds.) AMR 2007. LNCS, vol. 4918, pp. 147–159. Springer, Heidelberg (2008). doi:10.1007/978-3-540-79860-6_12

    Chapter  Google Scholar 

  48. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: 26th Annual Conference on Neural Information Processing Systems (NIPS 2012), pp. 1106–1114 (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks

  49. Lai, K., Liu, D., Chang, S., Chen, M.: Learning sample specific weights for late fusion. IEEE Trans. Image Process. 24(9), 2772–2783 (2015). doi:10.1109/TIP.2015.2423560

    Article  MathSciNet  Google Scholar 

  50. Lan, Z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Double fusion for multimedia event detection. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 173–185. Springer, Heidelberg (2012). doi:10.1007/978-3-642-27355-1_18

    Chapter  Google Scholar 

  51. Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: State of the art and challenges. TOMCCAP 2(1), 1–19 (2006). doi:10.1145/1126004.1126005

    Article  Google Scholar 

  52. Li, J.: Reachability based ranking in interactive image retrieval. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015), pp. 867–870 (2015). doi:10.1145/2766462.2767777

  53. Li, J., Ma, Q., Asano, Y., Yoshikawa, M.: Re-ranking by multi-modal relevance feedback for content-based social image retrieval. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds.) APWeb 2012. LNCS, vol. 7235, pp. 399–410. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29253-8_34

    Chapter  Google Scholar 

  54. Liu, Y., Mei, T., Hua, X.S.: CrowdReranking: exploring multiple search engines for visual search reranking. In: 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), pp. 500–507 (2009). doi:10.1145/1571941.1572027

  55. Lokoč, J., Novák, D., Batko, M., Skopal, T.: Visual image search: feature signatures or/and global descriptors. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 177–191. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32153-5_13

    Chapter  Google Scholar 

  56. Ma, D., Yu, Z.: New video target tracking algorithm based on KNN. J. Multimedia 9(5), 709–714 (2014). doi:10.4304/jmm.9.5.709-714

    Article  Google Scholar 

  57. Magalhães, J., Rüger, S.: An information-theoretic framework for semantic-multimedia retrieval. ACM Trans. Inform. Syst. 28(4), 1–32 (2010). doi:10.1145/1852102.1852105

  58. May, W., Fidler, S., Fazly, A.: Unsupervised disambiguation of image captions. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics (SemEval 2012), pp. 85–89, June 2012. http://dl.acm.org/citation.cfm?id=2387636.2387652

  59. McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in Action: Covers Apache Lucene V. 3. 0. Manning Pubs Co Series, Manning (2010)

    Google Scholar 

  60. Mei, T., Rui, Y., Li, S., Tian, Q.: Multimedia search reranking. ACM Comput. Surv. 46(3), 1–38 (2014). doi:10.1145/2536798

    Article  Google Scholar 

  61. Mironica, I., Ionescu, B., Vertan, C.: Hierarchical clustering relevance feedback for content-based image retrieval. In: 10th International Workshop on Content-Based Multimedia Indexing (CBMI 2012), pp. 1–6 (2012). doi:10.1109/CBMI.2012.6269811

  62. MPEG-7: Multimedia content description interfaces. Part 3: Visual. ISO/IEC 15938–3:2002 (2002)

    Google Scholar 

  63. Müller, H., Clough, P., Deselaers, T., Caputo, B.: ImageCLEF: Experimental Evaluation in Visual Information Retrieval, 1st edn. Springer, Heidelberg (2010)

    Book  MATH  Google Scholar 

  64. Nga, D.H., Yanai, K.: VisualTextualRank: an extension of VisualRank to large-scale video shot extraction exploiting tag co-occurrence. IEICE Trans. Inform. Syst. 98-D(1), 166–172 (2015). http://search.ieice.org/bin/summary.php?id=e98-d_1_166

  65. Novák, D.: Multi-modal similarity retrieval with distributed key-value store. Mob. Networks Appl. 20(4), 521–532 (2015). doi:10.1007/s11036-014-0561-4

    Article  Google Scholar 

  66. Novák, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inform. Syst. 36(4), 721–733 (2011). doi:10.1016/j.is.2010.10.002

    Article  Google Scholar 

  67. Oh, S., McCloskey, S., Kim, I., Vahdat, A., Cannons, K.J., Hajimirsadeghi, H., Mori, G., Perera, A.G.A., Pandey, M., Corso, J.J.: Multimedia event detection with multimodal feature fusion and temporal concept localization. Mach. Vis. Appl. 25(1), 49–69 (2013). doi:10.1007/s00138-013-0525-x

    Article  Google Scholar 

  68. Park, G., Baek, Y., Lee, H.K.: Web image retrieval using majority-based ranking approach. Multimedia Tools Appl. 31(2), 195–219 (2006). doi:10.1007/s11042-006-0039-x

    Article  Google Scholar 

  69. Patella, M., Ciaccia, P.: Approximate similarity search: a multi-faceted problem. J. Discrete Algorithms 7(1), 36–48 (2009). doi:10.1016/j.jda.2008.09.014

    Article  MathSciNet  MATH  Google Scholar 

  70. Pedronette, D.C.G., da Silva Torres, R.: Combining re-ranking and rank aggregation methods for image retrieval. Multimedia Tools Appl. 75(15), 9121–9144 (2016). doi:10.1007/s11042-015-3044-0

  71. Pham, T.T., Maillot, N., Lim, J.H., Chevallet, J.P.: Latent semantic fusion model for image retrieval and annotation. In: Sixteenth ACM Conference on Information and Knowledge Management (CIKM 2007), pp. 439–444 (2007). doi:10.1145/1321440.1321503

  72. Pulla, C., Jawahar, C.V.: Multi modal semantic indexing for image retrieval. In: 9th ACM International Conference on Image and Video Retrieval (CIVR 2010), pp. 342–349 (2010). doi:10.1145/1816041.1816091

  73. Qi, S., Wang, F., Wang, X., Guan, Y., Wei, J., Guan, J.: Multiple level visual semantic fusion method for image re-ranking. Multimedia Syst. 23(1), 155–167 (2017). doi:10.1007/s00530-014-0448-z

    Article  Google Scholar 

  74. Richter, F., Romberg, S., Hörster, E., Lienhart, R.: Multimodal ranking for image search on community databases. In: Proceedings of the International Conference on Multimedia Information Retrieval (MIR 2010), pp. 63–72 (2010). doi:10.1145/1743384.1743402

  75. Rokach, L.: Taxonomy for characterizing ensemble methods in classification tasks: a review and annotated bibliography. Comput. Stat. Data Anal. 53(12), 4046–4072 (2009). doi:10.1016/j.csda.2009.07.017

    Article  MathSciNet  MATH  Google Scholar 

  76. Ross, A., Jain, A.K.: Multimodal biometrics: an overview. In: 12th European Signal Processing Conference, pp. 1221–1224 (2004). http://ieeexplore.ieee.org/abstract/document/7080214/

  77. Rui, Y., Huang, T., Ortega, M., Mehrotra, S.: Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans. Circuits Syst. Video Technol. 8(5), 644–655 (1998). http://ieeexplore.ieee.org/abstract/document/718510/

    Article  Google Scholar 

  78. Safadi, B., Sahuguet, M., Huet, B.: When textual and visual information join forces for multimedia retrieval. In: International Conference on Multimedia Retrieval (ICMR 2014), p. 265 (2014). doi:10.1145/2578726.2578760

  79. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Computer Graphics and Geometric Modeling. Morgan Kaufmann Publishers Inc. (2005)

    Google Scholar 

  80. Santos, J.M., Cavalcanti, J.M.B., Saraiva, P.C., Moura, E.S.: Multimodal re-ranking of product image search results. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 62–73. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_6

    Chapter  Google Scholar 

  81. Santos, E., Gu, Q.: Automatic content based image retrieval using semantic analysis. J. Intell. Inform. Syst. 43(2), 247–269 (2014). doi:10.1007/s10844-014-0321-8

  82. Siddiquie, B., White, B., Sharma, A., Davis, L.S.: Multi-modal image retrieval for complex queries using small codes. In: International Conference on Multimedia Retrieval (ICMR 2014), p. 321 (2014). doi:10.1145/2578726.2578767

  83. Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–1380 (2000). doi:10.1109/34.895972

    Article  Google Scholar 

  84. Snoek, C., Worring, M., Smeulders, A.W.M.: Early versus late fusion in semantic video analysis. In: 13th ACM International Conference on Multimedia (ACM Multimedia), pp. 399–402 (2005). doi:10.1145/1101149.1101236

  85. Sugiyama, Y., Kato, M.P., Ohshima, H., Tanaka, K.: Relative relevance feedback in image retrieval. In: International Conference on Multimedia and Expo (ICME 2012), pp. 272–277 (2012). doi:10.1109/ICME.2012.161

  86. Tollari, S., Detyniecki, M., Marsala, C., Fakeri-Tabrizi, A., Amini, M.-R., Gallinari, P.: Exploiting visual concepts to improve text-based image retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 701–705. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00958-7_70

    Chapter  Google Scholar 

  87. Tran, T., Phung, D., Venkatesh, S.: Learning sparse latent representation and distance metric for image retrieval. In: IEEE International Conference on Multimedia and Expo (ICME 2013), pp. 1–6. IEEE (2013). doi:10.1109/ICME.2013.6607435

  88. Uluwitige, D., Chappell, T., Geva, S., Chandran, V.: Improving retrieval quality using pseudo relevance feedback in content-based image retrieval. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016), pp. 873–876 (2016). doi:10.1145/2911451.2914747

  89. Wang, L., Yang, L., Tian, X.: Query aware visual similarity propagation for image search reranking. In: ACM Multimedia 2009, pp. 725–728 (2009). doi:10.1145/1631272.1631398

  90. Wang, W., Yang, X., Ooi, B.C., Zhang, D., Zhuang, Y.: Effective deep learning-based multi-modal retrieval. VLDB J. 25(1), 79–101 (2016). doi:10.1007/s00778-015-0391-4

    Article  Google Scholar 

  91. Wang, X.J., Zhang, L., Ma, W.Y.: Duplicate-search-based image annotation using web-scale data. Proc. IEEE 100(9), 2705–2721 (2012). doi:10.1109/JPROC.2012.2193109

    Article  Google Scholar 

  92. Wei, Y., Song, Y., Zhen, Y., Liu, B., Yang, Q.: Heterogeneous translated hashing: A scalable solution towards multi-modal similarity search. ACM Trans. Knowl. Discov. Data 10(4), 36:1–36:28 (2016). doi:10.1145/2744204

  93. Wilkins, P., Smeaton, A.F., Ferguson, P.: Properties of optimally weighted data fusion in CBMIR. In: 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2010), pp. 643–650 (2010). doi:10.1145/1835449.1835556

  94. Wu, P., Hoi, S.C.H., Zhao, P., Miao, C., Liu, Z.: Online multi-modal distance metric learning with application to image retrieval. IEEE Trans. Knowl. Data Eng. 28(2), 454–467 (2016). doi:10.1109/TKDE.2015.2477296

    Article  Google Scholar 

  95. Xiao, Z., Qi, X.: Complementary relevance feedback-based content-based image retrieval. Multimedia Tools Appl. 73(3), 2157–2177 (2014). doi:10.1007/s11042-013-1693-4

    Article  Google Scholar 

  96. Xu, S., Li, H., Chang, X., Yu, S., Du, X., Li, X., Jiang, L., Mao, Z., Lan, Z., Burger, S., Hauptmann, A.G.: Incremental multimodal query construction for video search. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR 2015), pp. 675–678 (2015). doi:10.1145/2671188.2749413

  97. Yang, X., Zhang, Y., Yao, T., Ngo, C., Mei, T.: Click-boosting multi-modality graph-based reranking for image search. Multimedia Syst. 21(2), 217–227 (2015). doi:10.1007/s00530-014-0379-8

    Article  Google Scholar 

  98. Zezula, P.: Future trends in similarity searching. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 8–24. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32153-5_2

    Chapter  Google Scholar 

  99. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search - The Metric Space Approach, Advances in Database Systems, vol. 32. Springer (2006)

    Google Scholar 

  100. Zhang, D., Islam, M.M., Lu, G.: A review on automatic image annotation techniques. Pattern Recogn. 45(1), 346–362 (2012). doi:10.1016/j.patcog.2011.05.013

    Article  Google Scholar 

  101. Zhang, S., Yang, M., Cour, T., Yu, K., Metaxas, D.N.: Query specific fusion for image retrieval. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 660–673. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33709-3_47

    Chapter  Google Scholar 

  102. Zheng, L., Wang, S., Tian, L., He, F., Liu, Z., Tian, Q.: Query-adaptive late fusion for image search and person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 1741–1750 (2015). doi:10.1109/CVPR.2015.7298783

  103. Zitouni, H., Sevil, S.G., Ozkan, D., Duygulu, P.: Re-ranking of web image search results using a graph algorithm. In: 19th International Conference on Pattern Recognition (ICPR 2008), pp. 1–4 (2008). doi:10.1109/ICPR.2008.4761472

Download references

Acknowledgments

This work was supported by the Czech national research project GA16-18889S. Computational resources were provided by the CESNET LM2015042 and the CERIT Scientific Cloud LM2015085.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michal Batko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer-Verlag GmbH Germany

About this chapter

Cite this chapter

Budikova, P., Batko, M., Zezula, P. (2017). Fusion Strategies for Large-Scale Multi-modal Image Retrieval. In: Hameurlain, A., Küng, J., Wagner, R., Akbarinia, R., Pacitti, E. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIII. Lecture Notes in Computer Science(), vol 10430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-55696-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-55696-2_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-55695-5

  • Online ISBN: 978-3-662-55696-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics