Online Matrix Factorization for Multimodal Image Retrieval

  • Juan C. Caicedo
  • Fabio A. González
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7441)


In this paper, we propose a method to build an index for image search using multimodal information, that is, using visual features and text data simultaneously. The method combines both data sources and generates one multimodal representation using latent factor analysis and matrix factorization. One remarkable characteristic of this multimodal representation is that it connects textual and visual content allowing to solve queries with only visual content by implicitly completing the missing textual content. Another important characteristic of the method is that the multimodal representation is learned online using an efficient stochastic gradient descent formulation. Experiments were conducted in a dataset of 5,000 images to evaluate the convergence speed and search performance. Experimental results show that the proposed algorithm requires only one pass through the data set to achieve high quality retrieval performance.


Image Retrieval Matrix Factorization Text Data Retrieval Performance Nonnegative Matrix Factorization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Rasiwasia, N., Moreno, P.J., Vasconcelos, N.: Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia 9(5), 923–938 (2007)CrossRefGoogle Scholar
  2. 2.
    Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)CrossRefGoogle Scholar
  3. 3.
    Fan, X., Xie, X., Li, Z., Li, M., Ma, W.: Photo-to-search: using multimodal queries to search the web from mobile devices. In: Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 143–150. ACM, Hilton (2005)Google Scholar
  4. 4.
    Muller, H., Michoux, N., Bandon, D., Geissbuhler, A.: A review of content-based image retrieval systems in medical applications: Clinical benefits and future directions. International Journal of Medical Informatics 73, 1–23 (2004)CrossRefGoogle Scholar
  5. 5.
    Makadia, A., Pavlovic, V., Kumar, S.: A New Baseline for Image Annotation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 316–329. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Atrey, P., Hossain, M., El Saddik, A., Kankanhalli, M.: Multimodal fusion for multimedia analysis: a survey. Multimedia Systems (2010)Google Scholar
  7. 7.
    Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary (chapter 7). In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Caicedo, J.C., BenAbdallah, J., González, F.A., Nasraoui, O.: Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomput. 76, 50–60 (2012)CrossRefGoogle Scholar
  9. 9.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of the 19th International Conference on Computational Statistics (2010)Google Scholar
  11. 11.
    Hare, J.S., Samangooei, S., Lewis, P.H., Nixon, M.S.: Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces. In: CIVR 2008: Proceedings of the 2008 International Conference on Content-based Image and Video Retrieval, pp. 359–368. ACM, New York (2008)CrossRefGoogle Scholar
  12. 12.
    Akata, Z., Thurau, C., Bauckhage, C.: Non-negative matrix factorization in multimodality data for segmentation and label prediction. In: 16th Computer Vision Winter Workshop (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Juan C. Caicedo
    • 1
  • Fabio A. González
    • 1
  1. 1.Universidad Nacional de ColombiaColombia

Personalised recommendations