Advertisement

Multimedia Tools and Applications

, Volume 42, Issue 1, pp 31–56 | Cite as

Crossing textual and visual content in different application scenarios

  • Julien Ah-Pine
  • Marco Bressan
  • Stephane Clinchant
  • Gabriela Csurka
  • Yves Hoppenot
  • Jean-Michel Renders
Article

Abstract

This paper deals with multimedia information access. We propose two new approaches for hybrid text-image information processing that can be straightforwardly generalized to the more general multimodal scenario. Both approaches fall in the trans-media pseudo-relevance feedback category. Our first method proposes using a mixture model of the aggregate components, considering them as a single relevance concept. In our second approach, we define trans-media similarities as an aggregation of monomodal similarities between the elements of the aggregate and the new multimodal object. We also introduce the monomodal similarity measures for text and images that serve as basic components for both proposed trans-media similarities. We show how one can frame a large variety of problem in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. Finally, we present how these methods can be integrated in two applications: a travel blog assistant system and a tool for browsing the Wikipedia taking into account the multimedia nature of its content.

Keywords

Text-image information processing Trans-media similarities Cross-content information retrieval and browsing Image auto-annotation Multimedia document generation 

Notes

Acknowledgements

The authors want to thank particularly INA for their contributions in our work and Florent Perronin for his greatly appreciated help in applying some of the Generic Visual Categorizer (GVC) components. We would like also to acknowledge the following Flickr users whose photographs we reproduced here under Creative Common licences:

Tatiana Sapateiro   http://www.flickr.com/photos/tatianasapateiro

Pedro Paulo Silva de Souza   http://www.flickr.com/photos/pedrop

Leonardo Pallotta   http://www.flickr.com/photos/groundzero

Laszlo Ilyes   http://www.flickr.com/photos/laszlo-photo

Jorge Wagner   http://www.flickr.com/photos/jorgewagner

UminDaGuma   http://www.flickr.com/photos/umindaguma

Scott Robinson   http://www.flickr.com/photos/clearlyambiguous

Gabriel Flores Romero   http://www.flickr.com/photos/gabofr

Jenny Mealing   http://www.flickr.com/photos/jennifrog

Roney   http://www.flickr.com/photos/roney

David Katarina   http://www.flickr.com/photos/davidkatarina

T. Chu   http://www.flickr.com/photos/spyderball

Bill Wilcox   http://www.flickr.com/photos/billwilcox

S2RD2   http://www.flickr.com/photos/stuardo

Fred Hsu   http://www.flickr.com/photos/fhsu

Abel Pardo López   http://www.flickr.com/photos/sancho_panza

Cat   http://www.flickr.com/photos/clspeace

Thowra_uk   http://www.flickr.com/photos/thowra

Elena Heredero   http://www.flickr.com/photos/elenaheredero

Rick McCharles   http://www.flickr.com/photos/rickmccharles

Marília Almeida   http://www.flickr.com/photos/68306118@N00

Gustavo Madico   http://www.flickr.com/photos/desdegus

Douglas Fernandes   http://www.flickr.com/photos/thejourney1972

James Preston   http://www.flickr.com/photos/jamespreston

Rodrigo Della Fávera   http://www.flickr.com/photos/rodrigofavera

Dinesh Rao   http://www.flickr.com/photos/dinrao

Marina Campos Vinhal   http://www.flickr.com/photos/marinacvinhal

Jorge Gobbi   http://www.flickr.com/photos/morrissey

Steve Taylor   http://www.flickr.com/photostheboywiththethorninhisside/

Finally would like also to acknowledge the users who wrote the blog paragraphs were used and reproduced here. These texts can be found at the folloing addresses:

http://realtravel.com/cuzco-journals-j1879736.html

http://realtravel.com/machu_picchu-journals-j5181463.html

http://realtravel.com/rio-journals-j4669810.html

http://www.travelpod.com/travel-blog-entries/sarah_s_america/south_america/1140114720/tpod.html

http://www.travelpod.com/travel-blog-entries/rachel_john/roundtheworld/1146006300/tpod.html

http://www.travelpod.com/travel-blog-entries/eatdessertfirst/world_tour_05/1160411340/tpod.html

http://www.travelpod.com/travel-blog-entries/idarich/rtw_2005/1140476400/tpod.html

http://www.travelpod.com/travel-blog-entries/twittg/rtw/1132765860/tpod.html

http://www.travelpod.com/travel-blog-entries/emanddave/worldtrip2006/1155492420/tpod.html

References

  1. 1.
    Ah-Pine J, Cifarelli C, Clinchant S, Csurka G, Renders J (2008) Xrce’s participation to imageclefphoto 2008. In: Working Notes of the 2008 CLEF Workshop, Aarhus, 17–19 September 2008Google Scholar
  2. 2.
    Barnard K, Duygulu P, de Freitas N, Forsyth D, Blei D, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135MATHCrossRefGoogle Scholar
  3. 3.
    Blei D, Michael, Jordan MI (2003) Modeling annotated data. In: ACM SIGIR, Toronto, 28 July–1 August 2003Google Scholar
  4. 4.
    Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, Melbourne, 24–28 August 1998Google Scholar
  5. 5.
    Carbonetto P, de Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: ECCV, Prague, 11–14 May 2004Google Scholar
  6. 6.
    Chang Y-C, Chen H-H (2006) Approaches of using a word-image ontology and an annotated image corpus as intermedia for cross-language image retrieval. In: CLEF 2006 Working NotesGoogle Scholar
  7. 7.
    Clinchant S, Renders J, Csurka G (2007) Xrce’s participation to imageclefphoto 2007. In: Working Notes of the 2007 CLEF Workshop. http://clef.isti.cnr.it/2007/working_notes/CLEF2007WN-Contents.html
  8. 8.
    Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning for Computer Vision, Prague, May 2004Google Scholar
  9. 9.
    Dowman M, Tablan V, Cunningham H, Popov B (2005) Web-assisted annotation, semantic indexing and search of television and radio news. In: Proc. of the 14th international world wide web conference, Chiba, 10–14 May 2005Google Scholar
  10. 10.
    Duygulu P, Barnard K, de Freitas J, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: ECCV, Copenhagen, 27 May–2 June 2002Google Scholar
  11. 11.
    Feng S, Lavrenko V, Manmatha R (2004) Multiple bernoulli relevance models for image and video annotation. In: CVPR, Washington, DC, 27 June–2 July 2004Google Scholar
  12. 12.
    Flickr (2007) Flickr homepage. http://www.flickr.com
  13. 13.
    Footstops (2007) Footstops homepage. http://footstops.com/
  14. 14.
    Grubinger M, Clough P, Hanbury A, Müller H (2007) Overview of the ImageCLEFphoto 2007 photographic retrieval task. In: Working notes of the 2007 CLEF workshop http://www.clef-campaign.org/2007/working_notes/CLEF2007WN-Contents.html
  15. 15.
    Iyengar G, Duygulu P, Feng S, Ircing P, Khudanpur S, Klakow D, Krause M, Manmatha R, Nock H, Petkova D, Pytlik B, Virga P (2005) Joint visual-text modeling for automatic retrieval of multimedia documents. In: Proceedings of ACM multimedia, Singapore, 6–11 November 2005Google Scholar
  16. 16.
    Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: ACM SIGIR, Toronto, 28 July–1 August 2003Google Scholar
  17. 17.
    Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: NIPS, Vancouver, 13 December 2003Google Scholar
  18. 18.
    Li X, Chen L, Zhang L, Lin F, ying Ma W (2006) Image annotation by large-scale content-based image retrieval. In: Proc. of the 14th annual ACM international conference on multimedia (MM06), Santa Barbara, 23–27 October 2006Google Scholar
  19. 19.
    Li L-J, Wang G, Fei-Fei L (2007) Optimol: automatic object picture collection via incremental model learning. In: CVPR, Minneapolis, 18–23 June 2007Google Scholar
  20. 20.
    Li J, Wang JZ (2005) Alip: The automatic linguistic indexing of pictures system. In: CVPR ’05: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05)—vol 2. IEEE Computer Society, Washington, DC, pp. 1208–1209Google Scholar
  21. 21.
    Maillot N, Chevallet J-P, Valea V, Lim JH (2006) Ipal inter-media pseudo-relevance feedback approach to imageclef 2006 photo retrieval. In: CLEF 2006 Working NotesGoogle Scholar
  22. 22.
    Monay F, Gatica-Perez D (2004) Plsa-based image auto-annotation: constraining the latent space. In: ACM MM, New York, 10–16 October 2004Google Scholar
  23. 23.
    Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM’99 first international workshop on multimedia intelligent storage and retrieval management, Orlando, October 1999Google Scholar
  24. 24.
    Pan J, Yang H, Faloutsos C, Duygulu P (2004) Gcap: Graph-based automatic image captioning. In: CVPR workshop on multimedia data and document engineering, Washington, DC, July 2004Google Scholar
  25. 25.
    Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: CVPR, Minneapolis, 18–23 June 2007Google Scholar
  26. 26.
    Quattoni A, Collins M, Darrell T (2007) Learning visual representations using images with captions. In: CVPR, Minneapolis, 18–23 June 2007Google Scholar
  27. 27.
    Realtravel (2007) Realtravel homepage. http://realtravel.com/
  28. 28.
    Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system—experiments in automatic document processing. Kluwer, DeventerGoogle Scholar
  29. 29.
    Tao T, Zhai C (2006) Regularized estimation of mixture models for robust pseudo-relevance feedback. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, 6–11 August 2006Google Scholar
  30. 30.
    Travbuddy (2007) Travbuddy homepage. http://www.travbuddy.com/
  31. 31.
    Travelblog (2007) Travelblog homepage. http://www.travelblog.org/
  32. 32.
    Travellerspoint (2007) Travellerspoint homepage. http://www.travellerspoint.com
  33. 33.
    Travelpod (2007) Travelpod homepage. http://www.travelpod.com/
  34. 34.
    Trippert (2007) Trippert homepage. http://trippert.com/
  35. 35.
    Vinokourov A, Hardoon DR, Shawe-Taylor J (2003) Learning the semantics of multimedia content with application to web image retrieval and classification. In: Fourth international symposium on independent component analysis and blind source separation, Nara, 1–4 April 2003Google Scholar
  36. 36.
    Wang X, Zhang L, Jing W-YMF (2006) Annosearch: image auto-annotation by search. In: CVPR, New York, 17–22 June 2006Google Scholar
  37. 37.
    Yanai K, Barnard K (2005) Probabilistic web image gathering. In: Proc. of ACM multimedia workshop on multimedia information retrieval (MIR05), Singapore, 11–12 November 2005Google Scholar
  38. 38.
    Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: CIKM, Atlanta, 5–10 November 2001Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Julien Ah-Pine
    • 1
  • Marco Bressan
    • 1
  • Stephane Clinchant
    • 1
  • Gabriela Csurka
    • 1
  • Yves Hoppenot
    • 1
  • Jean-Michel Renders
    • 1
  1. 1.Xerox Research Centre EuropeMeylanFrance

Personalised recommendations