Multimedia Tools and Applications

, Volume 60, Issue 2, pp 419–441 | Cite as

Visual graph modeling for scene recognition and mobile robot localization

  • Trong-Ton Pham
  • Philippe Mulhem
  • Loïc Maisonnasse
  • Eric Gaussier
  • Joo-Hwee Lim


Image retrieval and categorization may need to consider several types of visual features and spatial information between them (e.g., different point of views of an image). This paper presents a novel approach that exploits an extension of the language modeling approach from information retrieval to the problem of graph-based image retrieval and categorization. Such versatile graph model is needed to represent the multiple points of views of images. A language model is defined on such graphs to handle a fast graph matching. We present the experiments achieved with several instances of the proposed model on two collections of images: one composed of 3,849 touristic images and another composed of 3,633 images captured by a mobile robot. Experimental results show that using visual graph model (VGM) improves the accuracies of the results of the standard language model (LM) and outperforms the Support Vector Machine (SVM) method.


Graph theory Information retrieval Language model Scene Recognition Robot localization 



This work was supported by the French National Agency of Research (ANR-06-MDCA-002). Pham Trong-Ton would like to thank Merlion programme of the French Embassy in Singapore for their supports during his Ph.D study.


  1. 1.
    Boutell MR, Luo J, Brown CM (2007) Scene parsing using region-based generative models. IEEE Trans Multimedia 9(1):136–146CrossRefGoogle Scholar
  2. 2.
    Chang Y, Ann H, Yeh W (2000) A unique-id-based matrix strategy for efficient iconic indexing of symbolic pictures. Pattern Recogn 33(8):1263–1276CrossRefGoogle Scholar
  3. 3.
    Chua TS, Tan KL, Ooi BC (1997) Fast signature-based color-spatial image retrieval. In: ICMCS 1997, pp 362–369Google Scholar
  4. 4.
    Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):1–60CrossRefGoogle Scholar
  5. 5.
    Egenhofer M, Herring J (1991) Categorizing binary topological relationships between regions, lines and points in geographic databases. In: A framework for the definition of topological relationships and an approach to spatial reasoning within this framework. Santa Barbara, CAGoogle Scholar
  6. 6.
    Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181CrossRefGoogle Scholar
  7. 7.
    Gao S, Wang DH, Lee CH (2006) Automatic image annotation through multi-topic text categorization. In: Proc. of ICASSP 2006, pp 377–380Google Scholar
  8. 8.
    Han D, Li W, Li Z (2008) Semantic image classification using statistical local spatial relations model. Multimedia Tools and Applications 39(2):169–188CrossRefGoogle Scholar
  9. 9.
    Hironobu YM, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: Neural networks, pp 405–409Google Scholar
  10. 10.
    Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: SIGIR ’03, pp 119–126Google Scholar
  11. 11.
    Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE PAMI 25(9):1075–1088CrossRefGoogle Scholar
  12. 12.
    Lim J, Li Y, You Y, Chevallet J (2007) Scene recognition with camera phones for tourist information access. In: ICME’07Google Scholar
  13. 13.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2) 91–110CrossRefGoogle Scholar
  14. 14.
    Maisonnasse L, Gaussier E, Chevallet J (2007) Revisiting the dependence language model for information retrieval. In: SIGIR ’07Google Scholar
  15. 15.
    Maisonnasse L, Gaussier E, Chevalet J (2009) Model fusion in conceptual language modeling. In: ECIR ’09, pp 240–251Google Scholar
  16. 16.
    Manning CD, Raghavan P, Schtze H (2009) Language models for information retrieval. In: An introduction to information retrieval. Cambridge University Press, pp 237–252Google Scholar
  17. 17.
    Mulhem P, Debanne E (2006) A framework for mixed symbolic-based and feature-based query by example image retrieval. Int J Inf Technol 12(1):74–98Google Scholar
  18. 18.
    Ounis I, Pasca M (1998) Relief: combining expressiveness and rapidity into a single system. In: SIGIR ’98, pp 266–274Google Scholar
  19. 19.
    Papadopoulos G, Mezaris V, Kompatsiaris I, Strintzis MG (2007) Combining global and local information for knowledge-assisted image analysis and classification. EURASIP Journal on Advances in Signal Processing, Special Issue on Knowledge-Assisted Media Analysis for Interactive Multimedia Applications 2007Google Scholar
  20. 20.
    Pham TT, Maisonnasse L, Mulhem P (2009) Visual language modeling for mobile localization: Lig participation in Robotvision’09. In: CLEF working notes 2009. Corfu, GreeceGoogle Scholar
  21. 21.
    Pham TT, Maisonnasse L, Mulhem P, Gaussier E (2010) Integration of spatial relationship in visual language model for scene retrieval. In: 8th IEEE int. workshop on content-based multimedia indexingGoogle Scholar
  22. 22.
    Pham TT, Mulhem P, Maisonnasse L (2010) Spatial relationships in visual graph modeling for image categorization. In: ACM SIGIR’10. Geneva, SwitzerlandGoogle Scholar
  23. 23.
    Pham TV, Smeulders AWM (2006) Learning spatial relations in object recognition. Pattern Recogn Lett 27(14):1673–1684CrossRefGoogle Scholar
  24. 24.
    Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: SIGIR ’98Google Scholar
  25. 25.
    Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision, vol 2, pp 1470–1477Google Scholar
  26. 26.
    Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content based image retrieval at the end of the early years. IEEE PAMI 22(12):1349–1380CrossRefGoogle Scholar
  27. 27.
    Smith JR, Chang S-F (1996) Visualseek: a fully automated content-based image query system. In: Proceedings ACM MM, pp 87–98Google Scholar
  28. 28.
    Song F, Croft WB (1999) General language model for information retrieval. In: CIKM’99, pp 316–321Google Scholar
  29. 29.
    Tirilly P, Claveau V, Gros P (2008) Language modeling for bag-of-visual words image categorization. In: Proc. of CIVR 2008, pp 249–258Google Scholar
  30. 30.
    Won CS, Park DK, Park SJ (2002) Efficient use of mpeg-7 edge histogram descriptor. ETRI J 24(1)Google Scholar
  31. 31.
    Wu L, Li M, Li Z, Ma WY, Yu N (2007) Visual language modeling for image classification. In: MIR ’07. ACM, New York, pp 115–124CrossRefGoogle Scholar
  32. 32.
    Zhai C, Lafferty J (2001) A study of smoothing methods for language models applied to ad-hoc information retrieval. In: SIGIR ’01, pp 334–342Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Trong-Ton Pham
    • 1
  • Philippe Mulhem
    • 2
  • Loïc Maisonnasse
    • 3
  • Eric Gaussier
    • 2
  • Joo-Hwee Lim
    • 4
  1. 1.Grenoble Institute of Technology—Laboratoire Informatique de Grenoble (LIG)GrenobleFrance
  2. 2.Multimedia Information Modeling and Retrieval—Laboratoire Informatique de Grenoble (LIG)GrenobleFrance
  3. 3.R&D Department-TecKnowMetrixVoironFrance
  4. 4.Computer Vision and Image Understanding-Institute for Infocomm Research (I2R)ConnexisSingapore

Personalised recommendations