Automated Annotation of Landmark Images Using Community Contributed Datasets and Web Resources

  • Gareth J. F. Jones
  • Daragh Byrne
  • Mark Hughes
  • Noel E. O’Connor
  • Andrew Salway
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6725)


A novel solution to the challenge of automatic image annotation is described. Given an image with GPS data of its location of capture, our system returns a semantically-rich annotation comprising tags which both identify the landmark in the image, and provide an interesting fact about it, e.g. “A view of the Eiffel Tower, which was built in 1889 for an international exhibition in Paris”. This exploits visual and textual web mining in combination with content-based image analysis and natural language processing. In the first stage, an input image is matched to a set of community contributed images (with keyword tags) on the basis of its GPS information and image classification techniques. The depicted landmark is inferred from the keyword tags for the matched set. The system then takes advantage of the information written about landmarks available on the web at large to extract a fact about the landmark in the image. We report component evaluation results from an implementation of our solution on a mobile device. Image localisation and matching offers 93.6% classification accuracy; the selection of appropriate tags for use in annotation performs well (F1M of 0.59), and it subsequently automatically identifies a correct toponym for use in captioning and fact extraction in 69.0% of the tested cases; finally the fact extraction returns an interesting caption in 78% of cases.


web mining geo-tagged images landmark identification automated image captioning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Chevallet, J.-P., Lim, J.-H., Leong, M.-K.: Object identification and retrieval from efficient image matching. Snap2Tell with the STOIC dataset. Information Processing and Management 43(2), 515–530 (2007)CrossRefGoogle Scholar
  6. 6.
    Cortes, C., Vapnik, V.: Support-vector networks, vol. (3), pp. 273–297 (1995)Google Scholar
  7. 7.
    Fritz, G., Seifert, C., Paletta, L.: A mobile vision system for urban detection with informative local descriptors. In: Proceedings of the IEEE International Conference on Computer Vision Systems (ICVS 2006), p. 30 (2006)Google Scholar
  8. 8.
    Jäschke, R., Eisterlehner, F., Hotho, A., Stumme, G.: Testing and evaluating tag recommenders in a live system. In: Workshop on Knowledge Discovery, Data Mining, and Machine Learning, pp. 44–51 (2009)Google Scholar
  9. 9.
    Lorenz Wendt, F., Bres, S., Tellez, B., Laurini, R.: Markerless outdoor localisation based on sift descriptors for mobile applications. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2008. LNCS, vol. 5099, pp. 439–446. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Lowe, D.G.: Local feature view clustering for 3D object recognition. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I-682–I-688 (2001)Google Scholar
  11. 11.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  12. 12.
    Malobabic, J., le Borgne, H., Murphy, N., O’Connor, N.: Detecting the presence of large buildings in natural images. In: Proceedings of the 4th International Workshop on Content-Based Multimedia Indexing (CBMI 2005), pp. 529–532 (2005)Google Scholar
  13. 13.
    Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)CrossRefGoogle Scholar
  14. 14.
    Qingji, G., Juan, L., Guoqing, Y.: Vision based road crossing scene recognition for robot localization. In: Proceedings of the International Conference on Computer Science and Software Engineering, vol. 6, pp. 62–66 (2008)Google Scholar
  15. 15.
    Rahmani, R., Goldman, S.A., Zhang, H., Cholleti, S.R., Fritts, J.E.: Localized content-based image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11), 1902–1912 (2008)CrossRefGoogle Scholar
  16. 16.
    Salway, A., Kelly, L., Skadina, I., Jones, G.J.F.: Portable extraction of partially structured facts from the web. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 345–356. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Szummer, M., Picard, R.W.: Indoor-outdoor image classification. In: Proceedings of the IEEE International Workshop on Content-Based Access of Image and Video Database, pp. 42–51 (1998)Google Scholar
  18. 18.
    van Rijsbergen, C.: Information Retrieval, 2nd edn., Butterworths (1979)Google Scholar
  19. 19.
    Yeh, T., Tollmar, K., Darrell, T.: Searching the web with mobile images for location recognition. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), vol. 2, pp. 76–81 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Gareth J. F. Jones
    • 1
  • Daragh Byrne
    • 1
    • 2
  • Mark Hughes
    • 1
    • 2
  • Noel E. O’Connor
    • 2
  • Andrew Salway
    • 1
  1. 1.Centre for Digital Video Processing, School of ComputingDublin City UniversityDublin 9Ireland
  2. 2.CLARITY: Centre for Sensor Web TechnologiesDublin City UniversityDublin 9Ireland

Personalised recommendations