Skip to main content
Log in

VIRaL: Visual Image Retrieval and Localization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

New applications are emerging every day exploiting the huge data volume in community photo collections. Most focus on popular subsets, e.g., images containing landmarks or associated to Wikipedia articles. In this work we are concerned with the problem of accurately finding the location where a photo is taken without needing any metadata, that is, solely by its visual content. We also recognize landmarks where applicable, automatically linking them to Wikipedia. We show that the time is right for automating the geo-tagging process, and we show how this can work at large scale. In doing so, we do exploit redundancy of content in popular locations—but unlike most existing solutions, we do not restrict to landmarks. In other words, we can compactly represent the visual content of all thousands of images depicting e.g., the Parthenon and still retrieve any single, isolated, non-landmark image like a house or a graffiti on a wall. Starting from an existing, geo-tagged dataset, we cluster images into sets of different views of the same scene. This is a very efficient, scalable, and fully automated mining process. We then align all views in a set to one reference image and construct a 2D scene map. Our indexing scheme operates directly on scene maps. We evaluate our solution on a challenging one million urban image dataset and provide public access to our service through our online application, VIRaL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. http://www.wikipedia.org

  2. http://www.flickr.com

  3. http://www.panoramio.com

  4. http://www.bing.com/toolbox/blogs/maps/archive/2010/02/11/new-bing-maps-application-streetside-photos.aspx

  5. http://google-latlong.blogspot.com/2010/06/seeing-new-sights-with-photo-overlays.html

  6. http://www.historypin.com/

  7. http://viral.image.ntua.gr

  8. http://www.geonames.org

  9. We shall use the terms photo, image and view interchangeably in the following.

  10. Photo titles and user tags are the ones provided by users at the Flickr website.

  11. http://www.geonames.org/export/wikipedia-webservice.html#wikipediaSearch

  12. http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Wikipedia-World/en

  13. We have published the dataset online at http://image.ntua.gr/iva/datasets/ec1m/.

  14. http://www.image.ntua.gr/iva/research/scene_maps

References

  1. Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In: International conference on computer vision

  2. Avrithis Y, Kalantidis Y, Tolias G, Spyrou E (2010) Retrieving landmark and non-landmark images from community photo collections. In: ACM multimedia. Firenze, Italy

    Google Scholar 

  3. Avrithis Y, Tolias G, Kalantidis Y (2010) Feature map hashing: sub-linear indexing of appearance and global geometry. In: ACM multimedia. Firenze, Italy

    Google Scholar 

  4. Bay H, Tuytelaars T, Van Gool L (2006) SURF: speeded up robust features. In: European conference on computer vision

  5. Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799

    Article  Google Scholar 

  6. Chum O, Matas J (2010) Large-scale discovery of spatially related images. IEEE Trans Pattern Anal Mach Intell 32(2):371–377

    Article  Google Scholar 

  7. Chum O, Matas J, Kittler J (2003) Locally optimized RANSAC. In: German association for pattern recognition. Springer, Berlin, p 236

    Google Scholar 

  8. Chum O, Perdoch M, Matas J (2009) Geometric min-hashing: finding a (thick) needle in a haystack. In: Computer vision and pattern recognition

  9. Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: International conference on computer vision

  10. Crandall D, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In: International World Wide Web conference

  11. Gammeter S, Bossard L, Quack T, Van Gool L (2009) I know what you did last summer: object-level auto-annotation of holiday snaps. In: International conference on computer vision

  12. Hartley R, Zisserman A (2000) Multiple view geometry. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  13. Hays J, Efros AA (2008) IM2GPS: estimating geographic information from a single image. In: Computer vision and pattern recognition

  14. Heath K, Gelfand N, Ovsjanikov M, Aanjaneya M, Guibas LJ (2010) Image webs: computing and exploiting connectivity in image collections. In: Computer vision and pattern recognition

  15. Jegou H, Douze M, Schmid C (2010) Improving bag-of-features for large scale image search. Int J Comput Vis 1–21

  16. Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Computer vision and pattern recognition

  17. Johansson B, Cipolla R (2002) A system for automatic pose—estimation from a single image in a city scene. In: IASTED international conference on signal processing, pattern recognition and applications

  18. Kalogerakis E, Vesselova O, Hays J, Efros AA, Hertzmann A (2009) Image sequence geolocation with human travel priors. In: International conference on computer vision

  19. Kennedy L, Naaman M, Ahern S, Nair R, Rattenbury T (2007) How flickr helps us make sense of the world: Context and content in community-contributed media collections. In: ACM multimedia, vol 3, pp 631–640

  20. Lampert CH (2009) Detecting objects in large image collections and videos by efficient subimage retrieval. In: International conference on computer vision

  21. Leibe B, Leonardis A, Schiele B (2008) Robust object detection with interleaved categorization and segmentation. Int J Comput Vis 77(1):259–289

    Article  Google Scholar 

  22. Levenshtein VI (1965) Binary codes capable of correcting spurious insertions and deletions of ones. Probl Inf Transm 1(1):8–17

    MathSciNet  Google Scholar 

  23. Li X, Wu C, Zach C, Lazebnik S, Frahm JM (2008) Modeling and recognition of landmark image collections using iconic scene graphs. In: European conference on computer vision. Springer, Berlin, pp 427–440

    Google Scholar 

  24. Li Y, Crandall DJ, Huttenlocher DP (2009) Landmark classification in large-scale image collections. In: International conference on computer VISION

  25. Lowe DG (2001) Local feature view clustering for 3D object recognition. In: Computer vision and pattern recognition

  26. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  27. Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767

    Article  Google Scholar 

  28. McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: 6Th ACM international conference on knowledge discovery and data mining, p 178

  29. Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: International conference on computer vision

  30. Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Computer vision and pattern recognition

  31. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  32. Perdoch M, Chum O, Matas J (2009) Efficient representation of local geometry for large scale object retrieval. In: Computer vision and pattern recognition

  33. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Computer vision and pattern recognition

  34. Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In: CIVR, pp 47–56

  35. Robertson D, Cipolla R (2004) An image-based system for urban navigation. In: British machine vision conference

  36. Schaffalitzky F, Zisserman A (2002) Multi-view matching for unordered image sets, or how do i organize my holiday snaps. In: European conference on computer vision

  37. Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Computer vision and pattern recognition

  38. Silpa-Anan C, Hartley R (2008) Optimised KD-trees for fast image descriptor matching. In: Computer vision and pattern recognition

  39. Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In: International conference on computer vision

  40. Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: International conference on computer vision, pp 1470–1477

  41. Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. In: Computer graphics and interactive techniques, pp 835–846

  42. Snavely N, Seitz SM, Szeliski R (2008) Skeletal graphs for efficient structure from motion. In: Computer vision and pattern recognition

  43. Steinhoff U, Omercevic D, Perko R, Schiele B, Leonardis A (2007) How computer vision can help in outdoor positioning. In: European conference on ambient intelligence

  44. Tipping M, Schölkopf B (2001) A kernel approach for vector quantization with guaranteed distortion bounds. In: Artificial intelligence and statistics, pp 129–134

  45. Zhang W, Kosecka J (2006) Image based localization in urban environments. In: International symposium on 3D data processing, visualization and transmission

  46. Zheng Y, Zhao M, Song Y, Adam H, Buddemeier U, Bissacco A, Brucher F, Chua TS, Neven H (2009) Tour the world: building a web-scale landmark recognition engine. In: Computer vision and pattern recognition

Download references

Acknowledgements

This work was partially supported by the European Commission under contract FP7-215453 WeKnowIt.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yannis Kalantidis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalantidis, Y., Tolias, G., Avrithis, Y. et al. VIRaL: Visual Image Retrieval and Localization. Multimed Tools Appl 51, 555–592 (2011). https://doi.org/10.1007/s11042-010-0651-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0651-7

Keywords

Navigation