VIRaL: Visual Image Retrieval and Localization

Kalantidis, Yannis; Tolias, Giorgos; Avrithis, Yannis; Phinikettos, Marios; Spyrou, Evaggelos; Mylonas, Phivos; Kollias, Stefanos

doi:10.1007/s11042-010-0651-7

VIRaL: Visual Image Retrieval and Localization

Published: 16 November 2010

Volume 51, pages 555–592, (2011)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yannis Kalantidis¹,
Giorgos Tolias¹,
Yannis Avrithis¹,
Marios Phinikettos¹,
Evaggelos Spyrou¹,
Phivos Mylonas¹ &
…
Stefanos Kollias¹

668 Accesses
53 Citations
3 Altmetric
Explore all metrics

Abstract

New applications are emerging every day exploiting the huge data volume in community photo collections. Most focus on popular subsets, e.g., images containing landmarks or associated to Wikipedia articles. In this work we are concerned with the problem of accurately finding the location where a photo is taken without needing any metadata, that is, solely by its visual content. We also recognize landmarks where applicable, automatically linking them to Wikipedia. We show that the time is right for automating the geo-tagging process, and we show how this can work at large scale. In doing so, we do exploit redundancy of content in popular locations—but unlike most existing solutions, we do not restrict to landmarks. In other words, we can compactly represent the visual content of all thousands of images depicting e.g., the Parthenon and still retrieve any single, isolated, non-landmark image like a house or a graffiti on a wall. Starting from an existing, geo-tagged dataset, we cluster images into sets of different views of the same scene. This is a very efficient, scalable, and fully automated mining process. We then align all views in a set to one reference image and construct a 2D scene map. Our indexing scheme operates directly on scene maps. We evaluate our solution on a challenging one million urban image dataset and provide public access to our service through our online application, VIRaL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

http://www.wikipedia.org
http://www.flickr.com
http://www.panoramio.com
http://www.bing.com/toolbox/blogs/maps/archive/2010/02/11/new-bing-maps-application-streetside-photos.aspx
http://google-latlong.blogspot.com/2010/06/seeing-new-sights-with-photo-overlays.html
http://www.historypin.com/
http://viral.image.ntua.gr
http://www.geonames.org
We shall use the terms photo, image and view interchangeably in the following.
Photo titles and user tags are the ones provided by users at the Flickr website.
http://www.geonames.org/export/wikipedia-webservice.html#wikipediaSearch
http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Wikipedia-World/en
We have published the dataset online at http://image.ntua.gr/iva/datasets/ec1m/.
http://www.image.ntua.gr/iva/research/scene_maps

References

Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In: International conference on computer vision
Avrithis Y, Kalantidis Y, Tolias G, Spyrou E (2010) Retrieving landmark and non-landmark images from community photo collections. In: ACM multimedia. Firenze, Italy
Google Scholar
Avrithis Y, Tolias G, Kalantidis Y (2010) Feature map hashing: sub-linear indexing of appearance and global geometry. In: ACM multimedia. Firenze, Italy
Google Scholar
Bay H, Tuytelaars T, Van Gool L (2006) SURF: speeded up robust features. In: European conference on computer vision
Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799
Article Google Scholar
Chum O, Matas J (2010) Large-scale discovery of spatially related images. IEEE Trans Pattern Anal Mach Intell 32(2):371–377
Article Google Scholar
Chum O, Matas J, Kittler J (2003) Locally optimized RANSAC. In: German association for pattern recognition. Springer, Berlin, p 236
Google Scholar
Chum O, Perdoch M, Matas J (2009) Geometric min-hashing: finding a (thick) needle in a haystack. In: Computer vision and pattern recognition
Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: International conference on computer vision
Crandall D, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In: International World Wide Web conference
Gammeter S, Bossard L, Quack T, Van Gool L (2009) I know what you did last summer: object-level auto-annotation of holiday snaps. In: International conference on computer vision
Hartley R, Zisserman A (2000) Multiple view geometry. Cambridge University Press, Cambridge
MATH Google Scholar
Hays J, Efros AA (2008) IM2GPS: estimating geographic information from a single image. In: Computer vision and pattern recognition
Heath K, Gelfand N, Ovsjanikov M, Aanjaneya M, Guibas LJ (2010) Image webs: computing and exploiting connectivity in image collections. In: Computer vision and pattern recognition
Jegou H, Douze M, Schmid C (2010) Improving bag-of-features for large scale image search. Int J Comput Vis 1–21
Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Computer vision and pattern recognition
Johansson B, Cipolla R (2002) A system for automatic pose—estimation from a single image in a city scene. In: IASTED international conference on signal processing, pattern recognition and applications
Kalogerakis E, Vesselova O, Hays J, Efros AA, Hertzmann A (2009) Image sequence geolocation with human travel priors. In: International conference on computer vision
Kennedy L, Naaman M, Ahern S, Nair R, Rattenbury T (2007) How flickr helps us make sense of the world: Context and content in community-contributed media collections. In: ACM multimedia, vol 3, pp 631–640
Lampert CH (2009) Detecting objects in large image collections and videos by efficient subimage retrieval. In: International conference on computer vision
Leibe B, Leonardis A, Schiele B (2008) Robust object detection with interleaved categorization and segmentation. Int J Comput Vis 77(1):259–289
Article Google Scholar
Levenshtein VI (1965) Binary codes capable of correcting spurious insertions and deletions of ones. Probl Inf Transm 1(1):8–17
MathSciNet Google Scholar
Li X, Wu C, Zach C, Lazebnik S, Frahm JM (2008) Modeling and recognition of landmark image collections using iconic scene graphs. In: European conference on computer vision. Springer, Berlin, pp 427–440
Google Scholar
Li Y, Crandall DJ, Huttenlocher DP (2009) Landmark classification in large-scale image collections. In: International conference on computer VISION
Lowe DG (2001) Local feature view clustering for 3D object recognition. In: Computer vision and pattern recognition
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767
Article Google Scholar
McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: 6Th ACM international conference on knowledge discovery and data mining, p 178
Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: International conference on computer vision
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Computer vision and pattern recognition
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article MATH Google Scholar
Perdoch M, Chum O, Matas J (2009) Efficient representation of local geometry for large scale object retrieval. In: Computer vision and pattern recognition
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Computer vision and pattern recognition
Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In: CIVR, pp 47–56
Robertson D, Cipolla R (2004) An image-based system for urban navigation. In: British machine vision conference
Schaffalitzky F, Zisserman A (2002) Multi-view matching for unordered image sets, or how do i organize my holiday snaps. In: European conference on computer vision
Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Computer vision and pattern recognition
Silpa-Anan C, Hartley R (2008) Optimised KD-trees for fast image descriptor matching. In: Computer vision and pattern recognition
Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In: International conference on computer vision
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: International conference on computer vision, pp 1470–1477
Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. In: Computer graphics and interactive techniques, pp 835–846
Snavely N, Seitz SM, Szeliski R (2008) Skeletal graphs for efficient structure from motion. In: Computer vision and pattern recognition
Steinhoff U, Omercevic D, Perko R, Schiele B, Leonardis A (2007) How computer vision can help in outdoor positioning. In: European conference on ambient intelligence
Tipping M, Schölkopf B (2001) A kernel approach for vector quantization with guaranteed distortion bounds. In: Artificial intelligence and statistics, pp 129–134
Zhang W, Kosecka J (2006) Image based localization in urban environments. In: International symposium on 3D data processing, visualization and transmission
Zheng Y, Zhao M, Song Y, Adam H, Buddemeier U, Bissacco A, Brucher F, Chua TS, Neven H (2009) Tour the world: building a web-scale landmark recognition engine. In: Computer vision and pattern recognition

Download references

Acknowledgements

This work was partially supported by the European Commission under contract FP7-215453 WeKnowIt.

Author information

Authors and Affiliations

National Technical University of Athens, 9, Iroon Polytexneiou Str., Zografou, Athens, Greece
Yannis Kalantidis, Giorgos Tolias, Yannis Avrithis, Marios Phinikettos, Evaggelos Spyrou, Phivos Mylonas & Stefanos Kollias

Authors

Yannis Kalantidis
View author publications
You can also search for this author in PubMed Google Scholar
Giorgos Tolias
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Avrithis
View author publications
You can also search for this author in PubMed Google Scholar
Marios Phinikettos
View author publications
You can also search for this author in PubMed Google Scholar
Evaggelos Spyrou
View author publications
You can also search for this author in PubMed Google Scholar
Phivos Mylonas
View author publications
You can also search for this author in PubMed Google Scholar
Stefanos Kollias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yannis Kalantidis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalantidis, Y., Tolias, G., Avrithis, Y. et al. VIRaL: Visual Image Retrieval and Localization. Multimed Tools Appl 51, 555–592 (2011). https://doi.org/10.1007/s11042-010-0651-7

Download citation

Published: 16 November 2010
Issue Date: January 2011
DOI: https://doi.org/10.1007/s11042-010-0651-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VIRaL: Visual Image Retrieval and Localization

Abstract

Access this article

Similar content being viewed by others

Hierarchical Image Geo-location on a World-Wide Scale

Introduction to Large-Scale Visual Geo-localization

Photo Recall: Using the Internet to Label Your Photos

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Hierarchical Image Geo-location on a World-Wide Scale

Introduction to Large-Scale Visual Geo-localization

Photo Recall: Using the Internet to Label Your Photos

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation