Visual Geo-localization of Non-photographic Depictions via 2D–3D Alignment

Aubry, Mathieu; Russell, Bryan; Sivic, Josef

doi:10.1007/978-3-319-25781-5_14

Visual Geo-localization of Non-photographic Depictions via 2D–3D Alignment

Mathieu Aubry⁷,
Bryan Russell⁸ &
Josef Sivic⁹

Chapter
First Online: 06 July 2016

1640 Accesses

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

This chapter describes a technique that can geo-localize arbitrary 2D depictions of architectural sites, including drawings, paintings, and historical photographs. This is achieved by aligning the input depiction with a 3D model of the corresponding site. The task is very difficult as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, e.g., due to the specific rendering style, drawing error, age, lighting, or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the depiction to a set of 3D models from different architectural sites is huge. To address these issues, we develop a compact representation of complex 3D scenes. 3D models of several scenes are represented by a set of discriminative visual elements that are automatically learnt from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learnt in a discriminative fashion. We show that the learnt visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g., watercolor, sketch, and historical photograph) and structural changes (e.g., missing scene parts and large occluders) of the scene. We demonstrate that the proposed approach can automatically identify the correct architectural site as well as recover an approximate viewpoint of historical photographs and paintings with respect to the 3D model of the site.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aubry M, Russell B, Sivic J (2014) Painting-to-3D model alignment via discriminative visual elements. ACM Trans Graphics 33(2)
Google Scholar
Baatz G, Saurer O, Köser K, Pollefeys M (2012) Large scale visual geo-localization of images in mountainous terrain. In: Proceedings of European conference on computer vision
Google Scholar
Baboud L, Cadik M, Eisemann E, Seidel HP (2011) Automatic photo-to-terrain alignment for the annotation of mountain pictures. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Bach F, Harchaoui Z (2008) Diffrac: a discriminative and flexible framework for clustering. In: Advances in neural information processing systems
Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer
Google Scholar
Bosché F (2010) Automated recognition of 3D CAD model objects in laser scans and calculation of as-built dimensions for dimensional compliance control in construction. Adv Eng Inf 24(1):107–118
Google Scholar
Chen D, Baatz G et al (2011) City-scale landmark identification on mobile devices. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Chum O, Matas J (2006) Geometric hashing with local affine frames. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Crowley EJ, Zisserman A (2014) In search of art. In: Workshop on computer vision for art analysis, ECCV
Google Scholar
Crowley EJ, Zisserman A (2014) The state of the art: object retrieval in paintings using discriminative regions. In: British machine vision conference
Google Scholar
Cummins M, Newman P (2009) Highly scalable appearance-only SLAM—FAB-MAP 2.0. In: Proceedings of robotics: science and systems, Seattle, USA
Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Dean T, Ruzon M, Segal M, Shlens J, Vijayanarasimhan S, Yagnik J (2013) Fast, accurate detection of 100,000 object classes on a single machine. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Doersch C, Singh S, Gupta A, Sivic J, Efros AA (2012) What makes Paris look like Paris? ACM Trans Graphics (Proc SIGGRAPH) 31(4)
Google Scholar
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531
Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(1):1871–1874
Google Scholar
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9)
Google Scholar
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Google Scholar
Frome A, Singer Y, Sha F, Malik J (2007) Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: Proceedings of international conference on computer vision
Google Scholar
Furukawa Y, Ponce J (2010) Accurate, dense, and robust multi-view stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8)
Google Scholar
Furukawa Y, Curless B, Seitz SM, Szeliski R (2010) Towards internet-scale multi-view stereo. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Gharbi M, Malisiewicz T, Paris S, Durand F (2012) A Gaussian approximation of feature space for fast image similarity. Technical report, MIT
Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Hariharan B, Malik J, Ramanan D (2012) Discriminative decorrelation for clustering and classification. In: Proceedings of European conference on computer vision
Google Scholar
Hartley RI, Zisserman A (2004) Multiple view geometry in computer vision, 2n edn. Cambridge University Press. ISBN: 0521540518
Google Scholar
Hauagge D, Snavely N (2012) Image matching using local symmetry features. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
http://www.di.ens.fr/willow/research/painting_to_3d/
Huttenlocher DP, Ullman S (1987) Object recognition using alignment. In: International conference on computer vision
Google Scholar
Irschara A, Zach C, Frahm JM, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Knopp J, Sivic J, Pajdla T (2010) Avoiding confusing features in place recognition. In: Proceedings of European conference on computer vision
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems
Google Scholar
Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3D point clouds. In: Proceedings of European conference on computer vision
Google Scholar
Lowe D (1987) The viewpoint consistency constraint. Int J Comput Vis 1(1):57–72
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Google Scholar
Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: Proceedings of international conference on computer vision
Google Scholar
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Google Scholar
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Rapp J (2008) A geometrical analysis of multiple viewpoint perspective in the work of Giovanni Battista Piranesi: an application of geometric restitution of perspective. J Arch 13(6)
Google Scholar
Russell BC, Sivic J, Ponce J, Dessales H (2011) Automatic alignment of paintings and photographs depicting a 3D scene. In: IEEE workshop on 3D representation for recognition (3dRR-11), associated with ICCV
Google Scholar
Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2d-to-3d matching. In: Proceedings of international conference on computer vision
Google Scholar
Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229
Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Program Seri B 127(1):3–30
Google Scholar
Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Shrivastava A, Malisiewicz T, Gupta A, Efros AA (2011) Data-driven visual similarity for cross-domain image matching. In: ACM Trans Graphics (Proc SIGGRAPH Asia)
Google Scholar
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of international conference on computer vision
Google Scholar
Szeliski R, Torr P (1998) Geometrically constrained structure from motion: points on planes. In: European workshop on 3D structure from multiple images of large-scale environments (SMILE)
Google Scholar
Torii A, Sivic J, Pajdla T, Okutomi M (2013) Visual place recognition with repetitive structures. In: Proceedings of the conference on computer vision and pattern recognition
Google Scholar
Zamir A, Shah M (2010) Accurate image localization based on google maps street view. In: Proceedings of European conference on computer vision
Google Scholar
Zeiler M, Fergus R (2013) Visualizing and understanding convolutional networks. arXiv:1311.2901

Download references

Acknowledgments

We are grateful to Guillaume Seguin, Alyosha Efros, Guillaume Obozinski and Jean Ponce for their useful feedback, and to Yasutaka Furukawa for providing access to the San Marco 3D model. This work was partly supported by the EIT ICT Labs, ANR project SEMAPOLIS (ANR-13-CORD-0003), and the ERC starting grant LEAP. The work was partly carried out at IMAGINE, a joint research project between Ecole des Ponts ParisTech (ENPC) and the Scientific and Technical Centre for Building (CSTB). Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL or the U.S. Government.

Author information

Authors and Affiliations

LIGM (UMR CNRS 8049), ENPC/Université Paris-Est, 77455, Marne-la-Vallée, France
Mathieu Aubry
Adobe Research, Lexington, KY, USA
Bryan Russell
Inria, WILLOW Project-team, Département d’Informatique de l’Ecole Normale Supérieure, ENS/INRIA/CNRS UMR, 8548, Paris, France
Josef Sivic

Authors

Mathieu Aubry
View author publications
You can also search for this author in PubMed Google Scholar
Bryan Russell
View author publications
You can also search for this author in PubMed Google Scholar
Josef Sivic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathieu Aubry .

Editor information

Editors and Affiliations

Computer Science Department, Stanford University Computer Science Department, Stanford, California, USA
Amir R. Zamir
Decisive Analytics Corporation, Arlington, Virginia, USA
Asaad Hakeem
ETH Zürich, Zürich, Switzerland
Luc Van Gool
University of Central Florida, Orlando, Florida, USA
Mubarak Shah
Facebook, Seattle, Washington, USA
Richard Szeliski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aubry, M., Russell, B., Sivic, J. (2016). Visual Geo-localization of Non-photographic Depictions via 2D–3D Alignment. In: Zamir, A., Hakeem, A., Van Gool, L., Shah, M., Szeliski, R. (eds) Large-Scale Visual Geo-Localization. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-25781-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-25781-5_14
Published: 06 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25779-2
Online ISBN: 978-3-319-25781-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics