Abstract
This chapter describes a technique that can geo-localize arbitrary 2D depictions of architectural sites, including drawings, paintings, and historical photographs. This is achieved by aligning the input depiction with a 3D model of the corresponding site. The task is very difficult as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, e.g., due to the specific rendering style, drawing error, age, lighting, or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the depiction to a set of 3D models from different architectural sites is huge. To address these issues, we develop a compact representation of complex 3D scenes. 3D models of several scenes are represented by a set of discriminative visual elements that are automatically learnt from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learnt in a discriminative fashion. We show that the learnt visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g., watercolor, sketch, and historical photograph) and structural changes (e.g., missing scene parts and large occluders) of the scene. We demonstrate that the proposed approach can automatically identify the correct architectural site as well as recover an approximate viewpoint of historical photographs and paintings with respect to the 3D model of the site.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aubry M, Russell B, Sivic J (2014) Painting-to-3D model alignment via discriminative visual elements. ACM Trans Graphics 33(2)
Baatz G, Saurer O, Köser K, Pollefeys M (2012) Large scale visual geo-localization of images in mountainous terrain. In: Proceedings of European conference on computer vision
Baboud L, Cadik M, Eisemann E, Seidel HP (2011) Automatic photo-to-terrain alignment for the annotation of mountain pictures. In: Proceedings of the conference on computer vision and pattern recognition
Bach F, Harchaoui Z (2008) Diffrac: a discriminative and flexible framework for clustering. In: Advances in neural information processing systems
Bishop CM (2006) Pattern recognition and machine learning. Springer
Bosché F (2010) Automated recognition of 3D CAD model objects in laser scans and calculation of as-built dimensions for dimensional compliance control in construction. Adv Eng Inf 24(1):107–118
Chen D, Baatz G et al (2011) City-scale landmark identification on mobile devices. In: Proceedings of the conference on computer vision and pattern recognition
Chum O, Matas J (2006) Geometric hashing with local affine frames. In: Proceedings of the conference on computer vision and pattern recognition
Crowley EJ, Zisserman A (2014) In search of art. In: Workshop on computer vision for art analysis, ECCV
Crowley EJ, Zisserman A (2014) The state of the art: object retrieval in paintings using discriminative regions. In: British machine vision conference
Cummins M, Newman P (2009) Highly scalable appearance-only SLAM—FAB-MAP 2.0. In: Proceedings of robotics: science and systems, Seattle, USA
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the conference on computer vision and pattern recognition
Dean T, Ruzon M, Segal M, Shlens J, Vijayanarasimhan S, Yagnik J (2013) Fast, accurate detection of 100,000 object classes on a single machine. In: Proceedings of the conference on computer vision and pattern recognition
Doersch C, Singh S, Gupta A, Sivic J, Efros AA (2012) What makes Paris look like Paris? ACM Trans Graphics (Proc SIGGRAPH) 31(4)
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531
Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(1):1871–1874
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9)
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Frome A, Singer Y, Sha F, Malik J (2007) Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: Proceedings of international conference on computer vision
Furukawa Y, Ponce J (2010) Accurate, dense, and robust multi-view stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8)
Furukawa Y, Curless B, Seitz SM, Szeliski R (2010) Towards internet-scale multi-view stereo. In: Proceedings of the conference on computer vision and pattern recognition
Gharbi M, Malisiewicz T, Paris S, Durand F (2012) A Gaussian approximation of feature space for fast image similarity. Technical report, MIT
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the conference on computer vision and pattern recognition
Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition. In: Proceedings of the conference on computer vision and pattern recognition
Hariharan B, Malik J, Ramanan D (2012) Discriminative decorrelation for clustering and classification. In: Proceedings of European conference on computer vision
Hartley RI, Zisserman A (2004) Multiple view geometry in computer vision, 2n edn. Cambridge University Press. ISBN: 0521540518
Hauagge D, Snavely N (2012) Image matching using local symmetry features. In: Proceedings of the conference on computer vision and pattern recognition
Huttenlocher DP, Ullman S (1987) Object recognition using alignment. In: International conference on computer vision
Irschara A, Zach C, Frahm JM, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: Proceedings of the conference on computer vision and pattern recognition
Knopp J, Sivic J, Pajdla T (2010) Avoiding confusing features in place recognition. In: Proceedings of European conference on computer vision
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems
Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3D point clouds. In: Proceedings of European conference on computer vision
Lowe D (1987) The viewpoint consistency constraint. Int J Comput Vis 1(1):57–72
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: Proceedings of international conference on computer vision
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the conference on computer vision and pattern recognition
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the conference on computer vision and pattern recognition
Rapp J (2008) A geometrical analysis of multiple viewpoint perspective in the work of Giovanni Battista Piranesi: an application of geometric restitution of perspective. J Arch 13(6)
Russell BC, Sivic J, Ponce J, Dessales H (2011) Automatic alignment of paintings and photographs depicting a 3D scene. In: IEEE workshop on 3D representation for recognition (3dRR-11), associated with ICCV
Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2d-to-3d matching. In: Proceedings of international conference on computer vision
Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proceedings of the conference on computer vision and pattern recognition
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229
Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Program Seri B 127(1):3–30
Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: Proceedings of the conference on computer vision and pattern recognition
Shrivastava A, Malisiewicz T, Gupta A, Efros AA (2011) Data-driven visual similarity for cross-domain image matching. In: ACM Trans Graphics (Proc SIGGRAPH Asia)
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of international conference on computer vision
Szeliski R, Torr P (1998) Geometrically constrained structure from motion: points on planes. In: European workshop on 3D structure from multiple images of large-scale environments (SMILE)
Torii A, Sivic J, Pajdla T, Okutomi M (2013) Visual place recognition with repetitive structures. In: Proceedings of the conference on computer vision and pattern recognition
Zamir A, Shah M (2010) Accurate image localization based on google maps street view. In: Proceedings of European conference on computer vision
Zeiler M, Fergus R (2013) Visualizing and understanding convolutional networks. arXiv:1311.2901
Acknowledgments
We are grateful to Guillaume Seguin, Alyosha Efros, Guillaume Obozinski and Jean Ponce for their useful feedback, and to Yasutaka Furukawa for providing access to the San Marco 3D model. This work was partly supported by the EIT ICT Labs, ANR project SEMAPOLIS (ANR-13-CORD-0003), and the ERC starting grant LEAP. The work was partly carried out at IMAGINE, a joint research project between Ecole des Ponts ParisTech (ENPC) and the Scientific and Technical Centre for Building (CSTB). Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL or the U.S. Government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Aubry, M., Russell, B., Sivic, J. (2016). Visual Geo-localization of Non-photographic Depictions via 2D–3D Alignment. In: Zamir, A., Hakeem, A., Van Gool, L., Shah, M., Szeliski, R. (eds) Large-Scale Visual Geo-Localization. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-25781-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-25781-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25779-2
Online ISBN: 978-3-319-25781-5
eBook Packages: Computer ScienceComputer Science (R0)