Skip to main content

Visual Geo-localization of Non-photographic Depictions via 2D–3D Alignment

  • Chapter
  • First Online:
  • 1640 Accesses

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

This chapter describes a technique that can geo-localize arbitrary 2D depictions of architectural sites, including drawings, paintings, and historical photographs. This is achieved by aligning the input depiction with a 3D model of the corresponding site. The task is very difficult as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, e.g., due to the specific rendering style, drawing error, age, lighting, or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the depiction to a set of 3D models from different architectural sites is huge. To address these issues, we develop a compact representation of complex 3D scenes. 3D models of several scenes are represented by a set of discriminative visual elements that are automatically learnt from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learnt in a discriminative fashion. We show that the learnt visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g., watercolor, sketch, and historical photograph) and structural changes (e.g., missing scene parts and large occluders) of the scene. We demonstrate that the proposed approach can automatically identify the correct architectural site as well as recover an approximate viewpoint of historical photographs and paintings with respect to the 3D model of the site.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aubry M, Russell B, Sivic J (2014) Painting-to-3D model alignment via discriminative visual elements. ACM Trans Graphics 33(2)

    Google Scholar 

  2. Baatz G, Saurer O, Köser K, Pollefeys M (2012) Large scale visual geo-localization of images in mountainous terrain. In: Proceedings of European conference on computer vision

    Google Scholar 

  3. Baboud L, Cadik M, Eisemann E, Seidel HP (2011) Automatic photo-to-terrain alignment for the annotation of mountain pictures. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  4. Bach F, Harchaoui Z (2008) Diffrac: a discriminative and flexible framework for clustering. In: Advances in neural information processing systems

    Google Scholar 

  5. Bishop CM (2006) Pattern recognition and machine learning. Springer

    Google Scholar 

  6. Bosché F (2010) Automated recognition of 3D CAD model objects in laser scans and calculation of as-built dimensions for dimensional compliance control in construction. Adv Eng Inf 24(1):107–118

    Google Scholar 

  7. Chen D, Baatz G et al (2011) City-scale landmark identification on mobile devices. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  8. Chum O, Matas J (2006) Geometric hashing with local affine frames. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  9. Crowley EJ, Zisserman A (2014) In search of art. In: Workshop on computer vision for art analysis, ECCV

    Google Scholar 

  10. Crowley EJ, Zisserman A (2014) The state of the art: object retrieval in paintings using discriminative regions. In: British machine vision conference

    Google Scholar 

  11. Cummins M, Newman P (2009) Highly scalable appearance-only SLAM—FAB-MAP 2.0. In: Proceedings of robotics: science and systems, Seattle, USA

    Google Scholar 

  12. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  13. Dean T, Ruzon M, Segal M, Shlens J, Vijayanarasimhan S, Yagnik J (2013) Fast, accurate detection of 100,000 object classes on a single machine. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  14. Doersch C, Singh S, Gupta A, Sivic J, Efros AA (2012) What makes Paris look like Paris? ACM Trans Graphics (Proc SIGGRAPH) 31(4)

    Google Scholar 

  15. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531

  16. Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(1):1871–1874

    Google Scholar 

  17. Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9)

    Google Scholar 

  18. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Google Scholar 

  19. Frome A, Singer Y, Sha F, Malik J (2007) Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: Proceedings of international conference on computer vision

    Google Scholar 

  20. Furukawa Y, Ponce J (2010) Accurate, dense, and robust multi-view stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8)

    Google Scholar 

  21. Furukawa Y, Curless B, Seitz SM, Szeliski R (2010) Towards internet-scale multi-view stereo. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  22. Gharbi M, Malisiewicz T, Paris S, Durand F (2012) A Gaussian approximation of feature space for fast image similarity. Technical report, MIT

    Google Scholar 

  23. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  24. Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  25. Hariharan B, Malik J, Ramanan D (2012) Discriminative decorrelation for clustering and classification. In: Proceedings of European conference on computer vision

    Google Scholar 

  26. Hartley RI, Zisserman A (2004) Multiple view geometry in computer vision, 2n edn. Cambridge University Press. ISBN: 0521540518

    Google Scholar 

  27. Hauagge D, Snavely N (2012) Image matching using local symmetry features. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  28. http://www.di.ens.fr/willow/research/painting_to_3d/

  29. Huttenlocher DP, Ullman S (1987) Object recognition using alignment. In: International conference on computer vision

    Google Scholar 

  30. Irschara A, Zach C, Frahm JM, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  31. Knopp J, Sivic J, Pajdla T (2010) Avoiding confusing features in place recognition. In: Proceedings of European conference on computer vision

    Google Scholar 

  32. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems

    Google Scholar 

  33. Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3D point clouds. In: Proceedings of European conference on computer vision

    Google Scholar 

  34. Lowe D (1987) The viewpoint consistency constraint. Int J Comput Vis 1(1):57–72

    Google Scholar 

  35. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Google Scholar 

  36. Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: Proceedings of international conference on computer vision

    Google Scholar 

  37. Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  38. Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Google Scholar 

  39. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  40. Rapp J (2008) A geometrical analysis of multiple viewpoint perspective in the work of Giovanni Battista Piranesi: an application of geometric restitution of perspective. J Arch 13(6)

    Google Scholar 

  41. Russell BC, Sivic J, Ponce J, Dessales H (2011) Automatic alignment of paintings and photographs depicting a 3D scene. In: IEEE workshop on 3D representation for recognition (3dRR-11), associated with ICCV

    Google Scholar 

  42. Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2d-to-3d matching. In: Proceedings of international conference on computer vision

    Google Scholar 

  43. Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  44. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229

  45. Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Program Seri B 127(1):3–30

    Google Scholar 

  46. Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  47. Shrivastava A, Malisiewicz T, Gupta A, Efros AA (2011) Data-driven visual similarity for cross-domain image matching. In: ACM Trans Graphics (Proc SIGGRAPH Asia)

    Google Scholar 

  48. Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of international conference on computer vision

    Google Scholar 

  49. Szeliski R, Torr P (1998) Geometrically constrained structure from motion: points on planes. In: European workshop on 3D structure from multiple images of large-scale environments (SMILE)

    Google Scholar 

  50. Torii A, Sivic J, Pajdla T, Okutomi M (2013) Visual place recognition with repetitive structures. In: Proceedings of the conference on computer vision and pattern recognition

    Google Scholar 

  51. Zamir A, Shah M (2010) Accurate image localization based on google maps street view. In: Proceedings of European conference on computer vision

    Google Scholar 

  52. Zeiler M, Fergus R (2013) Visualizing and understanding convolutional networks. arXiv:1311.2901

Download references

Acknowledgments

We are grateful to Guillaume Seguin, Alyosha Efros, Guillaume Obozinski and Jean Ponce for their useful feedback, and to Yasutaka Furukawa for providing access to the San Marco 3D model. This work was partly supported by the EIT ICT Labs, ANR project SEMAPOLIS (ANR-13-CORD-0003), and the ERC starting grant LEAP. The work was partly carried out at IMAGINE, a joint research project between Ecole des Ponts ParisTech (ENPC) and the Scientific and Technical Centre for Building (CSTB). Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL or the U.S. Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mathieu Aubry .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Aubry, M., Russell, B., Sivic, J. (2016). Visual Geo-localization of Non-photographic Depictions via 2D–3D Alignment. In: Zamir, A., Hakeem, A., Van Gool, L., Shah, M., Szeliski, R. (eds) Large-Scale Visual Geo-Localization. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-25781-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25781-5_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25779-2

  • Online ISBN: 978-3-319-25781-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics