Advertisement

CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps

  • Paul Hongsuck Seo
  • Tobias Weyand
  • Jack Sim
  • Bohyung HanEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11214)

Abstract

Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information. This task is inherently challenging since many photos have only few, possibly ambiguous cues to their geolocation. Recent work has cast this task as a classification problem by partitioning the earth into a set of discrete cells that correspond to geographic regions. The granularity of this partitioning presents a critical trade-off; using fewer but larger cells results in lower location accuracy while using more but smaller cells reduces the number of training examples per class and increases model size, making the model prone to overfitting. To tackle this issue, we propose a simple but effective algorithm, combinatorial partitioning, which generates a large number of fine-grained output classes by intersecting multiple coarse-grained partitionings of the earth. Each classifier votes for the fine-grained classes that overlap with their respective coarse-grained ones. This technique allows us to predict locations at a fine scale while maintaining sufficient training examples per class. Our algorithm achieves the state-of-the-art performance in location recognition on multiple benchmark datasets.

Keywords

Image geolocalization Combinatorial partitioning Fine-grained classification 

Notes

Acknowledgment

The part of this work was performed while the first and last authors were with Google, Venice, CA. This research is partly supported by the IITP grant [2017-0-01778], and the Technology Innovation Program [10073166] funded by the Korea government MSIT and MOTIE, respectively.

Supplementary material

474197_1_En_33_MOESM1_ESM.pdf (229 kb)
Supplementary material 1 (pdf 229 KB)

References

  1. 1.
    Hays, J., Efros, A.A.: Im2GPS: estimating geographic information from a single image. In: CVPR (2008)Google Scholar
  2. 2.
    Hays, J., Efros, A.A.: Large-scale image geolocalization. In: Choi, J., Friedland, G. (eds.) Multimodal Location Estimation of Videos and Images, pp. 41–62. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-09861-6_3CrossRefGoogle Scholar
  3. 3.
    Vo, N., Jacobs, N., Hays, J.: Revisiting IM2GPS in the deep learning era. In: ICCV (2017)Google Scholar
  4. 4.
    Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_3CrossRefGoogle Scholar
  5. 5.
    Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI 33(1), 117–128 (2011)CrossRefGoogle Scholar
  6. 6.
    Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)Google Scholar
  7. 7.
    Kim, H.J., Dunn, E., Frahm, J.M.: Learned contextual feature reweighting for image geo-localization. In: CVPR (2017)Google Scholar
  8. 8.
    Baatz, G., Köser, K., Chen, D., Grzeszczuk, R., Pollefeys, M.: Handling urban location recognition as a 2D homothetic problem. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 266–279. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15567-3_20CrossRefGoogle Scholar
  9. 9.
    Cao, S., Snavely, N.: Graph-based discriminative learning for location recognition. IJCV 112(2), 239–254 (2015)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Chen, D., et al.: City-scale landmark identification on mobile devices. In: CVPR (2011)Google Scholar
  11. 11.
    Kim, H.J., Dunn, E., Frahm, J.M.: Predicting good features for image geo-localization using per-bundle VLAD. In: ICCV (2015)Google Scholar
  12. 12.
    Knopp, J., Sivic, J., Pajdla, T.: Avoiding confusing features in place recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 748–761. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15549-9_54CrossRefGoogle Scholar
  13. 13.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  14. 14.
    Schindler, G., Brown, M., Szeliski, R.: City-scale location recognition. In: CVPR (2007)Google Scholar
  15. 15.
    Zamir, A.R., Shah, M.: Accurate image localization based on Google maps street view. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 255–268. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15561-1_19CrossRefGoogle Scholar
  16. 16.
    Zamir, A.R., Shah, M.: Image geo-localization based on multiple nearest neighbor feature matching using generalized graphs. PAMI 36(8), 1546–1558 (2014)CrossRefGoogle Scholar
  17. 17.
    Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: ICCV (2017)Google Scholar
  18. 18.
    Irschara, A., Zach, C., Frahm, J.M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: CVPR (2009)Google Scholar
  19. 19.
    Li, Y., Snavely, N., Huttenlocher, D.P.: Location recognition using prioritized feature matching. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 791–804. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15552-9_57CrossRefGoogle Scholar
  20. 20.
    Li, Y., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 15–29. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33718-5_2CrossRefGoogle Scholar
  21. 21.
    Liu, L., Li, H., Dai, Y.: Efficient global 2D–3D matching for camera localization in a large-scale 3D map. In: ICCV (2017)Google Scholar
  22. 22.
    Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2D-to-3D matching. In: ICCV (2011)Google Scholar
  23. 23.
    Sattler, T., Weyand, T., Leibe, B., Kobbelt, L.: Image retrieval for image-based localization revisited. In: BMVC (2012)Google Scholar
  24. 24.
    Sattler, T., et al.: Are large-scale 3D models really necessary for accurate visual localization? In: CVPR (2017)Google Scholar
  25. 25.
    Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: CVPR (2017)Google Scholar
  26. 26.
    Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: ICCV (2015)Google Scholar
  27. 27.
    Walch, F., Hazirbas, C., Leal-Taixé, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using LSTMS for structured feature correlation. In: ICCV (2017)Google Scholar
  28. 28.
    Avrithis, Y., Kalantidis, Y., Tolias, G., Spyrou, E.: Retrieving landmark and non-landmark images from community photo collections. In: MM (2010)Google Scholar
  29. 29.
    Gammeter, S., Quack, T., Van Gool, L.: I know what you did last summer: object-level auto-annotation of holiday snaps. In: ICCV (2009)Google Scholar
  30. 30.
    Johns, E., Yang, G.Z.: From images to scenes: compressing an image cluster into a single scene model for place recognition. In: ICCV (2011)Google Scholar
  31. 31.
    Quack, T., Leibe, B., Van Gool, L.: World-scale mining of objects and events from community photo collections. In: CIVR, pp. 47–56 (2008)Google Scholar
  32. 32.
    Zheng, Y.T., et al.: Tour the world: building a web-scale landmark recognition engine. In: CVPR (2009)Google Scholar
  33. 33.
    Weyand, T., Leibe, B.: Visual landmark recognition from internet photo collections: a large-scale evaluation. CVIU 135, 1–15 (2015)Google Scholar
  34. 34.
    Bergamo, A., Sinha, S.N., Torresani, L.: Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification. In: CVPR (2013)Google Scholar
  35. 35.
    Li, Y., Crandall, D.J., Huttenlocher, D.P.: Landmark classification in large-scale image collections. In: ICCV (2009)Google Scholar
  36. 36.
    Gronat, P., Obozinski, G., Sivic, J., Pajdla, T.: Learning per-location classifiers for visual place recognition. In: CVPR (2013)Google Scholar
  37. 37.
    Workman, S., Souvenir, R., Jacobs, N.: Wide-area image geolocalization with aerial reference imagery. In: ICCV (2015)Google Scholar
  38. 38.
    Lin, T.Y., Belongie, S., Hays, J.: Cross-view image geolocalization. In: CVPR (2013)Google Scholar
  39. 39.
    Lin, T.Y., Cui, Y., Belongie, S., Hays, J.: Learning deep representations for ground-to-aerial geolocalization. In: CVPR (2015)Google Scholar
  40. 40.
    Tian, Y., Chen, C., Shah, M.: Cross-view image matching for geo-localization in urban environments. In: CVPR (2017)Google Scholar
  41. 41.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  42. 42.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)Google Scholar
  43. 43.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: ICCV (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Paul Hongsuck Seo
    • 1
  • Tobias Weyand
    • 2
  • Jack Sim
    • 2
  • Bohyung Han
    • 3
    Email author
  1. 1.Department of CSEPOSTECHPohangKorea
  2. 2.Google ResearchLos AngelesUSA
  3. 3.Department of ECE & ASRISeoul National UniversitySeoulKorea

Personalised recommendations