Abstract
In this chapter, we describe a visual recognition technique for fast recognition of urban landmarks on a GPS-enabled mobile device. Most existing methods offload their computation to a server by uploading the query image. Over a slow network, this can cause a latency of several seconds. In contrast, our approach requires uploading only the approximate GPS location to a server after which a compact, location-specific classifier is downloaded to the device and all subsequent computation is performed on it. Our approach is supervised and involves training compact random forest classifiers (RDF) on a database of geo-tagged images. The feature vector for the RDF is computed by densely searching the image for the presence of selective discriminative local image patches extracted from the training images. The images are rectified using detected vanishing points and binary descriptors allow for an efficient search for the discriminative patches, a step that is further accelerated using min-hash. We have evaluated the performance of our approach on representative urban datasets where it outperforms traditional methods based on bag-of-visual-words features or direct matching of local feature descriptors, neither of which are feasible approaches when processing must occur on a low-power mobile device.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aly M, Welinder P, Munich M, Perona P (2009) Towards automated large scale discovery of image families. CVPR Workshop Intern Vis 9–16
Amit YDG (1997) Shape quantization and recognition with randomized trees. Neural Comput 9
Arth C, Schmalstieg D (2011) Challenges of large-scale augmented reality on smartphones. Graz University of Technology, Graz, pp 1–4
Arth C, Wagner D, Klopschitz M, Irschara A, Schmalstieg D (2009) Wide area localization on mobile phones. In: ISMAR, pp 73–82
Arth C, Klopschitz M, Reitmayr G, Schmalstieg D (2011) Real-time self-localization from panoramic images on mobile devices. In: 2013 IEEE international symposium on mixed and augmented reality (ISMAR) vol 0, pp 37–46
Baatz G, Koser K, Grzeszczuk R, Pollefeys M (2010) Handling urban location recognition as a 2d homothetic problem. In: IEEE proceedings of ECCV
Baatz G, Saurer O, Köser K, Pollefeys M (2012) Large scale visual geo-localization of images in mountainous terrain. In: ECCV (2), pp 517–530
Bergamo A, Sinha SN, Torresani L (2013) Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification. In: CVPR, pp 763–770
Breiman L (2001) Random forests. Machine Learn 45
Cstrecha AM, Bronstein MMB, Fua P (2012) LDAHash: improved matching with smaller descriptors. IEEE Trans Pattern Anal Mach Intell 34(1)
Calonder M, Lepetit V, Strecha C, Fua P (2010) BRIEF: Binary robust independent elementary features. In: ECCV 4:778–792
Cao S, Snavely N (2013) Graph-based discriminative learning for location recognition. In: CVPR, pp 700–707
Cao S, Snavely N (2014) Minimal scene descriptions from structure from motion models. In: CVPR
Chandrasekhar V, Takacs G, Chen D, Tsai S, Grzeszczuk R, Girod B (2009) CHoG: compressed histogram of gradients a low bit-rate feature descriptor. In: IEEE conference on computer vision and pattern recognition (2009), pp 2504–2511
Chen DM, Baatz G, Koser K, Tsai SS, Vedantham R, Pylvanainen T, Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R (2011) City-scale landmark identification on mobile devices. In: 2013 IEEE conference on computer vision and pattern recognition, vol 0, pp 737–744
Cheng Z, Ren J, Shen J, Miao H (2013) Building a large scale test collection for effective benchmarking of mobile landmark search. In: Advances in multimedia modeling, pp 36–46. Springer
Crandall D, Owens A, Snavely N, Huttenlocher D (2011) Discrete-continuous optimization for large-scale structure from motion. In: CVPR, pp 3001–3008
Criminisi A, Shotton J, Konukoglu E (2012) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7(2–3):81–227
Doersch C, Singh S, Gupta A, Sivic J, Efros AA (2012) What makes paris look like paris? ACM Trans Graph 31(4)
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24:381–395
Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Guan T, Fan Y, Duan L, Yu J (2014) On-device mobile visual location recognition by using panoramic images and compressed sensing based visual descriptors. PloS one 9(6):e98,806
Hays J, Efros A (20078) IM2GPS: estimating geographic information from a single image. In: IEEE proceedings of CVPR
Hua G, Brown M, Winder S (2007) Discriminant embedding for local image descriptors. In: IEEE proceedings of ICCV
Irschara A, Zach C, Frahm JM, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: CVPR, pp 2599–2606. IEEE
Jshotton M, Johnson RC (2008) Semantic texton forests for image categorization and segmentation. In: IEEE proceedings of CVPR
Knopp J, Sivic J, Pajdla T (2010) Avoiding confusing features in place recognition. In: IEEE proceedings of ECCV
Lepetit V, Fua P (2006) Keypoint recognition using randomized trees. PAMI 28:1465–1479
Li X, Wu C, Zach C, Lazebnik S, Frahm JM (2008) Modeling and recognition of landmark image collections using iconic scene graphs. In: IEEE proceedings of ECCV
Li Y, Crandall D, Huttenlocher D (2009) Landmark classification in large-scale image collections. In: IEEE Proceedings of ICCV
Li Y, Snavely N, Huttenlocher D (2010) Location recognition using prioritized feature matching. In: IEEE Proceedings of ECCV
Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3d point clouds. In: Computer Vision–ECCV 2012, pp 15–29. Springer
Li Z, Yap KH (2012) Content and context boosting for mobile landmark recognition. IEEE Sig Process Lett 19(8):459–462
Lim H, Sinha SN, Cohen MF, Uyttendaele M (2012) Real-time image-based 6-dof localization in large-scale environments. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 1043–1050. IEEE
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60
MicusÃk B, Wildenauer H, Kosecka J (2008) Detection and matching of rectilinear structures. In: IEEE Proceedings of CVPR
Middelberg S, Sattler T, Untzelmann O, Kobbelt L (2014) Scalable 6-dof localization on mobile devices. In: Computer vision ECCV 2014, lecture notes in computer science, vol 8690, pp 268–283
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: CVPR, pp 2161–2168
Ondrej Chum JP, Zisserman A (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: BMVC
Perdoch OCM, Matas J (2009) Geometric min-hashing: Finding a (thick) needle in a haystack. In: IEEE Proceedings of CVPR
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE proceedings of CVPR
Robertson D, Cipolla R (2004) An image based system for urban navigation. In: BMVC, pp 819–828
Sattler T, Leibe B, Kobbelt L (2012) Improving image-based localization by active correspondence search. In: ECCV 2012, pp 752–765. Springer
Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: IEEE proceedings of CVPR
Shao H, Svoboda T, Gool LV (2003) ZUBUD-Zurich buildings database for image based recognition. Tech. rep., No. 260, Swiss Federal Inst. of Technology
Tola E, Lepetit V, Fua P (2010) DAISY: an efficient dense descriptor applied to wide baseline stereo. IEEE transactions on pattern analysis and machine intelligence 32(5):815–830
Torii A, Sivic J, Pajdla T, Okutomi M (2013) Visual place recognition with repetitive structures. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Turcot P, Lowe DG (2009) Better matching with fewer features: the selection of useful features in large database recognition problems. In: ICCV workshop on emergent issues in large amounts of visual data (WS-LAVD)
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs. In: Advances in neural information processing systems, vol 13, pp 668–674. MIT Press
Winder SAJ, Hua G, Brown M (2009) Picking the best daisy. In: CVPR, pp 178–185
Zamir A, Shah M (2010) Accurate image localization based on google maps street view. In: IEEE proceedings of ECCV
Zhang W, Kosecka J (2007) Hierarchical building recognition. Image Vis Comput 25(5):704–716
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Sinha, S.N., Hedau, V., Zitnick, C.L., Szeliski, R. (2016). A Memory Efficient Discriminative Approach for Location-Aided Recognition. In: Zamir, A., Hakeem, A., Van Gool, L., Shah, M., Szeliski, R. (eds) Large-Scale Visual Geo-Localization. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-25781-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-25781-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25779-2
Online ISBN: 978-3-319-25781-5
eBook Packages: Computer ScienceComputer Science (R0)