Advertisement

Multimedia Tools and Applications

, Volume 74, Issue 2, pp 479–503 | Cite as

Memory efficient large-scale image-based localization

  • Guoyu LuEmail author
  • Nicu Sebe
  • Congfu Xu
  • Chandra Kambhamettu
Article

Abstract

Local features have been widely used in the area of image-based localization. However, large-scale 2D-to-3D matching problems still involve massive memory consumption, which is mainly caused by the high dimensionality of the features (e.g. 128 dimensions of SIFT feature). This paper introduces a new method that decreases local features’ high dimensionality for reducing memory capacity and accelerating the descriptor matching process. With this new method, all descriptors are projected into a lower dimensional space through the new learned matrices that are able to reduce the curse of dimensionality in the large scale image-based localization. The low dimensional descriptors are then mapped into a Hamming space for further reducing the memory requirement. This study also proposes an image-based localization pipeline based on the new learned Hamming descriptors. The new learned descriptor and the localization pipeline are applied to two challenging datasets. The experimental results show that the proposed method achieves extraordinary image registration performance compared with the published results from state-of-the-art methods.

Keywords

Image-based localization Large scale imagery SIFT Hamming descriptor Dimensionality reduction 

Notes

Acknowledgements

This work has been financially supported by European Master in Informatics program, RWTH Aachen University, University of Trento and the PhD program of University of Delaware. The authors are grateful to Torsten Sattler and Leif Kobbelt from RWTH Aachen University for their great help to make this work accomplished.

References

  1. 1.
    Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: Proceedings of 2012 IEEE conference on computer vision and pattern recognition (CVPR). pp 2911–2918Google Scholar
  2. 2.
    Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). J Comput Vis Image Underst (CVIU) 110(3):346–359CrossRefGoogle Scholar
  3. 3.
    Beltran A, Abargues C, Granell C, Núñez M, Díaz L, Huerta J (2013) A virtual globe tool for searching and visualizing geo-referenced media resources in social networks. Multimed Tools Appl (JMTA):1–25Google Scholar
  4. 4.
    Broder A (1997) On the resemblance and containment of documents. In: Proceedings of compression and complexity of sequences. pp 21–29Google Scholar
  5. 5.
    Broder A, Charikar M, Frieze A, Mitzenmacher M (1998) Min-wise independent permutations. J Comput Syst Sci 60:327–336MathSciNetGoogle Scholar
  6. 6.
    Brown M, Hua G, Winder S (2011) Discriminative learning of local image descriptors. IEEE Trans Patt Anal Mach Intell (TPAMI) 33(1):43–57CrossRefGoogle Scholar
  7. 7.
    Castle R, Klein G, Murray D (2008) Video-rate localization in multiple maps for wearable augmented reality. In: Proceedings of the 2008 12th IEEE international symposium on wearable computers (ISWC). pp 15–22Google Scholar
  8. 8.
    Crandall D, Owens A, Snavely N, Huttenlocher D (2011) Discrete-continuous optimization for large-scale structure from motion. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition (CVPR). pp. 3001–3008Google Scholar
  9. 9.
    Cummins M, Newman P (2008) Fab-map: probabilistic localization and mapping in the space of appearance. Int J Robot Res(IJRR) 27(6):647–665CrossRefGoogle Scholar
  10. 10.
    Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395CrossRefMathSciNetGoogle Scholar
  11. 11.
    Frahm J, Georgel P, Gallup D, Johnson T, Raguram R, Wu C, Jen Y, Dunn E, Clipp B, Lazebnik S, Pollefeys M (2010) Building Rome on a cloudless day. In: Proceedings of the 11th European conference on computer vision (ECCV). pp 368–381Google Scholar
  12. 12.
    Gao Y, Wang M, Zha Z, Shen J, Li X, Wu X (2013) Visual-textual joint relevance learning for tag-based social image search. IEEE Trans Image Process (TIP) 22(1):363–376CrossRefMathSciNetGoogle Scholar
  13. 13.
    Gao Y, Wang M, Zha Z, Tian Q, Dai Q, Zhang N (2011) Less is more: efficient 3-d object retrieval with query view selection. IEEE Trans Multimed (TMM) 13(5):1007–1018CrossRefGoogle Scholar
  14. 14.
    Han Y, Wu F, Tao D, Shao J, Zhuang Y, Jiang J (2012) Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans Circ Syst Video Tech 22(10):1485–1496CrossRefGoogle Scholar
  15. 15.
    Han Y, Yang Y, Zhou X (2013) Co-regularized ensemble for feature selection. In: Proceedings of the 23rd international joint conference on artificial intelligence (IJCAI)Google Scholar
  16. 16.
    Hartley R, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press. ISBN: 0521540518Google Scholar
  17. 17.
    Heath K, Gelfand N, Ovsjanikov M, Aanjaneya M, Guibas L (2010) Image webs: computing and exploiting connectivity in image collections. In: Proceedings of the 2010 IEEE conference on computer vision and pattern recognition (CVPR). pp 3432–3439Google Scholar
  18. 18.
    Hua G, Brown M, Winder S (2007) Discriminant embedding for local image descriptors. In: Proceedings of the 2007 IEEE 11th international conference on computer vision (ICCV). pp 1–8Google Scholar
  19. 19.
    Irschara A, Zach C, Frahm J, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: Proceedings of the 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR). pp 2599–2606Google Scholar
  20. 20.
    Jacobs N, Miskell K, Pless R (2011) Webcam geo-localization using aggregate light levels. In: Proceedings of 2011 IEEE workshops on applications of computer vision (WACV). pp 132–138Google Scholar
  21. 21.
    Jolliffe I (1986) Principal component analysis. Springer VerlagGoogle Scholar
  22. 22.
    Kalia R, Lee KD, Samir B, Je SK, Oh WG (2011) An analysis of the effect of different image preprocessing techniques on the performance of surf: speeded up robust features. In: Proceedings of the 2011 17th Korea-Japan joint workshop on frontiers of computer vision. pp 1–6Google Scholar
  23. 23.
    Ke Y, Sukthankar R (2004) Pca-sift: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 2. pp 506–513Google Scholar
  24. 24.
    Kulis B, Darrell T (2009) Learning to hash with binary reconstructive embeddings. In: Proceedings of the 23nd annual conference on neural information processing systems (NIPS). pp 1042–1050Google Scholar
  25. 25.
    Kulis B, Grauman K (2009) Kernelized locality-sensitive hashing for scalable image search. In: Proceedings of the 2009 IEEE 12th international conference on computer vision (ICCV). pp 2130–2137Google Scholar
  26. 26.
    Leonard J, Durrant-Whyte H (1991) Simultaneous map building and localization for an autonomous mobile robot. In: Proceedings of the 1991 IEEE/RSJ international workshop on intelligent robots and systems ’91. ’Intelligence for mechanical systems, vol 3. pp 1442–1447Google Scholar
  27. 27.
    Li Y, Snavely N, Huttenlocher DP (2010) Location recognition using prioritized feature matching. In: Proceedings of the 11th European conference on computer vision (ECCV). pp 791–804Google Scholar
  28. 28.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis (IJCV) 60(2):91–110CrossRefGoogle Scholar
  29. 29.
    Ma Z, Yang Y, Cai Y, Sebe N, Hauptmann A (2012) Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In: Proceedings of the 20th ACM international conference on multimedia (MM). pp 469–478Google Scholar
  30. 30.
    Ma Z, Yang Y, Sebe N, Hauptmann A (2014) Knowledge adaptation with partially shared features for event detection using few exemplars. In: IEEE transactions on pattern analysis and machine intelligence.  10.1109/TPAMI.2014.2306419
  31. 31.
    Mika S, Ratsch G, Weston J, Scholkopf B, Mullers K (1999) Fisher discriminant analysis with kernels. In: Proceedings of the 1999 IEEE signal processing society workshop neural networks for signal processing IX. pp 41–48Google Scholar
  32. 32.
    Muja M, Lowe D (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: Proceedings of the 2009 international conference on computer vision theory and applications (VISAPP). pp 331–340Google Scholar
  33. 33.
    Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 2. pp 2161–2168Google Scholar
  34. 34.
    Philbin J, Isard M, Sivic J, Zisserman A (2010) Descriptor learning for efficient retrieval. In: Proceedings of the 11th European conference on computer vision conference on Computer vision (ECCV). pp 677–691Google Scholar
  35. 35.
    Powell M (1964) An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput J 7:155–162CrossRefzbMATHMathSciNetGoogle Scholar
  36. 36.
    Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: Proceedings of the 22nd annual conference on neural information processing systems (NIPS). pp 1509–1517Google Scholar
  37. 37.
    Robertson D, Cipolla R (2004) An image-based system for urban navigation. In: Proceedings of the 2004 British machine vision conference (BMVC). pp 819–828Google Scholar
  38. 38.
    Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2d-to-3d matching. In: Proceedings of the 2011 IEEE international conference on computer vision (ICCV). pp 667–674Google Scholar
  39. 39.
    Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proceedings of the 2007 IEEE conference on computer vision and pattern recognition (CVPR). pp 1–7Google Scholar
  40. 40.
    Shao H, Svoboda T, Tuytelaars T, Van Gool L (2003) Hpat indexing for fast object/scene recognition based on local appearance. In: Proceedings of the 2003 international conference on image and video retrieval (CIVR). pp 71–80Google Scholar
  41. 41.
    Smith R, Cheeseman P (1986) On the representation and estimation of spatial uncertainty. Int J Robot Res (IJRR) 5(6):56–68CrossRefGoogle Scholar
  42. 42.
    Snavely N, Seitz S, Szeliski R (2006) Photo tourism: exploring photo collections in 3d. ACM Transit Graph 25(3):835–846CrossRefGoogle Scholar
  43. 43.
    Steinhoff U, Dusan O, Perko R, Schiele B, Leonardis A (2007) How computer vision can help in outdoor positioning. In: Proceedings of the 2007 European conference on ambient intelligence (AmI). pp 124–141Google Scholar
  44. 44.
    Strecha C, Bronstein A, Bronstein M, Fua P (2012) LDAHash: improved matching with smaller descriptors. IEEE Trans Patt Anal Mach Intell (TPAMI) 34:66–78CrossRefGoogle Scholar
  45. 45.
    Tola E, Lepetit V, Fua P (2010) Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Patt Anal Mach Intell (TPAMI) 32(5):815–830CrossRefGoogle Scholar
  46. 46.
    Wang H, Yan S, Xu D, Tang X, Huang T (2007) Trace ratio vs. ratio trace for dimensionality reduction. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 1–8Google Scholar
  47. 47.
    Wang M, Gao Y, Lu K, Rui Y (2013) View-based discriminative probabilistic modeling for 3d object retrieval and recognition. IEEE Trans Image Process (TIP) 22(4):1395–1407CrossRefMathSciNetGoogle Scholar
  48. 48.
    Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Proceedings of the 22nd annual conference on neural information processing systems (NIPS). pp 1753–1760Google Scholar
  49. 49.
    Wendel A, Irschara A, Bischof H (2011) Natural landmark-based monocular localization for mavs. In: Proceedings of the 2011 IEEE international conference on robotics and automation (ICRA). pp 5792–5799Google Scholar
  50. 50.
    Winder S, Hua G, Brown M (2009) Picking the best daisy. In: Proceedings of the 2009 IEEE conference on computer vision and pattern recognition (CVPR). pp 178–185Google Scholar
  51. 51.
    Xiao J, Chen J, Yeung D, Quan L (2008) Structuring visual words in 3d for arbitrary-view object localization. In: Proceedings of the 10th European conference on computer vision (ECCV). pp 725–737Google Scholar
  52. 52.
    Xuan K, Zhao G, Taniar D, Safar M, Srinivasan B (2011) Voronoi-based multi-level range search in mobile navigation. Multimed Tools Appl (JMTA) 53(2):459–479CrossRefGoogle Scholar
  53. 53.
    Yagnik J, Strelow D, Ross DA, Lin RS (2011) The power of comparative reasoning. In: Proceedings of the 2011 IEEE international conference on computer vision (ICCV). pp 2431–2438Google Scholar
  54. 54.
    Yang Y, Nie F, Luo J, Zhuang Y, Pan, Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Patt Anal Mach Intell (TPAMI) 34:723–742CrossRefGoogle Scholar
  55. 55.
    Yang Y, Zhuang Y, Wu F, YH, P (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimed (TMM) 10:437–446CrossRefGoogle Scholar
  56. 56.
    Yu S, Yang Y, Hauptmann A (2013) Harry potter’s marauder’s map: localizing and tracking multiple persons-of-interest by nonnegative discretization. In: Proceedings of 2013 IEEE conference on computer vision and pattern recognition (CVPR)Google Scholar
  57. 57.
    Zhang W, Kosecka J (2006) Image based localization in urban environments. In: Proceedings of the 3rd international symposium on 3D data processing, visualization, and transmission (3DPVT). pp 33–40Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Guoyu Lu
    • 1
    Email author
  • Nicu Sebe
    • 2
  • Congfu Xu
    • 3
  • Chandra Kambhamettu
    • 1
  1. 1.Video/Image Modeling and Synthesis LabUniversity of DelawareNewarkUSA
  2. 2.Department of Information Engineering and Computer ScienceUniversity of TrentoTrentoItaly
  3. 3.Institute of Artificial IntelligenceZhejiang UniversityHangzhouPeople’s Republic of China

Personalised recommendations