Skip to main content

A Survey of Landmark Recognition Using the Bag-of-Words Framework

  • Chapter
Intelligent Computer Graphics 2012

Part of the book series: Studies in Computational Intelligence ((SCI,volume 441))

Abstract

Recent years have seen an exponential increase in the use of mobile devices. Since many of the mobile devices are equipped with a camera and are connected to the internet, localization in an urban environment using landmark images is gaining popularity. The idea is simple. A tourist takes images of a landmark where he or she is standing with a mobile camera which are then transmitted to a server where the image(s) are matched against a database of landmark images for that locality. If a match is found, relevant information such as background information on the landmark, nearby transit facilities or information on other important landmarks nearby is sent back. This type of application has tremendous potential as a mobile city guide or navigation aid. In this paper, we investigate the use of local invariant shape features and global features such as colour and texture for the recognition task as evident from literature and present various retrieval techniques. A variety of descriptors for landmark recognition and scene classification are discussed. Insights into vocabulary building and weighting schemes for representing landmark images are provided that can help in boosting recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(4), 509–522 (2002)

    Article  Google Scholar 

  2. Bosch, A., Zisserman, A., Munoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Trans. on Pattern Analysis and Machine Intelligence 30(4), 712–727 (2008)

    Article  Google Scholar 

  3. Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recognition 37(9), 1757–1771 (2004)

    Article  Google Scholar 

  4. Chen, T., Wu, K., Yap, K.-H., Li, Z., Tsai, F.S.: A survey on mobile landmark recognition for information retrieval. In: Mobile Data Management: Systems, Services and Middleware, pp. 625–630 (2009)

    Google Scholar 

  5. Chum, O., Matas, J.: Large-scale discovery of spatially related images. IEEE PAMI 32, 371–377 (2010)

    Article  Google Scholar 

  6. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  7. Chen, D., et al.: City-scale landmark identification on mobile devices. In: CVPR (2011)

    Google Scholar 

  8. Nowak, E., Jurie, F., Triggs, B.: Sampling Strategies for Bag-of-Features Image Classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part IV. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Mikolajczyk, K., et al.: A comparison of affine region detectors. IJCV 65, 43–72 (2005)

    Article  Google Scholar 

  10. Chen, T., et al.: A multi-scale learning approach for landmark recognition using mobile devices. In: IEEE Information, Communications and Signal Processing, pp. 1–4 (2009)

    Google Scholar 

  11. Zheng, Y.-T., et al.: Tour the world: building a web-scale landmark recognition engine. In: CVPR, pp. 1085–1092 (2009)

    Google Scholar 

  12. Fischler, M., Bolles, R.: Random sample consensus: A paradigm for model fitting with application to image analysis and automated cartography. Communications of the ACM 24, 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  13. Fritz, G., Seifert, C., Paletta, L.: A mobile vision system for urban detection with informative local descriptors. In: IEEE Conf. Computer Vision, pp. 30–35 (2006)

    Google Scholar 

  14. Ge, Y., Yu, J.: A scene recognition algorithm based on covariance descriptor. In: IEEE Conf. Cybernetics and Systems, pp. 838–842 (2008)

    Google Scholar 

  15. Gokalp, D., Aksoy, S.: Scene classification using bag-of-regions representations. In: CVPR, pp. 1–8 (2007)

    Google Scholar 

  16. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded Up Robust Features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. Hile, H., Grzeszczuk, R., Liu, A., Vedantham, R., Košecka, J., Borriello, G.: Landmark-Based Pedestrian Navigation with Enhanced Spatial Reasoning. In: Tokuda, H., Beigl, M., Friday, A., Brush, A.J.B., Tobe, Y. (eds.) Pervasive 2009. LNCS, vol. 5538, pp. 59–76. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  18. Jiang, Y.-G., Ngo, C.-W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: ACM Int’l Conference on Image and Video Retrieval, pp. 494–501 (2007)

    Google Scholar 

  19. Kadir, T., Zisserman, A., Brady, M.: An Affine Invariant Salient Region Detector. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 228–241. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  20. Ke, Y., Sukthankar, R.: PCA-SIFT: A more distinctive representation for local image descriptors. In: CVPR, pp. 66–75 (2004)

    Google Scholar 

  21. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)

    Google Scholar 

  22. Li, F.F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, pp. 524–531 (2005)

    Google Scholar 

  23. Li, Y., Lim, J.H., Goh, H.: Cascaded classification with optimal candidate selection for effective place recognition. In: IEEE Conf. Multimedia, pp. 1493–1496 (2008)

    Google Scholar 

  24. Lim, J., Li, Y., You, Y.: Scene recognition with camera phones for tourist information access. In: IEEE Conf. Multimedia, pp. 100–103 (2007)

    Google Scholar 

  25. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV, 91–110 (2004)

    Google Scholar 

  26. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing 22(10), 761–767 (2004)

    Article  Google Scholar 

  27. Mikolajczyk, K.: Scale and affine invariant interest point detectors. PhD thesis (2002)

    Google Scholar 

  28. Mikolajczyk, K., Schmid, C.: An Affine Invariant Interest Point Detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  29. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: CVPR (2003)

    Google Scholar 

  30. Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. IJCV 60(1), 63–86 (2004)

    Article  Google Scholar 

  31. Nguyen, G.P., Andersen, H.J., Hoilund, C.: Street navigation using visual information on mobile phones. In: Intelligent Systems Design and Applications, pp. 37–42 (2010)

    Google Scholar 

  32. Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision (IJCV) 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  33. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)

    Google Scholar 

  34. Pronobis, S., Caputo, A.: Confidence-based cue integration for visual place recognition. In: IEEE Conf. Intelligent Robots and Systems, pp. 2394–2401 (2007)

    Google Scholar 

  35. Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1478 (2003)

    Google Scholar 

  36. Steinbach, M., Ertoz, L., Kumar, V.: Challenges of clustering high dimensional data. In: New Vistas in Statistical Physics Applications in Econophysics, Bioinformatics, and Pattern Recognition (2003)

    Google Scholar 

  37. Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: ICCV, pp. 273–280 (2003)

    Google Scholar 

  38. Turcot, P., Lowe, D.G.: Better matching with fewer features: The selection of useful features in large database recognition problems. In: ICCV Workshop on Emergent Issues in Large Amounts of Visual Data, WS-LAVD (2009)

    Google Scholar 

  39. Tuytelaars, T., Van Gool, L.: Matching widely separated views based on affine invariant regions. IJCV 59(1), 61–85 (2004)

    Article  Google Scholar 

  40. Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision 3(3), 177–280 (2008)

    Article  Google Scholar 

  41. van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1582–1596 (2010)

    Article  Google Scholar 

  42. Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org/ (last accessed 2012)

  43. Yang, J., Jiang, Y.-G., Hauptmann, A.G., Ngo, C.-W.: Evaluating bag-of-visual-words representations in scene classification. In: Int’l Workshop on Multimedia Information Retrieval, pp. 197–206 (2007)

    Google Scholar 

  44. Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Intl Conf. on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  45. Yeh, T., Tollmar, K., Darrell, T.: Searching the web with mobile images for location recognition. In: CVPR, pp. 76–81 (2004)

    Google Scholar 

  46. Zhang, W., Kosecka, J.: Image based localization in urban environments. In: International Symposium on 3D Data Processing, Visualization and Transmission, pp. 33–40 (2006)

    Google Scholar 

  47. Zhang, W., Kosecka, J.: Hierarchical building recognition. In: Image and Vision Computing, pp. 704–716 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priyadarshi Bhattacharya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bhattacharya, P., Gavrilova, M. (2013). A Survey of Landmark Recognition Using the Bag-of-Words Framework. In: Plemenos, D., Miaoulis, G. (eds) Intelligent Computer Graphics 2012. Studies in Computational Intelligence, vol 441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31745-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31745-3_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31744-6

  • Online ISBN: 978-3-642-31745-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics