Abstract
Recent years have seen an exponential increase in the use of mobile devices. Since many of the mobile devices are equipped with a camera and are connected to the internet, localization in an urban environment using landmark images is gaining popularity. The idea is simple. A tourist takes images of a landmark where he or she is standing with a mobile camera which are then transmitted to a server where the image(s) are matched against a database of landmark images for that locality. If a match is found, relevant information such as background information on the landmark, nearby transit facilities or information on other important landmarks nearby is sent back. This type of application has tremendous potential as a mobile city guide or navigation aid. In this paper, we investigate the use of local invariant shape features and global features such as colour and texture for the recognition task as evident from literature and present various retrieval techniques. A variety of descriptors for landmark recognition and scene classification are discussed. Insights into vocabulary building and weighting schemes for representing landmark images are provided that can help in boosting recognition rates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(4), 509–522 (2002)
Bosch, A., Zisserman, A., Munoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Trans. on Pattern Analysis and Machine Intelligence 30(4), 712–727 (2008)
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recognition 37(9), 1757–1771 (2004)
Chen, T., Wu, K., Yap, K.-H., Li, Z., Tsai, F.S.: A survey on mobile landmark recognition for information retrieval. In: Mobile Data Management: Systems, Services and Middleware, pp. 625–630 (2009)
Chum, O., Matas, J.: Large-scale discovery of spatially related images. IEEE PAMI 32, 371–377 (2010)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Chen, D., et al.: City-scale landmark identification on mobile devices. In: CVPR (2011)
Nowak, E., Jurie, F., Triggs, B.: Sampling Strategies for Bag-of-Features Image Classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part IV. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006)
Mikolajczyk, K., et al.: A comparison of affine region detectors. IJCV 65, 43–72 (2005)
Chen, T., et al.: A multi-scale learning approach for landmark recognition using mobile devices. In: IEEE Information, Communications and Signal Processing, pp. 1–4 (2009)
Zheng, Y.-T., et al.: Tour the world: building a web-scale landmark recognition engine. In: CVPR, pp. 1085–1092 (2009)
Fischler, M., Bolles, R.: Random sample consensus: A paradigm for model fitting with application to image analysis and automated cartography. Communications of the ACM 24, 381–395 (1981)
Fritz, G., Seifert, C., Paletta, L.: A mobile vision system for urban detection with informative local descriptors. In: IEEE Conf. Computer Vision, pp. 30–35 (2006)
Ge, Y., Yu, J.: A scene recognition algorithm based on covariance descriptor. In: IEEE Conf. Cybernetics and Systems, pp. 838–842 (2008)
Gokalp, D., Aksoy, S.: Scene classification using bag-of-regions representations. In: CVPR, pp. 1–8 (2007)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded Up Robust Features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)
Hile, H., Grzeszczuk, R., Liu, A., Vedantham, R., Košecka, J., Borriello, G.: Landmark-Based Pedestrian Navigation with Enhanced Spatial Reasoning. In: Tokuda, H., Beigl, M., Friday, A., Brush, A.J.B., Tobe, Y. (eds.) Pervasive 2009. LNCS, vol. 5538, pp. 59–76. Springer, Heidelberg (2009)
Jiang, Y.-G., Ngo, C.-W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: ACM Int’l Conference on Image and Video Retrieval, pp. 494–501 (2007)
Kadir, T., Zisserman, A., Brady, M.: An Affine Invariant Salient Region Detector. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 228–241. Springer, Heidelberg (2004)
Ke, Y., Sukthankar, R.: PCA-SIFT: A more distinctive representation for local image descriptors. In: CVPR, pp. 66–75 (2004)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Li, F.F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, pp. 524–531 (2005)
Li, Y., Lim, J.H., Goh, H.: Cascaded classification with optimal candidate selection for effective place recognition. In: IEEE Conf. Multimedia, pp. 1493–1496 (2008)
Lim, J., Li, Y., You, Y.: Scene recognition with camera phones for tourist information access. In: IEEE Conf. Multimedia, pp. 100–103 (2007)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV, 91–110 (2004)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing 22(10), 761–767 (2004)
Mikolajczyk, K.: Scale and affine invariant interest point detectors. PhD thesis (2002)
Mikolajczyk, K., Schmid, C.: An Affine Invariant Interest Point Detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: CVPR (2003)
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. IJCV 60(1), 63–86 (2004)
Nguyen, G.P., Andersen, H.J., Hoilund, C.: Street navigation using visual information on mobile phones. In: Intelligent Systems Design and Applications, pp. 37–42 (2010)
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision (IJCV) 42(3), 145–175 (2001)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)
Pronobis, S., Caputo, A.: Confidence-based cue integration for visual place recognition. In: IEEE Conf. Intelligent Robots and Systems, pp. 2394–2401 (2007)
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1478 (2003)
Steinbach, M., Ertoz, L., Kumar, V.: Challenges of clustering high dimensional data. In: New Vistas in Statistical Physics Applications in Econophysics, Bioinformatics, and Pattern Recognition (2003)
Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: ICCV, pp. 273–280 (2003)
Turcot, P., Lowe, D.G.: Better matching with fewer features: The selection of useful features in large database recognition problems. In: ICCV Workshop on Emergent Issues in Large Amounts of Visual Data, WS-LAVD (2009)
Tuytelaars, T., Van Gool, L.: Matching widely separated views based on affine invariant regions. IJCV 59(1), 61–85 (2004)
Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision 3(3), 177–280 (2008)
van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1582–1596 (2010)
Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org/ (last accessed 2012)
Yang, J., Jiang, Y.-G., Hauptmann, A.G., Ngo, C.-W.: Evaluating bag-of-visual-words representations in scene classification. In: Int’l Workshop on Multimedia Information Retrieval, pp. 197–206 (2007)
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Intl Conf. on Machine Learning, pp. 412–420 (1997)
Yeh, T., Tollmar, K., Darrell, T.: Searching the web with mobile images for location recognition. In: CVPR, pp. 76–81 (2004)
Zhang, W., Kosecka, J.: Image based localization in urban environments. In: International Symposium on 3D Data Processing, Visualization and Transmission, pp. 33–40 (2006)
Zhang, W., Kosecka, J.: Hierarchical building recognition. In: Image and Vision Computing, pp. 704–716 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bhattacharya, P., Gavrilova, M. (2013). A Survey of Landmark Recognition Using the Bag-of-Words Framework. In: Plemenos, D., Miaoulis, G. (eds) Intelligent Computer Graphics 2012. Studies in Computational Intelligence, vol 441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31745-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-31745-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31744-6
Online ISBN: 978-3-642-31745-3
eBook Packages: EngineeringEngineering (R0)