Abstract
Successful large scale object instance retrieval systems are typically based on accurate matching of local descriptors, such as SIFT. However, these local descriptors are often not sufficiently distinctive to prevent false correspondences, as they only consider the gradient appearance of the local patch, without being able to “see the big picture”.
We describe a method, SemanticSIFT, which takes account of local image semantic content (such as grass and sky) in matching, and thereby eliminates many false matches. We show that this enhanced descriptor can be employed in standard large scale inverted file systems with the following benefits: improved precision (as false retrievals are suppressed); an almost two-fold speedup in retrieval speed (as posting lists are shorter on average); and, depending on the target application, a 20 % decrease in memory requirements (since unrequired ‘semantic’ words can be removed). Furthermore, we also introduce a fast, and near state of the art, semantic segmentation algorithm.
Quantitative and qualitative results on standard benchmark datasets (Oxford Buildings 5 k and 105 k) demonstrate the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cummins, M., Newman, P.: Highly scalable appearance-only SLAM - FAB-MAP 2.0. In: RSS (2009)
Schindler, G., Brown, M., Szeliski, R.: City-scale location recognition. In: Proceedings of the CVPR (2007)
Knopp, J., Sivic, J., Pajdla, T.: Avoiding confusing features in place recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 748–761. Springer, Heidelberg (2010)
Torii, A., Sivic, J., Pajdla, T., Okutomi, M.: Visual place recognition with repetitive structures. In: Proceedings of the CVPR (2013)
Quack, T., Leibe, B., Van Gool, L.: World-scale mining of objects and events from community photo collections. In: Proceedings of the CIVR (2008)
Gammeter, S., Bossard, L., Quack, T., Van Gool, L.: I know what you did last summer: object-level auto-annotation of holiday snaps. In: Proceedings of the ICCV (2009)
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of the CVPR, pp. 2161–2168 (2006)
Shen, X., Lin, Z., Brandt, J., Wu, Y.: Mobile product image search by automatic query object extraction. In: Proceedings of the CVPR (2012)
Romberg, S., Lienhart, R.: Bundle min-hashing for logo recognition. In: ACM ICMR (2013)
Google Goggles. http://www.google.com/mobile/goggles
Sivic, J., Zisserman, A.: Efficient visual search of videos cast as text retrieval. IEEE PAMI 31, 591–606 (2009)
Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In: Proceedings of the ICCV (2009)
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the CVPR (2007)
Jégou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)
Jégou, H., Douze, M., Schmid, C.: Exploiting descriptor distances for precise image search. Technical report, INRIA (2011)
Aly, M., Munich, M., Perona, P.: Compactkdt: compact signatures for accurate large scale object recognition. In: IEEE Workshop on Applications of Computer Vision (2012)
Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings of the CVPR (2012)
Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: automatic query expansion with a generative feature model for object retrieval. In: Proceedings of the ICCV (2007)
Chum, O., Mikulik, A., Per\(\check{\rm d}\)och, M., Matas, J.: Total recall II: query expansion revisited. In: Proceedings of the CVPR (2011)
Shen, X., Lin, Z., Brandt, J., Avidan, S., Wu, Y.: Object retrieval and localization with spatially-constrained similarity measure and k-NN reranking. In: Proceedings of the CVPR (2012)
Qin, D., Gammeter, S., Bossard, L., Quack, T., Van Gool, L.: Hello neighbor: accurate object retrieval with k-reciprocal nearest neighbors. In: Proceedings of the CVPR (2011)
Simonyan, K., Vedaldi, A., Zisserman, A.: Descriptor learning using convex optimisation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 243–256. Springer, Heidelberg (2012)
Winder, S., Hua, G., Brown, M.: Picking the best daisy. In: Proceedings of the CVPR, pp. 178–185 (2009)
Ladicky, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.H.S.: Joint optimisation for object class segmentation and dense stereo reconstruction. IJCV 100(2), 122–133 (2012)
Haene, C., Zach, C., Cohen, A., Angst, R., Pollefeys, M.: Joint 3D scene reconstruction and class segmentation. In: Proceedings of the CVPR (2013)
Castle, R.O., Klein, G., Murray, D.W.: Combining monoSLAM with object recognition for scene augmentation using a wearable camera. Image Vis. Comput. 28(11), 1548–1556 (2010)
Civera, J., Gálvez-López, D., Riazuelo, L., Tardós, J.D., Montiel, J.M.M.: Towards semantic SLAM using a monocular camera. In: IEEE Intelligent Robots and Systems (IROS) (2011)
Turcot, T., Lowe, D.G.: Better matching with fewer features: the selection of useful features in large database recognition problems. In: ICCV Workshop on Emergent Issues in Large Amounts of Visual Data (WS-LAVD) (2009)
Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-duplicate web image search. In: Proceedings of the CVPR (2009)
Fernando, B., Tuytelaars, T.: Mining multiple queries for image retrieval: on-the-fly learning of an object-specific mid-level representation. In: Proceedings of the ICCV (2013)
Wang, J.Z., Li, J., Wiederhold, G.: SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE PAMI 23, 947–964 (2001)
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE PAMI 25(9), 1075–1088 (2003)
Munoz, D., Bagnell, J.A., Hebert, M.: Stacked hierarchical labeling. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 57–70. Springer, Heidelberg (2010)
Lempitsky, V., Vedaldi, A., Zisserman, A.: A pylon model for semantic segmentation. In: NIPS (2011)
Ladicky, L., Russell, C., Kohli, P., Torr, P.H.S.: Associative hierarchical random fields. IEEE PAMI 36(6), 1056–1077 (2014)
Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of the ICCV (2009)
Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: Proceedings of the CVPR (2008)
Felzenszwalb, P.F., Veksler, O.: Tiered scene labelling with dynamic programming. In: Proceedings of the CVPR (2010)
Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: Proceedings of the CVPR (2006)
Leordeanu, M., Sukthankar, R., Sminchisescu, C.: Efficient closed-form solution to generalized boundary detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 516–529. Springer, Heidelberg (2012)
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: Proceedings of the CVPR (2010)
Arandjelović, R., Zisserman, A.: Fast semantic segmentation code (2014). http://www.robots.ox.ac.uk/~vgg/software/fast_semantic_segmentation
Tighe, J., Lazebnik, S.: Superparsing: scalable nonparametric image parsing with superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 352–365. Springer, Heidelberg (2010)
Maire, M., Arbelaez, P., Fowlkes, C., Malik, J.: Using contours to detect and localize junctions in natural images. In: Proceedings of the CVPR (2008)
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: From contours to regions: an empirical evaluation. In: Proceedings of the CVPR (2009)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of the CVPR (2008)
Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: the benefit of pca and whitening. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 774–787. Springer, Heidelberg (2012)
Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: Proceedings of the CVPR (2009)
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. IJCV 1, 63–86 (2004)
Tolias, G., Jégou, H.: Local visual query expansion: exploiting an image collection to refine local descriptors. Technical report RR-8325, INRIA (2013)
Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)
Arandjelović, R., Zisserman, A.: Parissculpt360 annotations (2014). http://www.robots.ox.ac.uk/~vgg/data/data-various.html
Arandjelović, R., Zisserman, A.: Smooth object retrieval using a bag of boundaries. In: Proceedings of the ICCV (2011)
Stewénius, H., Gunderson, S.H., Pilet, J.: Size matters: exhaustive geometric verification for image retrieval accepted for ECCV 2012. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 674–687. Springer, Heidelberg (2012)
van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. IEEE PAMI 32, 1582–1596 (2010)
Philbin, J., Isard, M., Sivic, J., Zisserman, A.: Descriptor learning for efficient retrieval. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 677–691. Springer, Heidelberg (2010)
Simonyan, K., Vedaldi, A., Zisserman, A.: Learning local feature descriptors using convex optimisation. IEEE PAMI 36(8), 1573–1585 (2014)
Acknowledgement
We are grateful for financial support from ERC grant VisRec no. 228180 and a Royal Society Wolfson Research Merit Award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Arandjelović, R., Zisserman, A. (2015). Visual Vocabulary with a Semantic Twist. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-16865-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16864-7
Online ISBN: 978-3-319-16865-4
eBook Packages: Computer ScienceComputer Science (R0)