Skip to main content

Visual Vocabulary with a Semantic Twist

  • Conference paper
  • First Online:
Computer Vision – ACCV 2014 (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9003))

Included in the following conference series:

Abstract

Successful large scale object instance retrieval systems are typically based on accurate matching of local descriptors, such as SIFT. However, these local descriptors are often not sufficiently distinctive to prevent false correspondences, as they only consider the gradient appearance of the local patch, without being able to “see the big picture”.

We describe a method, SemanticSIFT, which takes account of local image semantic content (such as grass and sky) in matching, and thereby eliminates many false matches. We show that this enhanced descriptor can be employed in standard large scale inverted file systems with the following benefits: improved precision (as false retrievals are suppressed); an almost two-fold speedup in retrieval speed (as posting lists are shorter on average); and, depending on the target application, a 20 % decrease in memory requirements (since unrequired ‘semantic’ words can be removed). Furthermore, we also introduce a fast, and near state of the art, semantic segmentation algorithm.

Quantitative and qualitative results on standard benchmark datasets (Oxford Buildings 5 k and 105 k) demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cummins, M., Newman, P.: Highly scalable appearance-only SLAM - FAB-MAP 2.0. In: RSS (2009)

    Google Scholar 

  2. Schindler, G., Brown, M., Szeliski, R.: City-scale location recognition. In: Proceedings of the CVPR (2007)

    Google Scholar 

  3. Knopp, J., Sivic, J., Pajdla, T.: Avoiding confusing features in place recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 748–761. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Torii, A., Sivic, J., Pajdla, T., Okutomi, M.: Visual place recognition with repetitive structures. In: Proceedings of the CVPR (2013)

    Google Scholar 

  5. Quack, T., Leibe, B., Van Gool, L.: World-scale mining of objects and events from community photo collections. In: Proceedings of the CIVR (2008)

    Google Scholar 

  6. Gammeter, S., Bossard, L., Quack, T., Van Gool, L.: I know what you did last summer: object-level auto-annotation of holiday snaps. In: Proceedings of the ICCV (2009)

    Google Scholar 

  7. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of the CVPR, pp. 2161–2168 (2006)

    Google Scholar 

  8. Shen, X., Lin, Z., Brandt, J., Wu, Y.: Mobile product image search by automatic query object extraction. In: Proceedings of the CVPR (2012)

    Google Scholar 

  9. Romberg, S., Lienhart, R.: Bundle min-hashing for logo recognition. In: ACM ICMR (2013)

    Google Scholar 

  10. Google Goggles. http://www.google.com/mobile/goggles

  11. Sivic, J., Zisserman, A.: Efficient visual search of videos cast as text retrieval. IEEE PAMI 31, 591–606 (2009)

    Article  Google Scholar 

  12. Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In: Proceedings of the ICCV (2009)

    Google Scholar 

  13. Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)

    Article  Google Scholar 

  14. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the CVPR (2007)

    Google Scholar 

  15. Jégou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Jégou, H., Douze, M., Schmid, C.: Exploiting descriptor distances for precise image search. Technical report, INRIA (2011)

    Google Scholar 

  17. Aly, M., Munich, M., Perona, P.: Compactkdt: compact signatures for accurate large scale object recognition. In: IEEE Workshop on Applications of Computer Vision (2012)

    Google Scholar 

  18. Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings of the CVPR (2012)

    Google Scholar 

  19. Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: automatic query expansion with a generative feature model for object retrieval. In: Proceedings of the ICCV (2007)

    Google Scholar 

  20. Chum, O., Mikulik, A., Per\(\check{\rm d}\)och, M., Matas, J.: Total recall II: query expansion revisited. In: Proceedings of the CVPR (2011)

    Google Scholar 

  21. Shen, X., Lin, Z., Brandt, J., Avidan, S., Wu, Y.: Object retrieval and localization with spatially-constrained similarity measure and k-NN reranking. In: Proceedings of the CVPR (2012)

    Google Scholar 

  22. Qin, D., Gammeter, S., Bossard, L., Quack, T., Van Gool, L.: Hello neighbor: accurate object retrieval with k-reciprocal nearest neighbors. In: Proceedings of the CVPR (2011)

    Google Scholar 

  23. Simonyan, K., Vedaldi, A., Zisserman, A.: Descriptor learning using convex optimisation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 243–256. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  24. Winder, S., Hua, G., Brown, M.: Picking the best daisy. In: Proceedings of the CVPR, pp. 178–185 (2009)

    Google Scholar 

  25. Ladicky, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.H.S.: Joint optimisation for object class segmentation and dense stereo reconstruction. IJCV 100(2), 122–133 (2012)

    Article  Google Scholar 

  26. Haene, C., Zach, C., Cohen, A., Angst, R., Pollefeys, M.: Joint 3D scene reconstruction and class segmentation. In: Proceedings of the CVPR (2013)

    Google Scholar 

  27. Castle, R.O., Klein, G., Murray, D.W.: Combining monoSLAM with object recognition for scene augmentation using a wearable camera. Image Vis. Comput. 28(11), 1548–1556 (2010)

    Article  Google Scholar 

  28. Civera, J., Gálvez-López, D., Riazuelo, L., Tardós, J.D., Montiel, J.M.M.: Towards semantic SLAM using a monocular camera. In: IEEE Intelligent Robots and Systems (IROS) (2011)

    Google Scholar 

  29. Turcot, T., Lowe, D.G.: Better matching with fewer features: the selection of useful features in large database recognition problems. In: ICCV Workshop on Emergent Issues in Large Amounts of Visual Data (WS-LAVD) (2009)

    Google Scholar 

  30. Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-duplicate web image search. In: Proceedings of the CVPR (2009)

    Google Scholar 

  31. Fernando, B., Tuytelaars, T.: Mining multiple queries for image retrieval: on-the-fly learning of an object-specific mid-level representation. In: Proceedings of the ICCV (2013)

    Google Scholar 

  32. Wang, J.Z., Li, J., Wiederhold, G.: SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE PAMI 23, 947–964 (2001)

    Article  Google Scholar 

  33. Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  34. Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE PAMI 25(9), 1075–1088 (2003)

    Article  Google Scholar 

  35. Munoz, D., Bagnell, J.A., Hebert, M.: Stacked hierarchical labeling. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 57–70. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  36. Lempitsky, V., Vedaldi, A., Zisserman, A.: A pylon model for semantic segmentation. In: NIPS (2011)

    Google Scholar 

  37. Ladicky, L., Russell, C., Kohli, P., Torr, P.H.S.: Associative hierarchical random fields. IEEE PAMI 36(6), 1056–1077 (2014)

    Article  Google Scholar 

  38. Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of the ICCV (2009)

    Google Scholar 

  39. Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: Proceedings of the CVPR (2008)

    Google Scholar 

  40. Felzenszwalb, P.F., Veksler, O.: Tiered scene labelling with dynamic programming. In: Proceedings of the CVPR (2010)

    Google Scholar 

  41. Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: Proceedings of the CVPR (2006)

    Google Scholar 

  42. Leordeanu, M., Sukthankar, R., Sminchisescu, C.: Efficient closed-form solution to generalized boundary detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 516–529. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  43. Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: Proceedings of the CVPR (2010)

    Google Scholar 

  44. Arandjelović, R., Zisserman, A.: Fast semantic segmentation code (2014). http://www.robots.ox.ac.uk/~vgg/software/fast_semantic_segmentation

  45. Tighe, J., Lazebnik, S.: Superparsing: scalable nonparametric image parsing with superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 352–365. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  46. Maire, M., Arbelaez, P., Fowlkes, C., Malik, J.: Using contours to detect and localize junctions in natural images. In: Proceedings of the CVPR (2008)

    Google Scholar 

  47. Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: From contours to regions: an empirical evaluation. In: Proceedings of the CVPR (2009)

    Google Scholar 

  48. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of the CVPR (2008)

    Google Scholar 

  49. Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: the benefit of pca and whitening. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 774–787. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  50. Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: Proceedings of the CVPR (2009)

    Google Scholar 

  51. Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. IJCV 1, 63–86 (2004)

    Article  Google Scholar 

  52. Tolias, G., Jégou, H.: Local visual query expansion: exploiting an image collection to refine local descriptors. Technical report RR-8325, INRIA (2013)

    Google Scholar 

  53. Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  54. Arandjelović, R., Zisserman, A.: Parissculpt360 annotations (2014). http://www.robots.ox.ac.uk/~vgg/data/data-various.html

  55. Arandjelović, R., Zisserman, A.: Smooth object retrieval using a bag of boundaries. In: Proceedings of the ICCV (2011)

    Google Scholar 

  56. Stewénius, H., Gunderson, S.H., Pilet, J.: Size matters: exhaustive geometric verification for image retrieval accepted for ECCV 2012. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 674–687. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  57. van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. IEEE PAMI 32, 1582–1596 (2010)

    Article  Google Scholar 

  58. Philbin, J., Isard, M., Sivic, J., Zisserman, A.: Descriptor learning for efficient retrieval. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 677–691. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  59. Simonyan, K., Vedaldi, A., Zisserman, A.: Learning local feature descriptors using convex optimisation. IEEE PAMI 36(8), 1573–1585 (2014)

    Article  Google Scholar 

Download references

Acknowledgement

We are grateful for financial support from ERC grant VisRec no. 228180 and a Royal Society Wolfson Research Merit Award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Relja Arandjelović .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (zip 1,622 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Arandjelović, R., Zisserman, A. (2015). Visual Vocabulary with a Semantic Twist. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16865-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16864-7

  • Online ISBN: 978-3-319-16865-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics