Skip to main content

VISOR: Towards On-the-Fly Large-Scale Object Category Retrieval

  • Conference paper
Computer Vision – ACCV 2012 (ACCV 2012)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7725))

Included in the following conference series:

Abstract

This paper addresses the problem of object category retrieval in large unannotated image datasets. Our aim is to enable both fast learning of an object category model, and fast retrieval over the dataset. With these elements we show that new visual concepts can be learnt on-the-fly, given a text description, and so images of that category can then be retrieved from the dataset in realtime.

To this end we compare state of the art encoding methods and introduce a novel cascade retrieval architecture, with a focus on achieving the best trade-off between three important performance measures for a realtime system of this kind, namely: (i) class accuracy, (ii) memory footprint, and (iii) speed.

We show that an on-the-fly system is possible and compare its performance (using noisy training images) to that of using carefully curated images. For this evaluation we use the VOC 2007 dataset together with 100k images from ImageNet to act as distractors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proc. CVPR (2007)

    Google Scholar 

  2. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proc. CVPR (2006)

    Google Scholar 

  3. Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proc. ICCV., vol. 2, pp. 1470–1477 (2003)

    Google Scholar 

  4. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proc. CVPR (2010)

    Google Scholar 

  5. Perronnin, F., Sanchez, J., Liu, Y.: Large-scale image categorization with explicit data embedding. In: Proc. CVPR (2010)

    Google Scholar 

  6. Perronnin, F., Akata, Z., Harchaoui, Z., Schmid, C.: Towards good practice in large-scale learning for image classification. In: Proc. CVPR (2012)

    Google Scholar 

  7. Sánchez, J., Perronnin, F.: High-dimensional signature compression for large-scale image classification. In: Proc. CVPR (2011)

    Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Proc. CVPR (2009)

    Google Scholar 

  9. Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE PAMI (2011)

    Google Scholar 

  10. Parkhi, O.M., Vedaldi, A., Zisserman, A.: On-the-fly specific person retrieval. In: Intl. Workshop on Image Analysis for Multimedia Interactive Services. IEEE (2012)

    Google Scholar 

  11. Berg, T.L., Forsyth, D.A.: Animals on the web. In: Proc. CVPR (2006)

    Google Scholar 

  12. Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from Google’s image search. In: Proc. ICCV (2005)

    Google Scholar 

  13. Li, J., Wang, G., Fei-Fei, L.: OPTIMOL: automatic Object Picture collecTion via Incremental MOdel Learning. In: Proc. CVPR (2007)

    Google Scholar 

  14. Lin, W.H., Jin, R., Hauptmann, A.: Web Image Retrieval Re-Ranking with Relevance Model. In: Proc. ICWI (2003)

    Google Scholar 

  15. Schroff, F., Criminisi, A., Zisserman, A.: Harvesting Image Databases from the Web. IEEE PAMI 33, 754–766 (2011)

    Article  Google Scholar 

  16. Jégou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. IJCV 87, 316–336 (2010)

    Article  Google Scholar 

  17. Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: Proc. BMVC (2011)

    Google Scholar 

  18. Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NIPS, pp. 487–493. MIT Press (1998)

    Google Scholar 

  19. Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proc. CVPR (2007)

    Google Scholar 

  20. Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for Large-Scale Image Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  21. Lowe, D.: Object recognition from local scale-invariant features. In: Proc. ICCV, pp. 1150–1157 (1999)

    Google Scholar 

  22. Bosch, A., Zisserman, A., Munoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE PAMI 30 (2008)

    Google Scholar 

  23. Bergamo, A., Torresani, L., Fitzgibbon, A.: PiCoDes: Learning a compact code for novel-category recognition. In: NIPS, pp. 2088–2096 (2011)

    Google Scholar 

  24. Torresani, L., Szummer, M., Fitzgibbon, A.: Efficient Object Category Recognition Using Classemes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 776–789. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  25. Grauman, K., Darrel, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: Proc. ICCV (2005)

    Google Scholar 

  26. Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: Proc. CVPR (2006)

    Google Scholar 

  27. Jégou, H., Perronnin, F., Douze, M., Sánchez, J., P’erez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE PAMI (2011)

    Google Scholar 

  28. Jegou, H., Douze, M., Schmid, C.: Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  29. Rastegari, M., Fang, C., Torresani, L.: Scalable object-class retrieval with approximate and top-k ranking. In: Proc. ICCV (2011)

    Google Scholar 

  30. Goto, K., Kidono, K., Kimura, Y., Naito, T.: Pedestrian detection and direction estimation by cascade detector with multi-classifiers utilizing feature interaction descriptor. In: Proc. IEEE Symposium on Intelligent Vehicles, pp. 224–229 (2011)

    Google Scholar 

  31. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. IJCV 88, 303–338 (2010)

    Article  Google Scholar 

  32. Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org/

  33. Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. IEEE PAMI (2011)

    Google Scholar 

  34. Bergamo, A., Fang, C., Torresani, L.: VLG extractor software (2011)

    Google Scholar 

  35. Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for SVM. In: Proc. ICML, pp. 807–814 (2007)

    Google Scholar 

  36. Arandjelović, R., Zisserman, A.: Multiple queries for large scale specific object retrieval. In: Proc. BMVC (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chatfield, K., Zisserman, A. (2013). VISOR: Towards On-the-Fly Large-Scale Object Category Retrieval. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7725. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37444-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37444-9_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37443-2

  • Online ISBN: 978-3-642-37444-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics