Skip to main content

Feature Learning for the Image Retrieval Task

  • Conference paper
  • First Online:
Book cover Computer Vision - ACCV 2014 Workshops (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9010))

Included in the following conference series:

Abstract

In this paper we propose a generic framework for the optimization of image feature encoders for image retrieval. Our approach uses a triplet-based objective that compares, for a given query image, the similarity scores of an image with a matching and a non-matching image, penalizing triplets that give a higher score to the non-matching image. We use stochastic gradient descent to address the resulting problem and provide the required gradient expressions for generic encoder parameters, applying the resulting algorithm to learn the power normalization parameters commonly used to condition image features. We also propose a modification to codebook-based feature encoders that consists of weighting the local descriptors as a function of their distance to the assigned codeword before aggregating them as part of the encoding process. Using the VLAD feature encoder, we show experimentally that our proposed optimized power normalization method and local descriptor weighting method yield improvements on a standard dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Although \(l_2\) normalization commonly applied to local descriptors limits the effective volume of each cell, one should note that \(l_2\) normalization amounts to a reduction of dimensionality by one dimension, and that \(l_2\)-normalized data is still high-dimensional. Yet the question still remains on whether pruning mechanisms other than those proposed herein exist that better take into account the constraints on the data layout.

References

  1. Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British Machine Vision Conference, pp. 76.1–76.12. British Machine Vision Association (2011)

    Google Scholar 

  2. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)

    Article  Google Scholar 

  3. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. Int. J. Comput. Vision 65, 43–72 (2005)

    Article  Google Scholar 

  4. Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp. 2–9 (2003)

    Google Scholar 

  5. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Delhumeau, J., Gosselin, P.H., Jégou, H., Pérez, P.: Revisiting the VLAD image representation. In: Proceedings of ACM International Conference on Multimedia, vol. 21, pp. 653–656 (2013)

    Google Scholar 

  7. Sydorov, V., Sakurada, M., Lampert, C.: Deep fisher kernels - end to end learning of the fisher kernel GMM parameters. In: Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of Neural Information Processing Systems, pp. 1–9 (2012)

    Google Scholar 

  9. Oquab, M., Bottou, L.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  10. Mensink, T., Verbeek, J., Perronnin, F., Csurka, G.: Metric learning for large scale image classification: generalizing to new classes at Near-Zero cost. Pattern Anal. Mach. Intell. 34, 1704–1716 (2012)

    Article  Google Scholar 

  11. Brown, M., Hua, G., Winder, S.: Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33, 43–57 (2011)

    Article  Google Scholar 

  12. Simonyan, K., Vedaldi, A., Zisserman, A.: Descriptor learning using convex optimisation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 243–256. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of Computer Vision and Pattern Recognition, pp. 3304–3311 (2010)

    Google Scholar 

  14. Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings of Computer Vision and Pattern Recognition (2012)

    Google Scholar 

  15. Jegou, H., Perronnin, F., Douze, M., Jorge, S., Patrick, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell., 1–12 (2011)

    Google Scholar 

  16. Jegou, H.: INRIA Holidays dataset (2014)

    Google Scholar 

  17. Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree (2006)

    Google Scholar 

  18. Chechik, G., Shalit, U.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11, 1109–1135 (2010)

    MATH  MathSciNet  Google Scholar 

  19. Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, 2nd edn, pp. 421–436. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  20. Avila, S., Thome, N., Cord, M., Valle, E., de A. Araujo, A.: BOSSA: extended bow formalism for image classification. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 2909–2912 (2011)

    Google Scholar 

  21. Avila, S., Thome, N., Cord, M., Valle, E., De A. AraúJo, A.: Pooling in image representation: the visual codeword point of view. Comput. Vis. Image Underst. 117, 453–465 (2013)

    Article  Google Scholar 

Download references

Acknowledgement

This work was partially supported by the FP7 European integrated project AXES.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joaquin Zepeda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Rana, A., Zepeda, J., Perez, P. (2015). Feature Learning for the Image Retrieval Task. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9010. Springer, Cham. https://doi.org/10.1007/978-3-319-16634-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16634-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16633-9

  • Online ISBN: 978-3-319-16634-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics