Feature Learning for the Image Retrieval Task

Rana, Aakanksha; Zepeda, Joaquin; Perez, Patrick

doi:10.1007/978-3-319-16634-6_12

Aakanksha Rana¹⁵,
Joaquin Zepeda¹⁵ &
Patrick Perez¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9010))

Included in the following conference series:

Asian Conference on Computer Vision

1406 Accesses
3 Citations

Abstract

In this paper we propose a generic framework for the optimization of image feature encoders for image retrieval. Our approach uses a triplet-based objective that compares, for a given query image, the similarity scores of an image with a matching and a non-matching image, penalizing triplets that give a higher score to the non-matching image. We use stochastic gradient descent to address the resulting problem and provide the required gradient expressions for generic encoder parameters, applying the resulting algorithm to learn the power normalization parameters commonly used to condition image features. We also propose a modification to codebook-based feature encoders that consists of weighting the local descriptors as a function of their distance to the assigned codeword before aggregating them as part of the encoding process. Using the VLAD feature encoder, we show experimentally that our proposed optimized power normalization method and local descriptor weighting method yield improvements on a standard dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Although \(l_2\) normalization commonly applied to local descriptors limits the effective volume of each cell, one should note that \(l_2\) normalization amounts to a reduction of dimensionality by one dimension, and that \(l_2\)-normalized data is still high-dimensional. Yet the question still remains on whether pruning mechanisms other than those proposed herein exist that better take into account the constraints on the data layout.

References

Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British Machine Vision Conference, pp. 76.1–76.12. British Machine Vision Association (2011)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)
Article Google Scholar
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. Int. J. Comput. Vision 65, 43–72 (2005)
Article Google Scholar
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp. 2–9 (2003)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Chapter Google Scholar
Delhumeau, J., Gosselin, P.H., Jégou, H., Pérez, P.: Revisiting the VLAD image representation. In: Proceedings of ACM International Conference on Multimedia, vol. 21, pp. 653–656 (2013)
Google Scholar
Sydorov, V., Sakurada, M., Lampert, C.: Deep fisher kernels - end to end learning of the fisher kernel GMM parameters. In: Computer Vision and Pattern Recognition (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of Neural Information Processing Systems, pp. 1–9 (2012)
Google Scholar
Oquab, M., Bottou, L.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of Computer Vision and Pattern Recognition (2014)
Google Scholar
Mensink, T., Verbeek, J., Perronnin, F., Csurka, G.: Metric learning for large scale image classification: generalizing to new classes at Near-Zero cost. Pattern Anal. Mach. Intell. 34, 1704–1716 (2012)
Article Google Scholar
Brown, M., Hua, G., Winder, S.: Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33, 43–57 (2011)
Article Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Descriptor learning using convex optimisation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 243–256. Springer, Heidelberg (2012)
Chapter Google Scholar
Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of Computer Vision and Pattern Recognition, pp. 3304–3311 (2010)
Google Scholar
Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings of Computer Vision and Pattern Recognition (2012)
Google Scholar
Jegou, H., Perronnin, F., Douze, M., Jorge, S., Patrick, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell., 1–12 (2011)
Google Scholar
Jegou, H.: INRIA Holidays dataset (2014)
Google Scholar
Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree (2006)
Google Scholar
Chechik, G., Shalit, U.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11, 1109–1135 (2010)
MATH MathSciNet Google Scholar
Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, 2nd edn, pp. 421–436. Springer, Heidelberg (2012)
Chapter Google Scholar
Avila, S., Thome, N., Cord, M., Valle, E., de A. Araujo, A.: BOSSA: extended bow formalism for image classification. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 2909–2912 (2011)
Google Scholar
Avila, S., Thome, N., Cord, M., Valle, E., De A. AraúJo, A.: Pooling in image representation: the visual codeword point of view. Comput. Vis. Image Underst. 117, 453–465 (2013)
Article Google Scholar

Download references

Acknowledgement

This work was partially supported by the FP7 European integrated project AXES.

Author information

Authors and Affiliations

Technicolor R&I, 975 Avenue des Champs Blancs, CS 17616, 35576, Cesson Sevigne, France
Aakanksha Rana, Joaquin Zepeda & Patrick Perez

Authors

Aakanksha Rana
View author publications
You can also search for this author in PubMed Google Scholar
Joaquin Zepeda
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Perez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joaquin Zepeda .

Editor information

Editors and Affiliations

Center for Visual Information Technology, International Institute of Information Technology, Hyderabad, India
C. V. Jawahar
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Shiguang Shan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rana, A., Zepeda, J., Perez, P. (2015). Feature Learning for the Image Retrieval Task. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9010. Springer, Cham. https://doi.org/10.1007/978-3-319-16634-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-16634-6_12
Published: 12 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16633-9
Online ISBN: 978-3-319-16634-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics