Abstract
In this paper we propose a generic framework for the optimization of image feature encoders for image retrieval. Our approach uses a triplet-based objective that compares, for a given query image, the similarity scores of an image with a matching and a non-matching image, penalizing triplets that give a higher score to the non-matching image. We use stochastic gradient descent to address the resulting problem and provide the required gradient expressions for generic encoder parameters, applying the resulting algorithm to learn the power normalization parameters commonly used to condition image features. We also propose a modification to codebook-based feature encoders that consists of weighting the local descriptors as a function of their distance to the assigned codeword before aggregating them as part of the encoding process. Using the VLAD feature encoder, we show experimentally that our proposed optimized power normalization method and local descriptor weighting method yield improvements on a standard dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Although \(l_2\) normalization commonly applied to local descriptors limits the effective volume of each cell, one should note that \(l_2\) normalization amounts to a reduction of dimensionality by one dimension, and that \(l_2\)-normalized data is still high-dimensional. Yet the question still remains on whether pruning mechanisms other than those proposed herein exist that better take into account the constraints on the data layout.
References
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British Machine Vision Conference, pp. 76.1–76.12. British Machine Vision Association (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. Int. J. Comput. Vision 65, 43–72 (2005)
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp. 2–9 (2003)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Delhumeau, J., Gosselin, P.H., Jégou, H., Pérez, P.: Revisiting the VLAD image representation. In: Proceedings of ACM International Conference on Multimedia, vol. 21, pp. 653–656 (2013)
Sydorov, V., Sakurada, M., Lampert, C.: Deep fisher kernels - end to end learning of the fisher kernel GMM parameters. In: Computer Vision and Pattern Recognition (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of Neural Information Processing Systems, pp. 1–9 (2012)
Oquab, M., Bottou, L.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of Computer Vision and Pattern Recognition (2014)
Mensink, T., Verbeek, J., Perronnin, F., Csurka, G.: Metric learning for large scale image classification: generalizing to new classes at Near-Zero cost. Pattern Anal. Mach. Intell. 34, 1704–1716 (2012)
Brown, M., Hua, G., Winder, S.: Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33, 43–57 (2011)
Simonyan, K., Vedaldi, A., Zisserman, A.: Descriptor learning using convex optimisation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 243–256. Springer, Heidelberg (2012)
Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of Computer Vision and Pattern Recognition, pp. 3304–3311 (2010)
Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings of Computer Vision and Pattern Recognition (2012)
Jegou, H., Perronnin, F., Douze, M., Jorge, S., Patrick, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell., 1–12 (2011)
Jegou, H.: INRIA Holidays dataset (2014)
Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree (2006)
Chechik, G., Shalit, U.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11, 1109–1135 (2010)
Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, 2nd edn, pp. 421–436. Springer, Heidelberg (2012)
Avila, S., Thome, N., Cord, M., Valle, E., de A. Araujo, A.: BOSSA: extended bow formalism for image classification. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 2909–2912 (2011)
Avila, S., Thome, N., Cord, M., Valle, E., De A. AraúJo, A.: Pooling in image representation: the visual codeword point of view. Comput. Vis. Image Underst. 117, 453–465 (2013)
Acknowledgement
This work was partially supported by the FP7 European integrated project AXES.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Rana, A., Zepeda, J., Perez, P. (2015). Feature Learning for the Image Retrieval Task. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9010. Springer, Cham. https://doi.org/10.1007/978-3-319-16634-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-16634-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16633-9
Online ISBN: 978-3-319-16634-6
eBook Packages: Computer ScienceComputer Science (R0)