International Journal of Computer Vision

, Volume 127, Issue 11–12, pp 1723–1737 | Cite as

Understanding and Improving Kernel Local Descriptors

  • Arun MukundanEmail author
  • Giorgos Tolias
  • Andrei Bursuc
  • Hervé Jégou
  • Ondřej Chum


We propose a multiple-kernel local-patch descriptor based on efficient match kernels from pixel gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch mis-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Combined with whitening of the descriptor space, that is learned with or without supervision, the performance is significantly improved. We analyze the effect of the whitening on patch similarity and demonstrate its semantic meaning. Our unsupervised variant is the best performing descriptor constructed without the need of labeled data. Despite the simplicity of the proposed descriptor, it competes well with deep learning approaches on a number of different tasks.



The authors were supported by the OP VVV funded Project CZ.02.1.01/0.0/0.0/16_019/0000765 “Research Center for Informatics” and MSMT LL1303 ERC-CZ Grant. Arun Mukundan was supported by the CTU student Grant SGS17/185/OHK3/3T/13. We would like to thank Karel Lenc and Vassileios Balntas for their valuable help with the HPatches benchmark, and Dmytro Mishkin for providing the L2Net and HardNet descriptors for the HPatches dataset.


  1. Ahonen, T., Matas, J., He, C., & Pietikäinen, M. (2009). Rotation invariant image description with local binary pattern histogram fourier features. In Scandinavian conference on image analysis (pp. 61–70). Berlin.Google Scholar
  2. Alahi, A., Ortiz, R., & Vandergheynst, P. (2012). Reak: fast retina keypoint. In CVPR.Google Scholar
  3. Ambai, M., & Yoshida, Y. (2011). Card: Compact and real-time descriptors. In ICCV.Google Scholar
  4. Arandjelovic, & R., Zisserman, A., (2012). Three things everyone should know to improve object retrieval. In CVPR.Google Scholar
  5. Babenko, A., & Lempitsky, V. (2015). Aggregating deep convolutional features for image retrieval. In ICCV.Google Scholar
  6. Balntas, V., Johns, E., Tang, L., & Mikolajczyk, K. (2016). PN-Net: Conjoined triple deep network for learning local image descriptors. arXiv preprint arXiv:1601.05030
  7. Balntas, V., Riba, E., Ponsa, D., & Mikolajczyk, K. (2016). Learning local feature descriptors with triplets and shallow convolutional neural networks. In BMVC.Google Scholar
  8. Balntas, V., Lenc, K., Vedaldi, A., & Mikolajczyk, K. (2017). Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. In CVPR.Google Scholar
  9. Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Networkdissection: Quantifying interpretabilityof deep visual representations. In CVPR (pp. 3319–3327). IEEE.Google Scholar
  10. Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). CVIU, 110(3), 346–359.Google Scholar
  11. Bo, L., Ren, X., & Fox, D. (2010). Kernel descriptors for visual recognition. In NIPS.Google Scholar
  12. Bo, L., Ren, X., & Fox, D. (2011). Depth kernel descriptors for object recognition. In IROS.Google Scholar
  13. Bo, L., & Sminchisescu, C. (2009). Efficient match kernels between sets of features for visual recognition. In NIPS.Google Scholar
  14. Brown, M., Hua, G., & Winder, S. (2011). Discriminative learning of local image descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 43–57.CrossRefGoogle Scholar
  15. Brown, M., Szeliski, R., & Winder, S. (2005). Multi-image matching using multi-scale oriented patches. CVPR, 1, 510–517.Google Scholar
  16. Bursuc, A., Tolias, G., & Jégou, H. Kernel. (2015). local descriptors with implicit rotation matching. In ICMR.Google Scholar
  17. Calonder, M., Lepetit, V., Strecha, C., & Fua, P. (2010). Brief: Binary robust independent elementary features. In ECCV.Google Scholar
  18. Chum, O. (2015). Low dimensional explicit feature maps. In ICCV.Google Scholar
  19. Delhumeau, J., Gosselin, P. H., Jégou, H., & Pérez, P. (2013). Revisiting the VLAD image representation. In ACM multimedia.Google Scholar
  20. Dong, J., & Soatto, S. (2015). Domain-size pooling in local descriptors: Dsp-sift. In CVPR.Google Scholar
  21. Forssén, P.E., & Lowe, D.G. (2007). Shape descriptors for maximally stable extremal regions. In IEEE 11th international conference on computer vision, 2007. ICCV 2007 (pp. 1–8). IEEEGoogle Scholar
  22. Frahm, J. M., Fite-Georgel, P., Gallup, D., Johnson, T., Raguram, R., Wu, C., et al. (2010). Building rome on a cloudless day. In ECCV.Google Scholar
  23. Han, X., Leung, T., Jia, Y., Sukthankar, R., & Berg, A. C. (2015). Matchnet: Unifying feature and metric learning for patch-based matching. In CVPR.Google Scholar
  24. Heikkila, M., Pietikainen, M., & Schmid, C. (2009). Description of interest regions with local binary patterns. Pattern Recognition, 42(3), 425–436.CrossRefGoogle Scholar
  25. Heinly, J., Schonberger, J. L., Dunn, E., & Frahm, J. M. (2015). Reconstructing the world* in six days*(as captured by the yahoo 100 million image dataset). In CVPR.Google Scholar
  26. Jaderberg, M., Simonyan, K., & Zisserman, A., et al. (2015). Spatial transformer networks. InNIPS (pp. 2017–2025)Google Scholar
  27. Jégou, H., & Chum, O. (2012). Negative evidences and co-occurrences in image retrieval: The benefit of PCA and whitening. In ECCV.Google Scholar
  28. Ke, Y., & Sukthankar, R. (2004). PCA-SIFT: a more distinctive representation for local image descriptors. In CVPR (pp. 506–513).Google Scholar
  29. Kokkinos, I., & Yuille, A. (2008). Scale invariance without scale selection. In CVPR.Google Scholar
  30. Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1265–1278.CrossRefGoogle Scholar
  31. Ledoit, O., & Wolf, M. (2004). Honey, i shrunk the sample covariance matrix. The Journal of Portfolio Management, 30(4), 110–119.CrossRefGoogle Scholar
  32. Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411.MathSciNetCrossRefGoogle Scholar
  33. Leutenegger, S., Chli, M., & Siegwart, R. Y. Brisk. (2011). Binary robust invariant scalable keypoints. In ICCV.Google Scholar
  34. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.CrossRefGoogle Scholar
  35. Mahendran, A., & Vedaldi, A. (2016). Visualizing deep convolutional neural networks using natural pre-images. IJCV, 120(3), 233–255.MathSciNetCrossRefGoogle Scholar
  36. Mairal, J., Koniusz, P., Harchaoui, Z., & Schmid, C. (2014). Convolutional kernel networks. In NIPS (pp. 2627–2635).Google Scholar
  37. Mikolajczyk, K., & Matas, J. (2007). Improving descriptors for fast tree matching by optimal linear projection. In ICCV.Google Scholar
  38. Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.CrossRefGoogle Scholar
  39. Mishchuk, A., Mishkin, D., Radenovic, F., & Matas, J. (2017). Working hard to know your neighbor’s margins: Local descriptor learning loss. In NIPS.Google Scholar
  40. Mishkin, D., Matas, J., Perdoch, M., & Lenc, K. (2015). WxBS: Wide baseline stereo generalizations. arXiv preprint arXiv:1504.06603
  41. Mukundan, A., Tolias, G., & Chum, O. (2017). Multiple-kernel local-patch descriptor. In BMVC.Google Scholar
  42. Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.CrossRefGoogle Scholar
  43. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42(3), 145–175.CrossRefGoogle Scholar
  44. Paulin, M., Douze, M., Harchaoui, Z., Mairal, J., Perronin, F., & Schmid, C. (2015). Local convolutional features with unsupervised training for image retrieval. In ICCV.Google Scholar
  45. Paulin, M., Mairal, J., Douze, M., Harchaoui, Z., Perronnin, F., & Schmid, C. (2017). Convolutional patch representations for image retrieval: An unsupervised approach. ICCV, 121(1), 149–168.Google Scholar
  46. Philbin, J., Isard, M., Sivic, J., & Zisserman, A. (2010). Descriptor learning for efficient retrieval. In ECCV.Google Scholar
  47. Radenović, F., Tolias, G., & Chum, O. (2016). CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In ECCV.Google Scholar
  48. Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). Orb: An efficient alternative to sift or surf. In ICCV.Google Scholar
  49. Schmid, C., & Mohr, R. (1997). Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530–535.CrossRefGoogle Scholar
  50. Schonberger, J. L., & Frahm, J. M. (2016). Structure-from-motion revisited. In CVPR.Google Scholar
  51. Schönberger, J. L., Hardmeier, H., Sattler, T., & Pollefeys, M. (2017). Comparative evaluation of hand-crafted and learned local features. In CVPR.Google Scholar
  52. Schönberger, J. L., Radenović, F., Chum, O., & Frahm, J. M. (2015). From single image query to detailed 3D reconstruction. In CVPR.Google Scholar
  53. Scovanner, P., Ali, S., & Shah, M. (2007). A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th ACM international conference on multimedia (pp. 357–360).Google Scholar
  54. Shechtman, E., & Irani, M. (2007). Matching local self-similarities across images and videos. In CVPR (p. (pp. 1–8). IEEE.Google Scholar
  55. Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., & Moreno-Noguer, F. (2015). Discriminative learning of deep convolutional feature point descriptors. In ICCV.Google Scholar
  56. Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Learning local feature descriptors using convex optimisation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8), 1573–1585.CrossRefGoogle Scholar
  57. Taira, H., Torii, A., & Okutomi, M. (2016). Robust feature matching by learning descriptor covariance with viewpoint synthesis. In ICPR.Google Scholar
  58. Tian, B. F. Y., & Wu, F. (2017). L2-net: Deep learning of discriminative patch descriptor in euclidean space. In CVPR.Google Scholar
  59. Tola, E., Lepetit, V., & Fua, P. (2010). Daisy: An efficient dense descriptor applied to wide-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5), 815–830.CrossRefGoogle Scholar
  60. Tolias, G., Bursuc, A., Furon, T., & Jégou, H. (2015). Rotation and translation covariant match kernels for image retrieval. CVIU, 140, 9–20.Google Scholar
  61. Trzcinski, T., Christoudias, M., Lepetit, V., & Fua, P. (2012). Learning image descriptors with the boosting-trick. In NIPS Google Scholar
  62. van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596.CrossRefGoogle Scholar
  63. Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In CVPR.Google Scholar
  64. Vedaldi, A., & Zisserman, A. (2012). Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 480–492.CrossRefGoogle Scholar
  65. Wang, P., Wang, J., Zeng, G., Xu, W., Zha, H., & Li, S. (2013). Supervised kernel descriptors for visual recognition. In CVPR.Google Scholar
  66. Winder, S., & Brown, M. (2007). Learning local image descriptors. In CVPR.Google Scholar
  67. Yi, K. M., Trulls, E., Lepetit, V., & Fua, P. (2016). Lift: Learned invariant feature transform. In ECCV (pp. 467–483). Springer.Google Scholar
  68. Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., & Lipson, H. (2015). Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579
  69. Yu, G., & Morel, J. M. (2009). A fully affine invariant image comparison method. In ICASSP. (pp. 1597–1600). IEEE.Google Scholar
  70. Zagoruyko, S., & Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. In CVPR.Google Scholar
  71. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV.Google Scholar
  72. Zhou, L., Zhu, S., Shen, T., Wang, J., Fang, T., & Quan, L. (2017). Progressive large scale-invariant image matching in scale space. In ICCV.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.VRG, FEECTU in PraguePragueCzech Republic
  2. 2.Valeo.aiStrašnice, ParisFrance
  3. 3.Facebook AI ResearchPrague, ParisFrance

Personalised recommendations