Advertisement

Attention-Based Ensemble for Deep Metric Learning

  • Wonsik Kim
  • Bhavya Goyal
  • Kunal Chawla
  • Jungmin Lee
  • Keunjoo KwonEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)

Abstract

Deep metric learning aims to learn an embedding function, modeled as deep neural network. This embedding function usually puts semantically similar images close while dissimilar images far from each other in the learned embedding space. Recently, ensemble has been applied to deep metric learning to yield state-of-the-art results. As one important aspect of ensemble, the learners should be diverse in their feature embeddings. To this end, we propose an attention-based ensemble, which uses multiple attention masks, so that each learner can attend to different parts of the object. We also propose a divergence loss, which encourages diversity among the learners. The proposed method is applied to the standard benchmarks of deep metric learning and experimental results show that it outperforms the state-of-the-art methods by a significant margin on image retrieval tasks.

Keywords

Attention Ensemble Deep metric learning 

Supplementary material

474172_1_En_45_MOESM1_ESM.pdf (181 kb)
Supplementary material 1 (pdf 180 KB)

References

  1. 1.
    Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. In: International Conference on Learning Representations (2015)Google Scholar
  2. 2.
    Bachman, P., Alsharif, O., Precup, D.: Learning with pseudo-ensembles. In: Advances in Neural Information Processing Systems (2014)Google Scholar
  3. 3.
    Bell, S., Bala, K.: Learning visual similarity for product design with convolutional neural networks. Graphics 34(4), 98 (2015)Google Scholar
  4. 4.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Computer Vision and Pattern Recognition (2005)Google Scholar
  5. 5.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics (2010)Google Scholar
  6. 6.
    Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Computer Vision and Pattern Recognition (2006)Google Scholar
  7. 7.
    Harwood, B., VijayKumarB., G., Carneiro, G., Reid, I.D., Drummond, T.: Smart mining for deep metric learning. In: International Conference on Computer Vision (2017)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (2016)Google Scholar
  9. 9.
    Huang, C., Loy, C.C., Tang, X.: Local similarity-aware deep feature embedding. In: Advances in Neural Information Processing Systems (2016)Google Scholar
  10. 10.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)Google Scholar
  11. 11.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (2015)Google Scholar
  12. 12.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: International Conference on Multimedia (2014)Google Scholar
  13. 13.
    Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: Workshop on 3D Representation and Recognition (2013)Google Scholar
  14. 14.
    Law, M.T., Urtasun, R., Zemel, R.S.: Deep spectral clustering learning. In: International Conference on Machine Learning (2017)Google Scholar
  15. 15.
    Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics (2015)Google Scholar
  16. 16.
    Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D., Batra, D.: Why M heads are better than one: training a diverse ensemble of deep networks. arXiv preprint arXiv:1511.06314 (2015)
  17. 17.
    Liu, X., Xia, T., Wang, J., Lin, Y.: Fully convolutional attention localization networks: efficient attention localization for fine-grained recognition. arXiv preprint arXiv:1603.06765 (2016)
  18. 18.
    Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: Computer Vision and Pattern Recognition (2016)Google Scholar
  19. 19.
    Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems (2014)Google Scholar
  20. 20.
    Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: International Conference on Computer Vision (2017)Google Scholar
  21. 21.
    Opitz, M., Possegger, H., Bischof, H.: Efficient model averaging for deep neural networks. In: Asian Conference on Computer Vision (2016)Google Scholar
  22. 22.
    Opitz, M., Waltner, G., Possegger, H., Bischof, H.: BIER-boosting independent embeddings robustly. In: International Conference on Computer Vision (2017)Google Scholar
  23. 23.
    Opitz, M., Waltner, G., Possegger, H., Bischof, H.: Deep metric learning with BIER: boosting independent embeddings robustly. arXiv preprint arXiv:1801.04815 (2018)
  24. 24.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Computer Vision and Pattern Recognition (2015)Google Scholar
  26. 26.
    Sermanet, P., Frome, A., Real, E.: Attention for fine-grained categorization. In: International Conference on Learning Representations Workshop (2015)Google Scholar
  27. 27.
    Sohn, K.: Improved deep metric learning with multi-class N-pair loss objective. In: Advances in Neural Information Processing Systems (2016)Google Scholar
  28. 28.
    Song, H.O., Jegelka, S., Rathod, V., Murphy, K.: Deep metric learning via facility location. In: Computer Vision and Pattern Recognition (2017)Google Scholar
  29. 29.
    Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Computer Vision and Pattern Recognition (2016)Google Scholar
  30. 30.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (2014)Google Scholar
  32. 32.
    Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (2015)Google Scholar
  33. 33.
    Ustinova, E., Lempitsky, V.: Learning deep embeddings with histogram loss. In: Advances in Neural Information Processing Systems (2016)Google Scholar
  34. 34.
    Veit, A., Wilber, M.J., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems (2016)Google Scholar
  35. 35.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)Google Scholar
  36. 36.
    Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: International Conference on Computer Vision (2017)Google Scholar
  37. 37.
    Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10(2), 207–244 (2009)zbMATHGoogle Scholar
  38. 38.
    Wu, C.Y., Manmatha, R., Smola, A.J., Krähenbühl, P.: Sampling matters in deep embedding learning. In: International Conference on Computer Vision (2017)Google Scholar
  39. 39.
    Yuan, Y., Yang, K., Zhang, C.: Hard-aware deeply cascaded embedding. In: International Conference on Computer Vision (2017)Google Scholar
  40. 40.
    Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. Multimedia 19(6), 1245–1256 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Wonsik Kim
    • 1
  • Bhavya Goyal
    • 1
  • Kunal Chawla
    • 1
  • Jungmin Lee
    • 1
  • Keunjoo Kwon
    • 1
    Email author
  1. 1.Samsung ResearchSamsung ElectronicsSeoulKorea

Personalised recommendations