Advertisement

Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition

  • Huajie Jiang
  • Ruiping WangEmail author
  • Shiguang Shan
  • Xilin Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11214)

Abstract

Zero-shot learning (ZSL) aims to recognize objects of novel classes without any training samples of specific classes, which is achieved by exploiting the semantic information and auxiliary datasets. Recently most ZSL approaches focus on learning visual-semantic embeddings to transfer knowledge from the auxiliary datasets to the novel classes. However, few works study whether the semantic information is discriminative or not for the recognition task. To tackle such problem, we propose a coupled dictionary learning approach to align the visual-semantic structures using the class prototypes, where the discriminative information lying in the visual space is utilized to improve the less discriminative semantic space. Then, zero-shot recognition can be performed in different spaces by the simple nearest neighbor approach using the learned class prototypes. Extensive experiments on four benchmark datasets show the effectiveness of the proposed approach.

Keywords

Zero-shot learning Visual-semantic structures Coupled dictionary learning Class prototypes 

Notes

Acknowledgements

This work is partially supported by Natural Science Foundation of China under contracts Nos. 61390511, 61772500, 973 Program under contract No. 2015CB351802, Frontier Science Key Research Project CAS No. QYZDJ-SSW-JSC009, and Youth Innovation Promotion Association CAS No. 2015085.

Supplementary material

474197_1_En_8_MOESM1_ESM.pdf (1.8 mb)
Supplementary material 1 (pdf 1834 KB)

References

  1. 1.
    Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: Proceedings of Computer Vision and Pattern Recognition, pp. 819–826 (2013)Google Scholar
  2. 2.
    Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2927–2936 (2015)Google Scholar
  3. 3.
    Bucher, M., Herbin, S., Jurie, F.: Improving semantic embedding consistency by metric learning for zero-shot classiffication. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 730–746. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_44CrossRefGoogle Scholar
  4. 4.
    Changpinyo, S., Chao, W.L., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: Proceedings of Computer Vision and Pattern Recognition, pp. 5327–5336 (2016)Google Scholar
  5. 5.
    Changpinyo, S., Chao, W.L., Sha, F.: Predicting visual exemplars of unseen classes for zero-shot learning. In: Proceedings of International Conference on Computer Vision, pp. 3496–3505 (2017)Google Scholar
  6. 6.
    Chao, W.-L., Changpinyo, S., Gong, B., Sha, F.: An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 52–68. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_4CrossRefGoogle Scholar
  7. 7.
    Demirel, B., Cinbis, R.G., Ikizler-Cinbis, N.: Attributes2Classname: a discriminative model for attribute-based unsupervised zero-shot learning. In: Proceedings of International Conference on Computer Vision, pp. 1241–1250 (2017)Google Scholar
  8. 8.
    Ding, Z., Shao, M., Fu, Y.: Low-rank embedded ensemble semantic dictionary for zero-shot learning. In: Proceedings of Computer Vision and Pattern Recognition, pp. 6005–6013 (2017)Google Scholar
  9. 9.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Proceedings of Computer Vision and Pattern Recognition, pp. 1778–1785 (2009)Google Scholar
  10. 10.
    Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Proceedings of Advances in Neural Information Processing Systems, pp. 2121–2129 (2013)Google Scholar
  11. 11.
    Fu, Y., Hospedales, T.M., Xiang, T., Gong, S.: Transductive multi-view zero-shot learning. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2332–2345 (2015)CrossRefGoogle Scholar
  12. 12.
    Fu, Z.Y., Xiang, T.A., Kodirov, E., Gong, S.: Zero-shot object recognition by semantic manifold distance. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2635–2644 (2015)Google Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  14. 14.
    Jiang, H., Wang, R., Shan, S., Yang, Y., Chen, X.: Learning discriminative latent attributes for zero-shot classification. In: Proceedings of International Conference on Computer Vision, pp. 4233–4242 (2017)Google Scholar
  15. 15.
    Kodirov, E., Xiang, T., Fu, Z.Y., Gong, S.: Unsupervised domain adaptation for zero-shot learning. In: Proceedings of International Conference on Computer Vision, pp. 2452–2460 (2015)Google Scholar
  16. 16.
    Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: Proceedings of Computer Vision and Pattern Recognition, pp. 4447–4456 (2017)Google Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  18. 18.
    Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: Proceedings of Computer Vision and Pattern Recognition, pp. 951–958 (2009)Google Scholar
  20. 20.
    Long, Y., Liu, L., Shen, F., Shao, L., Li, X.: Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE Trans. Pattern Anal. Mach. Intell. 40(10), 2498–2512 (2018)CrossRefGoogle Scholar
  21. 21.
    van der Maaten, L., Hinton, G.E.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  22. 22.
    Morgado, P., Vasconcelos, N.: Semantically consistent regularization for zero-shot recognition. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2037–2046 (2017)Google Scholar
  23. 23.
    Norouzi, M., et al.: Zero-shot learning by convex combination of semantic embeddings. In: Proceedings of International Conference on Learning Representations (2014)Google Scholar
  24. 24.
    Paredes, B.R., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: Proceedings of International Conference on Machine Learning, pp. 2152–2161 (2015)Google Scholar
  25. 25.
    Patterson, G., Xu, C., Su, H., Hays, J.: The SUN attribute database: beyond categories for deeper scene understanding. Int. J. Comput. Vis. 108(1–2), 59–81 (2014)CrossRefGoogle Scholar
  26. 26.
    Reed, S.E., Akata, Z., Schiele, B., Lee, H.: Learning deep representations of fine-grained visual descriptions. In: Proceedings of Computer Vision and Pattern Recognition, pp. 49–58 (2016)Google Scholar
  27. 27.
    Romera-Paredes, B., Torr, P.H.S.: An embarrassingly simple approach to zero-shot learning. In: Proceedings of International Conference on Machine Learning (2015)Google Scholar
  28. 28.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  30. 30.
    Socher, R., Ganjoo, M., Sridhar, H., Bastani, O., Manning, C.D., Ng, A.Y.: Zero-shot learning through cross-modal transfer. In: Proceedings of Advances in Neural Information Processing Systems, pp. 935–943 (2013)Google Scholar
  31. 31.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  32. 32.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset. Technical report (2011)Google Scholar
  33. 33.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.J.: The caltech-UCSD birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology (2011)Google Scholar
  34. 34.
    Wang, L., Li, Y., Lazebnik, S.: Learning deep structure-preserving image-text embeddings. In: Proceedings of Computer Vision and Pattern Recognition, pp. 5005–5013 (2016)Google Scholar
  35. 35.
    Xian, Y., Akata, Z., Sharma, G., Nguyen, Q.N., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: Proceedings of Computer Vision and Pattern Recognition, pp. 69–77 (2016)Google Scholar
  36. 36.
    Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning - the good, the bad and the ugly. In: Proceedings of Computer Vision and Pattern Recognition (2017)Google Scholar
  37. 37.
    Xu, X., Shen, F., Yang, Y., Zhang, D., Shen, H.T., Song, J.: Matrix tri-factorization with manifold regularizations for zero-shot learning. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2007–2016 (2017)Google Scholar
  38. 38.
    Zhang, L., Xiang, T., Gong, S.: Learning a deep embedding model for zero-shot learning. In: Proceedings of Computer Vision and Pattern Recognition, pp. 3010–3019 (2017)Google Scholar
  39. 39.
    Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: Proceedings of International Conference on Computer Vision, pp. 4166–4174 (2015)Google Scholar
  40. 40.
    Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: Proceedings of Computer Vision and Pattern Recognition, pp. 6034–6042 (2016)Google Scholar
  41. 41.
    Zhu, X., Anguelov, D., Ramanan, D.: Capturing long-tail distributions of object subcategories. In: Proceedings of Computer Vision and Pattern Recognition, pp. 915–922 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Institute of Computing Technology, CASBeijingChina
  2. 2.Shanghai Institute of Microsystem and Information Technology, CASShanghaiChina
  3. 3.ShanghaiTech UniversityShanghaiChina
  4. 4.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations