Co-consistent Regularization with Discriminative Feature for Zero-Shot Learning

  • Yanling Tian
  • Weitong Zhang
  • Qieshi ZhangEmail author
  • Jun Cheng
  • Pengyi Hao
  • Gang Lu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11301)


With the development of deep learning, zero-shot learning (ZSL) issues deserve more attention. Due to the problems of projection domain shift and discriminative feature extraction, we propose an end-to-end framework, which is different from traditional ZSL methods in the following two aspects: (1) we use a cascaded network to automatically locate discriminative regions, which can better extract latent features and contribute to the representation of key semantic attributes. (2) our framework achieves mapping in visual-semantic embedding space and calculation procedure of the dot product in deep learning framework. In addition, a joint loss function is designed for the regularization constraint of the whole method and achieves supervised learning, which enhances generalization ability in test set. In this paper, we make some experiments on Animals with Attributes 2 (AwA2), Caltech-UCSD Birds 200-2011 (CUB) and SUN datasets, which achieves better results compared to the state-of-the-art methods.


Zero-shot learning (ZSL) Discriminative region Regularization Projection domain shift Supervised learning 



This work was supported by National Natural Science Foundation of China (61772508, 61801428, U1713213), National Key R&D Program of China (2017YFB1402100), Zhejiang Provincial Natural Science Foundation (LY18F020034), Natural Science Basic Research Plan in Shaanxi Province of China (2017JM6101, 2017JM6060, 2017JQ6077, 2017JM6103), Guangdong Technology Project (2016B010108010, 2016B010125003, 2017B010110007), CAS Key Technology Talent Program, Shenzhen Engineering Laboratory for 3D Content Generating Technologies ([2017]476), Shenzhen Technology Project (JCYJ 20170413152535587, JSGG20160331185256983, JSGG20160229115709109), Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, CAS (2014DP173025), Fundamental Research Funds for the Central Universities (GK201703060, GK201801004), Teaching Reform and Research Project of Shaanxi Normal University (17JG33).


  1. 1.
    Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. TPAMI 38(7), 1425–1438 (2016)CrossRefGoogle Scholar
  2. 2.
    Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: CVPR, pp. 2927–2936. IEEE Press, Boston (2015)Google Scholar
  3. 3.
    Blitzer, J., Foster, D.P., Kakade, S.M.: Zero-shot domain adaptation: a multi-view approach. Technical report, TTI-TR-2009-1. Toyota Technological Institute, Chicago (2009)Google Scholar
  4. 4.
    Ding, Z., Shao, M., Fu, Y.: Low-rank embedded ensemble semantic dictionary for zero-shot learning. In: CVPR, pp. 2050–2058. IEEE Press, Honolulu (2017)Google Scholar
  5. 5.
    Fouhey, D., Gupta, A., Zisserman, A.: From images to 3D shape attributes. TPAMI 1(1), 1–14 (2017)CrossRefGoogle Scholar
  6. 6.
    Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for ne-grained image recognition. In: CVPR, pp. 4476–4484. IEEE Press, Honolulu (2017)Google Scholar
  7. 7.
    Fu, Y., Hospedales, T.M., Xiang, T., Gong, S.: Learning multi-modal latent attributes. TPAMI 36(2), 303–316 (2014)CrossRefGoogle Scholar
  8. 8.
    Fu, Y., Hospedales, T.M., Xiang, T., Gong, S.: Transductive multi-view zero-shot learning. TPAMI 37(11), 2332–2345 (2015)CrossRefGoogle Scholar
  9. 9.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR, pp. 951–958. IEEE Press, Miami (2009)Google Scholar
  10. 10.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. TPAMI 36(3), 453–465 (2014)CrossRefGoogle Scholar
  11. 11.
    Lazaridou, A., Dinu, G., Baroni, M.: Hubness and pollution: delving into class-space mapping for zero-shot learning. In: IJCNLP, pp. 270–280. ACL, Beijing (2015)Google Scholar
  12. 12.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119. Curran Associates, Long Beach (2013)Google Scholar
  13. 13.
    Morgado, P., Vasconcelos, N.: Semantically consistent regularization for zero-shot recognition. In: CVPR, pp. 10–16. IEEE Press, Honolulu (2017)Google Scholar
  14. 14.
    Patterson, G., Hays, J.: Sun attribute database: discovering, annotating, and recognizing scene attributes. In: CVPR, pp. 2751–2758. IEEE Press, Providence (2012)Google Scholar
  15. 15.
    Peng, P., Tian, Y., Xiang, T., Wang, Y., Pontil, M., Huang, T.: Joint semantic and latent attribute modelling for cross-class transfer learning. TPAMI 40(7), 1625–1638 (2017)CrossRefGoogle Scholar
  16. 16.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543. ACL, Doha (2014)Google Scholar
  17. 17.
    Romera-Paredes, B., Torr, P.H.S.: An embarrassingly simple approach to zero-shot learning. Visual Attributes. ACVPR, pp. 11–30. Springer, Cham (2017). Scholar
  18. 18.
    Tian, Y., Zhang, W., Zhang, Q., Lu, G., Wu, X.: Selective multi-convolutional region feature extraction based iterative discrimination CNN for fine-grained vehicle model recognition. In: ICPR, pp. 3279–3284. IEEE Press, Beijing (2018)Google Scholar
  19. 19.
    Welinder, P., et al.: Caltech-UCSD birds 200. Technical report CNS-TR-2010-001, California Institute of Technology (CIT) (2010)Google Scholar
  20. 20.
    Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Zero-shot recognition via structured prediction. In: CVPR, pp. 69–77. IEEE Press, Las Vegas (2016)Google Scholar
  21. 21.
    Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. In: CVPR, pp. 3077–3086. IEEE Press, Honolulu (2017)Google Scholar
  22. 22.
    Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: CVPR, pp. 6034–6042. IEEE Press, Las Vegas (2016)Google Scholar
  23. 23.
    Zhang, Z., Saligrama, V.: Zero-shot recognition via structured prediction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 533–548. Springer, Cham (2016). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yanling Tian
    • 1
    • 2
    • 3
  • Weitong Zhang
    • 1
    • 2
    • 3
  • Qieshi Zhang
    • 1
    • 2
    Email author
  • Jun Cheng
    • 1
    • 2
  • Pengyi Hao
    • 4
  • Gang Lu
    • 3
  1. 1.Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology, Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenChina
  2. 2.The Chinese University of Hong KongHong KongChina
  3. 3.School of Computer ScienceShaanxi Normal UniversityXi’anChina
  4. 4.Zhejiang University of TechnologyHangzhouChina

Personalised recommendations