A Joint Generative Model for Zero-Shot Learning

  • Rui Gao
  • Xingsong HouEmail author
  • Jie Qin
  • Li Liu
  • Fan Zhu
  • Zhao Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11132)


Zero-shot learning (ZSL) is a challenging task due to the lack of data from unseen classes during training. Existing methods tend to have the strong bias towards seen classes, which is also known as the domain shift problem. To mitigate the gap between seen and unseen class data, we propose a joint generative model to synthesize features as the replacement for unseen data. Based on the generated features, the conventional ZSL problem can be tackled in a supervised way. Specifically, our framework integrates Variational Autoencoders (VAE) and Generative Adversarial Networks (GAN) conditioned on class-level semantic attributes for feature generation based on element-wise and holistic reconstruction. A categorization network acts as the additional guide to generate features beneficial for the subsequent classification task. Moreover, we propose a perceptual reconstruction loss to preserve semantic similarities. Experimental results on five benchmarks show the superiority of our framework over the state-of-the-art approaches in terms of both conventional ZSL and generalized ZSL settings.


Zero-shot learning Variational autoencoder Generative adversarial network Perceptual reconstruction 



This work was supported in part by the NSFC under Grant 61872286, u1531141, 61732008, 61772407 and 61701391, the National Key R&D Program of China under Grant 2017YFF0107700, the National Science Foundation of Shaanxi Province under Grant 2018JM6092, and Guangdong Provincial Science and Technology Plan Project under Grant 2017A010101006 and 2016A010101005.


  1. 1.
    Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: CVPR (2013)Google Scholar
  2. 2.
    Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: CVPR (2015)Google Scholar
  3. 3.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. In: ICML (2017)Google Scholar
  4. 4.
    Bao, J., Chen, D., Wen, F., Li, H., Hua, G.: CVAE-GAN: fine-grained image generation through asymmetric training. In: ICCV (2017)Google Scholar
  5. 5.
    Bucher, M., Herbin, S., Jurie, F.: Generating visual representations for zero-shot classification. In: ICCV Workshop (2017)Google Scholar
  6. 6.
    Changpinyo, S., Chao, W.L., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: CVPR (2016)Google Scholar
  7. 7.
    Chao, W.-L., Changpinyo, S., Gong, B., Sha, F.: An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part II. LNCS, vol. 9906, pp. 52–68. Springer, Cham (2016). Scholar
  8. 8.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)Google Scholar
  9. 9.
    Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2008)Google Scholar
  10. 10.
    Frome, A., et al.: Devise: A deep visual-semantic embedding model. In: NIPS (2013)Google Scholar
  11. 11.
    Fu, Y., Hospedales, T.M., Xiang, T., Gong, S.: Attribute learning for understanding unstructured social activity. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 530–543. Springer, Heidelberg (2012). Scholar
  12. 12.
    Fu, Y., Hospedales, T.M., Xiang, T., Gong, S.: Transductive multi-view zero-shot learning. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2332–2345 (2015)CrossRefGoogle Scholar
  13. 13.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  14. 14.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part II. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). Scholar
  15. 15.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)Google Scholar
  16. 16.
    Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: CVPR (2017)Google Scholar
  17. 17.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)Google Scholar
  18. 18.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)CrossRefGoogle Scholar
  19. 19.
    Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: AAAI (2008)Google Scholar
  20. 20.
    Larochelle, H., Murray, I.: The neural autoregressive distribution estimator. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (2011)Google Scholar
  21. 21.
    Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: ICML (2016)Google Scholar
  22. 22.
    Li, Y., Swersky, K., Zemel, R.: Generative moment matching networks. In: ICML (2015)Google Scholar
  23. 23.
    Long, Y., Liu, L., Shao, L.: Towards fine-grained open zero-shot learning: inferring unseen visual features from attributes. In: WACV (2017)Google Scholar
  24. 24.
    Long, Y., Liu, L., Shao, L., Shen, F., Ding, G., Han, J.: From zero-shot learning to conventional supervised classification: unseen visual data synthesis. In: CVPR (2017)Google Scholar
  25. 25.
    Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. arXiv preprint arXiv:1511.05644 (2015)
  26. 26.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. In: Computer Science, pp. 2672–2680 (2014)Google Scholar
  27. 27.
    Norouzi, M., et al.: Zero-shot learning by convex combination of semantic embeddings. In: ICLR (2014)Google Scholar
  28. 28.
    Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: NIPS (2009)Google Scholar
  29. 29.
    Patterson, G., Hays, J.: Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: CVPR (2012)Google Scholar
  30. 30.
    Qin, J., et al.: Zero-shot action recognition with error-correcting output codes. In: CVPR (2017)Google Scholar
  31. 31.
    Qin, J., Wang, Y., Liu, L., Chen, J., Shao, L.: Beyond semantic attributes: discrete latent attributes learning for zero-shot recognition. IEEE Signal Process. Lett. 23(11), 1667–1671 (2016)CrossRefGoogle Scholar
  32. 32.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. Computer Science (2015)Google Scholar
  33. 33.
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: ICML (2016)Google Scholar
  34. 34.
    Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: CVPR (2011)Google Scholar
  35. 35.
    Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: ICML (2015)Google Scholar
  36. 36.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: NIPS (2016)Google Scholar
  37. 37.
    Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: NIPS (2013)Google Scholar
  38. 38.
    Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NIPS (2015)Google Scholar
  39. 39.
    Verma, V.K., Arora, G., Mishra, A.: Generalized zero-shot learning via synthesized examples. In: CVPR (2018)Google Scholar
  40. 40.
    Welinder, P., et al.: Caltech-ucsd birds 200. California Institute of Technology (2010)Google Scholar
  41. 41.
    Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: CVPR (2016)Google Scholar
  42. 42.
    Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. In: CVPR (2017)Google Scholar
  43. 43.
    Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: CVPR (2018)Google Scholar
  44. 44.
    Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. Computer Science (2015)Google Scholar
  45. 45.
    Yu, X., Aloimonos, Y.: Attribute-based transfer learning for object categorization with zero/one training example. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 127–140. Springer, Heidelberg (2010). Scholar
  46. 46.
    Zhang, L., et al.: Learning a deep embedding model for zero-shot learning. In: CVPR (2017)Google Scholar
  47. 47.
    Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: ICCV (2015)Google Scholar
  48. 48.
    Zhang, Z., Saligrama, V.: Zero-shot recognition via structured prediction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part VII. LNCS, vol. 9911, pp. 533–548. Springer, Cham (2016). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Rui Gao
    • 1
  • Xingsong Hou
    • 1
    • 5
    Email author
  • Jie Qin
    • 2
  • Li Liu
    • 3
  • Fan Zhu
    • 3
  • Zhao Zhang
    • 4
  1. 1.School of Electronic and Information EngineeringXi’an Jiaotong UniversityXi’anChina
  2. 2.Computer Vision LaboratoryETH ZurichZürichSwitzerland
  3. 3.Inception Institute of Artificial IntelligenceAbu DhabiUAE
  4. 4.Soochow UniversitySuzhouChina
  5. 5.Guangdong Xi’an Jiaotong University AcademyGuangdongChina

Personalised recommendations