Triple discriminator generative adversarial network for zero-shot image classification

Abstract

One key challenge in zero-shot classification (ZSC) is the exploration of knowledge hidden in unseen classes. Generative methods such as generative adversarial networks (GANs) are typically employed to generate the visual information of unseen classes. However, the majority of these methods exploit global semantic features while neglecting the discriminative differences of local semantic features when synthesizing images, which may lead to sub-optimal results. In fact, local semantic information can provide more discriminative knowledge than global information can. To this end, this paper presents a new triple discriminator GAN for ZSC called TDGAN, which incorporates a text-reconstruction network into a dual discriminator GAN (D2GAN), allowing to realize cross-modal mapping from text descriptions to their visual representations. The text-reconstruction network focuses on key text descriptions for aligning semantic relationships to enable synthetic visual features to effectively represent images. Sharma-Mittal entropy is exploited in the loss function to make the distribution of synthetic classes be as close as possible to the distribution of real classes. The results of extensive experiments over the Caltech-UCSD Birds-2011 and North America Birds datasets demonstrate that the proposed TDGAN method consistently yields competitive performance compared to several state-of-the-art ZSC methods.

References

  1. 1

    Fu Y W, Xiang T, Jiang Y G, et al. Recent advances in zero-shot recognition: toward data-efficient understanding of visual content. IEEE Signal Process Mag, 2018, 35: 112–125

    Article  Google Scholar 

  2. 2

    Guo G J, Wang H Z, Yan Y, et al. Large margin deep embedding for aesthetic image classification. Sci China Inf Sci, 2020, 63: 119101

    Article  Google Scholar 

  3. 3

    Zhu X X, Anguelov D, Ramanan D. Capturing long-tail distributions of object subcategories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014. 915–922

  4. 4

    Guo Y C, Ding G G, Han J G, et al. Synthesizing samples for zero-shot learning. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017. 1774–1780

  5. 5

    Ji Z, Sun Y X, Yu Y, et al. Attribute-guided network for cross-modal zero-shot hashing. IEEE Trans Neural Netw Learn Syst, 2020, 31: 321–330

    Article  Google Scholar 

  6. 6

    Long Y, Liu L, Shao L, et al. From zero-shot learning to conventional supervised classification: unseen visual data synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1627–1636

  7. 7

    Yu Y L, Ji Z, Fu Y W, et al. Stacked semantics-guided attention model for fine-grained zero-shot learning. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 5995–6004

  8. 8

    Akata Z, Perronnin F, Harchaoui Z, et al. Label embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2013. 819–826

  9. 9

    Akata Z, Perronnin F, Harchaoui Z, et al. Label-embedding for image classification. IEEE Trans Pattern Anal Mach Intell, 2016, 38: 1425–1438

    Article  Google Scholar 

  10. 10

    Changpinyo S, Chao W L, Gong B Q, et al. Synthesized classifiers for zero-shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 5327–5336

  11. 11

    Wang X Y, Ji Q. A unified probabilistic approach modeling relationships between attributes and objects. In: Proceedings of IEEE International Conference on Computer Vision, 2013. 2120–2127

  12. 12

    Nguyen T D, Le T, Vu H, et al. Dual discriminator generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 2670–2680

  13. 13

    Wah C, Branson S, Welinder P, et al. The caltech-ucsd birds-200-2011 dataset. 2011. http://www.vision.caltech.edu/visipedia/CUB-200-2011.html

  14. 14

    van Horn G, Branson S, Farrell R, et al. Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 595–604

  15. 15

    Elhoseiny M, Saleh B, Elgammal A. Write a classifier: zero-shot learning using purely textual descriptions. In: Proceedings of IEEE International Conference on Computer Vision, 2013. 2584–2591

  16. 16

    Ba J L, Swersky K, Fidler S. Predicting deep zero-shot convolutional neural networks using textual descriptions. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 4247–4255

  17. 17

    Reed S, Akata Z, Lee H, et al. Learning deep representations of fine-grained visual descriptions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 49–58

  18. 18

    Qiao R Z, Liu L Q, Shen C H, et al. Less is more: zero-shot learning from online textual documents with noise suppression. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2249–2257

  19. 19

    Elhoseiny M, Zhu Y Z, Zhang H, et al. Link the head to the “beak”: zero shot learning from noisy text description at part precision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 6288–6297

  20. 20

    Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 2223–2232

  21. 21

    Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2672–2680

  22. 22

    Martin A, Bottou L. Towards principled methods for training generative adversarial networks. 2017. ArXiv:1701.04862

  23. 23

    Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein gans. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5767–5777

  24. 24

    Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, 2017. 214–223

  25. 25

    Xian Y Q, Lorenz T, Schiele B, et al. Feature generating networks for zero-shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 5542–5551

  26. 26

    Schonfeld E, Ebrahimi S, Sinha S, et al. Generalized zero-and few-shot learning via aligned variational autoencoders. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 8247–8255

  27. 27

    Bucher M, Herbin S, Jurie F. Generating visual representations for zero-shot classification. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 2666–2673

  28. 28

    Zhu Y Z, Elhoseiny M, Liu B C, et al. A generative adversarial approach for zero-shot learning from noisy texts. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1004–1013

  29. 29

    Li Y J, Swersky K, Zemel R. Generative moment matching networks. In: Proceedings of International Conference on Machine Learning, 2015. 1718–1727

  30. 30

    Zhang H, Xu T, Elhoseiny M, et al. SPDA-CNN: unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1143–1152

  31. 31

    Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manage, 1988, 24: 513–523

    Article  Google Scholar 

  32. 32

    Akata Z, Malinowski M, Fritz M, et al. Multi-cue zero-shot learning with strong supervision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 59–68

  33. 33

    Romera-Paredes B, Torr P. An embarrassingly simple approach to zero-shot learning. In: Proceedings of International Conference on Machine Learning, 2015. 2152–2161

  34. 34

    Akata Z, Reed S, Walter D, et al. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 2927–2936

  35. 35

    Chao W L, Changpinyo S, Gong B, et al. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: Proceedings of European Conference on Computer Vision, 2016. 52–68

  36. 36

    Ji Z, Xiong K L, Pang Y W, et al. Video summarization with attention-based encoder-decoder networks. IEEE Trans Circ Syst Video Technol, 2020, 30: 1709–1717

    Article  Google Scholar 

  37. 37

    Wang Z H, Liu X, Lin J W, et al. Multi-attention based cross-domain beauty product image retrieval. Sci China Inf Sci, 2020, 63: 120112

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61771329, 61632018).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zhong Ji.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ji, Z., Yan, J., Wang, Q. et al. Triple discriminator generative adversarial network for zero-shot image classification. Sci. China Inf. Sci. 64, 120101 (2021). https://doi.org/10.1007/s11432-020-3032-8

Download citation

Keywords

  • zero-shot classification
  • generative adversarial nets
  • text reconstruction
  • Sharma-Mittal entropy