Estimation of Near-Instance-Level Attribute Bottleneck for Zero-Shot Learning

Jiang, Chenyi; Shen, Yuming; Chen, Dubing; Zhang, Haofeng; Shao, Ling; Torr, Philip H. S.

doi:10.1007/s11263-024-02021-x

Estimation of Near-Instance-Level Attribute Bottleneck for Zero-Shot Learning

Published: 27 February 2024

(2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Chenyi Jiang¹^na1,
Yuming Shen²^na1,
Dubing Chen¹,
Haofeng Zhang ORCID: orcid.org/0000-0002-4039-7618¹,
Ling Shao³ &
…
Philip H. S. Torr²

396 Accesses
Explore all metrics

Abstract

Zero-Shot Learning (ZSL) involves transferring knowledge from seen classes to unseen classes by establishing connections between visual and semantic spaces. Traditional ZSL methods identify novel classes by class-level attribute vectors, which implies an information bottleneck. These approaches often use class-level attribute vectors as the fitting target during training, disregarding the individual variations within a class. Moreover, the attributes used for training lack location information and are prone to mismatch with local regions of visual features. To this end, we introduce a Near-Instance-Level Attribute Bottleneck (IAB) to alter class-level attribute vectors as well as visual features throughout the training phase to better reflect their naturalistic correspondences. Specifically, our Near-Instance-Wise Attribute Adaptation (NAA) modifies class attribute vectors to obtain multiple attribute basis vectors, generating a subspace that is more relevant to instance-level samples. Additionally, our Vision Attribute Relation Strengthening (VARS) module searches for attribute-related regions within the features, offering additional location information during the training phase. The proposed method is evaluated on four ZSL benchmarks, revealing that it is superior or competitive to the state-of-the-art methods on ZSL and the more challenging Generalized Zero-Shot Learning (GZSL) settings. Extensive experiments corroborate the sustainability of this study as one of the most potential directions for ZSL, i.e., the effectiveness of enhancing the visual-semantic relationships formed during training using a simple model structure. Code is available at: https://github.com/LanchJL/IAB-GZSL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attribute self-representation steered by exclusive lasso for zero-shot learning

Article 20 May 2022

Dynamic visual-guided selection for zero-shot learning

Article 13 September 2023

Simple Is Better: A Global Semantic Consistency Based End-to-End Framework for Effective Zero-Shot Learning

References

Adler, J., & Lunz, S. (2018). Banach wasserstein gan. In NeurIPS .
Akata, Z., Perronnin, F., & Harchaoui, Z., et al. (2013). Label-embedding for attribute-based classification. In CVPR, pp. 819–826.
Akata, Z., Reed, S., & Walter, D., et al. (2015). Evaluation of output embeddings for fine-grained image classification. CVPR, , 2927–2936.
Alemi, A. A., Fischer, I., & Dillon, J. V., et al. (2017). Deep variational information bottleneck. In ICLR.
Atzmon, Y., & Chechik, G. (2019). Adaptive confidence smoothing for generalized zero-shot learning. In CVPR.
Brown, T., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. NeurIPS, 33, 1877–1901.
Google Scholar
Cacheux, Y. L., Borgne, H. L., & Crucianu, M. (2019). Modeling inter and intra-class relations in the triplet loss for zero-shot learning. ICCV, pp. 10333–10342.
Cavazza, J., Murino, V., & Del Bue, A. (2023). No adversaries to zero-shot learning: Distilling an ensemble of gaussian feature generators. TPAMI, 45(10), 12167–12178.
Google Scholar
Changpinyo, S., Chao, W. L., & Gong, B., et al. (2016). Synthesized classifiers for zero-shot learning. In CVPR.
Changpinyo, S., Chao, W. L., Gong, B., et al. (2020). Classifier and exemplar synthesis for zero-shot learning. IJCV, 128(1), 166–201.
Article MathSciNet Google Scholar
Chao, W. L., Changpinyo, S., & Gong, B., et al. (2016). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: ECCV, Springer, pp. 52–68.
Chen, D., Shen, Y., & Zhang, H., et al. (2022). Zero-shot logit adjustment. In: IJCAI, pp. 813–819.
Chen, S., Wang, W., & Xia, B., et al. (2021). Free: Feature refinement for generalized zero-shot learning. In ICCV.
Chen, S., Hong, Z., Hou, W., et al. (2022). Transzero++: Cross attribute-guided transformer for zero-shot learning. TPAMI. https://doi.org/10.1109/TPAMI.2022.3229526
Article Google Scholar
Chen, S., Hong, Z., & Liu, Y., et al. (2022). Transzero: Attribute-guided transformer for zero-shot learning. In AAAI.
Chen, S., Hong, Z., & Xie, G. S., et al. (2022). Msdn: Mutually semantic distillation network for zero-shot learning. In CVPR, pp. 7612–7621.
Chen, Z., Luo, Y., & Qiu, R., et al. (2021). Semantics disentangling for generalized zero-shot learning. In: ICCV.
Chen, Z., Huang, Y., & Chen, J., et al. (2023). Duet: Cross-modal semantic grounding for contrastive zero-shot learning. In: AAAI, pp. 405–413.
Cheng, Y., Qiao, X., & Wang, X. (2016). An improved indirect attribute weighted prediction model for zero-shot image classification. IEICE Transactions on Information and Systems, 99(2), 435–442.
Article ADS CAS Google Scholar
Deng, J., Dong, W., & Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In CVPR, pp. 248–255.
Donahue, J., Jia, Y., & Vinyals, O., et al. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, PMLR, pp. 647–655.
Elhoseiny, M., Saleh, B., & Elgammal, A. (2013). Write a classifier: Zero-shot learning using purely textual descriptions. In ICCV, pp. 2584–2591.
Elhoseiny, M., Elgammal, A., & Saleh, B. (2016). Write a classifier: Predicting visual classifiers from unstructured text. TPAMI, 39(12), 2539–2553.
Article Google Scholar
Elhoseiny, M., Zhu, Y., & Zhang, H., et al. (2017). Link the head to the“beak”: Zero shot learning from noisy text description at part precision. In CVPR, pp. 5640–5649.
Feng, Y., Huang, X., & Yang, P., et al. (2022). Non-generative generalized zero-shot learning via task-correlated disentanglement and controllable samples synthesis. In: CVPR, pp. 9346–9355.
Frome, A., Corrado, G., Shlens, J., et al. (2013). Devise: A deep visual-semantic embedding model. NeurIPS, 2121–2129.
Girshick, R. (2015). Fast r-cnn. In ICCV, pp. 1440–1448.
Goodfellow, I., Pouget-Abadie, J., & Mirza, M., et al. (2014). Generative adversarial nets. NeurIPS .
Han, Z., Fu, Z., & Yang, J. (2020). Learning the redundancy-free features for generalized zero-shot object recognition. InCVPR, pp. 12865–12874.
Han, Z., Fu, Z., & Chen, S., et al. (2021). Contrastive embedding for generalized zero-shot learning. In CVPR, pp. 2371–2381.
Han, Z., Fu, Z., Chen, S., et al. (2022). Semantic contrastive embedding for generalized zero-shot learning. IJCV, 130(11), 2606–2622.
Article Google Scholar
He, K., Zhang, X., & Ren, S., et al. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778.
Hjelm, R. D., Fedorov, A., & Lavoie-Marchildon, S., et al. (2019). Learning deep representations by mutual information estimation and maximization. In ICLR.
Huynh, D., & Elhamifar, E. (2020). Fine-grained generalized zero-shot learning via dense attribute-based attention. In CVPR, pp. 4483–4493.
Kampffmeyer, M., Chen, Y., & Liang, X., et al. (2019). Rethinking knowledge graph propagation for zero-shot learning. In CVPR, pp. 11487–11496.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. In ICLR.
Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In CVPR.
Kong, X., Gao, Z., & Li, X., et al. (2022). En-compactness: Self-distillation embedding & contrastive generation for generalized zero-shot learning. In CVPR, pp. 9306–9315.
Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR, pp. 951–958.
Lampert, C. H., Nickisch, H., & Harmeling, S. (2013). Attribute-based classification for zero-shot visual object categorization. TPAMI, 36(3), 453–465.
Article Google Scholar
Lee, C. W., Fang, W., & Yeh, C. K., et al. (2018). Multi-label zero-shot learning with structured knowledge graphs. In CVPR, pp. 1576–1585.
Li, A., Lu, Z., Guan, J., et al. (2020). Transferrable feature and projection learning with class hierarchy for zero-shot learning. IJCV, 128(12), 2810–2827.
Article MathSciNet Google Scholar
Li, J., Jing, M., & Lu, K., et al. (2019). Leveraging the invariant side of generative zero-shot learning. In CVPR .
Li, K., Min, M. R., & Fu, Y. (2019). Rethinking zero-shot learning: A conditional visual classification perspective. In ICCV, pp. 3583–3592.
Li, Y. H., Chao, T. Y., Huang, C. C., et al. (2022). Make an omelette with breaking eggs: Zero-shot learning for novel attribute synthesis. NeurIPS, 35, 22477–22489.
Google Scholar
Liang, K., Chang, H., Ma, B., et al. (2018). Unifying visual attribute learning with object recognition in a multiplicative framework. TPAMI, 41(7), 1747–1760.
Article Google Scholar
Liu, J., Bai, H., & Zhang, H., et al. (2021). Near-real feature generative network for generalized zero-shot learning. In ICME, pp. 1–6.
Liu, M., Li, F., & Zhang, C., et al. (2023). Progressive semantic-visual mutual adaption for generalized zero-shot learning. In CVPR, pp. 15337–15346.
Liu, S., Long, M., & Wang, J., et al. (2018). Generalized zero-shot learning with deep calibration network. NeurIPS 2005–2015.
Liu, S., Chen, J., & Pan, L., et al. (2020). Hyperbolic visual embedding learning for zero-shot recognition. In CVPR, pp. 9273–9281.
Liu, Y., Guo, J., & Cai, D., et al. (2019). Attribute attention for semantic disambiguation in zero-shot learning. In: ICCV .
Liu, Y., Zhou, L., & Bai, X., et al. (2021). Goal-oriented gaze estimation for zero-shot learning. In CVPR, pp. 3794–3803.
Liu, Z., Guo, S., & Lu, X., et al. (2023b). (ml)\$ $^{2}$ \$ p-encoder: On exploration of channel-class correlation for multi-label zero-shot learning. In CVPR, vol. 1, pp. 23859–23868.
Marcos Gonzalez, D., Potze, A., & Xu, W., et al. (2022). Attribute prediction as multiple instance learning. TMLR 8.
Mazzetto, A., Menghini, C., Yuan, A., et al. (2022). Tight lower bounds on worst-case guarantees for zero-shot learning with attributes. NeurIPS, 35, 19732–19745.
Google Scholar
Menon, S., & Vondrick, C. (2022) Visual classification via description from large language models. arXiv preprint arXiv:2210.07183
Mikolov, T., Sutskever, I., & Chen, K., et al. (2013). Distributed representations of words and phrases and their compositionality. NeurIPS, 26.
Miller, G. A. (1995). Wordnet: a lexical database for English. Communications of the ACM, 38(11), 39–41.
Article Google Scholar
Min, S., Yao, H., & Xie, H., et al. (2020). Domain-aware visual bias eliminating for generalized zero-shot learning. In: CVPR.
Naeem, M. F., Xian, Y., Gool, L. V., et al. (2022). I2dformer: Learning image to document attention for zero-shot image classification. NeurIPS, 35, 12283–12294.
Google Scholar
Narayan, S., Gupta, A., & Khan, F. S., et al. (2020). Latent embedding feedback and discriminative features for zero-shot classification. In ECCV, pp. 479–495.
Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In: ICVGIP, pp. 722–729.
Paszke, A., Gross, S., & Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.
Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: CVPR, pp. 2751–2758.
Paul, A., Krishnan, N. C., & Munjal, P. (2019). Semantically aligned bias reducing zero shot learning. In CVPR, pp. 7056–7065.
Pratt, S., Covert, I., & Liu, R., et al. (2023). What does a platypus look like? generating customized prompts for zero-shot image classification. In ICCV, pp. 15691–15701.
Prillo, S., & Eisenschlos, J. (2020). Softsort: A continuous relaxation for the argsort operator. In ICML, pp. 7793–7802.
Qiao, R., Liu, L., & Shen, C., et al. (2016). Less is more: Zero-shot learning from online textual documents with noise suppression. In CVPR, pp. 2249–2257.
Radford, A., Kim, J. W., & Hallacy, C., et al. (2021). Learning transferable visual models from natural language supervision. In ICML, PMLR, pp. 8748–8763.
Reed, S., Akata, Z., & Lee, H., et al. (2016). Learning deep representations of fine-grained visual descriptions. In CVPR, pp. 49–58.
Ridnik, T., Ben-Baruch, E., & Noy, A. et al (2021) Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972.
Romera-Paredes, B., & Torr, P. (2015). An embarrassingly simple approach to zero-shot learning. In ICML, pp. 2152–2161.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.
Article Google Scholar
Schonfeld, E., Ebrahimi, S., & Sinha, S., et al. (2019). Generalized zero-and few-shot learning via aligned variational autoencoders. In CVPR, pp. 8247–8255.
Shen, Y., Qin, J., & Huang, L., et al. (2020). Invertible zero-shot recognition flows. In: ECCV, pp. 614–631.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Skorokhodov, I., & Elhoseiny, M. (2021). Class normalization for (continual)? generalized zero-shot learning. In: ICLR.
Su, H., Li, J., & Chen, Z., et al. (2022). Distinguishing unseen from seen for generalized zero-shot learning. In CVPR, pp. 7885–7894.
Szegedy, C., Vanhoucke, V., & Ioffe, S., et al. (2016). Rethinking the inception architecture for computer vision. In CVPR, pp. 2818–2826.
Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In ECCV, Springer, pp. 776–789.
Vaswani, A., Shazeer, N., & Parmar, N., et al. (2017). Attention is all you need. NeurIPS, 30.
Verma, V. K., Arora, G., & Mishra, A., et al. (2018). Generalized zero-shot learning via synthesized examples. In CVPR, pp. 4281–4289.
Vyas, M. R., Venkateswara, H., & Panchanathan, S. (2020). Leveraging seen and unseen semantic relationships for generative zero-shot learning. In ECCV, pp. 70–86.
Wah, C., Branson, S., Welinder, P., et al. (2011). The caltech-ucsd birds-200-2011 dataset. California Institute of Technology: Tech. rep.
Wang, C., Min, S., Chen, X., et al. (2021). Dual progressive prototype network for generalized zero-shot learning. NeurIPS, 34, 2936–2948.
Google Scholar
Wang, X., Ye, Y., & Gupta, A. (2018). Zero-shot recognition via semantic embeddings and knowledge graphs. In CVPR, pp. 6857–6866.
Wang, Z., Hao, Y., & Mu, T., et al. (2023). Bi-directional distribution alignment for transductive zero-shot learning. In: CVPR, pp. 19893–19902.
Xian, Y., Schiele, B., & Akata, Z. (2017). Zero-shot learning-the good, the bad and the ugly. In CVPR, pp 4582–4591.
Xian, Y., Lampert, C. H., Schiele, B., et al. (2018). Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. TPAMI, 41(9), 2251–2265.
Article Google Scholar
Xian, Y., Lorenz, T., & Schiele, B., et al. (2018). Feature generating networks for zero-shot learning. In CVPR, pp. 5542–5551.
Xian, Y., Sharma, S., & Schiele, B., et al. (2019). f-gan-d2: A feature generating framework for any-shot learning. In CVPR, pp. 10275–10284.
Xie, G. S., Liu, L., & Jin, X., et al. (2019). Attentive region embedding network for zero-shot learning. In CVPR .
Xie, G. S., Liu, L., & Zhu, F., et al. (2020). Region graph embedding network for zero-shot learning. In ECCV, Springer, pp. 562–580.
Xie, J., Xiang, J., & Chen, J., et al. (2022). C2AM: Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation. In CVPR, pp 989–998.
Xu, W., Xian, Y., Wang, J., et al. (2020). Attribute prototype network for zero-shot learning. NeurIPS, 33, 21969–21980.
Google Scholar
Xu, W., Xian, Y., Wang, J., et al. (2022). Attribute prototype network for any-shot learning. IJCV, 130(7), 1735–1753.
Article Google Scholar
Xu, W., Xian, Y., & Wang, J., et al. (2022). Vgse: Visually-grounded semantic embeddings for zero-shot learning. In CVPR, pp. 9316–9325.
Yang, F. E., Lee, Y. H., Lin, C. C., et al. (2023). Semantics-guided intra-category knowledge transfer for generalized zero-shot learning. IJCV, 131(6), 1331–1345.
Article Google Scholar
Ye, H. J., Hu, H., & Zhan, D. C. (2021). Learning adaptive classifiers synthesis for generalized few-shot learning. IJCV, 129(6), 1930–1953.
Article Google Scholar
Yi, K., Shen, X., & Gou, Y. et al (2022) Exploring hierarchical graph representation for large-scale zero-shot image classification. In ECCV, Springer, pp. 116–132.
Yu, Y., Ji, Z., & Han, J., et al. (2020). Episode-based prototype generating network for zero-shot learning. In CVPR, pp. 14035–14044.
Yue, Z., Wang, T., & Sun, Q., et al. (2021). Counterfactual zero-shot and open-set visual recognition. In CVPR, pp. 15404–15414.
Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In CVPR, pp. 2021–2030.
Zhao, X., Shen, Y., Wang, S., et al. (2023). Generating diverse augmented attributes for generalized zero shot learning. PR Letters, 166, 126–133.
Google Scholar
Zhou, B., Khosla, A., & Lapedriza, A., et al. (2016). Learning deep features for discriminative localization. In CVPR, pp. 2921–2929.
Zhou, K., Yang, J., Loy, C. C., et al. (2022). Learning to prompt for vision-language models. IJCV, 130(9), 2337–2348.
Article Google Scholar
Zhu, P., Wang, H., & Saligrama, V. (2020). Generalized zero-shot recognition based on visually semantic embedding. In CVPR .
Zhu, Y., Elhoseiny, M., & Liu, B., et al. (2018). A generative adversarial approach for zero-shot learning from noisy texts. In CVPR, pp. 1004–1013.

Download references

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. 62371235, 62072246, and 61929104, partly by the Natural Science Foundation of Jiangsu Province under Grant No. BK20201306, and partly by the “111 Program” under Grant No. B13022.

Author information

Chenyi Jiang and Yuming Shen have contributed equally to this work.

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
Chenyi Jiang, Dubing Chen & Haofeng Zhang
Department of Engineering Science, University of Oxford, Oxford, OX12JD, UK
Yuming Shen & Philip H. S. Torr
UCAS-Terminus AI Lab, University of Chinese Academy of Sciences, Beijing, 100190, China
Ling Shao

Authors

Chenyi Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yuming Shen
View author publications
You can also search for this author in PubMed Google Scholar
Dubing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Haofeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ling Shao
View author publications
You can also search for this author in PubMed Google Scholar
Philip H. S. Torr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haofeng Zhang.

Additional information

Communicated by Massimiliano Mancini.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jiang, C., Shen, Y., Chen, D. et al. Estimation of Near-Instance-Level Attribute Bottleneck for Zero-Shot Learning. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-02021-x

Download citation

Received: 31 January 2023
Accepted: 27 January 2024
Published: 27 February 2024
DOI: https://doi.org/10.1007/s11263-024-02021-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of Near-Instance-Level Attribute Bottleneck for Zero-Shot Learning

Abstract

Access this article

Similar content being viewed by others

Attribute self-representation steered by exclusive lasso for zero-shot learning

Dynamic visual-guided selection for zero-shot learning

Simple Is Better: A Global Semantic Consistency Based End-to-End Framework for Effective Zero-Shot Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimation of Near-Instance-Level Attribute Bottleneck for Zero-Shot Learning

Abstract

Access this article

Similar content being viewed by others

Attribute self-representation steered by exclusive lasso for zero-shot learning

Dynamic visual-guided selection for zero-shot learning

Simple Is Better: A Global Semantic Consistency Based End-to-End Framework for Effective Zero-Shot Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation