Skip to main content
Log in

Estimation of Near-Instance-Level Attribute Bottleneck for Zero-Shot Learning

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Zero-Shot Learning (ZSL) involves transferring knowledge from seen classes to unseen classes by establishing connections between visual and semantic spaces. Traditional ZSL methods identify novel classes by class-level attribute vectors, which implies an information bottleneck. These approaches often use class-level attribute vectors as the fitting target during training, disregarding the individual variations within a class. Moreover, the attributes used for training lack location information and are prone to mismatch with local regions of visual features. To this end, we introduce a Near-Instance-Level Attribute Bottleneck (IAB) to alter class-level attribute vectors as well as visual features throughout the training phase to better reflect their naturalistic correspondences. Specifically, our Near-Instance-Wise Attribute Adaptation (NAA) modifies class attribute vectors to obtain multiple attribute basis vectors, generating a subspace that is more relevant to instance-level samples. Additionally, our Vision Attribute Relation Strengthening (VARS) module searches for attribute-related regions within the features, offering additional location information during the training phase. The proposed method is evaluated on four ZSL benchmarks, revealing that it is superior or competitive to the state-of-the-art methods on ZSL and the more challenging Generalized Zero-Shot Learning (GZSL) settings. Extensive experiments corroborate the sustainability of this study as one of the most potential directions for ZSL, i.e., the effectiveness of enhancing the visual-semantic relationships formed during training using a simple model structure. Code is available at: https://github.com/LanchJL/IAB-GZSL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Adler, J., & Lunz, S. (2018). Banach wasserstein gan. In NeurIPS .

  • Akata, Z., Perronnin, F., & Harchaoui, Z., et al. (2013). Label-embedding for attribute-based classification. In CVPR, pp. 819–826.

  • Akata, Z., Reed, S., & Walter, D., et al. (2015). Evaluation of output embeddings for fine-grained image classification. CVPR, , 2927–2936.

  • Alemi, A. A., Fischer, I., & Dillon, J. V., et al. (2017). Deep variational information bottleneck. In ICLR.

  • Atzmon, Y., & Chechik, G. (2019). Adaptive confidence smoothing for generalized zero-shot learning. In CVPR.

  • Brown, T., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. NeurIPS, 33, 1877–1901.

    Google Scholar 

  • Cacheux, Y. L., Borgne, H. L., & Crucianu, M. (2019). Modeling inter and intra-class relations in the triplet loss for zero-shot learning. ICCV, pp. 10333–10342.

  • Cavazza, J., Murino, V., & Del Bue, A. (2023). No adversaries to zero-shot learning: Distilling an ensemble of gaussian feature generators. TPAMI, 45(10), 12167–12178.

    Google Scholar 

  • Changpinyo, S., Chao, W. L., & Gong, B., et al. (2016). Synthesized classifiers for zero-shot learning. In CVPR.

  • Changpinyo, S., Chao, W. L., Gong, B., et al. (2020). Classifier and exemplar synthesis for zero-shot learning. IJCV, 128(1), 166–201.

    Article  MathSciNet  Google Scholar 

  • Chao, W. L., Changpinyo, S., & Gong, B., et al. (2016). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: ECCV, Springer, pp. 52–68.

  • Chen, D., Shen, Y., & Zhang, H., et al. (2022). Zero-shot logit adjustment. In: IJCAI, pp. 813–819.

  • Chen, S., Wang, W., & Xia, B., et al. (2021). Free: Feature refinement for generalized zero-shot learning. In ICCV.

  • Chen, S., Hong, Z., Hou, W., et al. (2022). Transzero++: Cross attribute-guided transformer for zero-shot learning. TPAMI. https://doi.org/10.1109/TPAMI.2022.3229526

    Article  Google Scholar 

  • Chen, S., Hong, Z., & Liu, Y., et al. (2022). Transzero: Attribute-guided transformer for zero-shot learning. In AAAI.

  • Chen, S., Hong, Z., & Xie, G. S., et al. (2022). Msdn: Mutually semantic distillation network for zero-shot learning. In CVPR, pp. 7612–7621.

  • Chen, Z., Luo, Y., & Qiu, R., et al. (2021). Semantics disentangling for generalized zero-shot learning. In: ICCV.

  • Chen, Z., Huang, Y., & Chen, J., et al. (2023). Duet: Cross-modal semantic grounding for contrastive zero-shot learning. In: AAAI, pp. 405–413.

  • Cheng, Y., Qiao, X., & Wang, X. (2016). An improved indirect attribute weighted prediction model for zero-shot image classification. IEICE Transactions on Information and Systems, 99(2), 435–442.

    Article  ADS  CAS  Google Scholar 

  • Deng, J., Dong, W., & Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In CVPR, pp. 248–255.

  • Donahue, J., Jia, Y., & Vinyals, O., et al. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, PMLR, pp. 647–655.

  • Elhoseiny, M., Saleh, B., & Elgammal, A. (2013). Write a classifier: Zero-shot learning using purely textual descriptions. In ICCV, pp. 2584–2591.

  • Elhoseiny, M., Elgammal, A., & Saleh, B. (2016). Write a classifier: Predicting visual classifiers from unstructured text. TPAMI, 39(12), 2539–2553.

    Article  Google Scholar 

  • Elhoseiny, M., Zhu, Y., & Zhang, H., et al. (2017). Link the head to the“beak”: Zero shot learning from noisy text description at part precision. In CVPR, pp. 5640–5649.

  • Feng, Y., Huang, X., & Yang, P., et al. (2022). Non-generative generalized zero-shot learning via task-correlated disentanglement and controllable samples synthesis. In: CVPR, pp. 9346–9355.

  • Frome, A., Corrado, G., Shlens, J., et al. (2013). Devise: A deep visual-semantic embedding model. NeurIPS, 2121–2129.

  • Girshick, R. (2015). Fast r-cnn. In ICCV, pp. 1440–1448.

  • Goodfellow, I., Pouget-Abadie, J., & Mirza, M., et al. (2014). Generative adversarial nets. NeurIPS .

  • Han, Z., Fu, Z., & Yang, J. (2020). Learning the redundancy-free features for generalized zero-shot object recognition. InCVPR, pp. 12865–12874.

  • Han, Z., Fu, Z., & Chen, S., et al. (2021). Contrastive embedding for generalized zero-shot learning. In CVPR, pp. 2371–2381.

  • Han, Z., Fu, Z., Chen, S., et al. (2022). Semantic contrastive embedding for generalized zero-shot learning. IJCV, 130(11), 2606–2622.

    Article  Google Scholar 

  • He, K., Zhang, X., & Ren, S., et al. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778.

  • Hjelm, R. D., Fedorov, A., & Lavoie-Marchildon, S., et al. (2019). Learning deep representations by mutual information estimation and maximization. In ICLR.

  • Huynh, D., & Elhamifar, E. (2020). Fine-grained generalized zero-shot learning via dense attribute-based attention. In CVPR, pp. 4483–4493.

  • Kampffmeyer, M., Chen, Y., & Liang, X., et al. (2019). Rethinking knowledge graph propagation for zero-shot learning. In CVPR, pp. 11487–11496.

  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.

  • Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. In ICLR.

  • Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In CVPR.

  • Kong, X., Gao, Z., & Li, X., et al. (2022). En-compactness: Self-distillation embedding & contrastive generation for generalized zero-shot learning. In CVPR, pp. 9306–9315.

  • Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR, pp. 951–958.

  • Lampert, C. H., Nickisch, H., & Harmeling, S. (2013). Attribute-based classification for zero-shot visual object categorization. TPAMI, 36(3), 453–465.

    Article  Google Scholar 

  • Lee, C. W., Fang, W., & Yeh, C. K., et al. (2018). Multi-label zero-shot learning with structured knowledge graphs. In CVPR, pp. 1576–1585.

  • Li, A., Lu, Z., Guan, J., et al. (2020). Transferrable feature and projection learning with class hierarchy for zero-shot learning. IJCV, 128(12), 2810–2827.

    Article  MathSciNet  Google Scholar 

  • Li, J., Jing, M., & Lu, K., et al. (2019). Leveraging the invariant side of generative zero-shot learning. In CVPR .

  • Li, K., Min, M. R., & Fu, Y. (2019). Rethinking zero-shot learning: A conditional visual classification perspective. In ICCV, pp. 3583–3592.

  • Li, Y. H., Chao, T. Y., Huang, C. C., et al. (2022). Make an omelette with breaking eggs: Zero-shot learning for novel attribute synthesis. NeurIPS, 35, 22477–22489.

    Google Scholar 

  • Liang, K., Chang, H., Ma, B., et al. (2018). Unifying visual attribute learning with object recognition in a multiplicative framework. TPAMI, 41(7), 1747–1760.

    Article  Google Scholar 

  • Liu, J., Bai, H., & Zhang, H., et al. (2021). Near-real feature generative network for generalized zero-shot learning. In ICME, pp. 1–6.

  • Liu, M., Li, F., & Zhang, C., et al. (2023). Progressive semantic-visual mutual adaption for generalized zero-shot learning. In CVPR, pp. 15337–15346.

  • Liu, S., Long, M., & Wang, J., et al. (2018). Generalized zero-shot learning with deep calibration network. NeurIPS 2005–2015.

  • Liu, S., Chen, J., & Pan, L., et al. (2020). Hyperbolic visual embedding learning for zero-shot recognition. In CVPR, pp. 9273–9281.

  • Liu, Y., Guo, J., & Cai, D., et al. (2019). Attribute attention for semantic disambiguation in zero-shot learning. In: ICCV .

  • Liu, Y., Zhou, L., & Bai, X., et al. (2021). Goal-oriented gaze estimation for zero-shot learning. In CVPR, pp. 3794–3803.

  • Liu, Z., Guo, S., & Lu, X., et al. (2023b). (ml)\$ \(^{2}\) \$ p-encoder: On exploration of channel-class correlation for multi-label zero-shot learning. In CVPR, vol. 1, pp. 23859–23868.

  • Marcos Gonzalez, D., Potze, A., & Xu, W., et al. (2022). Attribute prediction as multiple instance learning. TMLR 8.

  • Mazzetto, A., Menghini, C., Yuan, A., et al. (2022). Tight lower bounds on worst-case guarantees for zero-shot learning with attributes. NeurIPS, 35, 19732–19745.

    Google Scholar 

  • Menon, S., & Vondrick, C. (2022) Visual classification via description from large language models. arXiv preprint arXiv:2210.07183

  • Mikolov, T., Sutskever, I., & Chen, K., et al. (2013). Distributed representations of words and phrases and their compositionality. NeurIPS, 26.

  • Miller, G. A. (1995). Wordnet: a lexical database for English. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Min, S., Yao, H., & Xie, H., et al. (2020). Domain-aware visual bias eliminating for generalized zero-shot learning. In: CVPR.

  • Naeem, M. F., Xian, Y., Gool, L. V., et al. (2022). I2dformer: Learning image to document attention for zero-shot image classification. NeurIPS, 35, 12283–12294.

    Google Scholar 

  • Narayan, S., Gupta, A., & Khan, F. S., et al. (2020). Latent embedding feedback and discriminative features for zero-shot classification. In ECCV, pp. 479–495.

  • Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In: ICVGIP, pp. 722–729.

  • Paszke, A., Gross, S., & Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.

  • Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: CVPR, pp. 2751–2758.

  • Paul, A., Krishnan, N. C., & Munjal, P. (2019). Semantically aligned bias reducing zero shot learning. In CVPR, pp. 7056–7065.

  • Pratt, S., Covert, I., & Liu, R., et al. (2023). What does a platypus look like? generating customized prompts for zero-shot image classification. In ICCV, pp. 15691–15701.

  • Prillo, S., & Eisenschlos, J. (2020). Softsort: A continuous relaxation for the argsort operator. In ICML, pp. 7793–7802.

  • Qiao, R., Liu, L., & Shen, C., et al. (2016). Less is more: Zero-shot learning from online textual documents with noise suppression. In CVPR, pp. 2249–2257.

  • Radford, A., Kim, J. W., & Hallacy, C., et al. (2021). Learning transferable visual models from natural language supervision. In ICML, PMLR, pp. 8748–8763.

  • Reed, S., Akata, Z., & Lee, H., et al. (2016). Learning deep representations of fine-grained visual descriptions. In CVPR, pp. 49–58.

  • Ridnik, T., Ben-Baruch, E., & Noy, A. et al (2021) Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972.

  • Romera-Paredes, B., & Torr, P. (2015). An embarrassingly simple approach to zero-shot learning. In ICML, pp. 2152–2161.

  • Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.

    Article  Google Scholar 

  • Schonfeld, E., Ebrahimi, S., & Sinha, S., et al. (2019). Generalized zero-and few-shot learning via aligned variational autoencoders. In CVPR, pp. 8247–8255.

  • Shen, Y., Qin, J., & Huang, L., et al. (2020). Invertible zero-shot recognition flows. In: ECCV, pp. 614–631.

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  • Skorokhodov, I., & Elhoseiny, M. (2021). Class normalization for (continual)? generalized zero-shot learning. In: ICLR.

  • Su, H., Li, J., & Chen, Z., et al. (2022). Distinguishing unseen from seen for generalized zero-shot learning. In CVPR, pp. 7885–7894.

  • Szegedy, C., Vanhoucke, V., & Ioffe, S., et al. (2016). Rethinking the inception architecture for computer vision. In CVPR, pp. 2818–2826.

  • Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In ECCV, Springer, pp. 776–789.

  • Vaswani, A., Shazeer, N., & Parmar, N., et al. (2017). Attention is all you need. NeurIPS, 30.

  • Verma, V. K., Arora, G., & Mishra, A., et al. (2018). Generalized zero-shot learning via synthesized examples. In CVPR, pp. 4281–4289.

  • Vyas, M. R., Venkateswara, H., & Panchanathan, S. (2020). Leveraging seen and unseen semantic relationships for generative zero-shot learning. In ECCV, pp. 70–86.

  • Wah, C., Branson, S., Welinder, P., et al. (2011). The caltech-ucsd birds-200-2011 dataset. California Institute of Technology: Tech. rep.

  • Wang, C., Min, S., Chen, X., et al. (2021). Dual progressive prototype network for generalized zero-shot learning. NeurIPS, 34, 2936–2948.

    Google Scholar 

  • Wang, X., Ye, Y., & Gupta, A. (2018). Zero-shot recognition via semantic embeddings and knowledge graphs. In CVPR, pp. 6857–6866.

  • Wang, Z., Hao, Y., & Mu, T., et al. (2023). Bi-directional distribution alignment for transductive zero-shot learning. In: CVPR, pp. 19893–19902.

  • Xian, Y., Schiele, B., & Akata, Z. (2017). Zero-shot learning-the good, the bad and the ugly. In CVPR, pp 4582–4591.

  • Xian, Y., Lampert, C. H., Schiele, B., et al. (2018). Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. TPAMI, 41(9), 2251–2265.

    Article  Google Scholar 

  • Xian, Y., Lorenz, T., & Schiele, B., et al. (2018). Feature generating networks for zero-shot learning. In CVPR, pp. 5542–5551.

  • Xian, Y., Sharma, S., & Schiele, B., et al. (2019). f-gan-d2: A feature generating framework for any-shot learning. In CVPR, pp. 10275–10284.

  • Xie, G. S., Liu, L., & Jin, X., et al. (2019). Attentive region embedding network for zero-shot learning. In CVPR .

  • Xie, G. S., Liu, L., & Zhu, F., et al. (2020). Region graph embedding network for zero-shot learning. In ECCV, Springer, pp. 562–580.

  • Xie, J., Xiang, J., & Chen, J., et al. (2022). C2AM: Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation. In CVPR, pp 989–998.

  • Xu, W., Xian, Y., Wang, J., et al. (2020). Attribute prototype network for zero-shot learning. NeurIPS, 33, 21969–21980.

    Google Scholar 

  • Xu, W., Xian, Y., Wang, J., et al. (2022). Attribute prototype network for any-shot learning. IJCV, 130(7), 1735–1753.

    Article  Google Scholar 

  • Xu, W., Xian, Y., & Wang, J., et al. (2022). Vgse: Visually-grounded semantic embeddings for zero-shot learning. In CVPR, pp. 9316–9325.

  • Yang, F. E., Lee, Y. H., Lin, C. C., et al. (2023). Semantics-guided intra-category knowledge transfer for generalized zero-shot learning. IJCV, 131(6), 1331–1345.

    Article  Google Scholar 

  • Ye, H. J., Hu, H., & Zhan, D. C. (2021). Learning adaptive classifiers synthesis for generalized few-shot learning. IJCV, 129(6), 1930–1953.

    Article  Google Scholar 

  • Yi, K., Shen, X., & Gou, Y. et al (2022) Exploring hierarchical graph representation for large-scale zero-shot image classification. In ECCV, Springer, pp. 116–132.

  • Yu, Y., Ji, Z., & Han, J., et al. (2020). Episode-based prototype generating network for zero-shot learning. In CVPR, pp. 14035–14044.

  • Yue, Z., Wang, T., & Sun, Q., et al. (2021). Counterfactual zero-shot and open-set visual recognition. In CVPR, pp. 15404–15414.

  • Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In CVPR, pp. 2021–2030.

  • Zhao, X., Shen, Y., Wang, S., et al. (2023). Generating diverse augmented attributes for generalized zero shot learning. PR Letters, 166, 126–133.

    Google Scholar 

  • Zhou, B., Khosla, A., & Lapedriza, A., et al. (2016). Learning deep features for discriminative localization. In CVPR, pp. 2921–2929.

  • Zhou, K., Yang, J., Loy, C. C., et al. (2022). Learning to prompt for vision-language models. IJCV, 130(9), 2337–2348.

    Article  Google Scholar 

  • Zhu, P., Wang, H., & Saligrama, V. (2020). Generalized zero-shot recognition based on visually semantic embedding. In CVPR .

  • Zhu, Y., Elhoseiny, M., & Liu, B., et al. (2018). A generative adversarial approach for zero-shot learning from noisy texts. In CVPR, pp. 1004–1013.

Download references

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. 62371235, 62072246, and 61929104, partly by the Natural Science Foundation of Jiangsu Province under Grant No. BK20201306, and partly by the “111 Program” under Grant No. B13022.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haofeng Zhang.

Additional information

Communicated by Massimiliano Mancini.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, C., Shen, Y., Chen, D. et al. Estimation of Near-Instance-Level Attribute Bottleneck for Zero-Shot Learning. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-02021-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11263-024-02021-x

Keywords

Navigation