Few-shot text classification by leveraging bi-directional attention and cross-class knowledge

Abstract

Few-shot text classification targets at the situation where a model is developed to classify newly incoming query instances after acquiring knowledge from a few support instances. In this paper, we investigate few-shot text classification under a metric-based meta-learning framework. While the representations of the query and support instances are the key to the classification, existing study handles them independently in the text encoding stage. To better describe the classification features, we propose to exploit their interaction with adapted bi-directional attention mechanism. Moreover, distinct from previous approaches that encode different classes individually, we leverage the underlying cross-class knowledge for classification. To this end, we conceive the learning target by incorporating the large margin loss, which is expected to shorten the intra-class distances while enlarging the inter-class distances. To validate the design, we conduct extensive experiments on three datasets, and the experimental results demonstrate that our solution outperforms its state-of-the-art competitors. Detailed analyses also reveal that the bi-directional attention and the cross-class knowledge both contribute to the overall performance.

This is a preview of subscription content, access via your institution.

References

  1. 1

    Pang B, Lee L. Opinion mining and sentiment analysis. FNT Inf Retrieval, 2008, 2: 1–135

    Article  Google Scholar 

  2. 2

    Aggarwal C C, Zhai C. A survey of text classification algorithms. In: Proceedings of Mining Text Data, 2012. 163–222

  3. 3

    Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 649–657

  4. 4

    Kim Y. Convolutional neural networks for sentence classification. 2014. ArXiv: 1408.5882

  5. 5

    Yao L, Mao C, Luo Y. Graph convolutional networks for text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2019. 7370–7377

  6. 6

    Li F-F, Fergus R, Perona P. One-shot learning of object categories. IEEE Trans Pattern Anal Machine Intell, 2006, 28: 594–611

    Article  Google Scholar 

  7. 7

    Sung F, Yang Y, Zhang L, et al. Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1199–1208

  8. 8

    Munkhdalai T, Yu H. Meta networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017. 2554–2563

  9. 9

    Snell J, Swersky K, Zemel R S. Prototypical networks for few-shot learning. In: Proceedings of Advances in Neural Information Processing Systems, Long Beach, 2017. 4077–4087

  10. 10

    Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning. In: Proceedings of Advances in Neural Information Processing Systems, Barcelona, 2016. 3630–3638

  11. 11

    Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: Proceedings of ICML Deep Learning Workshop, 2015

  12. 12

    Yu M, Guo X, Yi J, et al. Diverse few-shot text classification with multiple metrics. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, 2018. 1206–1215

  13. 13

    Han X, Zhu H, Yu P, et al. Fewrel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, 2018. 4803–4809

  14. 14

    Gao T, Han X, Liu Z, et al. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019. 6407–6414

  15. 15

    Ye Z, Ling Z. Multi-level matching and aggregation network for few-shot relation classification. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, 2019. 2872–2881

  16. 16

    Bao Y, Wu M, Chang S, et al. Few-shot text classification with distributional signatures. In: Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, 2020

  17. 17

    Seo M, Kembhavi A, Farhadi A, et al. Bidirectional attention flow for machine comprehension. 2016. ArXiv: 1611.01603

  18. 18

    Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016. 1480–1489

  19. 19

    Tao H, Tong S, Zhao H, et al. A radical-aware attention-based model for chinese text classification. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019

  20. 20

    Miller E G, Matsakis N E, Viola P A. Learning from one example through shared densities on transforms. In: Proceedings of Conference on Computer Vision and Pattern Recognition, Hilton Head, 2000. 1464–1471

  21. 21

    Santoro A, Bartunov S, Botvinick M, et al. Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd International Conference on Machine Learning, New York City, 2016. 1842–1850

  22. 22

    Mishra N, Rohaninejad M, Chen X, et al. A simple neural attentive meta-learner. In: Proceedings of the 6th International Conference on Learning Representations, Vancouver, 2018

  23. 23

    Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, 2017. 1126–1135

  24. 24

    Al-Shedivat M, Bansal T, Burda Y, et al. Continuous adaptation via meta-learning in nonstationary and competitive environments. In: Proceedings of the 6th International Conference on Learning Representations, Vancouver, 2018

  25. 25

    Bertinetto L, Henriques J F, Torr P H S, et al. Meta-learning with differentiable closed-form solvers. In: Proceedings of the 7th International Conference on Learning Representations, New Orleans, 2019

  26. 26

    Lam W, Lai K Y. A meta-learning approach for text categorization. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2001. 303–309

  27. 27

    Jiang X, Havaei M, Chartrand G, et al. Attentive task-agnostic meta-learning for few-shot text classification. In: Proceedings of International Conference on Learning Representations, 2019

  28. 28

    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, Long Beach, 2017. 5998–6008

  29. 29

    Ji G, Liu K, He S, et al. Distant supervision for relation extraction with sentence-level attention and entity descriptions. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017. 3060–3066

  30. 30

    Wu L, Zhang H, Yang Y, et al. Dynamic prototype selection by fusing attention mechanism for few-shot relation classification. In: Proceedings of the 12th Asian Conference Intelligent Information and Database Systems, Phuket, 2020. 431–441

  31. 31

    Pennington J, Socher R, Manning C D. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1532–1543

  32. 32

    Devlin J, Chang M W, Lee K, et al. Bert: pre-training of deep bidirectional transformers for language understanding. 2018. ArXiv: 1810.04805

  33. 33

    Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput, 1997, 9: 1735–1780

    Article  Google Scholar 

  34. 34

    Lewis D. Reuters-21578 text categorization test collection, distribution 1.0. 1997. http://www.research/.att.com

  35. 35

    Chen W, Liu Y, Kira Z, et al. A closer look at few-shot classification. In: Proceedings of the 7th International Conference on Learning Representations, New Orleans, 2019

  36. 36

    Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res, 2014, 15: 1929–1958

    MathSciNet  MATH  Google Scholar 

  37. 37

    Maaten L V D, Hinton G. Visualizing data using t-SNE. J Mach Learn Res, 2008, 9: 2579–2605

    MATH  Google Scholar 

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (Grant Nos. 61872446, U19B2024), Natural Science Foundation of Hunan Province (Grant No. 2019JJ20024), and the Science and Technology Innovation Program of Hunan Province (Grant No. 2020RC4046).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Xiang Zhao.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pang, N., Zhao, X., Wang, W. et al. Few-shot text classification by leveraging bi-directional attention and cross-class knowledge. Sci. China Inf. Sci. 64, 130103 (2021). https://doi.org/10.1007/s11432-020-3055-1

Download citation

Keywords

  • text classification
  • meta-learning
  • few-shot learning
  • bi-directional attention
  • cross-class knowledge