Dual-Attention Graph Convolutional Network

  • Xueya Zhang
  • Tong ZhangEmail author
  • Wenting Zhao
  • Zhen Cui
  • Jian Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12047)


Graph convolutional networks (GCNs) have shown the powerful ability in text structure representation and effectively facilitate the task of text classification. However, challenges still exist in adapting GCN on learning discriminative features from texts due to the main issue of graph variants incurred by the textual complexity and diversity. In this paper, we propose a dual-attention GCN to model the structural information of various texts as well as tackle the graph-invariant problem through embedding two types of attention mechanisms, i.e. the connection-attention and hop-attention, into the classic GCN. To encode various connection patterns between neighbour words, connection-attention adaptively imposes different weights specified to neighbourhoods of each word, which captures the short-term dependencies. On the other hand, the hop-attention applies scaled coefficients to different scopes during the graph diffusion process to make the model learn more about the distribution of context, which captures long-term semantics in an adaptive way. Extensive experiments are conducted on five widely used datasets to evaluate our dual-attention GCN, and the achieved state-of-the-art performance verifies the effectiveness of dual-attention mechanisms.


Dual-attention Graph convolutional networks Text classification 



This work was supported by the National Natural Science Foundation of China (Grants Nos. 61772276, 61972204, 61906094), the Natural Science Foundation of Jiangsu Province (Grant No. BK20190452), the fundamental research funds for the central universities (No. 30919011232).


  1. 1.
    Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1993–2001 (2016)Google Scholar
  2. 2.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)Google Scholar
  3. 3.
    Bastings, J., Titov, I., Aziz, W., Marcheggiani, D., Sima’an, K.: Graph convolutional encoders for syntax-aware neural machine translation. arXiv preprint arXiv:1704.04675 (2017)
  4. 4.
    Bruna, J., Zaremba, W., Szlam, A., Lecun, Y.: Spectral networks and locally connected networks on graphs. Comput. Sci. (2014)Google Scholar
  5. 5.
    Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). Comput. Sci. (2015)Google Scholar
  6. 6.
    Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, pp. 3844–3852 (2016)Google Scholar
  7. 7.
    Duvenaud, D.K., et al.: Convolutional networks on graphs for learning molecular fingerprints. In: Advances in Neural Information Processing Systems, pp. 2224–2232 (2015)Google Scholar
  8. 8.
    Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, pp. 1024–1034 (2017)Google Scholar
  9. 9.
    Henaff, M., Bruna, J., Lecun, Y.: Deep convolutional networks on graph-structured data. Comput. Sci. (2015)Google Scholar
  10. 10.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) CrossRefGoogle Scholar
  11. 11.
    Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
  12. 12.
    Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
  13. 13.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
  14. 14.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)Google Scholar
  15. 15.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  16. 16.
    Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)
  17. 17.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119 (2013)Google Scholar
  18. 18.
    Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)Google Scholar
  19. 19.
    Peng, H., et al.: Large-scale hierarchical text classification with recursively regularized deep graph-CNN. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 1063–1072. International World Wide Web Conferences Steering Committee (2018)Google Scholar
  20. 20.
    Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw. 12(1), 145–151 (1999)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Shen, D., et al.: Baseline needs more love: on simple word-embedding-based models and associated pooling mechanisms. arXiv preprint arXiv:1805.09843 (2018)
  22. 22.
    Tang, J., Qu, M., Mei, Q.: PTE: predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1165–1174. ACM (2015)Google Scholar
  23. 23.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)Google Scholar
  24. 24.
    Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
  25. 25.
    Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-volume 2, pp. 90–94. Association for Computational Linguistics (2012)Google Scholar
  26. 26.
    Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. arXiv preprint arXiv:1809.05679 (2018)
  27. 27.
    Zhang, Y., Liu, Q., Song, L.: Sentence-state LSTM for text representation. arXiv preprint arXiv:1805.02474 (2018)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Xueya Zhang
    • 1
  • Tong Zhang
    • 1
    Email author
  • Wenting Zhao
    • 1
  • Zhen Cui
    • 1
  • Jian Yang
    • 1
  1. 1.Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and EngineeringNanjing University of Science and TechnologyNanjingChina

Personalised recommendations