Multimodal Tweet Sentiment Classification Algorithm Based on Attention Mechanism

  • Peiyu ZouEmail author
  • Shuangtao YangEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 967)


With the rapid development of Internet, multimodal sentiment classification has become an important task in natural language processing research. In this paper, we focus on the sentiment classification of tweets that contains both text and image, a multimodal sentiment classification method for tweets is proposed. In this method Bidirectional-LSTM model is used to extract text modality features and VGG-16 model is used to extract image modality features. Where all features are extracted, a new multimodal feature fusion algorithm based on attention mechanism is used to finish the fusion of text and image features. This fusion method proposed in this paper can give different weights to modalities according to their importance. We evaluated the proposed method on the Chinese Weibo dataset and SentiBank Twitter dataset. The experimental results show method proposed in this paper is better than models that only use single modality feature, and attention based fusion method is more efficient than directly summing or concatenating features from different modalities.


Multimodal Sentiment classification Attention mechanism 


  1. 1.
    Saif, H., He, Y., Alani, H.: Semantic sentiment analysis of Twitter. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 508–524. Springer, Heidelberg (2012). Scholar
  2. 2.
    Gautam, G., Yadav, D.: Sentiment analysis of Twitter data using machine learning approaches and semantic analysis. In: International Conference on Contemporary Computing, pp. 437–442. IEEE (2014)Google Scholar
  3. 3.
    Zhou, H., et al.: Rule-based Weibo messages sentiment polarity classification towards given topics. In: Eighth SIGHAN Workshop on Chinese Language Processing, pp. 149–157 (2015)Google Scholar
  4. 4.
    Jiang, L., Yu, M., et al.: Target-dependent Twitter sentiment classification. In: Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 151–160. Association for Computational Linguistics (2011)Google Scholar
  5. 5.
    Morency, L.P., Mihalcea, R., Doshi, P.: Towards multimodal sentiment analysis: harvesting opinions from the web. In: International Conference on Multimodal Interfaces, pp. 169–176. ACM (2011)Google Scholar
  6. 6.
    Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. IEEE Intell. Syst. 31(6), 82–88 (2016)CrossRefGoogle Scholar
  7. 7.
    Baecchi, C., Uricchio, T., Bertini, M., Bimbo, A.D.: A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimedia Tools Appl. 75(5), 2507–2525 (2016)CrossRefGoogle Scholar
  8. 8.
    Poria, S., Chaturvedi, I., Cambria, E., et al.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: IEEE, International Conference on Data Mining, pp. 439–448. IEEE (2017)Google Scholar
  9. 9.
    Atrey, P.K., Hossain, M.A., Saddik, A.E., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010)CrossRefGoogle Scholar
  10. 10.
    Gallo, I., Calefati, A., Nawaz, S.: Multimodal classification fusion in real-world scenarios. In: IAPR International Conference on Document Analysis and Recognition. IEEE (2018)Google Scholar
  11. 11.
    Pérez-Rosas, V., Mihalcea, R., Morency, L.P.: Utterance-level multimodal sentiment analysis. Association for Computational Linguistics (ACL) (2013)Google Scholar
  12. 12.
    Poria, S., Cambria, E., Hazarika, D., Majumder, N., Zadeh, A., Morency, L.P.: Context-dependent sentiment analysis in user-generated videos. In: Meeting of the Association for Computational Linguistics, pp. 873–883 (2017)Google Scholar
  13. 13.
    Poria, S., Peng, H., Hussain, A., Howard, N., Cambria, E.: Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing 26, 217–230 (2017)CrossRefGoogle Scholar
  14. 14.
    Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis (2017)Google Scholar
  15. 15.
    Wang, H., Meghawat, A., Morency, L.P., Xing, E.P.: Select-additive learning: improving cross-individual generalization in multimodal sentiment analysis, pp. 949–954 (2016)Google Scholar
  16. 16.
    Chen, M., Wang, S., Liang, P.P., Baltrušaitis, T., Zadeh, A., Morency, L.P.: Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: ACM International Conference on Multimodal Interaction, pp. 163–171. ACM (2017)Google Scholar
  17. 17.
    Mikolov, T., et al.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations, pp. 1–12 (2013)Google Scholar
  18. 18.
    Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: International Conference on Neural Information Processing Systems, pp. 3111–3119. Curran Associates Inc. (2013)Google Scholar
  19. 19.
    Bengio, Y.: Learning deep architectures for AI. Found. Trends® Mach. Learn. 2(1), 1–127 (2009)Google Scholar
  20. 20.
    Kim, Y.: Convolutional neural networks for sentence classification. Eprint Arxiv (2014)Google Scholar
  21. 21.
    Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. Computer Science (2015)Google Scholar
  22. 22.
    Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling (2016)Google Scholar
  23. 23.
    Chen, T., Xu, R., He, Y., et al.: Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst. Appl. 72, 221–230 (2016)CrossRefGoogle Scholar
  24. 24.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)Google Scholar
  25. 25.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  26. 26.
    Bifet, A., Frank, E.: Sentiment knowledge discovery in twitter streaming data. In: Proceedings of the Discovery Science - International Conference, DS 2010, Canberra, Australia, 6–8 October 2010, pp. 1–15. DBLP (2010)Google Scholar
  27. 27.
    Madjarov, G., Kocev, D., Gjorgjevikj, D., Deroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recogn. 45(9), 3084–3104 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Northeast Agricultural UniversityHarbinChina
  2. 2.Lenovo AI LabBeijingChina

Personalised recommendations