Advertisement

Transfer Learning in Sentiment Classification with Deep Neural Networks

  • Andrea Pagliarani
  • Gianluca MoroEmail author
  • Roberto Pasolini
  • Giacomo Domeniconi
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 976)

Abstract

Cross-domain sentiment classifiers aim to predict the polarity (i.e. sentiment orientation) of target text documents, by reusing a knowledge model learnt from a different source domain. Distinct domains are typically heterogeneous in language, so that transfer learning techniques are advisable to support knowledge transfer from source to target. Deep neural networks have recently reached the state-of-the-art in many NLP tasks, including in-domain sentiment classification, but few of them involve transfer learning and cross-domain sentiment solutions. This paper moves forward the investigation started in a previous work [1], where an unsupervised deep approach for text mining, called Paragraph Vector (PV), achieved cross-domain accuracy equivalent to a method based on Markov Chain (MC), developed ad hoc for cross-domain sentiment classification. In this work, Gated Recurrent Unit (GRU) is included into the previous investigation, showing that memory units are beneficial for cross-domain when enough training data are available. Moreover, the knowledge models learnt from the source domain are tuned on small samples of target instances to foster transfer learning. PV is almost unaffected by fine-tuning, because it is already able to capture word semantics without supervision. On the other hand, fine-tuning boosts the cross-domain performance of GRU. The smaller is the training set used, the greater is the improvement of accuracy.

Keywords

Transfer learning Cross-domain Deep learning Fine-tuning Sentiment analysis Big Data 

References

  1. 1.
    Domeniconi, G., Moro, G., Pagliarani, A., Pasolini, R.: On deep learning in cross-domain sentiment classification. In: Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR, INSTICC, vol. 1, pp. 50–60. SciTePress (2017)Google Scholar
  2. 2.
    Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012).  https://doi.org/10.1007/978-1-4614-3223-4_13CrossRefGoogle Scholar
  3. 3.
    Domeniconi, G., Moro, G., Pagliarani, A., Pasolini, R.: Learning to predict the stock market Dow Jones index detecting and mining relevant tweets. In: Fred, A.L.N., Filipe, J. (eds.) Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Funchal, Madeira, Portugal, 1–3 November 2017, vol. 1, pp. 165–172. SciTePress (2017)Google Scholar
  4. 4.
    Domeniconi, G., Moro, G., Pagliarani, A., Pasini, K., Pasolini, R.: Job recommendation from semantic similarity of Linkedin users’ skills. In: Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods: ICPRAM, INSTICC, vol. 1, pp. 270–277. SciTePress (2016)Google Scholar
  5. 5.
    Lena, P.D., Domeniconi, G., Margara, L., Moro, G.: GOTA: GO term annotation of biomedical literature. BMC Bioinform. 16, 346 (2015)CrossRefGoogle Scholar
  6. 6.
    Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: Iterative refining of category profiles for nearest centroid cross-domain text classification. In: Fred, A., Dietz, J.L.G., Aveiro, D., Liu, K., Filipe, J. (eds.) IC3K 2014. CCIS, vol. 553, pp. 50–67. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25840-9_4CrossRefGoogle Scholar
  7. 7.
    Shrivastava, A., Malisiewicz, T., Gupta, A., Efros, A.A.: Data-driven visual similarity for cross-domain image matching. ACM Trans. Graph. 30, 154:1–154:10 (2011)CrossRefGoogle Scholar
  8. 8.
    Domeniconi, G., Masseroli, M., Moro, G., Pinoli, P.: Cross-organism learning method to discover new gene functionalities. Comput. Meth. Progr. Biomed. 126, 20–34 (2016)CrossRefGoogle Scholar
  9. 9.
    Domeniconi, G., Masseroli, M., Moro, G., Pinoli, P.: Random perturbations of term weighted gene ontology annotations for discovering gene unknown functionalities. In: Fred, A., Dietz, J.L.G., Aveiro, D., Liu, K., Filipe, J. (eds.) IC3K 2014. CCIS, vol. 553, pp. 181–197. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25840-9_12CrossRefGoogle Scholar
  10. 10.
    Domeniconi, G., Masseroli, M., Moro, G., Pinoli, P.: Discovering new gene functionalities from random perturbations of known gene ontological annotations. In: KDIR 2014 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, Rome, Italy, 21–24 October 2014, pp. 107–116. SciTePress (2014)Google Scholar
  11. 11.
    Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment TreeBank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642. Association for Computational Linguistics, Stroudsburg (2013)Google Scholar
  12. 12.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, ICML 2014, vol. 32, pp. II-1188–II-1196. JMLR.org (2014)Google Scholar
  13. 13.
    Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR abs/1502.01710 (2015)Google Scholar
  14. 14.
    Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: EMNLP, pp. 1422–1432. The Association for Computational Linguistics (2015)Google Scholar
  15. 15.
    Domeniconi, G., Moro, G., Pagliarani, A., Pasolini, R.: Markov chain based method for in-domain and cross-domain sentiment classification. In: Fred, A.L.N., Dietz, J.L.G., Aveiro, D., Liu, K., Filipe, J. (eds.) KDIR 2015 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, part of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015), Lisbon, Portugal, 12–14 November 2015, vol. 1, pp. 127–137. SciTePress (2015)Google Scholar
  16. 16.
    Domeniconi, G., Moro, G., Pagliarani, A., Pasolini, R.: Cross-domain sentiment classification via polarity-driven state transitions in a Markov model. In: Fred, A., Dietz, J.L.G., Aveiro, D., Liu, K., Filipe, J. (eds.) IC3K 2015. CCIS, vol. 631, pp. 118–138. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-52758-1_8CrossRefGoogle Scholar
  17. 17.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1724–1734. Association for Computational Linguistics (2014)Google Scholar
  18. 18.
    Daumé III, H., Marcu, D.: Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 26, 101–126 (2006)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010)CrossRefGoogle Scholar
  20. 20.
    Aue, A., Gamon, M.: Customizing sentiment classifiers to new domains: a case study. In: Proceedings of Recent Advances in Natural Language Processing (RANLP) (2005)Google Scholar
  21. 21.
    Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Carroll, J.A., van den Bosch, A., Zaenen, A. (eds.) ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, 23–30 June 2007, pp. 440–447. The Association for Computational Linguistics (2007)Google Scholar
  22. 22.
    Pan, S.J., Ni, X., Sun, J., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Rappa, M., Jones, P., Freire, J., Chakrabarti, S. (eds.) Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, 26–30 April 2010, pp. 751–760. ACM (2010)Google Scholar
  23. 23.
    He, Y., Lin, C., Alani, H.: Automatically extracting polarity-bearing topics for cross-domain sentiment classification. In: Lin, D., Matsumoto, Y., Mihalcea, R. (eds.) The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19–24 June 2011, Portland, Oregon, USA, pp. 123–131. The Association for Computer Linguistics (2011)Google Scholar
  24. 24.
    Bollegala, D., Weir, D.J., Carroll, J.A.: Cross-domain sentiment classification using a sentiment sensitive thesaurus. IEEE Trans. Knowl. Data Eng. 25, 1719–1731 (2013)CrossRefGoogle Scholar
  25. 25.
    Zhang, Y., Hu, X., Li, P., Li, L., Wu, X.: Cross-domain sentiment classification-feature divergence, polarity divergence or both? Pattern Recogn. Lett. 65, 44–50 (2015)CrossRefGoogle Scholar
  26. 26.
    Franco-Salvador, M., Cruz, F.L., Troyano, J.A., Rosso, P.: Cross-domain polarity classification using a knowledge-enhanced meta-classifier. Knowl.-Based Syst. 86, 46–56 (2015)CrossRefGoogle Scholar
  27. 27.
    Bollegala, D., Mu, T., Goulermas, J.Y.: Cross-domain sentiment classification using sentiment sensitive embeddings. IEEE Trans. Knowl. Data Eng. 28, 398–410 (2016)CrossRefGoogle Scholar
  28. 28.
    LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015)CrossRefGoogle Scholar
  29. 29.
    dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Hajic, J., Tsujii, J. (eds.) COLING 2014, 25th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, 23–29 August 2014, Dublin, Ireland, pp. 69–78. ACL (2014)Google Scholar
  30. 30.
    Kumar, A., et al.: Ask me anything: dynamic memory networks for natural language processing. In: Balcan, M., Weinberger, K.Q. (eds.) Proceedings of the 33rd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, 19–24 June 2016. JMLR Workshop and Conference Proceedings, vol. 48, pp. 1378–1387. JMLR.org (2016)Google Scholar
  31. 31.
    Wang, X., Jiang, W., Luo, Z.: Combination of convolutional and recurrent neural network for sentiment analysis of short texts. In: Calzolari, N., Matsumoto, Y., Prasad, R. (eds.) COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, Osaka, Japan, 11–16 December 2016, pp. 2428–2437. ACL (2016)Google Scholar
  32. 32.
    Chen, T., Xu, R., He, Y., Xia, Y., Wang, X.: Learning user and product distributed representations using a sequence model for sentiment analysis. IEEE Comp. Int. Mag. 11, 34–44 (2016)CrossRefGoogle Scholar
  33. 33.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, 28 June–2 July 2011, pp. 513–520. Omnipress (2011)Google Scholar
  35. 35.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)CrossRefGoogle Scholar
  36. 36.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRefGoogle Scholar
  37. 37.
    Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 6, 107–116 (1998)CrossRefGoogle Scholar
  38. 38.
    Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, pp. 45–50. ELRA (2010). http://is.muni.cz/publication/884893/en
  39. 39.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems: 27th Annual Conference on Neural Information Processing Systems 2013, 5–8 December 2013, Lake Tahoe, Nevada, United States, vo. 26, pp. 3111–3119 (2013)Google Scholar
  40. 40.
    Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: A comparison of term weighting schemes for text classification and sentiment analysis with a supervised variant of tf.idf. In: Helfert, M., Holzinger, A., Belo, O., Francalanci, C. (eds.) DATA 2015. CCIS, vol. 584, pp. 39–58. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-30162-4_4CrossRefGoogle Scholar
  41. 41.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, D.M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. JMLR Proceedings, vol. 9, pp. 249–256. JMLR.org (2010)Google Scholar
  42. 42.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)Google Scholar
  43. 43.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, 25–29 October 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1532–1543. ACL (2014)Google Scholar
  44. 44.
    Peters, M.E., et al.: Deep contextualized word representations. In: Walker, M.A., Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018 (Long Papers), New Orleans, Louisiana, USA, 1–6 June 2018, vol. 1, pp. 2227–2237. Association for Computational Linguistics (2018)Google Scholar
  45. 45.
    Graves, A., et al.: Hybrid computing using a neural network with dynamic external memory. Nature 538, 471–476 (2016)CrossRefGoogle Scholar
  46. 46.
    Moro, G., Pagliarani, A., Pasolini, R., Sartori, C.: Cross-domain & in-domain sentiment analysis with memory-based deep neural networks. In: Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR, INSTICC, vol. 1. SciTePress (2018)Google Scholar
  47. 47.
    Domeniconi, G., Semertzidis, K., Moro, G., Lopez, V., Kotoulas, S., Daly, E.M.: Identifying conversational message threads by integrating classification and data clustering. In: Francalanci, C., Helfert, M. (eds.) DATA 2016. CCIS, vol. 737, pp. 25–46. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-62911-7_2CrossRefGoogle Scholar
  48. 48.
    Domeniconi, G., Semertzidis, K., López, V., Daly, E.M., Kotoulas, S., Moro, G.: A novel method for unsupervised and supervised conversational message thread detection. In: DATA, pp. 43–54. SciTePress (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Andrea Pagliarani
    • 1
  • Gianluca Moro
    • 1
    Email author
  • Roberto Pasolini
    • 1
  • Giacomo Domeniconi
    • 2
  1. 1.Department of Computer Science and EngineeringUniversity of BolognaCesenaItaly
  2. 2.IBM - Watson Research CenterYorktown HeightsUSA

Personalised recommendations