Bidirectional Transformer Based Multi-Task Learning for Natural Language Understanding

Tripathi, Suraj; Singh, Chirag; Kumar, Abhay; Pandey, Chandan; Jain, Nishant

doi:10.1007/978-3-030-23281-8_5

Bidirectional Transformer Based Multi-Task Learning for Natural Language Understanding

Suraj Tripathi¹⁹,
Chirag Singh¹⁹,
Abhay Kumar¹⁹,
Chandan Pandey¹⁹ &
…
Nishant Jain¹⁹

Conference paper
First Online: 21 June 2019

1880 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11608))

Abstract

We propose a multi-task learning based framework for natural language understanding tasks like sentiment and topic classification. We make use of bidirectional transformer based architecture to generate encoded representations from given input followed by task-specific layers for classification. Multi-Task learning (MTL) based framework make use of a different set of tasks in parallel, as a kind of additional regularization, to improve the generalizability of the trained model over individual tasks. We introduced a task-specific auxiliary problem using the k-means clustering algorithm to be trained in parallel with main tasks to reduce the model’s generalization error on the main task. POS-tagging was also used as one of the auxiliary tasks. We also trained multiple benchmark classification datasets in parallel to improve the effectiveness of our bidirectional transformer based network across all the datasets. Our proposed MTL based transformer network improved state-of-the-art overall accuracy of Movie Review (MR), AG News, and Stanford Sentiment Treebank (SST-2) corpus by 6%, 1.4%, and 3.3% respectively.

A. Kumar, C. Pandey and N. Jain—Equal Contribution.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Gao, J., Galley, M., Li, L.: Neural approaches to conversational AI. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1371–1374. ACM (2018)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018)
Zhang, Y., Yang, Q.: A survey on multitask learning, arXiv:1707.08114 [cs], July 2017. http://arxiv.org/abs/1707.08114
Liu, X., Gao, J., He, X., Deng, L., Duh, K., Wang, Y.: Representation learning using multi-task deep neural networks for semantic classification and information retrieval. In: Proceedings of NAACL (2015)
Google Scholar
Luong, M., Le, Q., Sutskever, I., Vinyals, O., Kaiser, L.: Multitask sequence to sequence learning. In: Proceedings of ICLR, pp. 1–10 (2016)
Google Scholar
Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)
Google Scholar
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics, June 2005
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, October 2014, pp. 1746–1751. Association for Computational Linguistics (2014)
Google Scholar
Chen, T., Xu, R., He, Y., Wang, X.: Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert. Syst. Appl. 72, 221–230 (2017)
Article Google Scholar
McCann, B., Bradbury, J., Xiong, C., Socher, R.: Learned in translation: contextualized word vectors. In: NIPS (2017)
Google Scholar
Radford, A., Jozefowicz, R., Sutskever, I.: Learning to Generate Reviews and Discovering Sentiment, arXiv:1704.01444 [cs], April 2017. http://arxiv.org/abs/1704.01444
Shin, B., Lee, T., Choi, J.D.: Lexicon integrated CNN models with attention for sentiment analysis. In: Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 149–158 (2017)
Google Scholar
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of ACL, pp. 328–339 (2018)
Google Scholar
Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: Proceedings of IJCAI (2017)
Google Scholar
Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 562–570 (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL: HLT), New Orleans, Louisiana (2018)
Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Baxter, J.: A Bayesian/information theoretic model of learning to learn via multiple task sampling. Mach. Learn. 28(1), 7–39 (1997)
Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of EMNLP, pp. 1631–1642 (2013)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Samsung R&D Institute, Bengaluru, India
Suraj Tripathi, Chirag Singh, Abhay Kumar, Chandan Pandey & Nishant Jain

Authors

Suraj Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
Chirag Singh
View author publications
You can also search for this author in PubMed Google Scholar
Abhay Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Chandan Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Nishant Jain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Suraj Tripathi , Chirag Singh , Abhay Kumar , Chandan Pandey or Nishant Jain .

Editor information

Editors and Affiliations

Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais
University of Salford, Salford, UK
Farid Meziane
University of Salford, Salford, UK
Sunil Vadera
Oakland University, Rochester, MI, USA
Vijayan Sugumaran
CSE, University of Salford, Salford, UK
Mohamad Saraee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tripathi, S., Singh, C., Kumar, A., Pandey, C., Jain, N. (2019). Bidirectional Transformer Based Multi-Task Learning for Natural Language Understanding. In: Métais, E., Meziane, F., Vadera, S., Sugumaran, V., Saraee, M. (eds) Natural Language Processing and Information Systems. NLDB 2019. Lecture Notes in Computer Science(), vol 11608. Springer, Cham. https://doi.org/10.1007/978-3-030-23281-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-23281-8_5
Published: 21 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23280-1
Online ISBN: 978-3-030-23281-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics