A Word Embedding Transfer Model for Robust Text Categorization

Zhang, Yiming; Wang, Jing; Deng, Weijian; Lu, Yaojie

doi:10.1007/978-3-030-01716-3_26

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11221))

Included in the following conference series:

1393 Accesses

Abstract

It is common to fine-tune pre-trained word embeddings in text categorization. However, we find that fine-tuning does not guarantee improvement across text categorization datasets, while could introduce considerable parameters to model. In this paper, we study new transfer methods to solve the problems above, and propose “Robustness of OOVs” to provide a perspective to reduce memory consumption further. The experimental results show that the proposed method is proved to be a good alternative to fine-tuning method on large dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 9, pp. 249–256. PMLR, Chia Laguna Resort, Sardinia, 13–15 May 2010. http://proceedings.mlr.press/v9/glorot10a.html
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. CoRR abs/1502.01852 (2015). http://arxiv.org/abs/1502.01852
Ji, Y., Smith, N.A.: Neural discourse structure for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 996–1005. Association for Computational Linguistics, Vancouver, July 2017. http://aclweb.org/anthology/P17-1092
Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 562–570. Association for Computational Linguistics, Vancouver, July 2017. http://aclweb.org/anthology/P17-1052
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 655–665. Association for Computational Linguistics, Baltimore, June 2014. http://www.aclweb.org/anthology/P14-1062
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Doha, October 2014. http://www.aclweb.org/anthology/D14-1181
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Krause, B., Murray, I., Renals, S., Lu, L.: Multiplicative LSTM for sequence modelling. \(\text{arXiv}\): Neural and Evolutionary Computing (2017)
Google Scholar
Le, H.T., Cerisara, C., Denis, A.: Do convolutional networks need to be deep for text classification (2017)
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Li, P., et al.: Dataset and neural recurrent sequence labeling model for open-domain factoid question answering. arXiv preprint arXiv:1607.06275 (2016)
Liu, P., Qiu, X., Huang, X.: Adversarial multi-task learning for text classification. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1–10. Association for Computational Linguistics, Vancouver, July 2017. http://aclweb.org/anthology/P17-1001
Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mou, L., et al.: How transferable are neural networks in NLP applications? In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 479–489. Association for Computational Linguistics, Austin, November 2016. https://aclweb.org/anthology/D16-1046
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, pp. 807–814 (2010)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Seo, M.J., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. In: International Conference on Learning Representations (2017)
Google Scholar
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks, pp. 1556–1566, July 2015. http://www.aclweb.org/anthology/P15-1150
Wang, P., et al.: Semantic clustering and convolutional neural network for short text categorization, pp. 352–357 (2015)
Google Scholar
Wang, W., Yang, N., Wei, F., Chang, B., Zhou, M.: Gated self-matching networks for reading comprehension and question answering. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 189–198 (2017)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., Hovy, E.H.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar
Yu, A.W., et al.: QANet: combining local convolution with global self-attention for reading comprehension. arXiv preprint arXiv:1804.09541 (2018)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. NIPS 2015, vol. 1, pp. 649–657 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Engineering, CAS, Beijing, China
Yiming Zhang & Weijian Deng
Beijing University of Posts and Telecommunications, Beijing, China
Jing Wang
Institute of Software, CAS, Beijing, China
Yaojie Lu

Authors

Yiming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weijian Deng
View author publications
You can also search for this author in PubMed Google Scholar
Yaojie Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiming Zhang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Harbin Institute of Technology, Harbin, China
Ting Liu
Beijing University of Posts and Telecommunications, Beijing, China
Xiaojie Wang
Tsinghua University, Beijing, China
Zhiyuan Liu
Tsinghua University, Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Wang, J., Deng, W., Lu, Y. (2018). A Word Embedding Transfer Model for Robust Text Categorization. In: Sun, M., Liu, T., Wang, X., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2018 2018. Lecture Notes in Computer Science(), vol 11221. Springer, Cham. https://doi.org/10.1007/978-3-030-01716-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-01716-3_26
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01715-6
Online ISBN: 978-3-030-01716-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics