Addressing Unseen Word Problem in Text Classification

Yenigalla, Promod; Kar, Sibsambhu; Singh, Chirag; Nagar, Ajay; Mathur, Gaurav

doi:10.1007/978-3-319-91947-8_36

Promod Yenigalla¹⁸,
Sibsambhu Kar¹⁸,
Chirag Singh¹⁸,
Ajay Nagar¹⁸ &
…
Gaurav Mathur¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10859))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

2545 Accesses
7 Citations

Abstract

Word based Deep Neural Network (DNN) approach of text classification suffers performance issues due to limited set of vocabulary words. Character based Convolutional Neural Network models (CNN) was proposed by the researchers to address the issue. But, character based models do not inherently capture the sequential relationship of words in texts. Hence, there is scope of further improvement by addressing unseen word problem through character model while maintaining the sequential context through word based model. In this work, we propose methods to combine both character and word based models for efficient text classification. The methods are compared with some of the benchmark datasets and state-of-the art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of EMNLP 2014 Conference, pp. 1746–1751 (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of INTERSPEECH 2015, pp. 3057–3061 (2015)
Google Scholar
Dai, A.M., Olah, C., Le, Q.V.: Document embedding with paragraph vectors. arXiv:1507.07998v1 [cs.CL], 29 July 2015
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M: Character aware neural language models. arXiv:1508.06615v4 [cs.CL], 1 December 2015
Chen, T., Xu, R., He, Y., Wang, X.: Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst. Appl. 72, 221–230 (2017)
Article Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st ICML, Beijing, China, vol. 32, JMLR: W&CP (2014)
Google Scholar
Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: CHARAGRAM: Embedding Words and Sentences via Character n-grams. arXiv:1607.02789v1 [cs.CL], 10 July 2016
Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: Proceedings of IJCAI (2017)
Google Scholar
Liang, D., Xu, W., Zhao, Y.: Combining word-level and character-level representations for relation classification of informal text: In: Proceedings of the 2nd Workshop on Representation Learning for NLP, Vancouver, Canada, pp. 43–47, 3 August 2017
Google Scholar
Yiny, W., Kanny, K., Yuz, M., Schutze, H.: Comparative Study of CNN and RNN for Natural Language Processing. arXiv:1702.01923v1 [cs.CL], 7 February 2017
Mikolov, T., Karafiat, M., Burget, L., Cernoky, J.H., Khundanpur, S.: Recurrent neural network based language model. In: Proceedings of Interspeech (2010)
Google Scholar
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of EMNLP 2015, pp. 1422–1432 (2015)
Google Scholar
Wang, P., Xu, J., Xu, B., Liu, C., Zhang, H., Wang, F., Hao, H.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings ACL, pp. 352–357 (2015)
Google Scholar
Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the ACL, vol. 1, Long Papers, pp. 1107–1116 (2017)
Google Scholar
Johnson, R., Zhang, T.: Convolutional neural networks for text categorization: Shallow word-level vs. deep character-level (2016). arXiv preprint: arXiv:1609.00718
Johnson, R. Zhang, T.: Supervised and semi-supervised text categorization using LSTM for region embeddings. In: Proceedings of the 33rd ICML, New York, USA (2016)
Google Scholar
Zhou, C., Sun, C., Liu, Z., Lau, F.C.M.: A C-LSTM Neural Network for Text Classification. https://arxiv.org/pdf/1511.08630
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546
Dataset source. https://github.com/AcademiaSinicaNLPLab/sentiment_dataset, 2 January 2018
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS, vol. 9, pp. 249–256 (2010)
Google Scholar
Liu, P., Qiu, X., Huang, X.: Recurrent Neural Network for Text Classification with Multi-Task Learning. arXiv:1605.05101v1 [cs.CL], 17 May 2016

Download references

Author information

Authors and Affiliations

Samsung R&D Institute India, Bangalore, India
Promod Yenigalla, Sibsambhu Kar, Chirag Singh, Ajay Nagar & Gaurav Mathur

Authors

Promod Yenigalla
View author publications
You can also search for this author in PubMed Google Scholar
Sibsambhu Kar
View author publications
You can also search for this author in PubMed Google Scholar
Chirag Singh
View author publications
You can also search for this author in PubMed Google Scholar
Ajay Nagar
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Mathur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Promod Yenigalla .

Editor information

Editors and Affiliations

Université de Franche-Comté, Besançon, France
Max Silberztein
Conservatoire National des Arts et Métiers, Paris, France
Faten Atigui
Conservatoire National des Arts et Métiers, Paris, France
Elena Kornyshova
Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais
University of Salford, Manchester, United Kingdom
Farid Meziane

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yenigalla, P., Kar, S., Singh, C., Nagar, A., Mathur, G. (2018). Addressing Unseen Word Problem in Text Classification. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-91947-8_36
Published: 22 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics