SemVec: Semantic Features Word Vectors Based Deep Learning for Improved Text Classification

Odeh, Feras; Taweel, Adel

doi:10.1007/978-3-030-04070-3_35

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11324))

Included in the following conference series:

International Conference on Theory and Practice of Natural Computing

1050 Accesses
2 Citations

Abstract

Semantic word representation is a core building block in many deep learning systems. Most word representation techniques are based on words angle/distance, word analogies and statistical information. However, popular models ignore word morphology by representing each word with a distinct vector. This limits their ability to represent rare words in languages with large vocabulary. This paper proposes a dynamic model, named SemVec, for representing words as a vector of both domain and semantic features. Based on the problem domain, semantic features can be added or removed to generate an enriched word representation with domain knowledge. The proposed method is evaluated on adverse drug events (ADR) tweets/text classification. Results show that SemVec improves the precision of ADR detection by 15.28% over other state-of-the-art deep learning methods with a comparable recall score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://keras.io/.

References

Akhtyamova, L., Alexandrov, M., Cardiff, J.: Adverse drug extraction in twitter data using convolutional neural network. In: 2017 28th International Workshop on Database and Expert Systems Applications (DEXA), pp. 88–92. IEEE (2017)
Google Scholar
Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, vol. 10, pp. 2200–2204 (2010)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)
Google Scholar
Dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: COLING, pp. 69–78 (2014)
Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)
Google Scholar
Huynh, T., He, Y., Willis, A., Rüger, S.: Adverse drug reaction classification with deep neural networks. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 877–887 (2016)
Google Scholar
Johnson, R., Zhang, T.: Semi-supervised convolutional neural networks for text categorization via region embedding. In: Advances in Neural Information Processing Systems, pp. 919–927 (2015)
Google Scholar
Johnson, R., Zhang, T.: Supervised and semi-supervised text categorization using LSTM for region embeddings. In: International Conference on Machine Learning, pp. 526–534 (2016)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827 (2016)
Lee, K., et al.: Adverse drug event detection in tweets with semi-supervised convolutional neural networks. In: Proceedings of the 26th International Conference on World Wide Web, pp. 705–714. International World Wide Web Conferences Steering Committee (2017)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)
Google Scholar
Niu, Y., Zhu, X., Li, J., Hirst, G.: Analysis of polarity information in medical text. In: AMIA Annual Symposium Proceedings, vol. 2005, p. 570. American Medical Informatics Association (2005)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. EMNLP 14, 1532–1543 (2014)
Google Scholar
Poria, S., Cambria, E., Gelbukh, A.: Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2539–2544 (2015)
Google Scholar
Sahu, S.K., Anand, A., Oruganty, K., Gattu, M.: Relation extraction from clinical texts using domain invariant convolutional neural network. arXiv preprint arXiv:1606.09370 (2016)
Sarker, A., Aliod, D.M., Paris, C.: Automatic prediction of evidence-based recommendations via sentence-level polarity classification. In: IJCNLP, pp. 712–718 (2013)
Google Scholar
Sarker, A., Gonzalez, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J. Biomed. Inf. 53, 196–207 (2015)
Article Google Scholar
Wang, J., Yu, L.C., Lai, K.R., Zhang, X.: Dimensional sentiment analysis using a regional CNN-LSTM model. In: ACL 2016-Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, vol. 2, pp. 225–230 (2016)
Google Scholar
Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)
Article Google Scholar
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Association for Computational Linguistics (2005)
Google Scholar
Xiao, Y., Cho, K.: Efficient character-level document classification by combining convolution and recurrent layers. arXiv preprint arXiv:1602.00367 (2016)
Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep neural network. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 2335–2344 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Birzeit University, Birzeit, Palestine
Feras Odeh & Adel Taweel

Authors

Feras Odeh
View author publications
You can also search for this author in PubMed Google Scholar
Adel Taweel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feras Odeh .

Editor information

Editors and Affiliations

School of Business, University College Dublin, Dublin, Ireland
David Fagan
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
School of Business, University College Dublin, Dublin, Ireland
Michael O'Neill
Escuela Politécnica, University of Extremadura, Caceres, Spain
Miguel A. Vega-Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Odeh, F., Taweel, A. (2018). SemVec: Semantic Features Word Vectors Based Deep Learning for Improved Text Classification. In: Fagan, D., Martín-Vide, C., O'Neill, M., Vega-Rodríguez, M.A. (eds) Theory and Practice of Natural Computing. TPNC 2018. Lecture Notes in Computer Science(), vol 11324. Springer, Cham. https://doi.org/10.1007/978-3-030-04070-3_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-04070-3_35
Published: 22 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04069-7
Online ISBN: 978-3-030-04070-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics