Abstract
The words representation, as basic elements of documents representation, plays a crucial role in natural language processing. Topic models and Word embedding models have made great progress on words representation. There are some researches that combine the two models with each other, most of them assume that the semantics of context depends on the semantics of the current word and topic of the current word. This paper proposes a topic enhanced word vectors model (TEWV), which enhances the representation capability of word vectors by integrating topic information and semantics of context. Different from previous works, TEWV assumes that the semantics of the current word depends on the semantics of context and the topic, which is more consistent with common sense in dependency relationship. The experimental results on the 20NewsGroup dataset show that our approach achieves better performance than state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(1), 993–1022 (2003)
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Liu, Y., Liu, Z., Chua, T.S., et al.: Topical word embeddings. In: 29th AAAI Conference on Artificial Intelligence, pp. 2418–2424. AAAI Press, California (2015)
Law, J., Zhuo, H.H., He, J., et al.: LTSG: Latent Topical Skip-Gram for mutually learning topic model and vector representations. arXiv preprint arXiv:1702.07117 (2017)
Li, S., Chua, T.S., Zhu, J., et al.: Generative topic embedding: a continuous representation of documents. In: Meeting of the Association for Computational Linguistics, pp. 666–675 (2016)
Fan, R.E., Chang, K.W., Hsieh, C.J., et al.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(8), 1871–1874 (2008)
Lang, K.: Newsweeder: learning to Filter Netnews. In: 12th International Conference on Machine Learning, pp. 331–339 (1995)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. J. Am. Soc. Inf. Sci. Technol. 43(3), 824–825 (2008)
Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3(2), 1137–1155 (2003)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: 25th International Conference on Machine Learning, pp. 160–167. ACM Press, New York (2008)
Collobert, R., Weston, J., Bottou, L., et al.: Natural language processing (Almost) from scratch. J. Mach. Learn. Res. 12(8), 2493–2537 (2011)
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: 48th Annual Meeting of The Association for Computational Linguistics, pp. 384–394. Association for Computational Linguistics, Pennsylvania (2010)
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: 31st International Conference on Machine Learning, pp. 1188–1196 (2014)
Tian, F., Gao, B., He, D., et al.: Sentence level recurrent topic model: letting topics speak for themselves. arXiv preprint arXiv:1604.02038 (2016)
Acknowledgments
We thank the anonymous mentor provided by SMP for the careful proofreading. This work was supported by: National Natural Science Foundation of China (61632011, 61573231, 61672331, 61432011).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, D., Li, Y., Wang, S. (2017). Topic Enhanced Word Vectors for Documents Representation. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds) Social Media Processing. SMP 2017. Communications in Computer and Information Science, vol 774. Springer, Singapore. https://doi.org/10.1007/978-981-10-6805-8_14
Download citation
DOI: https://doi.org/10.1007/978-981-10-6805-8_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6804-1
Online ISBN: 978-981-10-6805-8
eBook Packages: Computer ScienceComputer Science (R0)