Topic Enhanced Word Vectors for Documents Representation

Li, Dayu; Li, Yang; Wang, Suge

doi:10.1007/978-981-10-6805-8_14

Dayu Li¹⁵,
Yang Li¹⁵ &
Suge Wang^15,16

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 774))

Included in the following conference series:

Chinese National Conference on Social Media Processing

1843 Accesses

Abstract

The words representation, as basic elements of documents representation, plays a crucial role in natural language processing. Topic models and Word embedding models have made great progress on words representation. There are some researches that combine the two models with each other, most of them assume that the semantics of context depends on the semantics of the current word and topic of the current word. This paper proposes a topic enhanced word vectors model (TEWV), which enhances the representation capability of word vectors by integrating topic information and semantics of context. Different from previous works, TEWV assumes that the semantics of the current word depends on the semantics of context and the topic, which is more consistent with common sense in dependency relationship. The experimental results on the 20NewsGroup dataset show that our approach achieves better performance than state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(1), 993–1022 (2003)
MATH Google Scholar
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Liu, Y., Liu, Z., Chua, T.S., et al.: Topical word embeddings. In: 29th AAAI Conference on Artificial Intelligence, pp. 2418–2424. AAAI Press, California (2015)
Google Scholar
Law, J., Zhuo, H.H., He, J., et al.: LTSG: Latent Topical Skip-Gram for mutually learning topic model and vector representations. arXiv preprint arXiv:1702.07117 (2017)
Li, S., Chua, T.S., Zhu, J., et al.: Generative topic embedding: a continuous representation of documents. In: Meeting of the Association for Computational Linguistics, pp. 666–675 (2016)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., et al.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(8), 1871–1874 (2008)
MATH Google Scholar
Lang, K.: Newsweeder: learning to Filter Netnews. In: 12th International Conference on Machine Learning, pp. 331–339 (1995)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. J. Am. Soc. Inf. Sci. Technol. 43(3), 824–825 (2008)
MATH Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3(2), 1137–1155 (2003)
MATH Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: 25th International Conference on Machine Learning, pp. 160–167. ACM Press, New York (2008)
Google Scholar
Collobert, R., Weston, J., Bottou, L., et al.: Natural language processing (Almost) from scratch. J. Mach. Learn. Res. 12(8), 2493–2537 (2011)
MATH Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: 48th Annual Meeting of The Association for Computational Linguistics, pp. 384–394. Association for Computational Linguistics, Pennsylvania (2010)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: 31st International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
Tian, F., Gao, B., He, D., et al.: Sentence level recurrent topic model: letting topics speak for themselves. arXiv preprint arXiv:1604.02038 (2016)

Download references

Acknowledgments

We thank the anonymous mentor provided by SMP for the careful proofreading. This work was supported by: National Natural Science Foundation of China (61632011, 61573231, 61672331, 61432011).

Author information

Authors and Affiliations

School of Computer & Information Technology, Shanxi University, Taiyuan, 030006, China
Dayu Li, Yang Li & Suge Wang
Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan, 030006, China
Suge Wang

Authors

Dayu Li
View author publications
You can also search for this author in PubMed Google Scholar
Yang Li
View author publications
You can also search for this author in PubMed Google Scholar
Suge Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suge Wang .

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xueqi Cheng
Beijing Jinri Toutiao Technology Co. Ltd , Beijing, China
Weiying Ma
Arizona State University , Tempe, Arizona, USA
Huan Liu
Institute of Computing Technology, Chinese Academy of Sciences , Beijing, China
Huawei Shen
Renmin University of China , Beijing, China
Shizheng Feng
Microsoft Asia Research , Beijing, China
Xing Xie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, D., Li, Y., Wang, S. (2017). Topic Enhanced Word Vectors for Documents Representation. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds) Social Media Processing. SMP 2017. Communications in Computer and Information Science, vol 774. Springer, Singapore. https://doi.org/10.1007/978-981-10-6805-8_14

Download citation

DOI: https://doi.org/10.1007/978-981-10-6805-8_14
Published: 26 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6804-1
Online ISBN: 978-981-10-6805-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics