Skip to main content

Topic Enhanced Word Vectors for Documents Representation

  • Conference paper
  • First Online:
Social Media Processing (SMP 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 774))

Included in the following conference series:

  • 1843 Accesses

Abstract

The words representation, as basic elements of documents representation, plays a crucial role in natural language processing. Topic models and Word embedding models have made great progress on words representation. There are some researches that combine the two models with each other, most of them assume that the semantics of context depends on the semantics of the current word and topic of the current word. This paper proposes a topic enhanced word vectors model (TEWV), which enhances the representation capability of word vectors by integrating topic information and semantics of context. Different from previous works, TEWV assumes that the semantics of the current word depends on the semantics of context and the topic, which is more consistent with common sense in dependency relationship. The experimental results on the 20NewsGroup dataset show that our approach achieves better performance than state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://github.com/LoveMercy/Topic-Enhanced-Word-Vector.

  2. 2.

    http://qwone.com/~jason/20Newsgroups/.

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(1), 993–1022 (2003)

    MATH  Google Scholar 

  2. Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  3. Liu, Y., Liu, Z., Chua, T.S., et al.: Topical word embeddings. In: 29th AAAI Conference on Artificial Intelligence, pp. 2418–2424. AAAI Press, California (2015)

    Google Scholar 

  4. Law, J., Zhuo, H.H., He, J., et al.: LTSG: Latent Topical Skip-Gram for mutually learning topic model and vector representations. arXiv preprint arXiv:1702.07117 (2017)

  5. Li, S., Chua, T.S., Zhu, J., et al.: Generative topic embedding: a continuous representation of documents. In: Meeting of the Association for Computational Linguistics, pp. 666–675 (2016)

    Google Scholar 

  6. Fan, R.E., Chang, K.W., Hsieh, C.J., et al.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(8), 1871–1874 (2008)

    MATH  Google Scholar 

  7. Lang, K.: Newsweeder: learning to Filter Netnews. In: 12th International Conference on Machine Learning, pp. 331–339 (1995)

    Google Scholar 

  8. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. J. Am. Soc. Inf. Sci. Technol. 43(3), 824–825 (2008)

    MATH  Google Scholar 

  9. Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3(2), 1137–1155 (2003)

    MATH  Google Scholar 

  10. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: 25th International Conference on Machine Learning, pp. 160–167. ACM Press, New York (2008)

    Google Scholar 

  11. Collobert, R., Weston, J., Bottou, L., et al.: Natural language processing (Almost) from scratch. J. Mach. Learn. Res. 12(8), 2493–2537 (2011)

    MATH  Google Scholar 

  12. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: 48th Annual Meeting of The Association for Computational Linguistics, pp. 384–394. Association for Computational Linguistics, Pennsylvania (2010)

    Google Scholar 

  13. Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  14. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: 31st International Conference on Machine Learning, pp. 1188–1196 (2014)

    Google Scholar 

  15. Tian, F., Gao, B., He, D., et al.: Sentence level recurrent topic model: letting topics speak for themselves. arXiv preprint arXiv:1604.02038 (2016)

Download references

Acknowledgments

We thank the anonymous mentor provided by SMP for the careful proofreading. This work was supported by: National Natural Science Foundation of China (61632011, 61573231, 61672331, 61432011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suge Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Li, D., Li, Y., Wang, S. (2017). Topic Enhanced Word Vectors for Documents Representation. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds) Social Media Processing. SMP 2017. Communications in Computer and Information Science, vol 774. Springer, Singapore. https://doi.org/10.1007/978-981-10-6805-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6805-8_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6804-1

  • Online ISBN: 978-981-10-6805-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics