Conceptual Sentence Embeddings

Wang, Yashen; Huang, Heyan; Feng, Chong; Zhou, Qiang; Gu, Jiahui

doi:10.1007/978-3-319-39937-9_30

Yashen Wang¹⁸,
Heyan Huang¹⁸,
Chong Feng¹⁸,
Qiang Zhou¹⁸ &
…
Jiahui Gu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9658))

Included in the following conference series:

International Conference on Web-Age Information Management

1650 Accesses
2 Citations

Abstract

Most sentence embedding models typically represent each sentence only using word surface, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance discriminativeness, we employ concept conceptualization model to assign associated concepts for each sentence in the text corpus, and learn conceptual sentence embedding (CSE). Hence, the sentence representations are more expressive than some widely-used document representation models such as latent topic models, especially for short text. In the experiments, we evaluate the CSE models on two tasks, text classification and information retrieval. The experimental results show that the proposed models outperform typical sentence embedding models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Probase data is available at http://probase.msra.cn/dataset.aspx.
2.
http://hlipca.org/index.php/2014-12-09-02-55-58/2014-12-09-02-56-24/57-cse.
3.
http://www.lemurproject.org/lemur.php.

References

Harris, Z.S.: Distributional structure. Synth. Lang. Libr. 10, 146–162 (1954)
Google Scholar
Le, Q., V., Mikolov, T.: Distributed representations of sentences and documents (2014). arXiv preprint arXiv:1405.4053
Liu, Y., Liu, Z., Chua, T.S., et al.: Topical word embeddings. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Ma, M., Huang, L., Xiang, B., et al.: Dependency-based convolutional neural networks for sentence embedding (2015)
Google Scholar
Palangi, H., Deng, L., Shen, Y., et al.: Deep sentence embedding using long short-term memory networks. Arxiv. 24(4) 694–707 (2015)
Google Scholar
Wieting, J., Bansal, M., Gimpel, K., et al.: Towards universal paraphrastic sentence embeddings (2015). arXiv preprint arXiv:1511.08198
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space (2013). arXiv preprint arXiv:1301.3781
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Wang, M., Lu, Z., Li, H., et al.: Syntax-based deep matching of short texts (2015). arXiv preprint arXiv:1503.02427
Severyn, A., Moschitti, A., Tsagkias, M., et al.: A syntax-aware re-ranker for microblog retrieval. In: 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp 1067–1070. ACM Press (2014)
Google Scholar
Wang, Z., Zhao, K., Wang, H., et al.: Query understanding through knowledge-based conceptualization. In: 24th International Joint Conference on Artificial Intelligence, pp. 3264–3270. AAAI Press (2015)
Google Scholar
Song, Y., Wang, S., Wang, H.: Open domain short text conceptualization: a generative+descriptive modeling approach. In: 24th International Conference on Artificial Intelligence, pp. 3820–3826. AAAI Press (2015)
Google Scholar
Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: International Workshop on Artificial Intelligence and Statistics, pp. 246–252 (2005)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Huang, P.S., He, X., Gao, J., et al.: Learning deep structured semantic models for web search using clickthrough data. In: 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 2333–2338. ACM Press (2013)
Google Scholar
Wu, W., Li, H., Wang, H., et al.: Probase: a probabilistic taxonomy for text understanding. In: 2012 ACM SIGMOD International Conference on Management of Data, pp. 481–492. ACM Press (2012)
Google Scholar
Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the TREC-2011 microblog track. In: TREC 2011 (2011)
Google Scholar
Soboroff, I., Ounis, I., Lin, J.: Overview of the TREC-2012 microblog track. In: TREC 2012 (2012)
Google Scholar
Li, X., Roth, D.: Learning question classifiers. In: 19th International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics (2002)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., et al.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1986)
MATH Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Smola, A., Narayanamurthy, S.: An architecture for parallel topic models. VLDB Endowment 3(1–2), 703–710 (2010)
Article Google Scholar
Ahmed, A., Aly, M., Gonzalez, J., et al.: Scalable inference in latent variable models. In: Fifth ACM International Conference on Web Search and Data Mining, pp. 123–132. ACM Press (2012)
Google Scholar

Download references

Acknowledgement

The work was supported by National Natural Science Foundation of China (Grant Nos. 61132009, 61201351), and National Hi-Tech Research & Development Program (863 Program, Grant No. 2015AA015404).

Author information

Authors and Affiliations

Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, School of Computer, Beijing Institute of Technology, Beijing, China
Yashen Wang, Heyan Huang, Chong Feng, Qiang Zhou & Jiahui Gu

Authors

Yashen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Heyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Gu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yashen Wang .

Editor information

Editors and Affiliations

Peking University , Beijing, China
Bin Cui
The George Washington University, Washington, D.C., USA
Nan Zhang
Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
Jianliang Xu
University of Texas Rio Grande Valley, Edinburg, Texas, USA
Xiang Lian
Jiangxi University of Finance and Economics, Nanchang, Jiangxi, China
Dexi Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Huang, H., Feng, C., Zhou, Q., Gu, J. (2016). Conceptual Sentence Embeddings. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9658. Springer, Cham. https://doi.org/10.1007/978-3-319-39937-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-39937-9_30
Published: 28 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39936-2
Online ISBN: 978-3-319-39937-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics