Attention-Based Chinese Word Embedding

Liang, Yiyuan; Zhang, Wei; Yang, Kehua

doi:10.1007/978-3-030-00015-8_24

Yiyuan Liang¹⁶,
Wei Zhang¹⁶ &
Kehua Yang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11066))

Included in the following conference series:

International Conference on Cloud Computing and Security

1873 Accesses
3 Citations

Abstract

Recent studies have shown that the internal composition of the Chinese word provides rich semantic information for Chinese word representation. The Chinese word consists of one or more Chinese characters. Chinese characters have semantic information. And some Chinese characters have multiple meanings. Moreover, the composition of Chinese characters has different semantic contributions to word. In response to this phenomenon, this paper proposes a new attention-based model (ACWE) to learn Chinese word representation. At the same time, the “HIT IR-Lab Tongyici Cilin (Extended Version)” can calculate the semantic similarity between Chinese characters and words. And it can reduce the impact of data sparseness and improve the effectiveness of Chinese word representation. We evaluate the ACWE model from the similarity task and the analogical reasoning task, and the experimental results show that the ACWE model is superior to the existing baseline model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mei, J., Zheng, Y., Gao, Y., Yin, H.: TongYiCiCiLin. The Commercial Press, Shanghai (1984)
Google Scholar
Cao, S., Lu, W., Zhou, J., Li, X.: cw2vec: Learning Chinese word embeddings with stroke n-gram information. Association for the Advancement of Artificial Intelligence, pp. 158–160 (2018)
Google Scholar
Li, Z.: Parsing the internal structure of words: a new paradigm for chinese word segmentation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 1405–1414 (2011)
Google Scholar
Li, M., Zong, C., Ng, H.T.: Automatic evaluation of Chinese translation output: word-level or character-level. In: Proceedings of ACL, pp. 159–164 (2011)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS, pp. 3111–3119 (2013)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv, pp. 131–145 (2013)
Google Scholar
Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: Proceedings of ICML, pp. 1899–1907 (2014)
Google Scholar
Hermann, K.M., Blunsom, P.: Multilingual models for compositional distributed semantics. arXiv preprint arXiv, pp. 4–14 (2014)
Google Scholar
Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.: Joint learning of character and word embeddings. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), pp. 101–115 (2015)
Google Scholar
Li, Y., Li, W., Sun, F., Li, S.: Component-enhanced Chinese character embeddings. arXiv preprint arXiv, pp. 8–15 (2015)
Google Scholar
Lai, S., Liu, K., Xu, L., Zhao, J.: How to Generate a Good Word Embedding. arXiv, pp. 7–18 (2015)
Google Scholar
Myers, J.L., Well, A., Lorch, R.F.: Research design and statistical analysis. pp. 29–41 (2010)
Google Scholar
Xu, J., Liu, J., Zhang, L., Chen, H.: Improve Chinese word embeddings by exploiting internal structure. In: Proceedings of NAACL-HLT 2016, pp. 1041–1050 (2016)
Google Scholar
Chen, X., Jin, P., McCarthy, D., Carroll, J.: Integrating character representations into chinese word embedding. Chinese Lexical Semantics. LNCS (LNAI), vol. 10085, pp. 335–349. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49508-8_32
Chapter Google Scholar
Cao, K., Rei, M.: A joint model for word embedding and word morphology. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 18–26 (2016)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information. arXiv, pp. 7–16 (2016)
Google Scholar
Jin, P., Wu, Y.: Semeval-2012 task 4: evaluating Chinese word similarity. In: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 374–377 (2012)
Google Scholar
Su, T.-R., Lee, H.-Y.: Learning Chinese Word Representations From Glyphs Of Characters. arXiv, pp. 17–28 (2017)
Google Scholar
Zamani, H., Crof, W.B.: Relevance-based Word Embedding. arXiv, pp. 5–17 (2017)
Google Scholar
Yu, J., Jian, X., Xin, H., Song, Y.: Joint embeddings of Chinese words, characters, and fine-grained subcharacter components. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 286–291 (2017)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 3–5 (1988)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
Yiyuan Liang, Wei Zhang & Kehua Yang

Authors

Yiyuan Liang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kehua Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Zhang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Xingming Sun
Nanjing University of Information Science and Technology, Nanjing, China
Zhaoqing Pan
Department of Computer Science, Purdue University, West Lafayette, IN, USA
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, Y., Zhang, W., Yang, K. (2018). Attention-Based Chinese Word Embedding. In: Sun, X., Pan, Z., Bertino, E. (eds) Cloud Computing and Security. ICCCS 2018. Lecture Notes in Computer Science(), vol 11066. Springer, Cham. https://doi.org/10.1007/978-3-030-00015-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-00015-8_24
Published: 13 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00014-1
Online ISBN: 978-3-030-00015-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics