Abstract
Recent studies have shown that the internal composition of the Chinese word provides rich semantic information for Chinese word representation. The Chinese word consists of one or more Chinese characters. Chinese characters have semantic information. And some Chinese characters have multiple meanings. Moreover, the composition of Chinese characters has different semantic contributions to word. In response to this phenomenon, this paper proposes a new attention-based model (ACWE) to learn Chinese word representation. At the same time, the “HIT IR-Lab Tongyici Cilin (Extended Version)” can calculate the semantic similarity between Chinese characters and words. And it can reduce the impact of data sparseness and improve the effectiveness of Chinese word representation. We evaluate the ACWE model from the similarity task and the analogical reasoning task, and the experimental results show that the ACWE model is superior to the existing baseline model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mei, J., Zheng, Y., Gao, Y., Yin, H.: TongYiCiCiLin. The Commercial Press, Shanghai (1984)
Cao, S., Lu, W., Zhou, J., Li, X.: cw2vec: Learning Chinese word embeddings with stroke n-gram information. Association for the Advancement of Artificial Intelligence, pp. 158–160 (2018)
Li, Z.: Parsing the internal structure of words: a new paradigm for chinese word segmentation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 1405–1414 (2011)
Li, M., Zong, C., Ng, H.T.: Automatic evaluation of Chinese translation output: word-level or character-level. In: Proceedings of ACL, pp. 159–164 (2011)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS, pp. 3111–3119 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv, pp. 131–145 (2013)
Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: Proceedings of ICML, pp. 1899–1907 (2014)
Hermann, K.M., Blunsom, P.: Multilingual models for compositional distributed semantics. arXiv preprint arXiv, pp. 4–14 (2014)
Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.: Joint learning of character and word embeddings. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), pp. 101–115 (2015)
Li, Y., Li, W., Sun, F., Li, S.: Component-enhanced Chinese character embeddings. arXiv preprint arXiv, pp. 8–15 (2015)
Lai, S., Liu, K., Xu, L., Zhao, J.: How to Generate a Good Word Embedding. arXiv, pp. 7–18 (2015)
Myers, J.L., Well, A., Lorch, R.F.: Research design and statistical analysis. pp. 29–41 (2010)
Xu, J., Liu, J., Zhang, L., Chen, H.: Improve Chinese word embeddings by exploiting internal structure. In: Proceedings of NAACL-HLT 2016, pp. 1041–1050 (2016)
Chen, X., Jin, P., McCarthy, D., Carroll, J.: Integrating character representations into chinese word embedding. Chinese Lexical Semantics. LNCS (LNAI), vol. 10085, pp. 335–349. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49508-8_32
Cao, K., Rei, M.: A joint model for word embedding and word morphology. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 18–26 (2016)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information. arXiv, pp. 7–16 (2016)
Jin, P., Wu, Y.: Semeval-2012 task 4: evaluating Chinese word similarity. In: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 374–377 (2012)
Su, T.-R., Lee, H.-Y.: Learning Chinese Word Representations From Glyphs Of Characters. arXiv, pp. 17–28 (2017)
Zamani, H., Crof, W.B.: Relevance-based Word Embedding. arXiv, pp. 5–17 (2017)
Yu, J., Jian, X., Xin, H., Song, Y.: Joint embeddings of Chinese words, characters, and fine-grained subcharacter components. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 286–291 (2017)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 3–5 (1988)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Liang, Y., Zhang, W., Yang, K. (2018). Attention-Based Chinese Word Embedding. In: Sun, X., Pan, Z., Bertino, E. (eds) Cloud Computing and Security. ICCCS 2018. Lecture Notes in Computer Science(), vol 11066. Springer, Cham. https://doi.org/10.1007/978-3-030-00015-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-00015-8_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00014-1
Online ISBN: 978-3-030-00015-8
eBook Packages: Computer ScienceComputer Science (R0)