Abstract
Word vector and topic model can help retrieve information semantically to some extent. However, there are still many problems. (1) Antonyms share high similarity when clustering with word vectors. (2) Number of all kinds of name entities, such as person name, location name, and organization name is infinite while the number of one specific name entity in corpus is limited. As the result, the vectors for these name entities are not fully trained. In order to overcome above problems, this paper proposes a word vector computation model based on implicit expression. Words with the same meaning are implicitly expression based on dictionary and part of speech. With the implicit expression, the sparsity of corpus is reduced, and word vectors are trained deeper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing (2014)
Huang, E.H., Socher, R., Manning, C.D., et al.: Improving word representations via global context and multiple word prototypes. In: Meeting of the Association for Computational Linguistics: Long Papers. Association for Computational Linguistics, pp. 873–882 (2012)
Mesnil, G., He, X., Deng, L., et al.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. Interspeech (2013)
Bastien, F., Lamblin, P., Pascanu, R., et al.: Theano: new features and speed improvements. Comput. Sci. (2012)
Socher, R., Huang, E.H., Pennington, J., et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Adv. Neural. Inf. Process. Syst. 24, 801–809 (2011)
Rigouste, L., Cappé, O., Yvon, F.: Inference and evaluation of the multinomial mixture model for text clustering. Inf. Process. Manag. 43(5), 1260–1280 (2006)
Hofmann, T.: Probabilistic latent semantic indexing. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Teh, Y.W., Jordan, M.I., Beal, M.J., et al.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)
Teh, Y.W., Jordan, M.I.: Hierarchical Bayesian Nonparametric Models With Applications. To appear in Bayesian Nonparametrics: Principles and Practice, pp. 158–207 (2009)
Acknowledgement
This work is supported by the National Natural Science Foundation of China (Grant No. 91646201, Grant No. 91224008) and by the National Basic Research Program of China (973 Program No. 2012CB719705).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Wang, X., Zhang, H. (2018). Word Vector Computation Based on Implicit Expression. In: Abawajy, J., Choo, KK., Islam, R. (eds) International Conference on Applications and Techniques in Cyber Security and Intelligence. ATCI 2017. Advances in Intelligent Systems and Computing, vol 580. Edizioni della Normale, Cham. https://doi.org/10.1007/978-3-319-67071-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-67071-3_12
Published:
Publisher Name: Edizioni della Normale, Cham
Print ISBN: 978-3-319-67070-6
Online ISBN: 978-3-319-67071-3
eBook Packages: EngineeringEngineering (R0)