Abstract
Word embedding models offer continuous vector representations that can capture rich semantics of word co-occurrence patterns. Although these models have improved the state-of-the-art on a number of nlp tasks, many open research questions remain. We study the semantic consistency and alignment of these models and show that their local properties are sensitive to even slight variations in the training datasets and parameters. We propose a solution that improves alignment of different word embedding models by leveraging carefully generated synthetic data points. Our approach leads to substantial improvements in recovering consistent and richer embeddings of local semantics.
This work was sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Latest wikipedia dump. http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
Boucher, T., Carey, C., Mahadevan, S., Dyar, M.D.: Aligning mixed manifolds. In: Proceedings of 29th AAAI Conference on AI, pp. 2511–2517 (2015)
Hashimoto, T., Alvarez-Melis, D., Jaakkola, T.: Word embeddings as metric recovery in semantic spaces. Trans. Assoc. Comp. Linguist. 4, 273–286 (2016)
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of 50th Annual Meeting of ACL, vol. 1. pp. 873–882 (2012)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. CoRR abs/1405.4053 (2014)
Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction (2007)
Lee, L.S.: On the linear algebraic structure of distributed word representations. CoRR abs/1511.06961 (2015)
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. Adv. Neural Inf. Process. Syst. 27, 2177–2185 (2014)
van der Maaten, L., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: a comparative review (2008)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26, 3111–3119 (2013)
Safak Sahin, C., Caceres, R.S., Oselio, B., Campbell, W.M.: Consistent alignment of word embedding model (2017). arXiv:1702.07680
Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of Conference on EMNLP, pp. 298–307 (2015)
Tsvetkov, Y., Faruqui, M., Ling, W., Lample, G., Dyer, C.: Evaluation of word vector representations by subspace alignment. In: Proceedings of Conference on EMNLP (2015)
Wang, C., Cao, L., Fan, J.: Building joint spaces for relation extraction. Relat. 1, 16 (2016)
Wang, C., Mahadevan, S.: Manifold alignment without correspondence. In: Proceedings of 21st IJCAI, CA, USA, pp. 1273–1278 (2009)
Wang, C., Mahadevan, S.: Heterogeneous domain adaptation using manifold alignment. In: Proceedings of 22nd IJCAI, vol. 2. pp. 1541–1546 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Şahin, C.Ş., Caceres, R.S., Oselio, B., Campbell, W.M. (2017). Challenges and Solutions with Alignment and Enrichment of Word Embedding Models. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-59569-6_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59568-9
Online ISBN: 978-3-319-59569-6
eBook Packages: Computer ScienceComputer Science (R0)