Challenges and Solutions with Alignment and Enrichment of Word Embedding Models

Şahin, Cem Şafak; Caceres, Rajmonda S.; Oselio, Brandon; Campbell, William M.

doi:10.1007/978-3-319-59569-6_31

Challenges and Solutions with Alignment and Enrichment of Word Embedding Models

Cem Şafak Şahin¹⁷,
Rajmonda S. Caceres¹⁷,
Brandon Oselio¹⁷ &
…
William M. Campbell¹⁷

Conference paper
First Online: 02 June 2017

1993 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10260))

Abstract

Word embedding models offer continuous vector representations that can capture rich semantics of word co-occurrence patterns. Although these models have improved the state-of-the-art on a number of nlp tasks, many open research questions remain. We study the semantic consistency and alignment of these models and show that their local properties are sensitive to even slight variations in the training datasets and parameters. We propose a solution that improves alignment of different word embedding models by leveraging carefully generated synthetic data points. Our approach leads to substantial improvements in recovering consistent and richer embeddings of local semantics.

This work was sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Latest wikipedia dump. http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
Boucher, T., Carey, C., Mahadevan, S., Dyar, M.D.: Aligning mixed manifolds. In: Proceedings of 29th AAAI Conference on AI, pp. 2511–2517 (2015)
Google Scholar
Hashimoto, T., Alvarez-Melis, D., Jaakkola, T.: Word embeddings as metric recovery in semantic spaces. Trans. Assoc. Comp. Linguist. 4, 273–286 (2016)
Google Scholar
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of 50th Annual Meeting of ACL, vol. 1. pp. 873–882 (2012)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. CoRR abs/1405.4053 (2014)
Google Scholar
Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction (2007)
Google Scholar
Lee, L.S.: On the linear algebraic structure of distributed word representations. CoRR abs/1511.06961 (2015)
Google Scholar
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. Adv. Neural Inf. Process. Syst. 27, 2177–2185 (2014)
Google Scholar
van der Maaten, L., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: a comparative review (2008)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26, 3111–3119 (2013)
Google Scholar
Safak Sahin, C., Caceres, R.S., Oselio, B., Campbell, W.M.: Consistent alignment of word embedding model (2017). arXiv:1702.07680
Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of Conference on EMNLP, pp. 298–307 (2015)
Google Scholar
Tsvetkov, Y., Faruqui, M., Ling, W., Lample, G., Dyer, C.: Evaluation of word vector representations by subspace alignment. In: Proceedings of Conference on EMNLP (2015)
Google Scholar
Wang, C., Cao, L., Fan, J.: Building joint spaces for relation extraction. Relat. 1, 16 (2016)
Google Scholar
Wang, C., Mahadevan, S.: Manifold alignment without correspondence. In: Proceedings of 21st IJCAI, CA, USA, pp. 1273–1278 (2009)
Google Scholar
Wang, C., Mahadevan, S.: Heterogeneous domain adaptation using manifold alignment. In: Proceedings of 22nd IJCAI, vol. 2. pp. 1541–1546 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

MIT Lincoln Laboratory, Lexington, MA, USA
Cem Şafak Şahin, Rajmonda S. Caceres, Brandon Oselio & William M. Campbell

Authors

Cem Şafak Şahin
View author publications
You can also search for this author in PubMed Google Scholar
Rajmonda S. Caceres
View author publications
You can also search for this author in PubMed Google Scholar
Brandon Oselio
View author publications
You can also search for this author in PubMed Google Scholar
William M. Campbell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cem Şafak Şahin .

Editor information

Editors and Affiliations

Erasmus University Rotterdam, Rotterdam, The Netherlands
Flavius Frasincar
University of Liège , Liège, Belgium
Ashwin Ittoo
Japan Advanced Institute of Science and Technology, Nomi, Japan
Le Minh Nguyen
Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Şahin, C.Ş., Caceres, R.S., Oselio, B., Campbell, W.M. (2017). Challenges and Solutions with Alignment and Enrichment of Word Embedding Models. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-59569-6_31
Published: 02 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59568-9
Online ISBN: 978-3-319-59569-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics