Skip to main content

Challenges and Solutions with Alignment and Enrichment of Word Embedding Models

  • Conference paper
  • First Online:
  • 1993 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10260))

Abstract

Word embedding models offer continuous vector representations that can capture rich semantics of word co-occurrence patterns. Although these models have improved the state-of-the-art on a number of nlp tasks, many open research questions remain. We study the semantic consistency and alignment of these models and show that their local properties are sensitive to even slight variations in the training datasets and parameters. We propose a solution that improves alignment of different word embedding models by leveraging carefully generated synthetic data points. Our approach leads to substantial improvements in recovering consistent and richer embeddings of local semantics.

This work was sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Latest wikipedia dump. http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

  2. Boucher, T., Carey, C., Mahadevan, S., Dyar, M.D.: Aligning mixed manifolds. In: Proceedings of 29th AAAI Conference on AI, pp. 2511–2517 (2015)

    Google Scholar 

  3. Hashimoto, T., Alvarez-Melis, D., Jaakkola, T.: Word embeddings as metric recovery in semantic spaces. Trans. Assoc. Comp. Linguist. 4, 273–286 (2016)

    Google Scholar 

  4. Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of 50th Annual Meeting of ACL, vol. 1. pp. 873–882 (2012)

    Google Scholar 

  5. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. CoRR abs/1405.4053 (2014)

    Google Scholar 

  6. Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction (2007)

    Google Scholar 

  7. Lee, L.S.: On the linear algebraic structure of distributed word representations. CoRR abs/1511.06961 (2015)

    Google Scholar 

  8. Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. Adv. Neural Inf. Process. Syst. 27, 2177–2185 (2014)

    Google Scholar 

  9. van der Maaten, L., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: a comparative review (2008)

    Google Scholar 

  10. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)

    Google Scholar 

  11. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26, 3111–3119 (2013)

    Google Scholar 

  12. Safak Sahin, C., Caceres, R.S., Oselio, B., Campbell, W.M.: Consistent alignment of word embedding model (2017). arXiv:1702.07680

  13. Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of Conference on EMNLP, pp. 298–307 (2015)

    Google Scholar 

  14. Tsvetkov, Y., Faruqui, M., Ling, W., Lample, G., Dyer, C.: Evaluation of word vector representations by subspace alignment. In: Proceedings of Conference on EMNLP (2015)

    Google Scholar 

  15. Wang, C., Cao, L., Fan, J.: Building joint spaces for relation extraction. Relat. 1, 16 (2016)

    Google Scholar 

  16. Wang, C., Mahadevan, S.: Manifold alignment without correspondence. In: Proceedings of 21st IJCAI, CA, USA, pp. 1273–1278 (2009)

    Google Scholar 

  17. Wang, C., Mahadevan, S.: Heterogeneous domain adaptation using manifold alignment. In: Proceedings of 22nd IJCAI, vol. 2. pp. 1541–1546 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cem Şafak Şahin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Şahin, C.Ş., Caceres, R.S., Oselio, B., Campbell, W.M. (2017). Challenges and Solutions with Alignment and Enrichment of Word Embedding Models. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59569-6_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59568-9

  • Online ISBN: 978-3-319-59569-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics