Skip to main content

Learning Unsupervised Word Mapping via Maximum Mean Discrepancy

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11838))

Abstract

Cross-lingual word embeddings aim at capturing common linguistic regularities of different languages. Recently, it has been shown that these embeddings can be effectively learned by aligning two disjoint monolingual vector spaces through a simple linear transformation (word mapping). In this work, we focus on learning such a word mapping without any supervision signal. Most previous work of this task adopts adversarial training or parametric metrics to perform distribution-matching, which typically requires a sophisticated alternate optimization process, either in the form of minmax game or intermediate density estimation. This alternate optimization process is relatively hard and unstable. In order to avoid such sophisticated alternate optimization, we propose to learn unsupervised word mapping by directly minimize the maximum mean discrepancy between the distribution of the transferred embedding and target embedding. Extensive experimental results show that our proposed model can substantially outperform several state-of-the-art unsupervised systems, and even achieves competitive performance compared to supervised methods. Further analysis demonstrates the effectiveness of our approach in improving stability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the experiment, we can observe that the eigenvalues of the matrix \(\mathbf {W}\) all have a modulus close to 1.

  2. 2.

    We train a specific compression network separately for each language pair.

  3. 3.

    We also tried CSLS retrieval and results show that our approach achieved consistent improvement over baselines. Due to page limitations, we only report results with cosine similarity.

  4. 4.

    Due to page limitations, for each language pair, we only show results in one direction because the conclusions drawn from the other direction are the same. For example, we only show EN-FR and ignore FR-EN. Same in Tables 3, 4, and Fig. 1.

References

  1. Aldarmaki, H., Mohan, M., Diab, M.T.: Unsupervised word mapping using structural similarities in monolingual embeddings. TACL 6, 185–196 (2018)

    Google Scholar 

  2. Alvarez-Melis, D., Jaakkola, T.S.: Gromov-Wasserstein alignment of word embedding spaces. In: EMNLP, pp. 1881–1890 (2018)

    Google Scholar 

  3. Artetxe, M., Labaka, G., Agirre, E.: Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In: EMNLP, pp. 2289–2294 (2016)

    Google Scholar 

  4. Artetxe, M., Labaka, G., Agirre, E.: Learning bilingual word embeddings with (almost) no bilingual data. In: ACL, pp. 451–462 (2017)

    Google Scholar 

  5. Artetxe, M., Labaka, G., Agirre, E.: Generalizing and improving bilingual word embedding mappings with a multi-step framework of linear transformations. In: AAAI, pp. 5012–5019 (2018)

    Google Scholar 

  6. Artetxe, M., Labaka, G., Agirre, E.: A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In: ACL, pp. 789–798 (2018)

    Google Scholar 

  7. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. TACL 5, 135–146 (2017)

    Google Scholar 

  8. Chen, X., Cardie, C.: Unsupervised multilingual word embeddings. In: EMNLP, pp. 261–270 (2018)

    Google Scholar 

  9. Dinu, G., Baroni, M.: Improving zero-shot learning by mitigating the hubness problem. In: ICLR (2015)

    Google Scholar 

  10. Faruqui, M., Dyer, C.: Improving vector space word representations using multilingual correlation. In: EACL, pp. 462–471 (2014)

    Google Scholar 

  11. Grave, E., Joulin, A., Berthet, Q.: Unsupervised alignment of embeddings with Wasserstein procrustes. arXiv:1805.11222 (2018)

  12. Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B.: A kernel method for the two-sample-problem. In: NIPS, pp. 513–520 (2006)

    Google Scholar 

  13. Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)

    MathSciNet  MATH  Google Scholar 

  14. Hoshen, Y., Wolf, L.: An iterative closest point method for unsupervised word translation. arXiv:1801.06126 (2018)

  15. Kementchedjhieva, Y., Ruder, S., Cotterell, R., Søgaard, A.: Generalizing procrustes analysis for better bilingual dictionary induction. In: CoNLL, pp. 211–220 (2018)

    Google Scholar 

  16. Kondrak, G., Hauer, B., Nicolai, G.: Bootstrapping unsupervised bilingual lexicon induction. In: EACL, pp. 619–624 (2017)

    Google Scholar 

  17. Lample, G., Conneau, A., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. In: ICLR (2018)

    Google Scholar 

  18. Li, Y., Swersky, K., Zemel, R.S.: Generative moment matching networks. In: ICML, pp. 1718–1727 (2015)

    Google Scholar 

  19. Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv:1309.4168 (2013)

  20. Nakashole, N.: NORMA: neighborhood sensitive maps for multilingual word embeddings. In: EMNLP, pp. 512–522 (2018)

    Google Scholar 

  21. Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., Matsumoto, Y.: Ridge regression, hubness, and zero-shot learning. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 135–151. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23528-8_9

    Chapter  Google Scholar 

  22. Smith, S.L., Turban, D.H., Hamblin, S., Hammerla, N.Y.: Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv:1702.03859 (2017)

  23. Xing, C., Wang, D., Liu, C., Lin, Y.: Normalized word embedding and orthogonal transform for bilingual word translation. In: NAACL, pp. 1006–1011 (2015)

    Google Scholar 

  24. Xu, R., Yang, Y., Otani, N., Wu, Y.: Unsupervised cross-lingual transfer of word embedding spaces. arXiv:1809.03633 (2018)

  25. Zhang, M., Liu, Y., Luan, H., Sun, M.: Adversarial training for unsupervised bilingual lexicon induction. In: ACL, pp. 1959–1970 (2017)

    Google Scholar 

  26. Zhang, M., Liu, Y., Luan, H., Sun, M.: Earth mover’s distance minimization for unsupervised bilingual lexicon induction. In: EMNLP, pp. 1934–1945 (2017)

    Google Scholar 

  27. Zhang, Y., Gaddy, D., Barzilay, R., Jaakkola, T.S.: Ten pairs to tag - multilingual POS tagging via coarse mapping between embeddings. In: NAACL, pp. 1307–1317 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengcheng Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, P., Luo, F., Wu, S., Xu, J., Zhang, D. (2019). Learning Unsupervised Word Mapping via Maximum Mean Discrepancy. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11838. Springer, Cham. https://doi.org/10.1007/978-3-030-32233-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32233-5_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32232-8

  • Online ISBN: 978-3-030-32233-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics