Learning Unsupervised Word Mapping via Maximum Mean Discrepancy

Yang, Pengcheng; Luo, Fuli; Wu, Shuangzhi; Xu, Jingjing; Zhang, Dongdong

doi:10.1007/978-3-030-32233-5_23

Pengcheng Yang^13,14,
Fuli Luo¹⁴,
Shuangzhi Wu¹⁵,
Jingjing Xu¹⁴ &
…
Dongdong Zhang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11838))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

2255 Accesses
1 Citations

Abstract

Cross-lingual word embeddings aim at capturing common linguistic regularities of different languages. Recently, it has been shown that these embeddings can be effectively learned by aligning two disjoint monolingual vector spaces through a simple linear transformation (word mapping). In this work, we focus on learning such a word mapping without any supervision signal. Most previous work of this task adopts adversarial training or parametric metrics to perform distribution-matching, which typically requires a sophisticated alternate optimization process, either in the form of minmax game or intermediate density estimation. This alternate optimization process is relatively hard and unstable. In order to avoid such sophisticated alternate optimization, we propose to learn unsupervised word mapping by directly minimize the maximum mean discrepancy between the distribution of the transferred embedding and target embedding. Extensive experimental results show that our proposed model can substantially outperform several state-of-the-art unsupervised systems, and even achieves competitive performance compared to supervised methods. Further analysis demonstrates the effectiveness of our approach in improving stability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In the experiment, we can observe that the eigenvalues of the matrix \(\mathbf {W}\) all have a modulus close to 1.
2.
We train a specific compression network separately for each language pair.
3.
We also tried CSLS retrieval and results show that our approach achieved consistent improvement over baselines. Due to page limitations, we only report results with cosine similarity.
4.
Due to page limitations, for each language pair, we only show results in one direction because the conclusions drawn from the other direction are the same. For example, we only show EN-FR and ignore FR-EN. Same in Tables 3, 4, and Fig. 1.

References

Aldarmaki, H., Mohan, M., Diab, M.T.: Unsupervised word mapping using structural similarities in monolingual embeddings. TACL 6, 185–196 (2018)
Google Scholar
Alvarez-Melis, D., Jaakkola, T.S.: Gromov-Wasserstein alignment of word embedding spaces. In: EMNLP, pp. 1881–1890 (2018)
Google Scholar
Artetxe, M., Labaka, G., Agirre, E.: Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In: EMNLP, pp. 2289–2294 (2016)
Google Scholar
Artetxe, M., Labaka, G., Agirre, E.: Learning bilingual word embeddings with (almost) no bilingual data. In: ACL, pp. 451–462 (2017)
Google Scholar
Artetxe, M., Labaka, G., Agirre, E.: Generalizing and improving bilingual word embedding mappings with a multi-step framework of linear transformations. In: AAAI, pp. 5012–5019 (2018)
Google Scholar
Artetxe, M., Labaka, G., Agirre, E.: A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In: ACL, pp. 789–798 (2018)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. TACL 5, 135–146 (2017)
Google Scholar
Chen, X., Cardie, C.: Unsupervised multilingual word embeddings. In: EMNLP, pp. 261–270 (2018)
Google Scholar
Dinu, G., Baroni, M.: Improving zero-shot learning by mitigating the hubness problem. In: ICLR (2015)
Google Scholar
Faruqui, M., Dyer, C.: Improving vector space word representations using multilingual correlation. In: EACL, pp. 462–471 (2014)
Google Scholar
Grave, E., Joulin, A., Berthet, Q.: Unsupervised alignment of embeddings with Wasserstein procrustes. arXiv:1805.11222 (2018)
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B.: A kernel method for the two-sample-problem. In: NIPS, pp. 513–520 (2006)
Google Scholar
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)
MathSciNet MATH Google Scholar
Hoshen, Y., Wolf, L.: An iterative closest point method for unsupervised word translation. arXiv:1801.06126 (2018)
Kementchedjhieva, Y., Ruder, S., Cotterell, R., Søgaard, A.: Generalizing procrustes analysis for better bilingual dictionary induction. In: CoNLL, pp. 211–220 (2018)
Google Scholar
Kondrak, G., Hauer, B., Nicolai, G.: Bootstrapping unsupervised bilingual lexicon induction. In: EACL, pp. 619–624 (2017)
Google Scholar
Lample, G., Conneau, A., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. In: ICLR (2018)
Google Scholar
Li, Y., Swersky, K., Zemel, R.S.: Generative moment matching networks. In: ICML, pp. 1718–1727 (2015)
Google Scholar
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv:1309.4168 (2013)
Nakashole, N.: NORMA: neighborhood sensitive maps for multilingual word embeddings. In: EMNLP, pp. 512–522 (2018)
Google Scholar
Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., Matsumoto, Y.: Ridge regression, hubness, and zero-shot learning. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 135–151. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23528-8_9
Chapter Google Scholar
Smith, S.L., Turban, D.H., Hamblin, S., Hammerla, N.Y.: Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv:1702.03859 (2017)
Xing, C., Wang, D., Liu, C., Lin, Y.: Normalized word embedding and orthogonal transform for bilingual word translation. In: NAACL, pp. 1006–1011 (2015)
Google Scholar
Xu, R., Yang, Y., Otani, N., Wu, Y.: Unsupervised cross-lingual transfer of word embedding spaces. arXiv:1809.03633 (2018)
Zhang, M., Liu, Y., Luan, H., Sun, M.: Adversarial training for unsupervised bilingual lexicon induction. In: ACL, pp. 1959–1970 (2017)
Google Scholar
Zhang, M., Liu, Y., Luan, H., Sun, M.: Earth mover’s distance minimization for unsupervised bilingual lexicon induction. In: EMNLP, pp. 1934–1945 (2017)
Google Scholar
Zhang, Y., Gaddy, D., Barzilay, R., Jaakkola, T.S.: Ten pairs to tag - multilingual POS tagging via coarse mapping between embeddings. In: NAACL, pp. 1307–1317 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Data Science, Beijing Institute of Big Data Research, Peking University, Beijing, China
Pengcheng Yang
MOE Key Lab of Computational Linguistics, School of EECS, Peking University, Beijing, China
Pengcheng Yang, Fuli Luo & Jingjing Xu
Microsoft Research Asia, Beijing, China
Shuangzhi Wu & Dongdong Zhang

Authors

Pengcheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Fuli Luo
View author publications
You can also search for this author in PubMed Google Scholar
Shuangzhi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jingjing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Dongdong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pengcheng Yang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Tang
National University of Singapore, Singapore, Singapore
Min-Yen Kan
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Sujian Li
Zhengzhou University, Zhengzhou, China
Hongying Zan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, P., Luo, F., Wu, S., Xu, J., Zhang, D. (2019). Learning Unsupervised Word Mapping via Maximum Mean Discrepancy. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11838. Springer, Cham. https://doi.org/10.1007/978-3-030-32233-5_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-32233-5_23
Published: 30 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32232-8
Online ISBN: 978-3-030-32233-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)