Abstract
Cross-lingual word embeddings aim at capturing common linguistic regularities of different languages. Recently, it has been shown that these embeddings can be effectively learned by aligning two disjoint monolingual vector spaces through a simple linear transformation (word mapping). In this work, we focus on learning such a word mapping without any supervision signal. Most previous work of this task adopts adversarial training or parametric metrics to perform distribution-matching, which typically requires a sophisticated alternate optimization process, either in the form of minmax game or intermediate density estimation. This alternate optimization process is relatively hard and unstable. In order to avoid such sophisticated alternate optimization, we propose to learn unsupervised word mapping by directly minimize the maximum mean discrepancy between the distribution of the transferred embedding and target embedding. Extensive experimental results show that our proposed model can substantially outperform several state-of-the-art unsupervised systems, and even achieves competitive performance compared to supervised methods. Further analysis demonstrates the effectiveness of our approach in improving stability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In the experiment, we can observe that the eigenvalues of the matrix \(\mathbf {W}\) all have a modulus close to 1.
- 2.
We train a specific compression network separately for each language pair.
- 3.
We also tried CSLS retrieval and results show that our approach achieved consistent improvement over baselines. Due to page limitations, we only report results with cosine similarity.
- 4.
References
Aldarmaki, H., Mohan, M., Diab, M.T.: Unsupervised word mapping using structural similarities in monolingual embeddings. TACL 6, 185–196 (2018)
Alvarez-Melis, D., Jaakkola, T.S.: Gromov-Wasserstein alignment of word embedding spaces. In: EMNLP, pp. 1881–1890 (2018)
Artetxe, M., Labaka, G., Agirre, E.: Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In: EMNLP, pp. 2289–2294 (2016)
Artetxe, M., Labaka, G., Agirre, E.: Learning bilingual word embeddings with (almost) no bilingual data. In: ACL, pp. 451–462 (2017)
Artetxe, M., Labaka, G., Agirre, E.: Generalizing and improving bilingual word embedding mappings with a multi-step framework of linear transformations. In: AAAI, pp. 5012–5019 (2018)
Artetxe, M., Labaka, G., Agirre, E.: A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In: ACL, pp. 789–798 (2018)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. TACL 5, 135–146 (2017)
Chen, X., Cardie, C.: Unsupervised multilingual word embeddings. In: EMNLP, pp. 261–270 (2018)
Dinu, G., Baroni, M.: Improving zero-shot learning by mitigating the hubness problem. In: ICLR (2015)
Faruqui, M., Dyer, C.: Improving vector space word representations using multilingual correlation. In: EACL, pp. 462–471 (2014)
Grave, E., Joulin, A., Berthet, Q.: Unsupervised alignment of embeddings with Wasserstein procrustes. arXiv:1805.11222 (2018)
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B.: A kernel method for the two-sample-problem. In: NIPS, pp. 513–520 (2006)
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)
Hoshen, Y., Wolf, L.: An iterative closest point method for unsupervised word translation. arXiv:1801.06126 (2018)
Kementchedjhieva, Y., Ruder, S., Cotterell, R., Søgaard, A.: Generalizing procrustes analysis for better bilingual dictionary induction. In: CoNLL, pp. 211–220 (2018)
Kondrak, G., Hauer, B., Nicolai, G.: Bootstrapping unsupervised bilingual lexicon induction. In: EACL, pp. 619–624 (2017)
Lample, G., Conneau, A., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. In: ICLR (2018)
Li, Y., Swersky, K., Zemel, R.S.: Generative moment matching networks. In: ICML, pp. 1718–1727 (2015)
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv:1309.4168 (2013)
Nakashole, N.: NORMA: neighborhood sensitive maps for multilingual word embeddings. In: EMNLP, pp. 512–522 (2018)
Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., Matsumoto, Y.: Ridge regression, hubness, and zero-shot learning. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 135–151. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23528-8_9
Smith, S.L., Turban, D.H., Hamblin, S., Hammerla, N.Y.: Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv:1702.03859 (2017)
Xing, C., Wang, D., Liu, C., Lin, Y.: Normalized word embedding and orthogonal transform for bilingual word translation. In: NAACL, pp. 1006–1011 (2015)
Xu, R., Yang, Y., Otani, N., Wu, Y.: Unsupervised cross-lingual transfer of word embedding spaces. arXiv:1809.03633 (2018)
Zhang, M., Liu, Y., Luan, H., Sun, M.: Adversarial training for unsupervised bilingual lexicon induction. In: ACL, pp. 1959–1970 (2017)
Zhang, M., Liu, Y., Luan, H., Sun, M.: Earth mover’s distance minimization for unsupervised bilingual lexicon induction. In: EMNLP, pp. 1934–1945 (2017)
Zhang, Y., Gaddy, D., Barzilay, R., Jaakkola, T.S.: Ten pairs to tag - multilingual POS tagging via coarse mapping between embeddings. In: NAACL, pp. 1307–1317 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, P., Luo, F., Wu, S., Xu, J., Zhang, D. (2019). Learning Unsupervised Word Mapping via Maximum Mean Discrepancy. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11838. Springer, Cham. https://doi.org/10.1007/978-3-030-32233-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-32233-5_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32232-8
Online ISBN: 978-3-030-32233-5
eBook Packages: Computer ScienceComputer Science (R0)