Skip to main content
Log in

Topic-aware pivot language approach for statisticalmachine translation

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

The pivot language approach for statistical machine translation (SMT) is a good method to break the resource bottleneck for certain language pairs. However, in the implementation of conventional approaches, pivot-side context information is far from fully utilized, resulting in erroneous estimations of translation probabilities. In this study, we propose two topic-aware pivot language approaches to use different levels of pivot-side context. The first method takes advantage of document-level context by assuming that the bridged phrase pairs should be similar in the document-level topic distributions. The second method focuses on the effect of local context. Central to this approach are that the phrase sense can be reflected by local context in the form of probabilistic topics, and that bridged phrase pairs should be compatible in the latent sense distributions. Then, we build an interpolated model bringing the above methods together to further enhance the system performance. Experimental results on French-Spanish and French-German translations using English as the pivot language demonstrate the effectiveness of topic-based context in pivot-based SMT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bertoldi, N., Federico, M., 2009. Domain adaptation for statistical machine translation with monolingual resources. Proc. 4th Workshop on Statistical Machine Translation, p.182–189. [doi:10.3115/1626431.1626468]

    Google Scholar 

  • Bertoldi, N., Barbaiani, M., Federico, M., et al., 2008. Phrase-based statistical machine translation with pivot languages. Proc. Int. Workshop on Spoken Language Translation, p.143–149.

    Google Scholar 

  • Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res., 3:993–1022.

    MATH  Google Scholar 

  • Borin, L., 2000. You’ll take the high road and I’ll take the low road: using a third language to improve bilingual word alignment. Proc. 18th Conf. on Computational Linguistics, p.97–103. [doi:10.3115/990820.990835]

    Google Scholar 

  • Callison-Burch, C., Koehn, P., Osborne, M., 2006. Improved statistical machine translation using paraphrases. Proc. Main Conf. on Human Language Technology Conf. of the North American Chapter of the Association of Computational Linguistics, p.17–24. [doi:10.3115/1220835.1220838]

    Google Scholar 

  • Chen, B.X., Foster, G., Kuhn, R., 2010. Bilingual sense similarity for statistical machine translation. Proc. 48th Annual Meeting of the Association for Computational Linguistics, p.834–843.

    Google Scholar 

  • Clark, J.H., Dyer, C., Lavie, A., et al., 2011. Better hypothesis testing for statistical machine translation: controlling for optimizer instability. Proc. 49th Annual Meeting of the Association for Computational Linguistics, p.176–181.

    Google Scholar 

  • Cohn, T., Lapata, M., 2007. Machine translation by triangulation: making effective use of multi-parallel corpora. Proc. 45th Annual Meeting of the Association for Computational Linguistics, p.728–735.

    Google Scholar 

  • Costa-Jussà, M.R., Henríquez, C., Banchs, R.E., 2011. Enhancing scarce-resource language translation through pivot combinations. Proc. 5th Int. Joint Conf. on Natural Language Processing, p.1361–1365.

    Google Scholar 

  • Crego, J.M., Max, A., Yvon, F., 2010. Local lexical adaptation in machine translation through triangulation: SMT helping SMT. Proc. 23rd Int. Conf. on Computational Linguistics, p.232–240.

    Google Scholar 

  • de Gispert, A., Mariño, J.B., 2006. Catalan-English statistical machine translation without parallel corpus: bridging through Spanish. Proc. 5th Int. Conf. on Language Resources and Evaluation, p.65–68.

    Google Scholar 

  • Denkowski, M., Lavie, A., 2011. Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems. Proc. 6th Workshop on Statistical Machine Translation, p.85–91.

    Google Scholar 

  • Dinu, G., Lapata, M., 2010. Measuring distributional similarity in context. Proc. Conf. on Empirical Methods in Natural Language Processing, p.1162–1172.

    Google Scholar 

  • Filali, K., Bilmes, J., 2005. Leveraging multiple languages to improve statistical MT word alignments. Proc. IEEE Automatic Speech Recognition and Understanding Workshop, p.92–97.

    Google Scholar 

  • Gong, Z.X., Zhou, G.D., Li, L.Y., 2011. Improve SMT with source-side “topic-document” distributions. Proc. 13th Machine Translation Summit, p.496–502.

    Google Scholar 

  • Griffiths, T.L., Steyvers, M., 2004. Finding scientific topics. PNAS, p.90–95.

    Google Scholar 

  • Habash, N., Hu, J., 2009. Improving Arabic-Chinese statistical machine translation using English as pivot language. Proc. 4th Workshop on Statistical Machine Translation, p.173–181.

    Google Scholar 

  • He, Z.J., Liu, Q., Lin, S.X., 2008. Improving statistical machine translation using lexicalized rule selection. Proc. 22nd Int. Conf. on Computational Linguistics, p.321–328.

    Google Scholar 

  • Hildebrand, A.S., Eck, M., Vogel, S., et al., 2005. Adaptation of the translation model for statistical machine translation based on information retrieval. EAMT 10th Annual Conf., p.133–142.

    Google Scholar 

  • Huck, M., Ney, H., 2012. Pivot lightly-supervised training for statistical machine translation. Proc. 10th Conf. of the Association for Machine Translation in the Americas, p.50–57.

    Google Scholar 

  • Khalilov, M., Costa-Jussà, M.R., Henríquez, C.A., et al., 2008. The TALP&I2R SMT sytstems for IWSLT 2008. Proc. Int. Workshop on Spoken Language Translation, p.116–123.

    Google Scholar 

  • Koehn, P., 2004. Statistical significance tests for machine translation evaluation. Proc. Conf. on Empirical Methods in Natural Language Processing, p.388–395.

    Google Scholar 

  • Koehn, P., Och, F.J., Marcu, D., 2003. Statistical phrasebased translation. Proc. Conf. of the North American Chapter of the Association for Computational Linguistics, p.48–54. [doi:10.3115/1073445.1073462]

    Google Scholar 

  • Kumar, S., Och, F.J., Macherey, W., 2007. Improving word alignment with bridge languages. Proc. Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, p.42–50.

    Google Scholar 

  • Mauser, A., Hasan, S., Ney, H., 2009. Extending statistical machine translation with discriminative and trigger-based lexicon models. Proc. Conf. on Empirical Methods in Natural Language Processing, p.210–218.

    Google Scholar 

  • Och, F.J., 2003. Minimum error rate training in statistical machine translation. Proc. 41st Annual Meeting on Association for Computational Linguistics, p.160–167. [doi:10.3115/1075096.1075117]

    Google Scholar 

  • Och, F.J., Ney, H., 2003. A systematic comparison of various statistical alignment models. Comput. Linguist., 29(1):19–51. [doi:10.1162/089120103321337421]

    Article  MATH  Google Scholar 

  • Papineni, K., Roukos, S., Ward, T., et al., 2002. BLEU: a method for automatic evaluation of machine translation. Proc. 40th Annual Meeting on Association for Computational Linguistics, p.311–318. [doi:10.3115/1073083.1073135]

    Google Scholar 

  • Paul, M., Yamamoto, H., Sumita, E., et al., 2009. On the importance of pivot language selection for statistical machine translation. Proc. Annual Conf. of the North American Chapter of the Association for Computational Linguistics, p.221–224.

    Google Scholar 

  • Ruiz, N., Federico, M., 2011. Topic adaptation for lecture translation through bilingual latent semantic models. Proc. 6th Workshop on Statistical Machine Translation, p.294–302.

    Google Scholar 

  • Schwenk, H., 2008. Investigations on large-scale lightlysupervised training for statistical machine translation. Proc. Int. Workshop on Spoken Language Translation, p.182–189.

    Google Scholar 

  • Shen, L.B., Xu, J.X., Zhang, B., et al., 2009. Effective use of linguistic and contextual information for statistical machine translation. Proc. Conf. on Empirical Methods in Natural Language Processing, p.72–80.

    Google Scholar 

  • Stolcke, A., 2002. SRILM — an extensible language modeling toolkit. Proc. 7th Int. Conf. on Spoken Language Processing, p.901–904.

    Google Scholar 

  • Su, J.S., Wu, H., Wang, H.F., et al., 2012. Translation model adaptation for statistical machine translation with monolingual topic information. Proc. 50th Annual Meeting of the Association for Computational Linguistics, p.459–468.

    Google Scholar 

  • Tam, Y.C., Lane, I., Schultz, T., 2007. Bilingual LSA-based adaptation for statistical machine translation. Mach. Transl., 21(4):187–207. [doi:10.1007/s10590-008-9045-2]

    Article  Google Scholar 

  • Tanaka, R., Murakami, Y., Ishida, T., 2009. Context-based approach for pivot translation services. Proc. 21st Int. Joint Conf. on Artificial Intelligence, p.1555–1561.

    Google Scholar 

  • Ueffing, N., Haffari, G., Sarkar, A., 2007. Semi-supervised model adaptation for statistical machine translation. Mach. Transl., 21(2):77–94. [doi:10.1007/s10590-008-9036-3]

    Article  Google Scholar 

  • Utiyama, M., Isahara, H., 2007. A comparison of pivot methods for phrase-based statistical machine translation. Proc. Annual Conf. of the North American Chapter of the Association for Computational Linguistics, p.484–491.

    Google Scholar 

  • Wang, H.F., Wu, H., Liu, Z.Y., 2006. Word alignment for languages with scarce resources using bilingual corpora of other language pairs. Proc. 21st Int. Conf. on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, p.874–881.

    Google Scholar 

  • Wu, H., Wang, H.F., 2007. Pivot language approach for phrase-based statistical machine translation. Mach. Transl., 21(3):165–181. [doi:10.1007/s10590-008-9041-6]

    Article  Google Scholar 

  • Wu, H., Wang, H.F., 2009. Revisiting pivot language approach for machine translation. Proc. Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th Int. Joint Conf. on Natural Language Processing, p.154–162.

    Google Scholar 

  • Xiao, X.Y., Xiong, D.Y., Zhang, M., et al., 2012. A topic similarity model for hierarchical phrase-based translation. Proc. 50th Annual Meeting of the Association for Computational Linguistics, p.750–758.

    Google Scholar 

  • Zhang, Y., Vogel, S., Waibel, A., 2004. Interpreting BLEU/NIST scores: how much improvement do we need to have a better system? Proc. 4th Int. Conf. on Language Resources and Evaluation, p.2051–2054.

    Google Scholar 

  • Zhao, B., Xing, E.P., 2006. BiTAM: bilingual topic AdMixture models for word alignment. Proc. 21st Int. Conf. on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, p.969–976.

    Google Scholar 

  • Zhao, B., Xing, E.P., 2007. HM-BiTAM: bilingual topic exploration, word alignment, and translation. Proc. Advances in Neural Information Processing Systems, p.1689–1696.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin-song Su.

Additional information

Project supported by the National High-Tech R&D Program of China (No. 2012BAH14F03), the National Natural Science Foundation of China (Nos. 61005052 and 61303082), the Research Fund for the Doctoral Program of Higher Education of China (No. 20120121120046), the Natural Science Foundation of Fujian Province of China (No. 2011J01360), and the Fundamental Research Funds for the Central Universities, China (No. 2010121068)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Su, Js., Shi, Xd., Huang, Yz. et al. Topic-aware pivot language approach for statisticalmachine translation. J. Zhejiang Univ. - Sci. C 15, 241–253 (2014). https://doi.org/10.1631/jzus.C1300208

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C1300208

Key words

CLC number

Navigation