Advertisement

Cross-Linguistic Projection for French-Vietnamese Named Entity Translation

  • Ngoc Tan LeEmail author
  • Fatiha Sadat
Conference paper
  • 291 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10930)

Abstract

High-quality translation is time-consuming and an expensive process. Named Entity (NE) Translation, including proper names, remains a very important task for multilingual natural language processing. Most of the gold standard corpora are available for English but not for under-resourced languages such as Vietnamese. In Asian languages, this task is remained problematic. This paper focuses on a named entity translation approach by cross-linguistic projection for French-Vietnamese, a poor-resourced pair of languages. We incrementally apply a cross-projection method using a small parallel annotated corpora, such as the surface string matching measures according to probabilistic string edit distance similarity and an additional score of syllable consistence feature between the source term and the target term by a syllabification process. Evaluations on French-Vietnamese pair show a good accuracy with BLEU gain more than 4 points when translating bilingual named entities pairs.

Keywords

Named entity Bilingual corpus Cross-projection Named entity translation French-Vietnamese 

References

  1. 1.
    Brown, P.F., Cocke, J., Pietra, S.A.D., Pietra, V.J.D., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Comput. Linguist. 16(2), 79–85 (1990)Google Scholar
  2. 2.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)Google Scholar
  3. 3.
    Cao, N.X., Pham, N.M., Vu, Q.H.: Comparative analysis of transliteration techniques based on statistical machine translation and joint-sequence model. In: Proceedings of the 2010 Symposium on Information and Communication Technology, pp. 59–63. Association for Computing Machinery (2010)Google Scholar
  4. 4.
    Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 138–145. Morgan Kaufmann Publishers Inc. (2002)Google Scholar
  5. 5.
    Duan, X., Banchs, R.E., Zhang, M., Li, H., Kumaran, A.: Report of news 2016 machine transliteration shared task. In: ACL 2016, pp. 58–72 (2016)Google Scholar
  6. 6.
    Finch, A., Liu, L., Wang, X., Sumita, E.: Neural network transduction models in transliteration generation. In: Proceedings of NEWS 2015 The Fifth Named Entities Workshop, p. 61 (2015)Google Scholar
  7. 7.
    Finch, A., Liu, L., Wang, X., Sumita, E.: Target-bidirectional neural models for machine transliteration. In: ACL 2016, pp. 78–82 (2016)Google Scholar
  8. 8.
    Finch, A., Sumita, E.: Transliteration using a phrase-based statistical machine translation system to re-score the output of a joint multigram model. In: Proceedings of the 2010 Named Entities Workshop, pp. 48–52. Association for Computational Linguistics (2010)Google Scholar
  9. 9.
    Hassan, A., Fahmy, H., Hassan, H.: Improving named entity translation by exploiting comparable and parallel corpora. In: AMML 2007 (2007)Google Scholar
  10. 10.
    Huang, F.: Improved named entity translation and bilingual named entity extraction. In: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, p. 253. IEEE Computer Society (2002)Google Scholar
  11. 11.
    Huang, F., Zhang, Y., Vogel, S.: Mining key phrase translations from web corpora. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 483–490. Association for Computational Linguistics (2005)Google Scholar
  12. 12.
    Jiang, L., Zhou, M., Chien, L.F., Niu, C.: Named entity translation with web mining and transliteration. In: IJCAI 2007, pp. 1629–1634 (2007)Google Scholar
  13. 13.
    Kim, J., Jiang, L., Hwang, S.w., Song, Y.I., Zhou, M.: Mining entity translations from comparable corpora: a holistic graph mapping approach. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1295–1304. ACM (2011)Google Scholar
  14. 14.
    Koehn, P.: Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Frederking, R.E., Taylor, K.B. (eds.) AMTA 2004. LNCS (LNAI), vol. 3265, pp. 115–124. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30194-3_13CrossRefGoogle Scholar
  15. 15.
    Koehn, P.: Statistical Machine Translation. Cambridge University Press, New York (2009)CrossRefGoogle Scholar
  16. 16.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)Google Scholar
  17. 17.
    Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 48–54. Association for Computational Linguistics (2003)Google Scholar
  18. 18.
    Laurent, A., Deléglise, P., Meignier, S., Spécinov-Trélazé, F.: Grapheme to phoneme conversion using an SMT system. In: Proceedings of INTERSPEECH, pp. 708–711. ISCA (2009)Google Scholar
  19. 19.
    Liu, Y.: The technical analyses of named entity translation. In: International Symposium on Computers & Informatics, pp. 2028–2037. ISCI (2015)Google Scholar
  20. 20.
    Lo, C.k., Cherry, C., Foster, G., Stewart, D., Islam, R., Kazantseva, A., Kuhn, R.: NRC Russian-English machine translation system for WMT 2016. In: Proceedings of the First Conference on Machine Translation, Berlin, Germany. Association for Computational Linguistics (2016)Google Scholar
  21. 21.
    Mingming, Z., Yu, H., Jianmin, Y.: Research on name entity translation based on transliteration and web. In: Proceedings of the 6th National Conference on Information Retrieval, pp. 357–366 (2010)Google Scholar
  22. 22.
    Ngo, H.G., Chen, N.F., Nguyen, B.M., Ma, B., Li, H.: Phonology-augmented statistical transliteration for low-resource languages. In: Interspeech, pp. 3670–3674 (2015)Google Scholar
  23. 23.
    Nguyen, K.A., Dinh, D.: Tích hợp thông tin từ loại vào hệ dịch máy thống kê. In: National Conference, Cần Thơ, pp. 150–157 (2011)Google Scholar
  24. 24.
    Nicolai, G., Hauer, B., Salameh, M., St Arnaud, A., Xu, Y., Yao, L., Kondrak, G.: Multiple system combination for transliteration. In: Proceedings of NEWS 2015 The Fifth Named Entities Workshop, pp. 72–79 (2015)Google Scholar
  25. 25.
    Nouvel, D., Antoine, J.-Y., Friburger, N.: Pattern mining for named entity recognition. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNCS (LNAI), vol. 8387, pp. 226–237. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-08958-4_19CrossRefGoogle Scholar
  26. 26.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)CrossRefGoogle Scholar
  27. 27.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)Google Scholar
  28. 28.
    Phan, T.T.T.: Machine translation of proper names from english and french into vietnamese: an error analysis and some proposed solutions. Ph.D. thesis. Université de Franche-Comté (2014)Google Scholar
  29. 29.
    Sadat, F., Johnson, H., Agbago, A., Foster, G., Kuhn, R., Martin, J., Tikuisis, A.: Portage: A phrase-based machine translation system. In: Proceedings of the ACL Workshop on Building and Using Parallel Texts, pp. 129–132. Association for Computational Linguistics (2005)Google Scholar
  30. 30.
    Sellami, R., Sadat, F., Belguith, L.H.: Mining named entity translation from non parallel corpora. In: FLAIRS Conference, pp. 219–224 (2014)Google Scholar
  31. 31.
    Shannon, C.E., Weaver, W.: The Mathematical Theory of Information. University of Illinois Press, Urbana (1949)zbMATHGoogle Scholar
  32. 32.
    Shao, Y., Nivre, J.: Applying neural networks to English-Chinese named entity transliteration. In: Sixth Named Entity Workshop, Joint With 54th ACL (2016)Google Scholar
  33. 33.
    Thu, Y.K., Pa, W.P., Sagisaka, Y., Iwahashi, N.: Comparison of grapheme-to-phoneme conversion methods on a myanmar pronunciation dictionary. In: Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing 2016, pp. 11–22 (2016)Google Scholar
  34. 34.
    Vu, D.H.: Phân đoạn từ tiếng việt ngữ dụng. Master thesis (2011)Google Scholar
  35. 35.
    Wan, S., Verspoor, C.M.: Automatic english-chinese name transliteration for development of multilingual resources. In: Proceedings of the 17th International Conference on Computational linguistics-Volume 2, pp. 1352–1356. Association for Computational Linguistics (1998)Google Scholar
  36. 36.
    Yang, F., Zhao, J., Liu, K.: A chinese-english organization name translation system using heuristic web mining and asymmetric alignment. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1, pp. 387–395. Association for Computational Linguistics (2009)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Universite du Quebec a MontrealMontrealCanada

Personalised recommendations