Advertisement

Syntax-Based Pre-reordering for Chinese-to-Japanese Statistical Machine Translation

  • Dan HanEmail author
  • Pascual Martínez-Gómez
  • Yusuke Miyao
Chapter
  • 769 Downloads
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

There are additional difficulties associated with the translation of language pairs that have different word orders. In this chapter, we introduce some of these difficulties and describe two syntax-based approaches to addressing these problems. First, we describe an approach that exploits regularities in the differences of phrase head locations between Chinese and Japanese and formalize rules that reorder branches of constituency trees. Second, we propose an approach that compensates the differences in typical locations of the Subject (S), the Verb (V), and the Object (O) between Chinese (SVO) and Japanese (SOV), and devise rules that reorder word blocks from dependency trees. These approaches are implemented in the form of pre-reordering methods, and we evaluate their impact on a phrase-based machine translation system in terms of translation quality in news and patent domains. These approaches rely on syntactic structures that are automatically extracted by means of parsers, and as such, they are sensitive to parse errors. We analyze the effect of these parse errors, and obtain upper bounds in translation performance that can be achieved with these syntax-based pre-reordering methods.

Keywords

Machine Translation Word Order Parse Tree Statistical Machine Translation Language Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Badr, Ibrahim, Rabih Zbib, and James Glass. 2009. Syntactic phrase reordering for English-to-Arabic statistical machine translation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, 86–93. Association for Computational Linguistics.Google Scholar
  2. Brown, Peter F, John Cocke, Stephen A Della Pietra, Vincent J Della Pietra, Fredrick Jelinek, John D Lafferty, Robert L Mercer, and Paul S Roossin. 1990. A statistical approach to machine translation. Computational Linguistics 16(2):79–85.Google Scholar
  3. Chang, Pi-Chuan, Michel Galley, and Christopher D Manning. 2008. Optimizing Chinese word segmentation for machine translation performance. In Proceedings of the Third Workshop on Statistical Machine Translation, 224–232. Association for Computational Linguistics.Google Scholar
  4. Chang, Pi-Chuan, Huihsin Tseng, Dan Jurafsky, and Christopher D Manning. 2009. Discriminative reordering with Chinese grammatical relations features. In Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation, 51–59. Association for Computational Linguistics.Google Scholar
  5. Collins, Michael, Philipp Koehn, and Ivona Kučerová. 2005. Clause restructuring for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 531–540. Association for Computational Linguistics.Google Scholar
  6. Costa-Jussà, Marta Ruiz, and José Adrián Rodríguez Fonollosa. 2006. Statistical machine reordering. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP), 70–76. Association for Computational Linguistics.Google Scholar
  7. Fukui, Naoki. 1992. Theory of projection in syntax. Stanford, CA/Tokyo: CSLI Publisher/Kuroshio Publisher.Google Scholar
  8. Gao, Qian. 2008. Word order in mandarin: Reading and speaking. In Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20), vol. 2, pp. 611–626.Google Scholar
  9. Gao, Qin, and Stephan Vogel. 2008. Parallel implementations of word alignment tool. In Proceedings of Software Engineering, Testing, and Quality Assurance for Natural Language Processing, 49–57. Association for Computational Linguistics.Google Scholar
  10. Genzel, Dmitriy. 2010. Automatically learning source-side reordering rules for large scale machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 376–384. Association for Computational Linguistics.Google Scholar
  11. Goto, Isao, Masao Utiyama, and Eiichiro Sumita. 2012. Post-ordering by parsing for Japanese-English statistical machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2, 311–316. Association for Computational Linguistics.Google Scholar
  12. Han, Dan, Katsuhito Sudoh, Xianchao Wu, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2012. Head finalization reordering for Chinese-to-Japanese machine translation. In Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-6), 57–66. Association for Computational Linguistics.Google Scholar
  13. Han, Dan, Pascual Martínez-Gómez, Yusuke Miyao, Katsuhito Sudoh, and Masaaki Nagata. 2013a. Effects of parsing errors on pre-reordering performance for Chinese-to-Japanese SMT. In Proceedings of the 27th Pacific Asia Conference on Language Information and Computing (PACLIC). The PACLIC Steering Committee.Google Scholar
  14. Han, Dan, Pascual Martínez-Gómez, Yusuke Miyao, Katsuhito Sudoh, and Masaaki Nagata. 2013b. Using unlabeled dependency parsing for pre-reordering for Chinese-to-Japanese statistical machine translation. In Proceedings of the 2nd Workshop on Hybrid Approaches to Translation (HyTra), 25–33. Association for Computational Linguistics.Google Scholar
  15. Hatori, Jun, Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii. 2011. Incremental joint POS tagging and dependency parsing in Chinese. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), 1216–1224. Asian Federation of Natural Language Processing.Google Scholar
  16. Isozaki, Hideki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010a. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 944–952. Association for Computational Linguistics.Google Scholar
  17. Isozaki, Hideki, Katsuhito Sudoh, Hajime Tsukada, and Kevin Duh. 2010b. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and Metrics MATR, 244–251. Association for Computational Linguistics.Google Scholar
  18. Isozaki, Hideki, Katsuhito Sudoh, Hajime Tsukada, and Kevin Duh. 2012. HPSG-based preprocessing for English-to-Japanese translation. ACM Transactions on Asian Language Information Processing (TALIP) 11(3):8:1–8:16.Google Scholar
  19. Kendall, Maurice G. 1938. A new measure of rank correlation. Biometrika 30(1/2):81–93.MathSciNetCrossRefzbMATHGoogle Scholar
  20. Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, and Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics on Interactive Poster and Demonstration Sessions, 177–180. Association for Computational Linguistics.Google Scholar
  21. Kudo, Taku, and Yuji Matsumoto. 2000. Japanese dependency structure analysis based on support vector machines. In Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 13, 18–25. Association for Computational Linguistics.Google Scholar
  22. Lee, Young-Suk, Bing Zhao, and Xiaoqiang Luo. 2010. Constituent reordering and syntax models for English-to-Japanese statistical machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 626–634. Association for Computational Linguistics.Google Scholar
  23. Li, Charles N., and Sandra Annear Thompson. 1989. Mandarin Chinese: A functional reference grammar. Linguistics-Asian studies. Berkeley, CA: University of California Press.Google Scholar
  24. Li, Chi-Ho, Minghui Li, Dongdong Zhang, Mu Li, Ming Zhou, and Yi Guan. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the 45th Annual Meeting on Association for Computational Linguistics (ACL), vol. 45(1), pp. 720–727. Association for Computational Linguistics.Google Scholar
  25. Ma, Xiaoyi. 2006. Champollion: A robust parallel text sentence aligner. In Proceedings of 5th International Conference on Language Resources and Evaluation (LREC-5), 489–492. Citeseer.Google Scholar
  26. Miller, James Edward, and Jim Miller. 2011. A critical introduction to syntax. New York: Continuum International Publishing Group.Google Scholar
  27. Miyao, Yusuke, and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Computational Linguistics 34(1):35–80.MathSciNetCrossRefGoogle Scholar
  28. Neubig, Graham, Taro Watanabe, and Shinsuke Mori. 2012. Inducing a discriminative parser to optimize machine translation reordering. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 843–853. Association for Computational Linguistics.Google Scholar
  29. Och, Franz Josef. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, 160–167. Association for Computational Linguistics.Google Scholar
  30. Och, Franz Josef, and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29(1):19–51.CrossRefzbMATHGoogle Scholar
  31. Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 311–318. Association for Computational Linguistics.Google Scholar
  32. Petrov, Slav, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, 433–440. Association for Computational Linguistics.Google Scholar
  33. Pollard, Carl Jesse, and Ivan Andrew Sag. 1994. Head-driven phrase structure grammar. Chicago and Stanford, CA: The University of Chicago Press and CSLI Publications.Google Scholar
  34. Ramanathan, Ananthakrishnan, Hansraj Choudhary, Avishek Ghosh, and Pushpak Bhattacharyya. 2009. Case markers and morphology: Addressing the crux of the fluency problem in English-Hindi SMT. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing, 800–808. Association for Computational Linguistics.Google Scholar
  35. Rottmann, Kay, and Stephan Vogel. 2007. Word reordering in statistical machine translation with a pos-based distortion model. In Proceedings of the 11th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI), 171–180.Google Scholar
  36. Snover, Matthew, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA), 223–231. The Association for Machine Translation in the Americas.Google Scholar
  37. Sudoh, Katsuhito, Xianchao Wu, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2011. Post-ordering in statistical machine translation. In Proceedings of the 13th Machine Translation Summit, 316–323. The International Association for Machine Translation (IAMT).Google Scholar
  38. Tillmann, Christoph, Stephan Vogel, Hermann Ney, Alex Zubiaga, and Hassan Sawaf. 1997. Accelerated dp based search for statistical translation. In Proceedings of the 5th European Conference on Speech Communication and Technology, 2667–2670.Google Scholar
  39. Tromble, Roy, and Jason Eisner. 2009. Learning linear ordering problems for better translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 1007–1016. Association for Computational Linguistics.Google Scholar
  40. Tsunakawa, Takashi, Naoaki Okazaki, Xiao Liu, and Jun’ichi Tsujii. 2009. A Chinese-Japanese lexical machine translation through a pivot language. ACM Transactions on Asian Language Information Processing 8(2):9:1–9:21.Google Scholar
  41. Visweswariah, Karthik, Jiri Navratil, Jeffrey Sorensen, Vijil Chenthamarakshan, and Nanda Kambhatla. 2010. Syntax based reordering with automatically derived rules for improved statistical machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 1119–1127. Association for Computational Linguistics.Google Scholar
  42. Visweswariah, Karthik, Rajakrishnan Rajkumar, Ankur Gandhe, Ananthakrishnan Ramanathan, and Jiri Navratil. 2011. A word reordering model for improved machine translation. In Proceedings of Empirical Methods in Natural Language Processing, 486–496. Association for Computational Linguistics.Google Scholar
  43. Wang, Chao, Michael Collins, and Philipp Koehn. 2007. Chinese syntactic reordering for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 737–745. Association for Computational Linguistics.Google Scholar
  44. Wu, Hua, and Haifeng Wang. 2007. Pivot language approach for phrase-based statistical machine translation. Machine Translation 21(3):165–181.CrossRefGoogle Scholar
  45. Wu, Xianchao, Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2011. Extracting pre-ordering rules from predicate-argument structures. In Proceedings of 5th International Joint Conference on Natural Language Processing (IJCNLP), November 2011, 29–37. Chiang Mai: Asian Federation of Natural Language Processing. http://www.aclweb.org/anthology/I111004.Google Scholar
  46. Xia, Fei. 2000. The part-of-speech tagging guidelines for the Penn Chinese Treebank 3.0. Technical Report IRCS0007 (October 2000). Institute of Research and Cognitive Science (IRCS). Pennsylvania: University of Pennsylvania. http://repository.upenn.edu/ircs_reports/38/.
  47. Xia, Fei, and Michael McCord. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), 508–514. Association for Computational Linguistics.Google Scholar
  48. Xu, Peng, Jaeho Kang, Michael Ringgaard, and Franz Och. 2009. Using a dependency parser to improve SMT for subject-object-verb languages. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 245–253. Association for Computational Linguistics.Google Scholar
  49. Yu, Kun, Yusuke Miyao, Takuya Matsuzaki, Xiangli Wang, and Junichi Tsujii. 2011. Analysis of the difficulties in Chinese deep parsing. In Proceedings of the 12th International Conference on Parsing Technologies, 48–57. Association for Computational Linguistics.Google Scholar
  50. Zhao, Hong-Mei, Ya-Juan Lv, Guo-Sheng Ben, Yun Huang, and Qun Liu. 2011. Evaluation report for the 7th China workshop on machine translation (CWMT2011). In The 7th China Workshop on Machine Translation (CWMT2011). http://mt.xmu.edu.cn/cwmt2011/document/papers/e00.pdf.

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Dan Han
    • 1
    Email author
  • Pascual Martínez-Gómez
    • 1
  • Yusuke Miyao
    • 2
    • 3
  1. 1.National Institute of Advanced Industrial ScienceTokyoJapan
  2. 2.The Graduate University for Advanced StudiesHayamaJapan
  3. 3.National Institute of InformaticsTokyoJapan

Personalised recommendations