Building a Bilingual Dictionary from a Japanese-Chinese Patent Corpus

  • Keiji Yasuda
  • Eiichiro Sumita
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7817)


In this paper, we propose an automatic method to build a bilingual dictionary from a Japanese-Chinese parallel corpus. The proposed method uses character similarity between Japanese and Chinese, and a statistical machine translation (SMT) framework in a cascading manner. The first step extracts word translation pairs from the parallel corpus based on similarity between Japanese kanji characters (Chinese characters used in Japanese writing) and simplified Chinese characters. The second step trains phrase tables using 2 different SMT training tools, then extracts common word translation pairs. The third step trains an SMT system using the word translation pairs obtained by the first and the second steps. According to the experimental results, the proposed method yields 59.3% to 92.1% accuracy in the word translation pairs extracted, depending on the cascading step.


Machine Translation Chinese Character Dictionary Entry Chinese Word Statistical Machine Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asahara, M., Matsumoto, Y.: Extended models and tools for high-performance part-of-speech tagger. In: Proceedings of COLING, pp. 21–27 (2000)Google Scholar
  2. 2.
    Zhang, H.P., Yu, H.K., Xiong, D.Y., Liu, Q.: Hhmm-based chinese lexical analyzer ictclas. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, vol. 17, pp. 184–187 (2003)Google Scholar
  3. 3.
    Nakagawa, H., Mori, T.: Automaic term recognition based on statistics of compound nouns and their components. Terminology 9(2), 201–209 (2003)CrossRefGoogle Scholar
  4. 4.
    Hoang, H., Koehn, P.: Design of the moses decoder for statistical machine translation. In: Proceedings of ACL Workshop on Software Engineering, Testing and Quality Assurance for NLP, pp. 58–65 (2008)Google Scholar
  5. 5.
    Neubig, G., Watanabe, T., Sumita, E., Mori, S., Kawahara, T.: An unsupervised model for joint phrase alignment and extraction. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 632–641. Association for Computational Linguistics, Portland (2011)Google Scholar
  6. 6.
    Lu, B., Tsou, B.K., Jiang, T., Kwong, O.Y., Zhu, J.: Multilingual patents: An english-chinese example and its application to smt. In: Proceedings of the 1st CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP 2010) (2010)Google Scholar
  7. 7.
    Goh, C.-L., Asahara, M., Matsumoto, Y.: Building a japanese-chinese dictionary using kanji/Hanzi conversion. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 670–681. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Tsunakawa, T., Okazaki, N., Tsujii, J.: Building a bilingual lexicon using phrase-based statistical machine translation via a pivot language. In: Proceedings of the 22nd International Conference on Computational Linguistics Companion Volume Posters and Demonstrations, pp. 127–130 (2008)Google Scholar
  9. 9.
    Morishita, Y., Bing, L., Utsuro, T., Yamamoto, M.: Estimating translation of technical terms based on phrase translation table and a bilingual lexicon. IEICE Transactions on Information and Systems J-93D(11), 2525–2537 (2010) (in Japanese)Google Scholar
  10. 10.
    Utiyama, M., Isahara, H.: A japanese-english patent parallel corpus. In: Proceedings of MT Summit XI, pp. 475–482 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Keiji Yasuda
    • 1
  • Eiichiro Sumita
    • 1
  1. 1.National Institute of Information and Communications TechnologyKeihanna Science CityJapan

Personalised recommendations