Skip to main content

Collecting Bilingual Technical Terms from Japanese-Chinese Patent Families by SVM

  • Conference paper
  • First Online:
Computational Linguistics (PACLING 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 593))

Included in the following conference series:

  • 656 Accesses

Abstract

This paper proposes how to collect bilingual technical terms from Japanese-Chinese patent families. In the proposed method, the phrase translation table of a statistical machine translation model is used within the procedure of estimating Japanese-Chinese translation of technical terms. In this procedure, first, we extract Japanese technical terms from the Japanese side of parallel patent sentences. Then, we collect all the sentences that contain the extracted Japanese term. Next, we generate Chinese translation of the Japanese technical term, where we refer to the phrase translation table of a statistical machine translation model. Finally, we apply the Support Vector Machines (SVMs) to the task of identifying bilingual technical terms. As the overall performance, we achieve over 90 % precision with the condition of more than or equal to 60 % recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://mecab.sourceforge.net/.

  2. 2.

    http://sourceforge.jp/projects/ipadic/.

  3. 3.

    We collect Japanese-Chinese bilingual technical term pairs which are generated from an identical Japanese term into one subset. We do not separate them into more than one subsets.

  4. 4.

    http://chasen.org/~taku/software/TinySVM.

References

  1. Bouamor, D., Semmar, N., Zweigenbaum, P.: Context vector disambiguation for bilingual lexicon extraction from comparable corpora. In: Proceedings of 51st ACL, pp. 759–764 (2013)

    Google Scholar 

  2. Erdmann, M., Nakayama, K., Hara, T., Nishio, S.: Improving the extraction of bilingual terminology from Wikipedia. ACM Trans. Multimedia Comput. Commun. Appl. 5(4), 31:1–31:17 (2009)

    Article  Google Scholar 

  3. Huang, F., Zhang, Y., Vogel, S.: Mining key phrase translations from Web corpora. In: Proceedings of HLT/EMNLP, pp. 483–490 (2005)

    Google Scholar 

  4. Itagaki, M., Aikawa, T., He, X.: Automatic validation of terminology translation consistency with statistical method. In: Proceedings of MT Summit XI, pp. 269–274 (2007)

    Google Scholar 

  5. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of 45th ACL, Companion Volume, pp. 177–180 (2007)

    Google Scholar 

  6. Lin, D., Zhao, S., Van Durme, B., Paşca, M.: Mining parenthetical translations from the web by word alignment. In: Proceedings of 46th ACL: HLT, pp. 994–1002 (2008)

    Google Scholar 

  7. Lu, B., Tsou, B.K.: Towards bilingual term extraction in comparable patents. In: Proceedings of 23rd PACLIC, pp. 755–762 (2009)

    Google Scholar 

  8. Matsumoto, Y., Utsuro, T.: Lexical knowledge acquisition. In: Dale, R., Moisl, H., Somers, H. (eds.) Handbook of Natural Language Processing, chap. 24, pp. 563–610. Marcel Dekker Inc., New York (2000)

    Google Scholar 

  9. Morin, E., Hazem, A.: Looking at unbalanced specialized comparable corpora for bilingual lexicon extraction. In: Proceedings of 52nd ACL, pp. 1284–1293 (2014)

    Google Scholar 

  10. Morishita, Y., Utsuro, T., Yamamoto, M.: Integrating a phrase-based SMT model and a bilingual lexicon for human in semi-automatic acquisition of technical term translation lexicon. In: Proceedings of 8th AMTA, pp. 153–162 (2008)

    Google Scholar 

  11. Tonoike, M., Kida, M., Takagi, T., Sasaki, Y., Utsuro, T., Sato, S.: A comparative study on compositional translation estimation using a domain/topic-specific corpus collected from the web. In: Proceedings of 2nd International Workshop on Web as Corpus, pp. 11–18 (2006)

    Google Scholar 

  12. Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter for Sighan bakeoff 2005. In: Proceedings of 4th SIGHAN Workshop on Chinese Language Processing, pp. 168–171 (2005)

    Google Scholar 

  13. Utiyama, M., Isahara, H.: A Japanese-English patent parallel corpus. In: Proceedings of MT Summit XI, pp. 475–482 (2007)

    Google Scholar 

  14. Yasuda, K., Sumita, E.: Building a bilingual dictionary from a Japanese-Chinese patent corpus. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 276–284. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takehito Utsuro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

Dong, L., Long, Z., Utsuro, T., Mitsuhashi, T., Yamamoto, M. (2016). Collecting Bilingual Technical Terms from Japanese-Chinese Patent Families by SVM. In: Hasida, K., Purwarianti, A. (eds) Computational Linguistics. PACLING 2015. Communications in Computer and Information Science, vol 593. Springer, Singapore. https://doi.org/10.1007/978-981-10-0515-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0515-2_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0514-5

  • Online ISBN: 978-981-10-0515-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics