Japanese-Chinese Cross-Language Entity Linking Adapting to User’s Language Ability

Conference paper

Abstract

In this chapter, we propose a method to automatically discover valuable keyphrases in Japanese and link these keyphrases to related Chinese Wikipedia pages. Our proposed method has four stages. Firstly, we extract nouns from a Japanese document using a morphological analyzer and extract the candidates of keyphrases using a method called top consecutive nouns cohesion (TCNC) (Horita et al. Int. J. Comput. Theory Eng. 8(1):32–35, (2016) [1]). Secondly, we judge the degree of difficulty of the extracted keyphrases and tag them with different linguistic levels. Thirdly, we translate the extracted Japanese keyphrases into Chinese using a combination of three translation methods. Fourthly, we extract the corresponding Chinese Wikipedia articles of the translated keyphrases. Fifthly, we translate the original Japanese document into Chinese and make a vector of noun frequencies. Sixthly, we calculate the cosine similarities of the translated original document and candidate Chinese Wikipedia articles. Finally, we create links from the Japanese keyphrases to the top-ranking Chinese Wikipedia articles.

Keywords

Cross-language link discovery Entity disambiguation Keyphrase extraction Linguistic difficulty level estimation Wikification Wikipedia Word2vec 

Notes

Acknowledgements

This work was supported in part by JSPS KAKENHI Grant Numbers 24500300, 16K00452, and the MEXT-Supported Program for the Strategic Research Foundation at Private Universities (S1511026).

References

  1. 1.
    K. Horita, F. Kimura, A. Maeda, Automatic keyword extraction for wikification of East Asian language documents. Int. J. Comput. Theory Eng. 8(1), 32–35 (2016)CrossRefGoogle Scholar
  2. 2.
    R. Mihalcea, A. Csomai, Wikify!: linking documents to encyclopedic knowledge, in Proceedings of the 16th ACM Conference on Information and Knowledge Management, pp. 233–242 (2007)Google Scholar
  3. 3.
    L.X. Tang, S. Geva, A. Trotman, Y. Xu, K.Y. Itakura, Overview of the NTCIR-9 crosslink task: cross-lingual link discovery, in Proceedings of the 9th NTCIR Conference, pp. 437–463 (2011)Google Scholar
  4. 4.
    L.X. Tang, I.S. Kang, F. Kimura, Y.H. Lee, A. Trotman, S. Geva, Y. Xu, Overview of the NTCIR-10 cross-lingual link discovery task, in Proceedings of the 10th NTCIR Conference, pp. 8–38 (2013)Google Scholar
  5. 5.
    J. Heng, J. Nothman, B. Hachey, Overview of TAC-KBP2014 entity discovery and linking tasks, in Proceedings of TAC2014 (2014)Google Scholar
  6. 6.
    Z. Wang, J. Li, Z. Wang, J. Tang, Cross-lingual knowledge linking across wiki knowledge bases, in Proceedings of the 21st International conference on World Wide Web, pp. 459–468 (2012)Google Scholar
  7. 7.
    S. Chen, G.J.F. Jones, N.E. O’Connor, DCU at NTCIR-10 crosslingual link discovery (CrossLink-2) task, in Proceedings of the 10th NTCIR Conference, pp. 74–78 (2013)Google Scholar
  8. 8.
    D. Milne, I.W. Witten, An open-source toolkit for mining Wikipedia. Artif. Intell. 194, 222–239 (2013)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Y. Liu, J. Boisson, J.S. Chang, NTHU at NTCIR-10 crosslink-2: an approach toward semantic features, in Proceedings of the 10th NTCIR Conference, pp. 62–68 (2013)Google Scholar
  10. 10.
    J. Zhou, F. Kimura, A. Maeda, Cross-language entity linking adapting to user’s language ability, in Proceedings of The International MultiConference of Engineers and Computer Scientists 2017. Lecture Notes in Engineering and Computer Science, 15–17 Mar 2017, Hong Kong, pp. 24–29Google Scholar
  11. 11.
    MeCab: yet another part-of-speech and morphological analyzer (in Japanese), http://taku910.github.io/mecab/. Accessed 21 Aug 2017
  12. 12.
    JLPT Japanese-language proficiency test, http://www.jlpt.jp/e/index.html. Accessed 21 Aug 2017
  13. 13.
    T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Proceedings of Advances on Neural Information Processing Systems 26 (NIPS 2013), pp. 3111–3119 (2013)Google Scholar
  14. 14.
    Google Translate, https://translate.google.com. Accessed 21 Aug 2017
  15. 15.
    Bing Translator, http://www.bing.com/translator. Accessed 21 Aug 2017
  16. 16.
    Pinconv 4 (in Japanese), http://www.karak.jp/chinese/pinconv-4-00.html. Accessed 21 Aug 2017
  17. 17.
    J. Zhou, X. Song, F. Kimura, A. Maeda, A cross-language entity linking method using combination of multiple translation methods,” in Proceedings of the 4th ICT International Student Project Conference (ICT-ISPC2015) (2015)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Faculty of Economics, Management and Information ScienceOnomichi City UniversityOnomichiJapan
  2. 2.Graduate School of Information Science and EngineeringRitsumeikan UniversityKusatsuJapan
  3. 3.College of Information Science and EngineeringRitsumeikan UniversityKusatsuJapan

Personalised recommendations