Advertisement

Effective Chinese Relation Extraction by Sentence Rolling and Candidate Ranking

  • Meilun Sheng
  • Lin Qiu
  • Chenyang Wu
  • Haofen Wang
  • Yong Yu
Part of the Communications in Computer and Information Science book series (CCIS, volume 406)

Abstract

Relation extraction is to discover relations between entities mentioned in the plain text. It can be used to generate semantic data in form of RDF triples representing facts. In this paper, we focus on relation extraction from Chinese text, which is less studied compared with that for English. Chinese words and phrases have great ambiguities on syntax and semantic. Thus, Chinese NLP tools can be insufficient when the sentence is too long or the sentence structure is too complex. Unfortunately, this is the case in the real world data. In order to tackle the limitation of the current Chinese NLP tools, we propose a method called sentence rolling to generate several enhanced inputs from the original input to help generate the correct relation candidates. In order to rank these candidates in an appropriate way, a voting approach is applied based on several statistic-based ranking function. Further, a Relation KB is used to help determine the subject part and the object part for the selected relation candidate. We carried out comprehensive experiments on both real world news corpus and benchmark data combining Chinese Treebank and Chinese Dependency Treebank. The experimental results show that the method can improve the performance of relation extraction significantly compared with the existing ones and cost a reasonable time.

Keywords

Relation Extraction Chinese Relation Extraction Statistical Method Dependency Tree Relation Knowledge Base 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agichtein, E., Gravano, L., Pavel, J., Sokolova, V., Voskoboynik, A.: Snowball: A prototype system for extracting relations from large text collections. ACM SIGMOD Record 30, 612 (2001)CrossRefGoogle Scholar
  2. 2.
    Björkelund, A., Hafdell, L., Nugues, P.: Multilingual semantic role labeling. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, pp. 43–48. Association for Computational Linguistics (2009)Google Scholar
  3. 3.
    Bohnet, B.: Very high accuracy and fast dependency parsing is not a contradiction. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 89–97. Association for Computational Linguistics (2010)Google Scholar
  4. 4.
    Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall: (preliminary results). In: Proceedings of the 13th International Conference on World Wide Web, pp. 100–110. ACM (2004)Google Scholar
  5. 5.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics (2011)Google Scholar
  6. 6.
    Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving chinese linking open data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Pan, J., Wang, H., Yu, Y.: Building large scale relation kb from text. In: International Semantic Web Conference (Posters and Demos) (2012)Google Scholar
  8. 8.
    Suchanek, F.M., Ifrim, G., Weikum, G.: Combining linguistic and statistical analysis to extract relations from web documents. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 712–717. ACM (2006)Google Scholar
  9. 9.
    Wang, C., Kalyanpur, A., Fan, J., Boguraev, B.K., Gondek, D.C.: Relation extraction and scoring in deepqa. IBM Journal of Research and Development 56(3-4), 9:1–9:12 (2012)Google Scholar
  10. 10.
    Wang, W.: Chinese news event 5w1h semantic elements extraction for event ontology population. In: Proceedings of the 21st International Conference Companion on World Wide Web, pp. 197–202. ACM (2012)Google Scholar
  11. 11.
    Wang, W., Zhao, D., Wang, D.: Chinese news event 5w1h elements extraction using semantic role labeling. In: 2010 Third International Symposium on Information Processing (ISIP), pp. 484–489. IEEE (2010)Google Scholar
  12. 12.
    Wang, W., Zhao, D., Zou, L., Wang, D., Zheng, W.: Extracting 5W1H event semantic elements from chinese online news. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 644–655. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: Textrunner: open information extraction on the web. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 25–26. Association for Computational Linguistics (2007)Google Scholar
  14. 14.
    Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.-R.: Statsnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th International Conference on World Wide Web, pp. 101–110. ACM (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Meilun Sheng
    • 1
  • Lin Qiu
    • 1
  • Chenyang Wu
    • 1
  • Haofen Wang
    • 1
  • Yong Yu
    • 1
  1. 1.Apex LabShanghai Jiao Tong UniversityChina

Personalised recommendations