Advertisement

A Feature-Based Approach for Relation Extraction from Thai News Documents

  • Nattapong Tongtep
  • Thanaruk Theeramunkong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5477)

Abstract

Relation extraction among named entities is one of the most important tasks in information extraction. This paper presents a feature-based approach for extracting relations among named entities from Thai news documents. In this approach, shallow linguistic processing, including pattern-based named entity extraction, is performed to construct several sets of features. Four supervised learning schemes are applied alternatively to investigate the performance of relation extraction using different feature sets. Focusing on four different types of relations in crime-related news documents, the experimental result shows that the proposed method achieves up to an accuracy of 95% using a data set of 1736 entity pairs. Effect of each set of features on relation extraction is explored for further discussion.

Keywords

Relation Extraction Named Entity Extraction Thai Language Processing Supervised Learning Local Features 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zhu, J., Gonçalves, A., Uren, V., Motta, E., Pacheco, R.: Corder: Community relation discovery by named entity recognition. In: Proceedings of the 3rd int’l conference on Knowledge capture (K-CAP 2005), pp. 219–220. ACM, New York (2005)Google Scholar
  2. 2.
    Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from large corpora. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL 2004), Morristown, NJ, USA, ACL, p. 415 (2004)Google Scholar
  3. 3.
    Rosenfeld, B., Feldman, R.: Clustering for unsupervised relation identification. In: Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM 2007), pp. 411–418. ACM, New York (2007)CrossRefGoogle Scholar
  4. 4.
    Kawtrakul, A., Suktarachan, M., Varasai, P., Chanlekha, H.: A state of the art of thai language resources and thai language behavior analysis and modeling. In: Proceedings of the 3rd workshop on Asian language resources and int’l standardization (COLING 2002), Morristown, NJ, USA, ACL, pp. 1–8 (2002)Google Scholar
  5. 5.
    Tongtep, N., Theeramunkong, T.: Pattern-based named entity extraction for thai news documents. In: Proceedings of the 3rd Int’l Conference on Knowledge, Information and Creativity Support Systems (KICSS 2008), December 22-23, 2008, pp. 82–89 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Nattapong Tongtep
    • 1
  • Thanaruk Theeramunkong
    • 1
  1. 1.Sirindhorn International Institute of TechnologyThammasat UniversityThailand

Personalised recommendations