Medicinal Property Knowledge Extraction from Herbal Documents for Supporting Question Answering System

  • Chaveevan Pechsiri
  • Sumran Painuall
  • Uraiwan Janviriyasopak
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7104)


The aim of this paper is to automatically extract the medicinal properties of an object, especially an herb, from technical documents as knowledge sources for health-care problem solving through the question-answering system, especially What-Question, for disease treatment. The extracted medicinal property knowledge is based on multiple simple sentence or EDUs (Elementary Discourse Units). There are three problems of extracting the medicinal property knowledge: the herbal object identification problem, the medicinal property identification problem for each object and the medicinal property boundary determination problem. We propose using NLP (Natural Language Processing) with statistical based approach to identify the medicinal property and also with machine learning technique as Naïve Bayes with verb features for solving the boundary problem. The result shows successfully the medicinal property extraction of the precision and recall of 86% and 77%, respectively, along with 87% correctness of the boundary determination.


Medicinal Property Knowledge Elementary Discourse Unit Medicinal Property Boundary 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Weeber, M., Vos, R.: Extracting expert medical knowledge from texts. In: Working Notes of the Intelligent Data Analysis in Medicine and Pharmacology Workshop (1998) Google Scholar
  2. 2.
    Carlson, L., Marcu, D., Okurowski, M. E.: Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory. In: Current Directions in Discourse and Dialogue, pp. 85–112 (2003) Google Scholar
  3. 3.
    Kongwan, K., Kawtrakul, A.: Know-what: A Development of Object-Property Extraction from Thai Texts and Query System. In: Proceedings of SNLP 2005, Bangkok, Thailand, pp. 157–162 (2005) Google Scholar
  4. 4.
    Fang, Y.-C., Huang, H.-C., Chen, H.-H., Juan, H.-F.: TCMGeneDIT: a database for associ-ated traditional Chinese medicine, gene and disease information using text mining. BioMed. Central Complementary and Alternative Medicine 8, 58 (2008)CrossRefGoogle Scholar
  5. 5.
    Paşca, M.: Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Google Scholar
  6. 6.
    Mitchell, T.M.: Machine Learning. The McGraw-Hill Companies Inc. and MIT Press, Singapore (1997) Google Scholar
  7. 7.
    Sudprasert, S., Kawtrakul, A.: Thai Word Segmentation based on Global and Local Unsupervised Learning. In: Proceedings of NCSEC 2003 (2003) Google Scholar
  8. 8.
    Chanlekha, H., Kawtrakul, A.: Thai Named Entity Extraction by incorporating Maximum Entropy Model with Simple Heuristic Information. In: IJCNLP 2004 Proceedings (2004) Google Scholar
  9. 9.
    Chareonsuk, J., Sukvakree, T., Kawtrakul, A.: Elementary Discourse unit Segmentation for Thai using Discourse Cue and Syntactic Information. In: Proceedings of NCSEC 2005 (2005) Google Scholar
  10. 10.
    Guthrie, J.A., Guthrie, L., Wilks, Y., Aidinejad, H.: Subject-dependent co-occurrence and word sense disambiguation. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (1991) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Chaveevan Pechsiri
    • 1
  • Sumran Painuall
    • 1
  • Uraiwan Janviriyasopak
    • 2
  1. 1.Dept. of Information TechnologyDhurakijPundit UniversityBangkokThailand
  2. 2.Eastern Industry

Personalised recommendations