Automatic Extraction of Structured Information from Drug Descriptions
This paper describes a Conditional Random Field (CRF) based named entity extraction model that is used for identifying relevant information from drug prescriptions. The entities that the model is able to extract are: dosage, measuring unit, to whom the treatment is directed, frequency and the total duration of treatment. A corpus with 1800 sentences has been compiled and annotated by two experts from drug prescription texts. Using the set of features identified by us, the CRF model hits around 95% F1-measure values for unit, dosage and frequency detection.
KeywordsConditional Random Field Drug description Named Entity Recognition
The work for this paper has been supported in part by the Computer Science Department of the Technical University of Cluj-Napoca, Romania.
- 2.Okazaki, N.: CRFsuite: a fast implementation of Conditional Random Fields (CRFs) (2007)Google Scholar
- 3.Patrick, J., Li, M.: A cascade approach to extracting medication events. In: Proceedings of the Australasian Language Technology Association Workshop 2009, pp. 99–103 (2009)Google Scholar
- 6.Slavescu, R.R., Masca, C., Slavescu, K.C.: Sequence labeling for extracting relevant pieces of information from raw text medicine descriptions. In: Proceedings of the International Conference on Advancements of Medicine and Health Care through Technology, October 2018, Cluj-Napoca, Romania (2018, In press)Google Scholar
- 11.Zhang, Y., Jiang, M., Wang, J., Xu, H.: Semantic role labeling of clinical text: comparing syntactic parsers and features. In: AMIA 2016, American Medical Informatics Association Annual Symposium, Chicago, IL, USA (2016)Google Scholar