Abstract
Automatic recognition of named entities from clinical text lightens the work of health professionals by helping in the interpretation and easing tasks such as the population of databases with patient health information. In this study, we evaluated the performance of Conditional Random Fields, a sequence labelling model, for extracting entities from neurology clinical texts written in Portuguese. More than achieving F1-scores of about 73% or 80%, respectively for a relaxed or strict evaluation, the more discriminant features in this task were also analyzed.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Distributional semantic models, or word embeddings, are typically learned from large collections of text and represent words by vectors of numbers, based on their distribution in text. This enables positioning words in a hyperplane and makes several processing tasks easier, such as computing semantic similarity with the cosine of the word vectors.
- 2.
- 3.
References
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of GSCL, pp. 31–40 (2009)
Ferreira, L., Teixeira, A.J.S., Cunha, J.P.: Information extraction from Portuguese hospital discharge letters. In: VI Jornadas en Technologia del Habla and II Iberian SL Tech Workshop, pp. 39–42, January 2010
Ferreira, L.d.S.: Medical information extraction in European Portuguese. Ph.D. thesis, Universidade de Aveiro (2011)
Gold, S., Elhadad, N., Zhu, X., Cimino, J.J., Hripcsak, G.: Extracting structured medication event information from discharge summaries. In: AMIA Annual Symposium Proceedings, vol. 2008, pp. 237–241. American Medical Informatics Association (2008)
Henriksson, A., Dalianis, H., Kowalski, S.: Generating features for named entity recognition by learning prototypes in semantic space: the case of de-identifying health records. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 450–457. IEEE (2014)
Klinger, R., Tomanek, K.: Classical probabilistic models and conditional random fields. Technical report TR07-2-013, Department of Computer Science, Dortmund University of Technology (2007). https://ls11-www.cs.uni-dortmund.de/_media/techreports/tr07-13.pdf
Lamy, M., Pereira, R., Ferreira, J.C., Vasconcelos, J.B., Melo, F., Velez, I.: Extracting clinical information from electronic medical records. In: Novais, P., et al. (eds.) ISAmI2018 2018. AISC, vol. 806, pp. 113–120. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01746-0_13
Mykowiecka, A., Marciniak, M., Kupść, A.: Rule-based information extraction from patients clinical data. J. Biomed. Inform. 42(5), 923–936 (2009)
Névéol, A., Dalianis, H., Velupillai, S., Savova, G., Zweigenbaum, P.: Clinical natural language processing in languages other than English: opportunities and challenges. J. Biomed. Seman. 9(1), 12 (2018)
Rais, M., Lachkar, A., Lachkar, A., Ouatik, S.E.A.: A comparative study of biomedical named entity recognition methods based machine learning approach. In: 2014 Third IEEE International Colloquium in Information Science and Technology (CIST), pp. 329–334. IEEE (2014)
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta (2010)
Rodrigues, R., Oliveira, H.G., Gomes, P.: NLPPort: a pipeline for Portuguese NLP (Short paper). In: 7th Symposium on Languages, Applications and Technologies (SLATE 2018). OpenAccess Series in Informatics (OASIcs), vol. 62, pp. 18:1–18:9. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018). https://doi.org/10.4230/OASIcs.SLATE.2018.18
Russell, S.J., Norvig, P.: Probabilistic reasoning over time. In: Limited, P.E. (ed.) Artificial Intelligence: A Modern Approach, Chap. 15, pp. 566–636, 3rd edn. Pearson, London (2010)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 134–141. Association for Computational Linguistics (2003)
Sinapse: Publicações da Sociedade Portuguesa de Neurologia, vol. 17:1. Sociedade Portuguesa de Neurologia, Lisbon (2017)
Sinapse: Publicações da Sociedade Portuguesa de Neurologia, vol. 17:2. Sociedade Portuguesa de Neurologia, Lisbon (2017)
Skeppstedt, M., Kvist, M., Dalianis, H.: Rule-based entity recognition and coverage of SNOMED CT in Swedish clinical text. In: LREC, pp. 1250–1257 (2012)
Skeppstedt, M., Kvist, M., Nilsson, G.H., Dalianis, H.: Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study. J. Biomed. Inform. 49, 148–158 (2014)
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4, CONLL 2003, pp. 142–147. Association for Computational Linguistics, Stroudsburg (2003). https://doi.org/10.3115/1119176.1119195
Wang, Y., et al.: Supervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: an empirical study. J. Biomed. Inform. 47, 91–104 (2014)
Wu, Y., Xu, J., Jiang, M., Zhang, Y., Xu, H.: A study of neural word embeddings for named entity recognition in clinical text. In: AMIA Annual Symposium Proceedings, vol. 2015, pp. 1326–1333. American Medical Informatics Association (2015)
Acknowledgements
We acknowledge the financial support of Fundação para a Ciência e a Tecnologia through CISUC (UID/CEC/00326/2019).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lopes, F., Teixeira, C., Gonçalo Oliveira, H. (2019). Named Entity Recognition in Portuguese Neurology Text Using CRF. In: Moura Oliveira, P., Novais, P., Reis, L. (eds) Progress in Artificial Intelligence. EPIA 2019. Lecture Notes in Computer Science(), vol 11804. Springer, Cham. https://doi.org/10.1007/978-3-030-30241-2_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-30241-2_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30240-5
Online ISBN: 978-3-030-30241-2
eBook Packages: Computer ScienceComputer Science (R0)