Automatic Annotation of Medical Records in Spanish with Disease, Drug and Substance Names

  • Maite Oronoz
  • Arantza Casillas
  • Koldo Gojenola
  • Alicia Perez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8259)


This paper presents an annotation tool that detects entities in the biomedical domain. By enriching the lexica of the Freeling analyzer with bio-medical terms extracted from dictionaries and ontologies as SNOMED CT, the system is able to automatically detect medical terms in texts. An evaluation has been performed against a manually tagged corpus focusing on entities referring to pharmaceutical drug-names, substances and diseases. The obtained results show that a good annotation tool would help to leverage subsequent processes as data mining or pattern recognition tasks in the biomedical domain.

Index Terms

development of linguistic tools annotation medical domain 


  1. 1.
    Jimeno-Yepes, A., Prieur-Gaston, É., Névéol, A.: Combining medline and publisher data to create parallel corpora for the automatic translation of biomedical text. BMC Bioinformatics 14, 146 (2013)CrossRefGoogle Scholar
  2. 2.
    Tiedemann, J.: Parallel data, tools and interfaces in opus. In: Proc. Language Resources and Evaluation, LREC (2012)Google Scholar
  3. 3.
    Wu, Y., Abe, K., Dixon, P.R., Hori, C., Kashioka, H.: Leveraging Social Annotation for Topic Language Model Adaptation. In: Proc. International Speech Communication Association (INTERSPEECH) (2012)Google Scholar
  4. 4.
    Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: Brat: A web-based tool for nlp-assisted text annotation. In: Proc. EACL (2012)Google Scholar
  5. 5.
    Padró, L., Reese, S., Agirre, E., Soroa, A.: Semantic Services in Freeling 2.1: WordNet and UKB. In: Global Wordnet Conference, Mumbai, India (2010)Google Scholar
  6. 6.
    Tsuruoka, Y., Tateishi, Y., Kim, J., Ohta, T., McNaught, J., Ananiadou, S., Tsujii, J.: Developing a Robust Part-of-Speech Tagger for Biomedical Text. In: 10th Panhellenic Conference on Informatics (2005)Google Scholar
  7. 7.
    Patrick, J., Wang, Y., Budd, P.: An Automated System for Conversion of Clinical Notes into SNOMED Clinical Terminology. In: Proc. Australasian symposium on ACSW frontiers, ACSW 2007, vol. 68, pp. 219–226 (2007)Google Scholar
  8. 8.
    Aronson, A.: Effective Mapping of Biomedical Text to the UMLS Metathesaurus: the MetaMap program. In: Proc. of AMIAS, pp. 17–21 (2001)Google Scholar
  9. 9.
    Carrero, F.M., Cortizo, J.C., Gómez, J.M., de Buenaga, M.: In the Development of a Spanish Metamap. In: Proc. of the 17th ACM Conference on Information and Knowledge Management, pp. 1465–1466 (2008)Google Scholar
  10. 10.
    Castro, E., Iglesias, A., Martínez, P., Castaño, L.: Automatic Identification of Biomedical Concepts in Spanish-Language Unstructured Clinical Texts. In: Proc. of the 1st ACM International Health Informatics Symposium. IHI 2010, pp. 751–757 (2010)Google Scholar
  11. 11.
    Yetano, J., Alberola, V.: Diccionario de Siglas Médicas y Otras Abreviaturas, Epónimos y Términos Médicos Relacionados con la Codificación de las Altas Hospitalarias. Ministerio de Sanidad y Consumo (2003)Google Scholar
  12. 12.
    Kim, J.D., Pysalo, S., Ohta, T., Bossy, R., Nguyen, N., Tsujii, J.: Overview of BioNLP Shared Task 2011. In: Proc. of BioNLP Shared Task 2011. ACL (2011)Google Scholar
  13. 13.
    Agirre, E., Soroa, A., Stevenson, M.: Graph-based word sense disambiguation of biomedical documents. Bioinformatics 26, 2889–2896 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Maite Oronoz
    • 1
  • Arantza Casillas
    • 2
  • Koldo Gojenola
    • 1
  • Alicia Perez
    • 1
  1. 1.Departamento de Lenguajes y Sistemas Informáticos. IXA taldeaUPV-EHUSpain
  2. 2.Departamento de Electricidad y Electrónica. IXA taldeaUPV-EHUSpain

Personalised recommendations