Evaluating and Improving Annotation Tools for Medical Forms

  • Ying-Chi LinEmail author
  • Victor Christen
  • Anika Groß
  • Silvio Domingos Cardoso
  • Cédric Pruski
  • Marcos Da Silveira
  • Erhard Rahm
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10649)


The annotation of entities with concepts from standardized terminologies and ontologies is of high importance in the life sciences to enhance semantic interoperability, information retrieval and meta-analysis. Unfortunately, medical documents such as clinical forms or electronic health records are still rarely annotated despite the availability of some tools to automatically determine possible annotations. In this study, we comparatively evaluate the quality of two such tools, cTAKES and MetaMap, as well as of a recently proposed annotation approach from our group for annotating medical forms. We also investigate how to improve the match quality of the tools by post-filtering computed annotations as well as by combining several annotation approaches.


Annotation Medical documents Ontology UMLS 



This work is funded by the German Research Foundation (DFG) (grant RA 497/22-1, “ELISA - Evolution of Semantic Annotations”), German Federal Ministry of Education and Research (BMBF) (grant 031L0026, “Leipzig Health Atlas”) and National Research Fund Luxembourg (FNR) (grant C13/IS/5809134).


  1. 1.
    Abedi, V., Zand, R., Yeasin, M., Faisal, F.E.: An automated framework for hypotheses generation using literature. BioData Min. 5(1), 13 (2012)CrossRefGoogle Scholar
  2. 2.
    Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17(3), 229–236 (2010)CrossRefGoogle Scholar
  3. 3.
    Campos, D., Matos, S., Oliveira, J.: Current methodologies for biomedical named entity recognition. In: Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, pp. 839–868 (2013)Google Scholar
  4. 4.
    Christen, V., Groß, A., Rahm, E.: A reuse-based annotation approach for medical documents. In: Groth, P., Simperl, E., Gray, A., Sabou, M., Krötzsch, M., Lecue, F., Flöck, F., Gil, Y. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 135–150. Springer, Cham (2016). doi: 10.1007/978-3-319-46523-4_9 CrossRefGoogle Scholar
  5. 5.
    Christen, V., Groß, A., Varghese, J., Dugas, M., Rahm, E.: Annotating medical forms using UMLS. In: Ashish, N., Ambite, J.-L. (eds.) DILS 2015. LNCS, vol. 9162, pp. 55–69. Springer, Cham (2015). doi: 10.1007/978-3-319-21843-4_5 CrossRefGoogle Scholar
  6. 6.
    Dai, M., Shah, N.H., Xuan, W., Musen, M.A., Watson, S.J., Athey, B.D., Meng, F., et al.: An efficient solution for mapping free text to ontology terms. In: AMIA Summit on Translational Bioinformatics 21 (2008)Google Scholar
  7. 7.
    Doan, S., Conway, M., Phuong, T.M., Ohno-Machado, L.: Natural language processing in biomedicine: a unified system architecture overview. In: Trent, R. (ed.) Clinical Bioinformatics. Methods in Molecular Biology (Methods and Protocols), vol 1168, pp. 275–294. Humana Press, New York (2014)Google Scholar
  8. 8.
    Dugas, M., Neuhaus, P., Meidt, A., Doods, J., Storck, M., Bruland, P., Varghese, J.: Portal of medical data models: information infrastructure for medical research and healthcare. Database: The Journal of Biological Databases and Curation p. bav121 (2016)Google Scholar
  9. 9.
    Friedman, C., Shagina, L., Lussier, Y., Hripcsak, G.: Automated encoding of clinical documents based on natural language processing. J. Am. Med. Inform. Assoc. 11(5), 392–402 (2004)CrossRefGoogle Scholar
  10. 10.
    Funk, C., Baumgartner, W., Garcia, B., Roeder, C., Bada, M., Cohen, K.B., Hunter, L.E., Verspoor, K.: Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinform. 15(1), 1–29 (2014)CrossRefGoogle Scholar
  11. 11.
    Heinemann, F., Huber, T., Meisel, C., Bundschus, M., Leser, U.: Reflection of successful anticancer drug development processes in the literature. Drug Discovery Today 21(11), 1740–1744 (2016)CrossRefGoogle Scholar
  12. 12.
    Humphrey, S.M., Rogers, W.J., Kilicoglu, H., Demner-Fushman, D., Rindflesch, T.C.: Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment. J. Am. Soc. Inform. Sci. Technol. 57(1), 96–113 (2006)CrossRefGoogle Scholar
  13. 13.
    LePendu, P., Iyer, S., Fairon, C., Shah, N.H., et al.: Annotation analysis for testing drug safety signals using unstructured clinical notes. J. Biomed. Semant. 3(S-1), S5 (2012)Google Scholar
  14. 14.
    McCray, A.T., Srinivasan, S., Browne, A.C.: Lexical methods for managing variation in biomedical terminologies. In: Proceedings of the Annual Symposium on Computer Application in Medical Care, pp. 235–239 (1994)Google Scholar
  15. 15.
    Oellrich, A., Collier, N., Smedley, D., Groza, T.: Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes. PLoS ONE 10(1), e0116040 (2015)CrossRefGoogle Scholar
  16. 16.
    Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17(5), 507–513 (2010)CrossRefGoogle Scholar
  17. 17.
    Shah, N.H., Bhatia, N., Jonquet, C., Rubin, D., Chiang, A.P., Musen, M.A.: Comparison of concept recognizers for building the open biomedical annotator. BMC Bioinform. 10(Suppl. 9), S14–S14 (2009)CrossRefGoogle Scholar
  18. 18.
    Sohn, S., Kocher, J.P.A., Chute, C.G., Savova, G.K.: Drug side effect extraction from clinical narratives of psychiatry and psychology patients. J. Am. Med. Inform. Assoc. 18(Suppl. 1), i144–i149 (2011)CrossRefGoogle Scholar
  19. 19.
    Sohn, S., Savova, G.K.: Mayo clinic smoking status classification system: extensions and improvements. In: AMIA Annual Symposium Proceedings, pp. 619–623 (2009)Google Scholar
  20. 20.
    Tanenblatt, M.A., Coden, A., Sominsky, I.L.: The ConceptMapper approach to named entity recognition. In: Proceedings of 7th Language Resources and Evaluation Conference (LREC), pp. 546–551 (2010)Google Scholar
  21. 21.
    Tseytlin, E., Mitchell, K., Legowski, E., Corrigan, J., Chavan, G., Jacobson, R.S.: NOBLE-Flexible concept recognition for large-scale biomedical natural language processing. BMC Bioinform. 17(1), 32 (2016)CrossRefGoogle Scholar
  22. 22.
    University of Pittsburgh: TIES-Text Information Extraction System (2017).
  23. 23.
    Zheng, J., Chapman, W.W., Miller, T.A., Lin, C., Crowley, R.S., Savova, G.K.: A system for coreference resolution for the clinical narrative. J. Am. Med. Inform. Assoc. 19(4), 660 (2012)CrossRefGoogle Scholar
  24. 24.
    Zou, Q., Chu, W.W., Morioka, C., Leazer, G.H., Kangarloo, H.: Indexfinder: a knowledge-based method for indexing clinical texts. In: AMIA Annual Symposium Proceedings, pp. 763–767 (2003)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ying-Chi Lin
    • 1
    Email author
  • Victor Christen
    • 1
  • Anika Groß
    • 1
  • Silvio Domingos Cardoso
    • 2
    • 3
  • Cédric Pruski
    • 2
  • Marcos Da Silveira
    • 2
  • Erhard Rahm
    • 1
  1. 1.Department of Computer ScienceUniversität LeipzigLeipzigGermany
  2. 2.LIST, Luxembourg Institute of Science and TechnologyEsch-sur-AlzetteLuxembourg
  3. 3.LRIUniversity of Paris-Sud XIOrsayFrance

Personalised recommendations