Skip to main content

Information Extraction from Clinical Documents: Towards Disease/Disorder Template Filling

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9283))

Abstract

In recent years there has been an increase in the generation of electronic health records (EHRs), which lead to an increased scope for research on biomedical literature. Many research works have been using various NLP, information retrieval and machine learning techniques to extract information from these records. In this paper, we provide a methodology to extract information for understanding the status of the disease/disorder. The status of disease/disorder is based on different attributes like temporal information, severity and progression of the disease. Here, we consider ten attributes that allow us to understand the majority details regarding the status of the disease/disorder. They are Negation Indicator, Subject Class, Uncertainty Indicator, Course Class, Severity Class, Conditional Class, Generic Class, Body Location, DocTime Class, and Temporal Expression. In this paper, we present rule-based and machine learning approaches to identify each of these attributes and evaluate our system on attribute level and system level accuracies. This project was done as a part of the ShARe/CLEF eHealth Evaluation Lab 2014. We were able to achieve state-of-art accuracy (0.868) in identifying normalized values of the attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrade, M.A., Valencia, A.: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998)

    Article  Google Scholar 

  2. Aramaki, E., Miura, Y., Tonoike, M., Ohkuma, T., Mashuichi, H., Ohe, K.: Text2table: medical text summarization system based on named entity recognition and modality identification. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, BioNLP 2009, pp. 185–192 (2009)

    Google Scholar 

  3. Bodnari, A., Deleger, L., Lavergne, T., Neveol, A., Zweigenbaum, P.: A supervised named-entity extraction system for medical text. In: Online Working Notes of the CLEF 2013 Evaluation Labs and Workshop, September 2013

    Google Scholar 

  4. Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics 34(5), 301–310 (2001)

    Article  Google Scholar 

  5. Collier, N., Nobata, C., Tsujii, J.I.: Extracting the names of genes and gene products with a hidden markov model. In: Proceedings of the 18th conference on Computational Linguistics, vol. 1, pp. 201–207 (2000)

    Google Scholar 

  6. Dingare, S., Nissim, M., Finkel, J., Manning, C., Grover, C.: A system for identifying named entities in biomedical text: How results from two evaluations reflect on both the system and the evaluations. Comparative and Functional Genomics 6(1–2), 77–85 (2005)

    Article  Google Scholar 

  7. Dligach, D., Bethard, S., Becker, L., Miller, T.A., Savova, G.K.: Discovering body site and severity modifiers in clinical texts. JAMIA 21(3), 448–454 (2014)

    Google Scholar 

  8. Doan, S., Xu, H.: Recognizing medication related entities in hospital discharge summaries using support vector machine. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 259–266 (2010)

    Google Scholar 

  9. Fang, X., Bai, C., Wang, X.: Bioinformatics insights into acute lung injury/acute respiratory distress syndrome. Clinical and Translational Medicine 1(1), 9 (2012)

    Article  Google Scholar 

  10. Gaizauskas, R., Demetriou, G., Humphreys, K.: Term recognition and classification in biological science journal articles. In: Proc. of the Computional Terminology for Medical and Biological Applications Workshop of the 2nd International Conference on NLP (2000)

    Google Scholar 

  11. Hara, A., Ichimura, T., Yoshida, K.: Discovering multiple diagnostic rules from coronary heart disease database using automatically defined groups. Journal of Intelligent Manufacturing 16(6), 645–661 (2005)

    Article  Google Scholar 

  12. Huang, C.C., Lu, Z.: Community challenges in biomedical text mining over 10 years: success, failure and the future. Briefings in Bioinformatics, bbv024 (2015)

    Google Scholar 

  13. Humphreys, K., Demetriou, G., Gaizauskas, R.: Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. In: Proceedings of the Pacific Symposium on Biocomputing (PSB-2000), pp. 505–516, January 2000

    Google Scholar 

  14. Johri, N., Niwa, Y., Chikka, V.R.: Optimizing apache ctakes for disease/disorder template filling: team HITACHI in the share/clef 2014 ehealth evaluation lab. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15–18, 2014. CEUR Workshop Proceedings, vol. 1180, pp. 111–123. CEUR-WS.org (2014)

    Google Scholar 

  15. Kelly, L., et al.: Overview of the ShARe/CLEF ehealth evaluation lab 2014. In: Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 172–191. Springer, Heidelberg (2014)

    Google Scholar 

  16. Lucene, A.: Apache lucene. http://lucene.apache.org/core/

  17. Mittal, P., Gill, N.S.: Article: Study and analysis of predictive data mining approaches for clinical dataset. International Journal of Computer Applications 63(3), 35–39 (2013)

    Article  Google Scholar 

  18. Mowery, D.L., Velupillai, S., South, B.R., Christensen, L., Martinez, D., Kelly, L., Goeuriot, L., Elhadad, N., Pradhan, S., Savova, G., et al.: Task 2: share/clef ehealth evaluation lab 2014. In: Proceedings of CLEF 2014 (2013)

    Google Scholar 

  19. Ogren, P.V., Wetzler, P.G., Bethard, S.J.: Cleartk: a framework for statistical natural language processing. Unstructured Information Management Architecture Workshop at the Conference of the German Society for Computational Linguistics and Language Technology, 9 (2009)

    Google Scholar 

  20. Raja, K., Subramani, S., Natarajan, J.: Template filling, text mining. In: Dubitzky, W., Wolkenhauer, O., Cho, K.H., Yokota, H. (eds.) Encyclopedia of Systems Biology, pp. 2150–2154. Springer, New York (2013)

    Google Scholar 

  21. Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 17(5), 507–513 (2010)

    Article  Google Scholar 

  22. Sun, W., Rumshisky, A., Uzuner, O.: Temporal reasoning over clinical text: the state of the art. Journal of Biomedical Informatics 20(5), 814–823 (2013)

    Google Scholar 

  23. Wang, X., Hripcsak, G., Friedman, C.: Characterizing environmental and phenotypic associations using information theory and electronic health records. BMC Bioinformatics 10(Suppl. 9), S13 (2009)

    Article  Google Scholar 

  24. Yoo, I., Alafaireet, P., Marinov, M., Pena-Hernandez, K., Gopidi, R., Chang, J.F., Hua, L.: Data mining in healthcare and biomedicine: A survey of the literature. Journal of Medical Systems 36(4), 2431–2448 (2012)

    Article  Google Scholar 

  25. Yoo, I., Song, M.: Biomedical ontologies and text mining for biomedicine and healthcare: A survey. Journal of Computing Science and Engineering, 109–136, June 2008

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Veera Raghavendra Chikka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chikka, V.R., Mariyasagayam, N., Niwa, Y., Karlapalem, K. (2015). Information Extraction from Clinical Documents: Towards Disease/Disorder Template Filling. In: Mothe, J., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2015. Lecture Notes in Computer Science(), vol 9283. Springer, Cham. https://doi.org/10.1007/978-3-319-24027-5_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24027-5_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24026-8

  • Online ISBN: 978-3-319-24027-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics