Information Extraction from Clinical Documents: Towards Disease/Disorder Template Filling

Chikka, Veera Raghavendra; Mariyasagayam, Nestor; Niwa, Yoshiki; Karlapalem, Kamalakar

doi:10.1007/978-3-319-24027-5_41

Veera Raghavendra Chikka²¹,
Nestor Mariyasagayam²²,
Yoshiki Niwa²³ &
…
Kamalakar Karlapalem²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9283))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1858 Accesses
4 Citations

Abstract

In recent years there has been an increase in the generation of electronic health records (EHRs), which lead to an increased scope for research on biomedical literature. Many research works have been using various NLP, information retrieval and machine learning techniques to extract information from these records. In this paper, we provide a methodology to extract information for understanding the status of the disease/disorder. The status of disease/disorder is based on different attributes like temporal information, severity and progression of the disease. Here, we consider ten attributes that allow us to understand the majority details regarding the status of the disease/disorder. They are Negation Indicator, Subject Class, Uncertainty Indicator, Course Class, Severity Class, Conditional Class, Generic Class, Body Location, DocTime Class, and Temporal Expression. In this paper, we present rule-based and machine learning approaches to identify each of these attributes and evaluate our system on attribute level and system level accuracies. This project was done as a part of the ShARe/CLEF eHealth Evaluation Lab 2014. We were able to achieve state-of-art accuracy (0.868) in identifying normalized values of the attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andrade, M.A., Valencia, A.: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998)
Article Google Scholar
Aramaki, E., Miura, Y., Tonoike, M., Ohkuma, T., Mashuichi, H., Ohe, K.: Text2table: medical text summarization system based on named entity recognition and modality identification. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, BioNLP 2009, pp. 185–192 (2009)
Google Scholar
Bodnari, A., Deleger, L., Lavergne, T., Neveol, A., Zweigenbaum, P.: A supervised named-entity extraction system for medical text. In: Online Working Notes of the CLEF 2013 Evaluation Labs and Workshop, September 2013
Google Scholar
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics 34(5), 301–310 (2001)
Article Google Scholar
Collier, N., Nobata, C., Tsujii, J.I.: Extracting the names of genes and gene products with a hidden markov model. In: Proceedings of the 18th conference on Computational Linguistics, vol. 1, pp. 201–207 (2000)
Google Scholar
Dingare, S., Nissim, M., Finkel, J., Manning, C., Grover, C.: A system for identifying named entities in biomedical text: How results from two evaluations reflect on both the system and the evaluations. Comparative and Functional Genomics 6(1–2), 77–85 (2005)
Article Google Scholar
Dligach, D., Bethard, S., Becker, L., Miller, T.A., Savova, G.K.: Discovering body site and severity modifiers in clinical texts. JAMIA 21(3), 448–454 (2014)
Google Scholar
Doan, S., Xu, H.: Recognizing medication related entities in hospital discharge summaries using support vector machine. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 259–266 (2010)
Google Scholar
Fang, X., Bai, C., Wang, X.: Bioinformatics insights into acute lung injury/acute respiratory distress syndrome. Clinical and Translational Medicine 1(1), 9 (2012)
Article Google Scholar
Gaizauskas, R., Demetriou, G., Humphreys, K.: Term recognition and classification in biological science journal articles. In: Proc. of the Computional Terminology for Medical and Biological Applications Workshop of the 2nd International Conference on NLP (2000)
Google Scholar
Hara, A., Ichimura, T., Yoshida, K.: Discovering multiple diagnostic rules from coronary heart disease database using automatically defined groups. Journal of Intelligent Manufacturing 16(6), 645–661 (2005)
Article Google Scholar
Huang, C.C., Lu, Z.: Community challenges in biomedical text mining over 10 years: success, failure and the future. Briefings in Bioinformatics, bbv024 (2015)
Google Scholar
Humphreys, K., Demetriou, G., Gaizauskas, R.: Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. In: Proceedings of the Pacific Symposium on Biocomputing (PSB-2000), pp. 505–516, January 2000
Google Scholar
Johri, N., Niwa, Y., Chikka, V.R.: Optimizing apache ctakes for disease/disorder template filling: team HITACHI in the share/clef 2014 ehealth evaluation lab. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15–18, 2014. CEUR Workshop Proceedings, vol. 1180, pp. 111–123. CEUR-WS.org (2014)
Google Scholar
Kelly, L., et al.: Overview of the ShARe/CLEF ehealth evaluation lab 2014. In: Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 172–191. Springer, Heidelberg (2014)
Google Scholar
Lucene, A.: Apache lucene. http://lucene.apache.org/core/
Mittal, P., Gill, N.S.: Article: Study and analysis of predictive data mining approaches for clinical dataset. International Journal of Computer Applications 63(3), 35–39 (2013)
Article Google Scholar
Mowery, D.L., Velupillai, S., South, B.R., Christensen, L., Martinez, D., Kelly, L., Goeuriot, L., Elhadad, N., Pradhan, S., Savova, G., et al.: Task 2: share/clef ehealth evaluation lab 2014. In: Proceedings of CLEF 2014 (2013)
Google Scholar
Ogren, P.V., Wetzler, P.G., Bethard, S.J.: Cleartk: a framework for statistical natural language processing. Unstructured Information Management Architecture Workshop at the Conference of the German Society for Computational Linguistics and Language Technology, 9 (2009)
Google Scholar
Raja, K., Subramani, S., Natarajan, J.: Template filling, text mining. In: Dubitzky, W., Wolkenhauer, O., Cho, K.H., Yokota, H. (eds.) Encyclopedia of Systems Biology, pp. 2150–2154. Springer, New York (2013)
Google Scholar
Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 17(5), 507–513 (2010)
Article Google Scholar
Sun, W., Rumshisky, A., Uzuner, O.: Temporal reasoning over clinical text: the state of the art. Journal of Biomedical Informatics 20(5), 814–823 (2013)
Google Scholar
Wang, X., Hripcsak, G., Friedman, C.: Characterizing environmental and phenotypic associations using information theory and electronic health records. BMC Bioinformatics 10(Suppl. 9), S13 (2009)
Article Google Scholar
Yoo, I., Alafaireet, P., Marinov, M., Pena-Hernandez, K., Gopidi, R., Chang, J.F., Hua, L.: Data mining in healthcare and biomedicine: A survey of the literature. Journal of Medical Systems 36(4), 2431–2448 (2012)
Article Google Scholar
Yoo, I., Song, M.: Biomedical ontologies and text mining for biomedicine and healthcare: A survey. Journal of Computing Science and Engineering, 109–136, June 2008
Google Scholar

Download references

Author information

Authors and Affiliations

International Institute of Information Technology, Hyderabad, India
Veera Raghavendra Chikka & Kamalakar Karlapalem
Research and Development Centre, Hitachi India Pvt Ltd, Bangalore, India
Nestor Mariyasagayam
Central Research Laboratory, Hitachi, Ltd., Kokubunji, Japan
Yoshiki Niwa

Authors

Veera Raghavendra Chikka
View author publications
You can also search for this author in PubMed Google Scholar
Nestor Mariyasagayam
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiki Niwa
View author publications
You can also search for this author in PubMed Google Scholar
Kamalakar Karlapalem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Veera Raghavendra Chikka .

Editor information

Editors and Affiliations

Institut de Recherche en Informatique de Toulouse, Toulouse , France
Josanne Mothe
Department of Computer Science, University of Neuchatel, Neuchâtel, Switzerland
Jacques Savoy
Faculteit der Geesteswetenschappen, Universiteit Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Institut de Recherche en Informatique de Toulouse, Toulouse, France
Karen Pinel-Sauvagnat
School of Computing, Dublin City University, Dublin, Ireland
Gareth Jones
LIA - CERI, Université d'Avignon et des Pays de Vaucluse, Avignon, France
Eric San Juan
Department of Information Engineering, University of Padua, Padua, Italy
Linda Capellato
of Information Engineering (DEI), University of Padua, Department, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chikka, V.R., Mariyasagayam, N., Niwa, Y., Karlapalem, K. (2015). Information Extraction from Clinical Documents: Towards Disease/Disorder Template Filling. In: Mothe, J., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2015. Lecture Notes in Computer Science(), vol 9283. Springer, Cham. https://doi.org/10.1007/978-3-319-24027-5_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-24027-5_41
Published: 20 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24026-8
Online ISBN: 978-3-319-24027-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics