CASCADENET: An LSTM Based Deep Learning Model for Automated ICD-10 Coding

Azam, Sheikh Shams; Raju, Manoj; Pagidimarri, Venkatesh; Kasivajjala, Vamsi Chandra

doi:10.1007/978-3-030-12385-7_6

Sheikh Shams Azam⁴,
Manoj Raju⁴,
Venkatesh Pagidimarri⁴ &
…
Vamsi Chandra Kasivajjala⁵

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 70))

Included in the following conference series:

Future of Information and Communication Conference

1674 Accesses
6 Citations

Abstract

In this paper, a cascading hierarchical architecture using LSTM is proposed for automatic mapping of ICD-10 codes from clinical documents. The fact that it becomes increasingly difficult to train a robust classifier as the number of classes (over 93k ICD-10 codes) grows, coupled with other challenges such as the variance in length, structure and context of the text data, and the lack of training data, puts this task among some of the hardest tasks of Machine Learning (ML) and Natural Language Processing (NLP). This work evaluates the performance of various methods on this task, which include basic techniques such as TF-IDF, inverted indexing using concept aggregation based on exhaustive Unified Medical Language System (UMLS) knowledge sources, as well as advanced methods such as SVM trained on a bag-of-words model, CNN and LSTM trained on distributed word embeddings. The effect of breaking down the problem into a hierarchy is also explored. Data used is an aggregate of ICD-10 long descriptions along with anonymised annotated training data provided by few of the private hospitals from India. A study of the above-mentioned techniques leads to the observation that hierarchical LSTM network outperforms other methods in terms of accuracy as well as micro and macro-averaged precision and recall scores on the held out data (or test data).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

World Health Organization: International statistical classification of diseases and related health problems, vol. 1. World Health Organization (2004)
Google Scholar
Beebe, M., et al.: Current Procedural Terminology: CPT. American Medical Association (2007)
Google Scholar
McDonald, C., et al.: Logical observation identifiers names and codes (LOINC) users’ guide. Regenstrief Institute, Indianapolis (2004)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM coding systems. BMC Bioinform. 93 BioMed Central (2008)
Google Scholar
Boutell, M.R., et al.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)
Article Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multilabel classification. IEEE Trans. Knowl. Data Eng. 23(7), 1079–1089 (2011)
Article Google Scholar
Pereira, S., et al.: Construction of a semi-automated ICD-10 coding help system to optimize medical and economic coding. MIE (2006)
Google Scholar
Lipscomb, C.E.: Medical subject headings (MeSH). Bull. Med. Libr. Assoc. 88(3), 265 (2000)
Google Scholar
Schuyler, P.L., et al.: The UMLS Metathesaurus: representing different views of biomedical concepts. Bull. Med. Libr. Assoc. 81(2), 217 (1993)
Google Scholar
Lita, L.V., et al.: Large scale diagnostic code classification for medical patient records. In: Proceedings of the Third International Joint Conference on Natural Language Processing, vol. II (2008)
Google Scholar
Hearst, M.A., et al.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
Article Google Scholar
Hoerl, Arthur E., Kennard, Robert W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Article Google Scholar
Mikolov, T., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Read, J., et al.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333 (2011)
Article MathSciNet Google Scholar
International Classification Of Diseases, 10th Revision (ICD-10). World Health Organization. N.p., 2018. Web. 9 July 2018
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Azam, S.S., et al.: Q-Map: clinical concept mining with phrase sense disambiguation. arXiv preprint arXiv:1804.11149 (2018)
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Article MathSciNet Google Scholar
Chapman, W.W., et al.: A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34(5), 301–310 (2001)
Article Google Scholar
Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Google Scholar
Gagniuc, P.A.: Markov Chains: From Theory to Implementation and Experimentation. Wiley, New York (2017)
Book Google Scholar
National Center for Health Statistics. Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 11 June 2018 www.cdc.gov/nchs/icd/icd10cm.htm
Biomedical Natural Language Processing. Bio.nlplab.org, bio.nlplab.org/
Google Scholar
Home - PubMed - NCBI. , U.S. National Library of Medicine www.ncbi.nlm.nih.gov/pubmed
Home - PMC - NCBI., U.S. National Library of Medicine www.ncbi.nlm.nih.gov/pmc/
Main Page. Wikipedia, Wikimedia Foundation, 8 July 2018 www.en.wikipedia.org/wiki/Main_Page
shams-sam. Shams-Sam/Logic-Lab. GitHub, www.github.com/shams-sam/logic-lab/blob/master/TextPreprocessing/text_preprocessing.py. Accessed 9 July 2018
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium. American Medical Informatics Association (2001)
Google Scholar
Srivastava, N., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Johnson, A.E.W., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)
Article Google Scholar
Schafer, J.B., et al.: Collaborative filtering recommender systems, pp. 291–324. The adaptive web. Springer, Berlin (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Foundation Inc., Marina Del Rey, San Francisco, CA, 90292, USA
Sheikh Shams Azam, Manoj Raju & Venkatesh Pagidimarri
Healthcare Information and Management Systems Society, Bengaluru, India
Vamsi Chandra Kasivajjala

Authors

Sheikh Shams Azam
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Raju
View author publications
You can also search for this author in PubMed Google Scholar
Venkatesh Pagidimarri
View author publications
You can also search for this author in PubMed Google Scholar
Vamsi Chandra Kasivajjala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheikh Shams Azam .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Azam, S.S., Raju, M., Pagidimarri, V., Kasivajjala, V.C. (2020). CASCADENET: An LSTM Based Deep Learning Model for Automated ICD-10 Coding. In: Arai, K., Bhatia, R. (eds) Advances in Information and Communication. FICC 2019. Lecture Notes in Networks and Systems, vol 70. Springer, Cham. https://doi.org/10.1007/978-3-030-12385-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-12385-7_6
Published: 02 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12384-0
Online ISBN: 978-3-030-12385-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics