CASCADENET: An LSTM Based Deep Learning Model for Automated ICD-10 Coding

  • Sheikh Shams AzamEmail author
  • Manoj Raju
  • Venkatesh Pagidimarri
  • Vamsi Chandra Kasivajjala
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 70)


In this paper, a cascading hierarchical architecture using LSTM is proposed for automatic mapping of ICD-10 codes from clinical documents. The fact that it becomes increasingly difficult to train a robust classifier as the number of classes (over 93k ICD-10 codes) grows, coupled with other challenges such as the variance in length, structure and context of the text data, and the lack of training data, puts this task among some of the hardest tasks of Machine Learning (ML) and Natural Language Processing (NLP). This work evaluates the performance of various methods on this task, which include basic techniques such as TF-IDF, inverted indexing using concept aggregation based on exhaustive Unified Medical Language System (UMLS) knowledge sources, as well as advanced methods such as SVM trained on a bag-of-words model, CNN and LSTM trained on distributed word embeddings. The effect of breaking down the problem into a hierarchy is also explored. Data used is an aggregate of ICD-10 long descriptions along with anonymised annotated training data provided by few of the private hospitals from India. A study of the above-mentioned techniques leads to the observation that hierarchical LSTM network outperforms other methods in terms of accuracy as well as micro and macro-averaged precision and recall scores on the held out data (or test data).


Deep learning Natural language processing Medical coding 


  1. 1.
    World Health Organization: International statistical classification of diseases and related health problems, vol. 1. World Health Organization (2004)Google Scholar
  2. 2.
    Beebe, M., et al.: Current Procedural Terminology: CPT. American Medical Association (2007)Google Scholar
  3. 3.
    McDonald, C., et al.: Logical observation identifiers names and codes (LOINC) users’ guide. Regenstrief Institute, Indianapolis (2004)Google Scholar
  4. 4.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  5. 5.
    Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM coding systems. BMC Bioinform. 93 BioMed Central (2008)Google Scholar
  6. 6.
    Boutell, M.R., et al.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)CrossRefGoogle Scholar
  7. 7.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multilabel classification. IEEE Trans. Knowl. Data Eng. 23(7), 1079–1089 (2011)CrossRefGoogle Scholar
  8. 8.
    Pereira, S., et al.: Construction of a semi-automated ICD-10 coding help system to optimize medical and economic coding. MIE (2006)Google Scholar
  9. 9.
    Lipscomb, C.E.: Medical subject headings (MeSH). Bull. Med. Libr. Assoc. 88(3), 265 (2000)Google Scholar
  10. 10.
    Schuyler, P.L., et al.: The UMLS Metathesaurus: representing different views of biomedical concepts. Bull. Med. Libr. Assoc. 81(2), 217 (1993)Google Scholar
  11. 11.
    Lita, L.V., et al.: Large scale diagnostic code classification for medical patient records. In: Proceedings of the Third International Joint Conference on Natural Language Processing, vol. II (2008)Google Scholar
  12. 12.
    Hearst, M.A., et al.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)CrossRefGoogle Scholar
  13. 13.
    Hoerl, Arthur E., Kennard, Robert W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)CrossRefGoogle Scholar
  14. 14.
    Mikolov, T., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  15. 15.
    Read, J., et al.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333 (2011)MathSciNetCrossRefGoogle Scholar
  16. 16.
    International Classification Of Diseases, 10th Revision (ICD-10). World Health Organization. N.p., 2018. Web. 9 July 2018Google Scholar
  17. 17.
    Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
  18. 18.
    Azam, S.S., et al.: Q-Map: clinical concept mining with phrase sense disambiguation. arXiv preprint arXiv:1804.11149 (2018)
  19. 19.
    Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Chapman, W.W., et al.: A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34(5), 301–310 (2001)CrossRefGoogle Scholar
  21. 21.
    Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)Google Scholar
  22. 22.
    Gagniuc, P.A.: Markov Chains: From Theory to Implementation and Experimentation. Wiley, New York (2017)CrossRefGoogle Scholar
  23. 23.
    National Center for Health Statistics. Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 11 June 2018
  24. 24.
    Biomedical Natural Language Processing., Scholar
  25. 25.
    Home - PubMed - NCBI. , U.S. National Library of Medicine
  26. 26.
    Home - PMC - NCBI., U.S. National Library of Medicine
  27. 27.
    Main Page. Wikipedia, Wikimedia Foundation, 8 July 2018
  28. 28.
    shams-sam. Shams-Sam/Logic-Lab. GitHub, Accessed 9 July 2018
  29. 29.
    Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium. American Medical Informatics Association (2001)Google Scholar
  30. 30.
    Srivastava, N., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  32. 32.
    Johnson, A.E.W., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)CrossRefGoogle Scholar
  33. 33.
    Schafer, J.B., et al.: Collaborative filtering recommender systems, pp. 291–324. The adaptive web. Springer, Berlin (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Sheikh Shams Azam
    • 1
    Email author
  • Manoj Raju
    • 1
  • Venkatesh Pagidimarri
    • 1
  • Vamsi Chandra Kasivajjala
    • 2
  1. 1.Foundation Inc.San FranciscoUSA
  2. 2.Healthcare Information and Management Systems SocietyBengaluruIndia

Personalised recommendations