Abstract
Current deep learning based disease diagnosis systems usually fall short in catastrophic forgetting, i.e., directly fine-tuning the disease diagnosis model on new tasks usually leads to abrupt decay of performance on previous tasks. What is worse, the trained diagnosis system would be fixed once deployed but collecting training data that covers enough diseases is infeasible, which inspires us to develop a lifelong learning diagnosis system. In this work, we propose to adopt attention to combine medical entities and context, embedding episodic memory and consolidation to retain knowledge, such that the learned model is capable of adapting to sequential disease-diagnosis tasks. Moreover, we establish a new benchmark, named Jarvis-40, which contains clinical notes collected from various hospitals. Experiments show that the proposed method can achieve state-of-the-art performance on the proposed benchmark. Code is available at https://github.com/yifyang/LifelongLearningDiseaseDiagnosis.
Y. Yang—Equal contribution
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In Chinese, subarachnoid homorrhage is “ ” , which can be split into “ ” , “ ” , “ ” , “ ” by an NER model considering general semantics.
- 2.
For the sake of privacy, we are only permitted by hospitals to release Jarvis-40\(_{\text {small}}\). All the data released has been manually desensitized.
- 3.
Different diseases can be classified in various ways (e.g., specialties and severity). Therefore, it is natural to split the whole set into disjoint subsets (i.e., tasks).
References
Cao, S., Qian, B., Yin, C., Li, X., Wei, J., Zheng, Q., Davidson, I.: Knowledge guided short-text classification for healthcare applications. In: IEEE International Conference on Data Mining, pp. 31–40. IEEE (2017)
Chaudhry, A., Ranzato, M., Rohrbach, M., Elhoseiny, M.: Efficient lifelong learning with a-gem. In: International Conference on Learning Representations (2018)
Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor AI: predicting clinical events via recurrent neural networks. In: Machine Learning for Healthcare Conference, pp. 301–318 (2016)
Choi, E., Bahadori, M.T., Sun, J., Kulas, J., Schuetz, A., Stewart, W.: RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In: Advances in Neural Information Processing Systems, pp. 3504–3512 (2016)
Hao, Y., et al.: An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In: Annual Meeting of the Association for Computational Linguistics, pp. 221–231 (2017)
Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Lee, C.S., Lee, A.Y.: Clinical applications of continual learning machine learning. Lancet Digit Health 2(6), e279–e281 (2020)
Li, Z., Zhong, C., Wang, R., Zheng, W.-S.: Continual learning of new diseases with dual distillation and ensemble strategy. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 169–178. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_17
Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. In: Advances in Neural Information Processing Systems, pp. 6467–6476 (2017)
Ma, F., Chitta, R., Zhou, J., You, Q., Sun, T., Gao, J.: Dipole: diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1903–1911 (2017)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)
de Masson d’Autume, C., Ruder, S., Kong, L., Yogatama, D.: Episodic memory in lifelong language learning. In: Advances in Neural Information Processing Systems, pp. 13143–13152 (2019)
McClelland, J.L., McNaughton, B.L., O’Reilly, R.C.: Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102(3), 419–457 (1995)
Robins, A.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connect. Sci. 7(2), 123–146 (1995)
Silver, D.L., Mercer, R.E.: The task rehearsal method of life-long learning: overcoming impoverished data. In: Cohen, R., Spencer, B. (eds.) AI 2002. LNCS (LNAI), vol. 2338, pp. 90–101. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47922-8_8
Wang, H., Xiong, W., Yu, M., Guo, X., Chang, S., Wang, W.Y.: Sentence embedding alignment for lifelong relation extraction. In: International Conference of the North American Chapter of the Association for Computational Linguistics, pp. 796–806 (2019)
Wang, Z., et al.: Online disease self-diagnosis with inductive heterogeneous graph convolutional networks. In: Proceedings of The Web Conference 2021 (2021)
Zhang, X., et al.: Learning robust patient representations from multi-modal electronic health records: a supervised deep learning approach. In: Proceedings of the SIAM International Conference on Data Mining (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Z., Yang, Y., Wen, R., Chen, X., Huang, SL., Zheng, Y. (2021). Lifelong Learning Based Disease Diagnosis on Clinical Notes. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-75762-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75761-8
Online ISBN: 978-3-030-75762-5
eBook Packages: Computer ScienceComputer Science (R0)