Abstract
Extracting meaningful features from unstructured text is one of the most challenging tasks in medical document classification. The various domain specific expressions and synonyms in the clinical discharge notes make it more challenging to analyse them. The case becomes worse for short texts such as abstract documents. These challenges can lead to poor classification accuracy. As the medical input data is often not enough in the real world, in this work a novel ontology-guided method is proposed for data augmentation to enrich input data. Then, three different deep learning methods are employed to analyse the performance of the suggested approach for classification. The experimental results show that the suggested approach achieved substantial improvement in the targeted medical documents classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sánchez, D., Batet, M., Viejo, A.: Utility-preserving privacy protection of textual healthcare documents. J. Biomed. Inform. 52, 189–198 (2014)
Rosario, R.R.: A data augmentation approach to short text classification. Ph.D. thesis, UCLA (2017)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Quijas, J.K.: Analysing the effects of data augmentation and free parameters for text classification with recurrent convolutional neural networks. The University of Texas at El Paso (2017)
Kobayashi, S.: Contextual augmentation: data augmentation by words with paradigmatic relations. arXiv preprint arXiv:1805.06201 (2018)
Coulombe, C.: Text data augmentation made simple by leveraging NLP cloud APIs. arXiv preprint arXiv:1812.04718 (2018)
Jungiewicz, M., Smywiński-Pohl, A.: Towards textual data augmentation for neural networks: synonyms and maximum loss. Comput. Sci. 20 (2019)
Shah, F.P., Patel, V.: A review on feature selection and feature extraction for text classification. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 2264–2268 (2016)
Dollah, R.B., Aono, M.: Ontology based approach for classifying biomedical text abstracts. Int. J. Data Eng. 2, 1–15 (2011)
Buchan, K., Filannino, M., Uzuner, Ö.: Automatic prediction of coronary artery disease from clinical narratives. J. Biomed. Inform. 72, 23–32 (2017)
Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J.: Uncovering discriminative knowledge-guided medical concepts for classifying coronary artery disease notes. In: Mitrovic, T., Xue, B., Li, X. (eds.) AI 2018. LNCS (LNAI), vol. 11320, pp. 104–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03991-2_11
Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J.: An ontology-based two-stage approach to medical text classification with feature selection by particle swarm optimisation. In: 2019 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8 (2019)
Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J.: Stratifying risk of coronary artery disease using discriminative knowledge-guided medical concept pairings from clinical notes. In: Nayak, A.C., Sharma, A. (eds.) PRICAI 2019. LNCS (LNAI), vol. 11672, pp. 457–473. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29894-4_37
Shivade, C., Malewadkar, P., Fosler-Lussier, E., Lai, A.M.: Comparison of UMLS terminologies to identify risk of heart disease using clinical notes. J. Biomed. Inform. 58, S103–S110 (2015)
Aronson, A.R., Lang, F.-M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17, 229–236 (2010)
Gao, S., et al.: Hierarchical attention networks for information extraction from cancer pathology reports. J. Am. Med. Inform. Assoc. 25, 321–330 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J. (2020). Ontology-Guided Data Augmentation for Medical Document Classification. In: Michalowski, M., Moskovitch, R. (eds) Artificial Intelligence in Medicine. AIME 2020. Lecture Notes in Computer Science(), vol 12299. Springer, Cham. https://doi.org/10.1007/978-3-030-59137-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-59137-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59136-6
Online ISBN: 978-3-030-59137-3
eBook Packages: Computer ScienceComputer Science (R0)