Ontology-Guided Data Augmentation for Medical Document Classification

Abdollahi, Mahdi; Gao, Xiaoying; Mei, Yi; Ghosh, Shameek; Li, Jinyan

doi:10.1007/978-3-030-59137-3_8

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12299))

Included in the following conference series:

International Conference on Artificial Intelligence in Medicine

1989 Accesses
1 Citations

Abstract

Extracting meaningful features from unstructured text is one of the most challenging tasks in medical document classification. The various domain specific expressions and synonyms in the clinical discharge notes make it more challenging to analyse them. The case becomes worse for short texts such as abstract documents. These challenges can lead to poor classification accuracy. As the medical input data is often not enough in the real world, in this work a novel ontology-guided method is proposed for data augmentation to enrich input data. Then, three different deep learning methods are employed to analyse the performance of the suggested approach for classification. The experimental results show that the suggested approach achieved substantial improvement in the targeted medical documents classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sánchez, D., Batet, M., Viejo, A.: Utility-preserving privacy protection of textual healthcare documents. J. Biomed. Inform. 52, 189–198 (2014)
Article Google Scholar
Rosario, R.R.: A data augmentation approach to short text classification. Ph.D. thesis, UCLA (2017)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Google Scholar
Quijas, J.K.: Analysing the effects of data augmentation and free parameters for text classification with recurrent convolutional neural networks. The University of Texas at El Paso (2017)
Google Scholar
Kobayashi, S.: Contextual augmentation: data augmentation by words with paradigmatic relations. arXiv preprint arXiv:1805.06201 (2018)
Coulombe, C.: Text data augmentation made simple by leveraging NLP cloud APIs. arXiv preprint arXiv:1812.04718 (2018)
Jungiewicz, M., Smywiński-Pohl, A.: Towards textual data augmentation for neural networks: synonyms and maximum loss. Comput. Sci. 20 (2019)
Google Scholar
Shah, F.P., Patel, V.: A review on feature selection and feature extraction for text classification. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 2264–2268 (2016)
Google Scholar
Dollah, R.B., Aono, M.: Ontology based approach for classifying biomedical text abstracts. Int. J. Data Eng. 2, 1–15 (2011)
Google Scholar
Buchan, K., Filannino, M., Uzuner, Ö.: Automatic prediction of coronary artery disease from clinical narratives. J. Biomed. Inform. 72, 23–32 (2017)
Article Google Scholar
Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J.: Uncovering discriminative knowledge-guided medical concepts for classifying coronary artery disease notes. In: Mitrovic, T., Xue, B., Li, X. (eds.) AI 2018. LNCS (LNAI), vol. 11320, pp. 104–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03991-2_11
Chapter Google Scholar
Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J.: An ontology-based two-stage approach to medical text classification with feature selection by particle swarm optimisation. In: 2019 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8 (2019)
Google Scholar
Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J.: Stratifying risk of coronary artery disease using discriminative knowledge-guided medical concept pairings from clinical notes. In: Nayak, A.C., Sharma, A. (eds.) PRICAI 2019. LNCS (LNAI), vol. 11672, pp. 457–473. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29894-4_37
Chapter Google Scholar
Shivade, C., Malewadkar, P., Fosler-Lussier, E., Lai, A.M.: Comparison of UMLS terminologies to identify risk of heart disease using clinical notes. J. Biomed. Inform. 58, S103–S110 (2015)
Article Google Scholar
Aronson, A.R., Lang, F.-M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17, 229–236 (2010)
Article Google Scholar
Gao, S., et al.: Hierarchical attention networks for information extraction from cancer pathology reports. J. Am. Med. Inform. Assoc. 25, 321–330 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Victoria University of Wellington, Wellington, New Zealand
Mahdi Abdollahi, Xiaoying Gao & Yi Mei
Medius Health, Sydney, Australia
Shameek Ghosh
University of Technology Sydney, Sydney, Australia
Jinyan Li

Authors

Mahdi Abdollahi
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoying Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yi Mei
View author publications
You can also search for this author in PubMed Google Scholar
Shameek Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Jinyan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahdi Abdollahi .

Editor information

Editors and Affiliations

School of Nursing, University of Minnesota, Minneapolis, MN, USA
Martin Michalowski
Ben-Gurion University of the Negev, Tonawanda, NY, USA
Robert Moskovitch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J. (2020). Ontology-Guided Data Augmentation for Medical Document Classification. In: Michalowski, M., Moskovitch, R. (eds) Artificial Intelligence in Medicine. AIME 2020. Lecture Notes in Computer Science(), vol 12299. Springer, Cham. https://doi.org/10.1007/978-3-030-59137-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-59137-3_8
Published: 26 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59136-6
Online ISBN: 978-3-030-59137-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics