Skip to main content

Ontology-Guided Data Augmentation for Medical Document Classification

  • Conference paper
  • First Online:
Artificial Intelligence in Medicine (AIME 2020)

Abstract

Extracting meaningful features from unstructured text is one of the most challenging tasks in medical document classification. The various domain specific expressions and synonyms in the clinical discharge notes make it more challenging to analyse them. The case becomes worse for short texts such as abstract documents. These challenges can lead to poor classification accuracy. As the medical input data is often not enough in the real world, in this work a novel ontology-guided method is proposed for data augmentation to enrich input data. Then, three different deep learning methods are employed to analyse the performance of the suggested approach for classification. The experimental results show that the suggested approach achieved substantial improvement in the targeted medical documents classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sánchez, D., Batet, M., Viejo, A.: Utility-preserving privacy protection of textual healthcare documents. J. Biomed. Inform. 52, 189–198 (2014)

    Article  Google Scholar 

  2. Rosario, R.R.: A data augmentation approach to short text classification. Ph.D. thesis, UCLA (2017)

    Google Scholar 

  3. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)

    Google Scholar 

  4. Quijas, J.K.: Analysing the effects of data augmentation and free parameters for text classification with recurrent convolutional neural networks. The University of Texas at El Paso (2017)

    Google Scholar 

  5. Kobayashi, S.: Contextual augmentation: data augmentation by words with paradigmatic relations. arXiv preprint arXiv:1805.06201 (2018)

  6. Coulombe, C.: Text data augmentation made simple by leveraging NLP cloud APIs. arXiv preprint arXiv:1812.04718 (2018)

  7. Jungiewicz, M., Smywiński-Pohl, A.: Towards textual data augmentation for neural networks: synonyms and maximum loss. Comput. Sci. 20 (2019)

    Google Scholar 

  8. Shah, F.P., Patel, V.: A review on feature selection and feature extraction for text classification. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 2264–2268 (2016)

    Google Scholar 

  9. Dollah, R.B., Aono, M.: Ontology based approach for classifying biomedical text abstracts. Int. J. Data Eng. 2, 1–15 (2011)

    Google Scholar 

  10. Buchan, K., Filannino, M., Uzuner, Ö.: Automatic prediction of coronary artery disease from clinical narratives. J. Biomed. Inform. 72, 23–32 (2017)

    Article  Google Scholar 

  11. Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J.: Uncovering discriminative knowledge-guided medical concepts for classifying coronary artery disease notes. In: Mitrovic, T., Xue, B., Li, X. (eds.) AI 2018. LNCS (LNAI), vol. 11320, pp. 104–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03991-2_11

    Chapter  Google Scholar 

  12. Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J.: An ontology-based two-stage approach to medical text classification with feature selection by particle swarm optimisation. In: 2019 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8 (2019)

    Google Scholar 

  13. Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J.: Stratifying risk of coronary artery disease using discriminative knowledge-guided medical concept pairings from clinical notes. In: Nayak, A.C., Sharma, A. (eds.) PRICAI 2019. LNCS (LNAI), vol. 11672, pp. 457–473. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29894-4_37

    Chapter  Google Scholar 

  14. Shivade, C., Malewadkar, P., Fosler-Lussier, E., Lai, A.M.: Comparison of UMLS terminologies to identify risk of heart disease using clinical notes. J. Biomed. Inform. 58, S103–S110 (2015)

    Article  Google Scholar 

  15. Aronson, A.R., Lang, F.-M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17, 229–236 (2010)

    Article  Google Scholar 

  16. Gao, S., et al.: Hierarchical attention networks for information extraction from cancer pathology reports. J. Am. Med. Inform. Assoc. 25, 321–330 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahdi Abdollahi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abdollahi, M., Gao, X., Mei, Y., Ghosh, S., Li, J. (2020). Ontology-Guided Data Augmentation for Medical Document Classification. In: Michalowski, M., Moskovitch, R. (eds) Artificial Intelligence in Medicine. AIME 2020. Lecture Notes in Computer Science(), vol 12299. Springer, Cham. https://doi.org/10.1007/978-3-030-59137-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59137-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59136-6

  • Online ISBN: 978-3-030-59137-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics