Skip to main content

Hierarchical Text Classification of Autopsy Reports to Determine MoD and CoD Through Term-Based and Concepts-Based Features

  • Conference paper
  • First Online:
Advances in Data Mining. Applications and Theoretical Aspects (ICDM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10357))

Included in the following conference series:

Abstract

Nowadays, text classification has been extensively employed in medical domain to classify free text clinical reports. In this study, text classification techniques have been used to determine cause of death from free text forensic autopsy reports using proposed term-based and SNOMED CT concept-based features. In this study, detailed term-based features and concept-based features were extracted from a set of 1500 forensic autopsy reports belonging to four manners of death and 16 different causes of death. These features were used to train text classifier. The classifier was deployed in cascade architecture: the first level will predict the manner of death and the second level will predict the CoD using proposed term-based and SNOMED CT concept-based features. Moreover, to show the significance of our proposed approach, we compared the results of our proposed approach with four state-of-the-art feature extraction approaches. Finally, we also presented the comparison of one-level classification versus two-level classification. The experimental results showed that our proposed approach showed 8% improvement in accuracy as compared to other four baselines. Moreover, two-level classification showed improved accuracy in determining CoD compared to one-level classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)

    Article  MATH  Google Scholar 

  2. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34, 1–47 (2002)

    Article  Google Scholar 

  3. Lewis, D.D.: Feature selection and feature extraction for text categorization. In: Proceedings of the Workshop on Speech and Natural Language, pp. 212–217(1992)

    Google Scholar 

  4. Markov, A., Last, M., Kandel, A.: The hybrid representation model for web document classification. International Journal of Intelligent Systems 23, 654–679 (2008)

    Article  MATH  Google Scholar 

  5. Al-garadi, M.A., Varathan, K.D., Ravana, S.D.: Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network. Computers in Human Behavior 63, 433–443 (2016)

    Article  Google Scholar 

  6. Mujtaba, G., Shuib, L., Raj, R. G., Rajandram, R., Shaikh, K.: Automatic Text Classification of ICD-10 Related CoD from Complex and Free Text Forensic Autopsy Reports. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1055–1058

    Google Scholar 

  7. Mujtaba, G., Shuib, L., Raj, R.G., Rajandram, R., Shaikh, K., Al-Garadi, M.A.: Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection. PloS one 12, e0170242 (2017)

    Article  Google Scholar 

  8. James, S. H., Nordby, J. J., Bell, S.:Forensic science: an introduction to scientific and investigative techniques. CRC press (2002)

    Google Scholar 

  9. Yeow, W.L., Mahmud, R., Raj, R.G.: An application of case-based reasoning with machine learning for forensic autopsy. Expert Systems with Applications 41, 3497–3505 (2014)

    Article  Google Scholar 

  10. Koopman, B., Zuccon, G., Nguyen, A., Bergheim, A., Grayson, N.: Automatic ICD-10 classification of cancers from free-text death certificates. International Journal of Medical Informatics 84, 956–965 (2015)

    Article  Google Scholar 

  11. Dias, R., Salvini, R., Nierenberg, A., Lafer, B.: Machine learning approach with baseline clinical data forecasting depression relapse in bipolar disorder. Bipolar Disorders 18, 103–103 (2016)

    Google Scholar 

  12. Farooq, K., Hussain, A.: A novel ontology and machine learning driven hybrid cardiovascular clinical prognosis as a complex adaptive clinical system. Complex Adaptive Systems Modeling 4, 21 (2016)

    Article  Google Scholar 

  13. Galli, M., Zoppis, I., Smith, A., Magni, F., Mauri, G.: Machine learning approaches in MALDI-MSI: clinical applications. Expert Review of Proteomics 13, 685–696 (2016)

    Article  Google Scholar 

  14. Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954)

    Article  Google Scholar 

  15. Passalis, N., Tefas, A.: Entropy optimized feature-based bag-of-words representation for information retrieval. IEEE Transactions on Knowledge and Data Engineering 28, 1664–1677 (2016)

    Article  Google Scholar 

  16. Le, Q.V., Mikolov, T.: Distributed Representations of Sentences and Documents. In: ICML, pp. 1188–1196 (2014)

    Google Scholar 

  17. Enríquez, F., Troyano, J.A., López-Solaz, T.: An approach to the use of word embeddings in an opinion classification task. Expert Systems with Applications 66, 1–6 (2016)

    Article  Google Scholar 

  18. Jouhet, V., Defossez, G., Burgun, A., Le Beux, P., Levillain, P., Ingrand, P., et al.: Automated classification of free-text pathology reports for registration of incident cases of cancer. Methods of Information in Medicine 51, 242 (2012)

    Article  Google Scholar 

  19. Danso, S., Atwell, E., Johnson, O.: Linguistic and statistically derived features for cause of death prediction from verbal autopsy text. In: Gurevych, I., Biemann, C., Zesch, T. (eds.) GSCL 2013. LNCS, vol. 8105, pp. 47–60. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40722-2_5

    Chapter  Google Scholar 

  20. Danso, S., Atwell, E., Johnson, O.: A comparative study of machine learning methods for verbal autopsy text classification (2014). arXiv preprint arXiv:1402.4380

  21. Siddiqui, M.F., Reza, A.W., Kanesan, J.: An automated and intelligent medical decision support system for brain MRI scans classification. PloS One 10, e0135875 (2015)

    Article  Google Scholar 

  22. Al-garadi, M.A., Khan, M.S., Varathan, K.D., Mujtaba, G., Al-Kabsi, A.M.: Using online social networks to track a pandemic: A systematic review. Journal of Biomedical Informatics 62, 1–11 (2016)

    Article  Google Scholar 

  23. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, pp. 1137–1145 (1995)

    Google Scholar 

  24. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Information Processing & Management 45, 427–437 (2009)

    Article  Google Scholar 

  25. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). doi:10.1007/BFb0026683

    Chapter  Google Scholar 

  26. Xu, B., Guo, X., Ye, Y., Cheng, J.: An Improved Random Forest Classifier for Text Categorization. JCP 7, 2913–2920 (2012)

    Google Scholar 

  27. Dreiseitl, S., Ohno-Machado, L., Kittler, H., Vinterbo, S., Billhardt, H., Binder, M.: A comparison of machine learning methods for the diagnosis of pigmented skin lesions. Journal of Biomedical Informatics 34, 28–36 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ram Gopal Raj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Mujtaba, G., Shuib, L., Raj, R.G., Al-Garadi, M.A., Rajandram, R., Shaikh, K. (2017). Hierarchical Text Classification of Autopsy Reports to Determine MoD and CoD Through Term-Based and Concepts-Based Features. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2017. Lecture Notes in Computer Science(), vol 10357. Springer, Cham. https://doi.org/10.1007/978-3-319-62701-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62701-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62700-7

  • Online ISBN: 978-3-319-62701-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics