Discharge Summaries Classifier

  • Shusaku TsumotoEmail author
  • Tomohiro Kimura
  • Haruko Iwata
  • Shoji Hirano
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 82)


This paper proposes a method for construction of classifiers for discharge summaries. First, morphological analysis is applied to a set of summaries and a term matrix is generated. Second, correspond analysis is applied to the classification labels and the term matrix and generates two dimensional coordinates. By measuring the distance between categories and the assigned points, ranking of key words will be generated. Then, keywords are selected as attributes according to the rank, and training example for classifiers will be generated. Finally learning methods are applied to the training examples. Experimental validation shows that random forest achieved the best performance and the second best was the deep learner with a small difference, but decision tree methods with many keywords performed only a little worse than neural network or deep learning methods.


Discharge summaries Classifier Deep learning Random forest Decision Tree SVM 


  1. 1.
    Aramaki, E., Miura, Y., Sotoike, M., Ohkuma, T., Masuichi, H., Ohe, K.: Construction of system on visualizing discharge summaries. In: 15h Annual Conference on Association of Natural Language Processing Society, pp. 348–351 (2009). (in Japanese)Google Scholar
  2. 2.
    Drees, M.: Implementierung und Analyse von tiefen Architekturen in R. Master’s thesis, Fachhochschule Dortmund (2013)Google Scholar
  3. 3.
  4. 4.
    Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab - an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004).
  5. 5.
    Kim, J.H.: Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 53(11), 3735–3745 (2009). MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002). Google Scholar
  7. 7.
    MeSH: Medical subject headings 2017, U.S. national library of medicine.
  8. 8.
    Miura, Y., Aramaki, E., Ohkuma, T., Sotoike, M., Sugihara, D., Masuichi, H., Ohe, K.: Automated extraction of side effects from electronic patient records (in Japanese). In: 16th Annual Conference on Association of Natural Language Processing Society, pp. 78–81 (2010)Google Scholar
  9. 9.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)CrossRefGoogle Scholar
  10. 10.
    Srinivasan, P.: Meshmap: a text mining tool for medline. In: Proceedings of AMIA Symposium, pp. 642–646 (2001)Google Scholar
  11. 11.
    Suzuki, T., Doi, S., Shimanda, G., Takasaki, M., Tamura, T., Fujita, S., Takabayashi, K.: Auto-selection of DRG codes from discharge summaries by text mining in several hospitals: analysis of difference of discharge summaries. Stud. Health Technol. Inform. 160(Pt 2), 1020–1024 (2010)Google Scholar
  12. 12.
    Therneau, T.M., Atkinson, E.J.: An introduction to recursive partitioning using the RPART routines (2015).
  13. 13.
    Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). ISBN: 0-387-95457-0. CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Shusaku Tsumoto
    • 1
    Email author
  • Tomohiro Kimura
    • 2
  • Haruko Iwata
    • 3
  • Shoji Hirano
    • 1
  1. 1.Department of Medical Informatics, Faculty of MedicineShimane UniversityMatsueJapan
  2. 2.General Coordination Division, Faculty of MedicineShimane UniversityMatsueJapan
  3. 3.Center for Bed-ControlShimane University HospitalIzumoJapan

Personalised recommendations