Classification of stroke disease using machine learning algorithms

  • Priya Govindarajan
  • Ravichandran Kattur Soundarapandian
  • Amir H. GandomiEmail author
  • Rizwan Patan
  • Premaladha Jayaraman
  • Ramachandran Manikandan
Intelligent Biomedical Data Analysis and Processing


This paper presents a prototype to classify stroke that combines text mining tools and machine learning algorithms. Machine learning can be portrayed as a significant tracker in areas like surveillance, medicine, data management with the aid of suitably trained machine learning algorithms. Data mining techniques applied in this work give an overall review about the tracking of information with respect to semantic as well as syntactic perspectives. The proposed idea is to mine patients’ symptoms from the case sheets and train the system with the acquired data. In the data collection phase, the case sheets of 507 patients were collected from Sugam Multispecialty Hospital, Kumbakonam, Tamil Nadu, India. Next, the case sheets were mined using tagging and maximum entropy methodologies, and the proposed stemmer extracts the common and unique set of attributes to classify the strokes. Then, the processed data were fed into various machine learning algorithms such as artificial neural networks, support vector machine, boosting and bagging and random forests. Among these algorithms, artificial neural networks trained with a stochastic gradient descent algorithm outperformed the other algorithms with a higher classification accuracy of 95% and a smaller standard deviation of 14.69.


Stroke Tagging Maximum entropy Data pre-processing Classification Machine learning 



We are grateful to Dr. Sundarrajan S, Neurologist, Sugam Multispecialty Hospital, for permitting us to access the real-time data of the patients and for his valuable suggestions in classifying the type of strokes. We also thank the management of Sugam Multispecialty Hospital, Kumbakonam, for their assistance in collecting the case sheets. We acknowledge the Department of Science and Technology, India, for providing financial support through INSPIRE fellowship (No. IF120649) to carry out this research work. The second author also thanks Department of Science & Technology for financial aid from grant No.SR/FST/ETI-349/2013.

Compliance with ethical standards

Conflict of interest

There is no conflict of interest among the authors to publish this article.


  1. 1.
    Roger VL, Go AS, Lloyd-Jones DM, Benjamin EJ, Berry JD, Borden WB, Bravata DM, Dai S, Ford ES, Fox CS, Fullerton HJ, Gillespie C, Hailpern SM, Heit JA, Howard VJ, Kissela BM, Kittner SJ, Lackland DT, Lichtman JH, Lisabeth LD, Makuc DM, Marcus GM, Marelli A, Matchar DB, Moy CS, Mozaffarian D, Mussolino ME, Nichol G, Paynter NP, Soliman EZ, Sorlie PD, Sotoodehnia N, Turan TN, Virani SS, Wong ND, Woo D, Turner MB (2012) Executive summary: heart disease and stroke statistics—2012 update: a report. Circulation 125(1):188–197CrossRefGoogle Scholar
  2. 2.
    Pahus SH, Hansen AT, Hvas AM (2016) Thrombophilia testing in young patients with Ischemic stroke. Thromb Res 137:108–112CrossRefGoogle Scholar
  3. 3.
    Dupont SA, Wijdicks EF, Lanzino G, Rabinstein AA (2010) Aneurysmal subarachnoid hemorrhage: an overview for the practicing neurologist. Semin Neurol 30(5):45–54CrossRefGoogle Scholar
  4. 4.
    Santos EMM, Yoo AJ, Beenen LF, Majoie CB, Marquering HA (2016) Observer variability of absolute and relative thrombus density measurements in patients with acute ischemic stroke. Neuroradiology 58(2):133–139CrossRefGoogle Scholar
  5. 5.
    Rebouças ES, Marques RCP, Braga AM, Oliveira SAF, de Albuquerque VHC, Filho PPR (2018) New level set approach based on Parzen estimation for stroke segmentation in skull CT images. Soft Comput. Google Scholar
  6. 6.
    Shinohara Y, Yanagihara T, Abe K, Yoshimine T, Fujinaka T, Chuma T, Ochi F, Nagayama M, Ogawa A, Suzuki N, Katayama Y, Kimura A, Minematsu K (2011) Cerebral infarction/transient ischemic attack (TIA). J Stroke Cerebrovasc Dis 20(4):S71–S73Google Scholar
  7. 7.
    Süt N, Çelik Y (2012) Prediction of mortality in stroke patients using multilayer perceptron neural networks. Turk J Med Sci 42(5):886–893Google Scholar
  8. 8.
    Rajini NH, Bhavani R (2013) Computer aided detection of ischemic stroke using segmentation and texture features. Measurement 46(6):1865–1874CrossRefGoogle Scholar
  9. 9.
    Sundström C (2014) Machine learning algorithms for stroke diagnostics. Master’s thesis in biomedical engineeringGoogle Scholar
  10. 10.
    Amini L, Azarpazhouh R, Farzadfar MT, Mousavi SA, Jazaieri F, Khorvash F, Norouzi R, Toghianfar N (2013) Prediction and control of stroke by data mining. Int J Prev Med 4(2):S245Google Scholar
  11. 11.
    Bentley P, Ganesalingam J, Jones AL, Mahady K, Epton S, Rinne P, Sharma P, Halse O, Mehta A, Rueckert D (2014) Prediction of stroke thrombolysis outcome using CT brain machine learning. NeuroImage Clin 4:635–640CrossRefGoogle Scholar
  12. 12.
    Cheng CA, Lin YC, Chiu HW (2014) Prediction of the prognosis of ischemic stroke patients after intravenous thrombolysis using artificial neural networks. Stud Health Technol Inform 202:115–118Google Scholar
  13. 13.
    Colak C, Karaman E, Turtay MG (2015) Application of knowledge discovery process on the prediction of stroke. Comput Methods Programs Biomed 119(3):181–185CrossRefGoogle Scholar
  14. 14.
    Maier O, Schröder C, Forkert ND, Martinetz T, Handels H (2015) Classifiers for ischemic stroke lesion segmentation: a comparison study. PLoS ONE 10(12):e0145118CrossRefGoogle Scholar
  15. 15.
    Kansadub T, Thammaboosadee S, Kiattisin S, Jalayondeja C (2015) Stroke risk prediction model based on demographic data. In: Biomedical engineering international conference (BMEiCON), pp 1–3Google Scholar
  16. 16.
    Sung SF, Hsieh CY, Yang YH, Lin HJ, Chen CH, Chen YW, Hu YH (2015) Developing a stroke severity index based on administrative data was feasible using data mining techniques. J Clin Epidemiol 68(11):1292–1300CrossRefGoogle Scholar
  17. 17.
    Alotaibi NN, Sasi S (2016) Stroke in-patients’ transfer to the ICU using ensemble based model. In: IEEE international conference on electrical, electronics, and optimization techniques (ICEEOT), pp 2004–2010Google Scholar
  18. 18.
    Adam SY, Yousif A, Bashir MB (2016) Classification of ischemic stroke using machine learning algorithms. Int J Comput Appl 149(10):26–31Google Scholar
  19. 19.
    Radu RA, Terecoasă EO, Băjenaru OA, Tiu C (2017) Etiologic classification of ischemic stroke: where do we stand. Clin Neurol Neurosurg 159:93–106CrossRefGoogle Scholar
  20. 20.
    Chantamit-O-Pas P, Goyal M (2017) Prediction of stroke using deep learning model. In: Liu D., Xie S, Li Y, Zhao D, El-Alfy ES (eds) Neural information processing ICONIP, Lecture notes in computer science 10638Google Scholar
  21. 21.
    Suwanwela NC, Poungvarin N, The Asian Stroke Advisory Panel (2016) Stroke burden and stroke care system in Asia. Neurol India 64:46–51CrossRefGoogle Scholar
  22. 22.
    World Health Organization (2004) Global burden of disease (GBD) 2002 estimates. World health report 2004. WHO, GenevaGoogle Scholar
  23. 23.
    O’Donnell MJ, Xavier D, Liu L, Zhang H, Chin SL, Rao-Melacini P, Rangarajan S, Islam S, Pais P, McQueen MJ, Mondo C, Damasceno A, Lopez-Jaramillo P, Hankey GJ, Dans AL, Yusoff K, Truelsen T, Diener H-C, Sacco RL, Ryglewicz D, Czlonkowska A, Weimar C, Wang X, Yusuf S (2010) Risk factors for ischaemic and intracerebral haemorrhagic stroke in 22 countries (the INTERSTROKE study): a case-control study. Lancet 376:112–123CrossRefGoogle Scholar
  24. 24.
    O’Donnell MJ, Chin SL, Rangarajan S, Xavier D, Liu L, Zhang H, Rao-Melacini P, Zhang X, Pais P, Agapay S, Lopez-Jaramillo P (2016) Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries(INTERSTROKE): a case-control study. Lancet 388(10046):761–775CrossRefGoogle Scholar
  25. 25.
    Tsuruoka Y, Tateisi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii J (2005) Developing a robust part-of-speech tagger for biomedical text. In: Advances in informatics—10th Panhellenic conference on informatics, pp 382–392Google Scholar
  26. 26.
    Kulick S, Bies A, Liberman M, Mandel M, McDonald R, Palmer M, Schein A, Ungar L (2004) Integrated annotation for biomedical information extraction. Linking biological literature, ontologies and databases. In: Proceedings of the HLT/NAACL 2004 workshop: BioLINK, pp 61–68Google Scholar
  27. 27.
    Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of NAACL ‘03, pp 173–180Google Scholar
  28. 28.
    Tateisi Y, Tsujii J (2004) Part-of-speech annotation of biology research abstracts. In: Proceedings of 4th international conference on language resource and evaluation (LREC2004), pp 1267–1270Google Scholar
  29. 29.
    Pollay M (2012) Overview of the CSF dual outflow system. Acta Neurochir Suppl 113:47–50CrossRefGoogle Scholar
  30. 30.
    Fan J, Upadhye S, Worster A (2006) Understanding receiver operating characteristic (ROC) curves. Can J Emergency Med 8(1):19–20Google Scholar
  31. 31.
    Dreyfus SE (1990) Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure. J Guid Control Dyn 13(5):926–928MathSciNetCrossRefGoogle Scholar
  32. 32.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  33. 33.
    Vishwanathan SVM, Murty MN (2002) SSVM: a simple SVM algorithm. In: Proceedings of the 2002 international joint conference on neural networks. IJCNN’02, vol 3, pp 2393–2398Google Scholar
  34. 34.
    Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186CrossRefGoogle Scholar
  35. 35.
    Saraee M, Keane J (2007) Using T3, an improved decision tree classifier, for mining stroke-related medical data. Methods Inf Med 46(5):523–529CrossRefGoogle Scholar
  36. 36.
    Liu L, Luo G, Ke Q, Zhang X (2017) An algorithm based on logistic regression with data fusion in wireless sensor network. Eurasip J Wirel Commun Netw. Google Scholar
  37. 37.
    Ho TK (1995) Random decision forests. In: Proceedings of the third international conference on document analysis and recognition, vol 1, pp 278–282Google Scholar
  38. 38.
    Isaac E, Easwarakumar KS, Issac J (2017) Urban landcover classification from multispectral image data using optimized AdaBoosted random forests. Remote Sens Lett 8(4):350–359CrossRefGoogle Scholar
  39. 39.
    Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(771–780):1612Google Scholar
  40. 40.
    Filho PPR, Rebouças ES, Marinho LB, Sarmento RM, Tavares JMRS, Albuquerque VHC (2017) Analysis of human tissue densities: a new approach to extract features from medical images. Pattern Recognit Lett 2017(94):211–218. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceSASTRA Deemed UniversityKumbakonamIndia
  2. 2.Department of Information and Communication TechnologySASTRA Deemed UniversityThanjavurIndia
  3. 3.School of BusinessStevens Institute of TechnologyHobokenUSA
  4. 4.School of Computing Science and EngineeringGalgotias UniversityGreater NoidaIndia

Personalised recommendations