Heart Disease Prediction Using Classification (Naive Bayes)

  • Akansh GuptaEmail author
  • Lokesh Kumar
  • Rachna Jain
  • Preeti Nagrath
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 121)


This paper aims toward a greater idea and utilization of machine learning in the medical sector. In this paper, comparative performances of six classification models are presented, when used over the University of California Irvine’s (UCI) Cleveland Heart Disease Records to predict coronary artery disease (CAD). At first, all the 13 provided independent features were used to build the models. On comparing the accuracy of models, it was found that K-nearest neighbors (KNN), support vector machine (SVM), and Naive Bayes have expected and better performances. Thereafter, feature selection is applied to improve prediction accuracy. The backward elimination method and filter method based on the Pearson correlation coefficient is used to choose major predicting features. The accuracy of models using all features and using features selected significantly enhanced the performance of Naive Bayes and random forest, while the other models did not perform as expected. Naive Bayes produced an accuracy of 88.16% on the test set thereafter.


Naive Bayes Random forest Classification model Coronary artery disease Cleveland dataset 


  1. 1.
    Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J. J., Meyer, M., Guppy, K.H., Abi-Mansour, P.: Algorithm to predict triple-vessel/left main coronary artery disease in patients without myocardial infarction. An international cross validation. Circulation 83(5 Suppl), III89–96 (1991)Google Scholar
  2. 2.
    Alwan, A.: Global status report on noncommunicable diseases 2010. World Health Organization. Open J. Prev. Med. 5(8) (2015)Google Scholar
  3. 3.
    Kumari, M., Godara, S.: Comparative study of data mining classification methods in cardiovascular disease prediction 1. Int. J. Comput. Sci. Technol. 2, 304–308 (2011)Google Scholar
  4. 4.
    Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.J., Sandhu, S., Guppy, K.H., Lee, S., Froelicher, V.: International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 64(5), 304–310 (1989)CrossRefGoogle Scholar
  5. 5.
    Yao, Z., Liu, P., Lei, L., Yin, J.: R-C4. 5 Decision tree model and its applications to health care dataset. In: Proceedings of ICSSSM’05. 2005 International Conference on Services Systems and Services Management, vol. 2, pp. 1099–1103. IEEE (2005)Google Scholar
  6. 6.
    Das, R., Turkoglu, I., Sengur, A.: Effective diagnosis of heart disease through neural networks ensembles. Expert Syst. Appl. 36(4), 7675–7680 (2009)CrossRefGoogle Scholar
  7. 7.
    Kurt, I., Ture, M., Kurum, A.T.: Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst. Appl. 34(1), 366–374 (2008)CrossRefGoogle Scholar
  8. 8.
    Jabbar, M.A., Deekshatulu, B.L., Chandra, P.: Classification of heart disease using artificial neural network and feature subset selection. Glob. J. Comput. Sci. Technol. Neural Artif. Intell. 13(3), 4–8 (2013)Google Scholar
  9. 9.
    Gennari, J.H., Langley, P., Fisher, D.: Models of incremental concept formation. Artif. Intell. 40(1–3), 11–61 (1989)CrossRefGoogle Scholar
  10. 10.
    Sabay, A., Harris, L., Bejugama, V., Jaceldo-Siegl, K.: Overcoming small data limitations in heart disease prediction by using surrogate data. SMU Data Sci. Rev. 1(3), 12 (2018)Google Scholar
  11. 11.
    Mehanović, D., Mašetić, Z., Kečo, D.: Prediction of heart diseases using majority voting ensemble method. In: International Conference on Medical and Biological Engineering, pp. 491–498. Springer, Cham (2019)Google Scholar
  12. 12.
    Heart Disease Data Set, UCI Machine Learning Repository.
  13. 13.
    Detrano, R.: Heart Disease Data Set of Cleveland, V.A. Medical Center, Long Beach and Cleveland Clinic FoundationGoogle Scholar
  14. 14.
  15. 15.
    Chen, L., Cao, Q., Li, S., Ju, X.: Predicting heart attacks. Int. J. Comput. Appl. (0975–8887) 17(8) (2011)Google Scholar
  16. 16.
    Chaki, D., Das, A., Zaber, M.I.: A comparison of three discrete methods for classification of heart disease data. Bangladesh J. Sci. Ind. Res. 50(4), 293–296 (2015)CrossRefGoogle Scholar
  17. 17.
    Wei, L., Altman, R.B.: An automated system for generating comparative disease profiles and making diagnoses. IEEE Trans. Neural Netw. 15, 597 (2004)CrossRefGoogle Scholar
  18. 18.
    Sen, S.K.: Predicting and diagnosing of heart disease using machine learning algorithms. Int. J. Eng. Comput. Sci. 6(6) (2017)Google Scholar
  19. 19.
    Singh, Y.K., Sinha, N., Singh, S.K. Heart disease prediction system using random forest. In: International Conference on Advances in Computing and Data Sciences, pp. 613–623. Springer, Singapore (2016)Google Scholar
  20. 20.
    Basharat, I., Anjum, A.R., Fatima, M., Qamar, U., Khan, S.A.: A framework for classifying unstructured data of cardiac patients: a supervised learning approach. Framework 7(2) (2016)Google Scholar
  21. 21.
    Hossain, J., FazlidaMohdSani, N., Mustapha, A., SurianiAffendey, L.: Using feature selection as accuracy benchmarking in clinical data mining. J. Comput. Sci. 9(7), 883 (2013)CrossRefGoogle Scholar
  22. 22.
    Chowdhury, D.R., Chatterjee, M., Samanta, R.K.: An artificial neural network model for neonatal disease diagnosis. Int. J. Artif. Intell. Expert Syst. (IJAE) 2(3), 96–106 (2011)Google Scholar
  23. 23.
    Chavda, P., Bhavsar, H., Pithadia, Y., Kotecha, R.: Early Detection of Cardiac Disease Using Machine Learning. Available at SSRN 3370813 (2019)Google Scholar
  24. 24.
  25. 25.
    Deekshatulu, B.L., Chandra, P.: Classification of heart disease using k-nearest neighbor and genetic algorithm. Procedia Technol. 10, 85–94 (2013)CrossRefGoogle Scholar
  26. 26.
    Jain, D., Singh, V.: Feature selection and classification systems for chronic disease prediction: a review. Egypt. Inf. J. 19(3), 179–189 (2018)CrossRefGoogle Scholar
  27. 27.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995).
  28. 28.
    Aha, D., Kibler, D.: Instance-based prediction of heart-disease presence with the Cleveland database. University of California, 3(1), 3-2 (1988)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Akansh Gupta
    • 1
    Email author
  • Lokesh Kumar
    • 1
  • Rachna Jain
    • 1
  • Preeti Nagrath
    • 1
  1. 1.Bharati Vidyapeeth’s College of EngineeringNew DelhiIndia

Personalised recommendations