Advertisement

Performance Analysis of Various Missing Value Imputation Methods on Heart Failure Dataset

  • Mohammad Al KhaldyEmail author
  • Chandrasekhar Kambhampati
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 16)

Abstract

The missing data issue is a fundamental challenge in terms of analyses and classification of data. The classification performance of incomplete data could be affected and produce different accuracy results compared with complete data. In this work we compare six scalable imputation methods, implemented on a Heart Failure dataset. The comparison is done by the performance metrics of three different classification methods namely J48, REPTree, and Random Forest. The aim of the research is to find a classifier that achieves best performance results after imputing the missing data using different imputation methods. The results show that in general, the Random Forest classification achieves the best results in comparison to the decision tree J48 and REP Tree. Furthermore, the performance of classification improved when imputing the missing values by concept most common (CMC) and support vector machine (SVM).

Keywords

Heart failure Decision tree J48 REPTree Random forest EM Most common CMC KNN K-mean SVM 

References

  1. 1.
    Liu, Z., Pan, Q., Dezert, J., Martin, A.: Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn. 52, 85–95 (2015)CrossRefGoogle Scholar
  2. 2.
    Razzaghi, T., Roderick, O., Safro, I., Marko, N.: Fast imbalanced classification of healthcare data with missing values. arXiv preprint arXiv:1503.06250 (2015)
  3. 3.
    Batista, G.E., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 17, 519–533 (2003)CrossRefGoogle Scholar
  4. 4.
    Zhang, S., Qin, Z., Ling, C.X., Sheng, S.: “Missing is useful”: missing values in cost-sensitive decision trees. IEEE Trans. Knowl. Data Eng. 17, 1689–1693 (2005)CrossRefGoogle Scholar
  5. 5.
    Marivate, V.N., Nelwamodo, F.V., Marwala, T.: Autoencoder, principal component analysis and support vector regression for data imputation. arXiv preprint arXiv:0709.2506 (2007)
  6. 6.
    Umathe, V.H., Chaudhary, G.: Imputation methods for incomplete data. In: 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. 1–4 (2015)Google Scholar
  7. 7.
    Carmona, C.J., Luengo, J., Gonzalez, P., del Jesus, M.J.: A preliminary study on missing data imputation in evolutionary fuzzy systems of subgroup discovery. In: 2012 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–7 (2012)Google Scholar
  8. 8.
    Zhang, Y., Kambhampati, C., Davis, D.N., Goode, K., Cleland, J.G.: A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In: 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 2840–2844 (2012)Google Scholar
  9. 9.
    Little, R.J., Rubin, D.B.: The analysis of social science data with missing values. Sociol. Methods Res. 18, 292–326 (1989)CrossRefGoogle Scholar
  10. 10.
    Nelwamondo, F.V., Mohamed, S., Marwala, T.: Missing data: a comparison of neural network and expectation maximisation techniques. arXiv preprint arXiv:0704.3474 (2007)
  11. 11.
    Farhangfar, A., Kurgan, L., Pedrycz, W.: A novel framework for imputation of missing values in databases. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 37, 692–709 (2007)CrossRefGoogle Scholar
  12. 12.
    Belanche, L.A., Kobayashi, V., Aluja, T.: Handling missing values in kernel methods with application to microbiology data. Neurocomputing 141, 110–116 (2014)CrossRefGoogle Scholar
  13. 13.
    Jordanov, I., Petrov, N.: Sets with incomplete and missing data—NN radar signal classification. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 218–224 (2014)Google Scholar
  14. 14.
    Gheyas, I.A., Smith, L.S.: A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73, 3039–3065 (2010)CrossRefGoogle Scholar
  15. 15.
    Min, P.: Based on kernel function and non-parametric multiple imputation algorithm to solve the problem of missing data. In: 2011 International Conference on Management Science and Industrial Engineering (MSIE), pp. 905–909 (2011)Google Scholar
  16. 16.
    Chauhan, H., Kumar, V., Pundir, S., Pilli, E.S.: A comparative study of classification techniques for intrusion detection. In: 2013 International Symposium on Computational and Business Intelligence (ISCBI), pp. 40–43 (2013)Google Scholar
  17. 17.
    Moore, L., Kambhampati, C., Cleland, J.G.F.: Classification of a real live heart failure clinical dataset- Is TAN Bayes better than other Bayes? In: 2014 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 882–887 (2014)Google Scholar
  18. 18.
    My Chau, T., Dongil, S., Dongkyoo, S.: A comparative study of medical data classification methods based on decision tree and bagging algorithms. In: 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, DASC 2009, pp. 183–187 (2009)Google Scholar
  19. 19.
    Nakai, M., Chen, D.-G., Nishimura, K., Miyamoto, Y.: Comparative study of four methods in missing value imputations under missing completely at random mechanism. Open J. Stat. 4, 27–37 (2014)CrossRefGoogle Scholar
  20. 20.
    Kumdee, O., Ritthipravat, P., Bhongmakapat, T., Cheewaruangroj, W.: Dealing with missing values for effective prediction of NPC recurrence. In: 2008 SICE Annual Conference, pp. 1290–1294 (2008)Google Scholar
  21. 21.
    Dodge, Y., Zoppe, A.: Adjusting the EM algorithm for design of experiments with missing data. In: 2004 26th International Conference on Information Technology Interfaces, vol. 1, pp. 9–12 (2004)Google Scholar
  22. 22.
    Karmaker, A., Kwek, S.: Incorporating an EM-approach for handling missing attribute-values in decision tree induction. In: 2005 Fifth International Conference on Hybrid Intelligent Systems, HIS 2005, p. 6 (2005)Google Scholar
  23. 23.
    Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: Rough Sets and Current Trends in Computing, pp. 573–579 (2004)Google Scholar
  24. 24.
    Grzymala-Busse, J.W., Goodwin, L.K., Grzymala-Busse, W.J., Zheng, X.: Handling missing attribute values in preterm birth data sets. In: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, pp. 342–351. Springer (2005)Google Scholar
  25. 25.
    Kaiser, J.: Dealing with missing values in data. J. Syst. Integrat. 5, 42–51 (2014)CrossRefGoogle Scholar
  26. 26.
    Sivapriya, T., Kamal, A.N.B., Thavavel, V.: Imputation and classification of missing data using least square support vector machines–a new approach in dementia diagnosis. Int. J. Adv. Res. Artif. Intell. 1, 29–33 (2012)Google Scholar
  27. 27.
    Rogers, S.D.: Support vector machines for classification and imputation (2012)Google Scholar
  28. 28.
    Liu, Y., Liu, Y.: Incremental learning method of least squares support vector machine. In: 2010 International Conference on Intelligent Computation Technology and Automation (ICICTA), pp. 529–532 (2010)Google Scholar
  29. 29.
    Lomax, S., Vadera, S., Saraee, M.: A multi-armed bandit approach to cost-sensitive decision tree learning. In: 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 162–168 (2012)Google Scholar
  30. 30.
    Agrawal, G.L., Gupta, H.: Optimization of C4.5 decision tree algorithm for data mining application. Int. J. Emerg. Technol. Adv. Eng. 3, 341–345 (2013)Google Scholar
  31. 31.
    Sharma, P., Singh, D., Singh, A.: Classification algorithms on a large continuous random dataset using rapid miner tool. In: 2015 2nd International Conference on Electronics and Communication Systems (ICECS), pp. 704–709 (2015)Google Scholar
  32. 32.
    Kaur, G., Chhabra, A.: Improved J48 classification algorithm for the prediction of diabetes. Int. J. Comput. Appl. 98, 13–17 (2014)Google Scholar
  33. 33.
    Almutairi, A., Parish, D.: Using classification techniques for creation of predictive intrusion detection model. In: 2014 9th International Conference on Internet Technology and Secured Transactions (ICITST), pp. 223–228 (2014)Google Scholar
  34. 34.
    Galathiya, A., Ganatra, A., Bhensdadia, C.: Classification with an improved Decision Tree Algorithm. Int. J. Comput. Appl. 46, 1–6 (2012)Google Scholar
  35. 35.
    Mohamed, W.N.H.W., Salleh, M.N.M., Omar, A.H.: A comparative study of Reduced Error Pruning method in decision tree algorithms. In: 2012 IEEE International Conference on Control System, Computing and Engineering (ICCSCE), pp. 392–397 (2012)Google Scholar
  36. 36.
    Balasundaram, A., Bhuvaneswari, P.T.V.: Comparative study on decision tree based data mining algorithm to assess risk of epidemic. In: IET Chennai Fourth International Conference on Sustainable Energy and Intelligent Systems (SEISCON 2013), pp. 390–396 (2013)Google Scholar
  37. 37.
    Junghun, P., Hsiao-Rong, T., Kuo, C.C.J.: GA-based internet traffic classification technique for qos provisioning. In: 2006 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2006, pp. 251–254 (2006)Google Scholar
  38. 38.
    Jian, X., Chen, P., Bin, L.: Random forest for relational classification with application to terrorist profiling. In: 2009 IEEE International Conference on Granular Computing, GRC 2009, pp. 630–633 (2009)Google Scholar
  39. 39.
    Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 (2003)CrossRefGoogle Scholar
  40. 40.
    Cuzzocrea, A., Francis, S.L., Gaber, M.M.: An information-theoretic approach for setting the optimal number of decision trees in random forests. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1013–1019 (2013)Google Scholar
  41. 41.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)CrossRefGoogle Scholar
  42. 42.
    Alcalá-Fdez, A.F.J., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
  43. 43.
    Alcalá-Fdez, J., Sánchez, L., García, S., Jesus, M.J., Ventura, S., Garrell, J.M., et al.: KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft Comput. 13(3), 307–318 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of HullHullUK

Personalised recommendations