Predicting kidney transplantation outcome based on hybrid feature selection and KNN classifier


Kidney transplantation outcome prediction is very significant and doesn’t require emphasis. This will grant the selection of the best available kidney donor and the best immunosuppressive treatment for patients. Survival prediction before treatment could simplify patient’s decision making and boost survival by altering clinical practice. This paper proposes a new novel prediction method based on data mining techniques to predict five-year graft survival after transplantation. This new proposed prediction method composes of three stages: data preparation stage (DPS), feature selection stage (FSS), and prediction stage (PS). The new proposed prediction method merges information gain with naïve Bayes and k-nearest neighbor. Initially, it uses information gain to select the essential features, uses naïve Bayes to select the most essential features. These two methods are combined in a new hybrid feature selection method which chooses the minimum number of features that produce highest accuracy. Finally, it uses k-nearest neighbor for graft survival prediction classification. The proposed prediction method has been evaluated against recent techniques. Experimental results have proven that the proposed prediction method outperforms the recent techniques as it attains the maximum accuracy and F-measure with minimal errors. This prediction method can also be used in other transplant datasets.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

    Akl A, Ismail AM, Ghoneim M (2008) Prediction of graft survival of living-donor kidney transplantation: nomograms or artificial neural networks? Transplantation 86(10):1401–1406

    Article  Google Scholar 

  2. 2.

    Akl A, Mostafa A, Ghoneim MA (2008) Nomogram that predicts graft survival probability following living-donor kidney transplant. Exp Clin Transplant 6(1):30–36

    Google Scholar 

  3. 3.

    Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistics Surveys 4:40–79

    MathSciNet  MATH  Article  Google Scholar 

  4. 4.

    Atallah DM, Eldesoky AI, Amira Y, Ghoneim MA (2014) One-year renal graft survival prediction using a weighted decision tree classifier. International Journal of Engineering & Technology 3(3):327

    Article  Google Scholar 

  5. 5.

    Ben-Bassat M (1982) Pattern recognition and reduction of dimensionality. Handbook of Statistics 2(1982):773–910

    MathSciNet  Article  Google Scholar 

  6. 6.

    Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1-2):245–271

    MathSciNet  MATH  Article  Google Scholar 

  7. 7.

    Breiman L (2017) Classification and regression trees. Routledge, Abingdon

    Google Scholar 

  8. 8.

    Brier ME, Ray PC, Klein JB (2003) Prediction of delayed renal allograft function using an artificial neural network. Nephrol Dial Transplant 18(12):2655–2659

    Article  Google Scholar 

  9. 9.

    Brown TS, Elster EA, Stevens K, Graybill JC, Gillern S, Phinney S, Salifu MO, Jindal RM (2012) Bayesian modeling of pretransplant variables accurately predicts kidney graft survival. Am J Nephrol 36(6):561–569

    Article  Google Scholar 

  10. 10.

    Cawley GC, Talbot NL (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107

    MathSciNet  MATH  Google Scholar 

  11. 11.

    Dag A, Oztekin A, Yucel A, Bulur S, Megahed FM (2017) Predicting heart transplantation outcomes through data analytics. Decis Support Syst 94:42–52

    Article  Google Scholar 

  12. 12.

    Dag A, Topuz K, Oztekin A, Bulur S, Megahed FM (2016) A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival. Decis Support Syst 86:1–12

    Article  Google Scholar 

  13. 13.

    Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: Icml, pp 74-81

  14. 14.

    Dash M, Liu H (1997) Feature selection for classification. Intelligent Data Analysis 1(3):131–156

    Article  Google Scholar 

  15. 15.

    Doak J (1992) CSE-92-18-an evaluation of feature selection methodsand their application to computer security

  16. 16.

    Doyle HR, Dvorchik I, Mitchell S, Marino IR, Ebert FH, McMichael J, Fung JJ (1994) Predicting outcomes after liver transplantation. A connectionist approach. Ann Surg 219(4):408

    Article  Google Scholar 

  17. 17.

    Duch W, Adamczak R, Grabczewski K (2001) A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans Neural Netw 12(2):277–306

    Article  Google Scholar 

  18. 18.

    Dy JG, Brodley CE (2000) Feature subset selection and order identification for unsupervised learning. In: ICML. Citeseer, pp 247-254

  19. 19.

    Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2-3):131–163

    MATH  Article  Google Scholar 

  20. 20.

    Ghoneim MA, Bakr MA, Refaie AF, Akl AI, Shokeir AA, El-Dein S, Ahmed B, Ammar HM, Ismail AM (2013) Sheashaa HA (2013) Factors affecting graft survival among patients receiving kidneys from live donors: a single-center experience. Biomed Res Int

  21. 21.

    Goldfarb-Rumyantzev AS, Scandling JD, Pappas L, Smout RJ, Horn S (2003) Prediction of 3-yr cadaveric graft survival based on pre-transplant variables in a large national dataset. Clin Transpl 17(6):485–497

    Article  Google Scholar 

  22. 22.

    Grinyó JM (2013) Why is organ transplantation clinically important? Cold Spring Harbor Perspectives in Medicine 3(6):a014985

    Article  Google Scholar 

  23. 23.

    Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    Google Scholar 

  24. 24.

    Hariharan S, Johnson CP, Bresnahan BA, Taranto SE, McIntosh MJ, Stablein D (2000) Improved graft survival after renal transplantation in the United States, 1988 to 1996. N Engl J Med 342(9):605–612

    Article  Google Scholar 

  25. 25.

    Heldal K, Hartmann A, Grootendorst DC, de Jager DJ, Leivestad T, Foss A, Midtvedt K (2009) Benefit of kidney transplantation beyond 70 years of age. Nephrol Dial Transplant 25(5):1680–1687

    Article  Google Scholar 

  26. 26.

    Hoot N, Aronsky D (2005) Using Bayesian networks to predict survival of liver transplant patients. In: AMIA annual symposium proceedings. American Medical Informatics Association, p 345

  27. 27.

    Inza I, Larrañaga P, Etxeberria R, Sierra B (2000) Feature subset selection by Bayesian network-based optimization. Artif Intell 123(1-2):157–184

    MATH  Article  Google Scholar 

  28. 28.

    Kaplan B, Schold J (2009) Transplantation: neural networks for predicting graft survival. Nat Rev Nephrol 5(4):190

    Article  Google Scholar 

  29. 29.

    Kim Y, Street WN, Menczer F (2000) Feature selection in unsupervised learning via evolutionary search. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 365-369

  30. 30.

    Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol 2. Montreal, pp 1137-1145

  31. 31.

    Krikov S, Khan A, Baird BC, Barenbaum LL, Leviatov A, Koford JK, Goldfarb-Rumyantzev AS (2007) Predicting kidney transplant survival using tree-based modeling. ASAIO J 53(5):592–600

    Article  Google Scholar 

  32. 32.

    Kusiak A, Dixon B, Shah S (2005) Predicting survival time for kidney dialysis patients: a data mining approach. Comput Biol Med 35(4):311–327

    Article  Google Scholar 

  33. 33.

    Lin RS, Horn SD, Hurdle JF, Goldfarb-Rumyantzev AS (2008) Single and multiple time-point prediction models in kidney transplant outcomes. J Biomed Inform 41(6):944–952

    Article  Google Scholar 

  34. 34.

    Liu H, Motoda H (1998) Feature extraction, construction and selection: A data mining perspective, vol 453. Springer Science & Business Media, Berlin

    Google Scholar 

  35. 35.

    Martín-Valdivia MT, Díaz-Galiano MC, Montejo-Raez A, Urena-Lopez L (2008) Using information gain to improve multi-modal information retrieval systems. Inf Process Manag 44(3):1146–1158

    Article  Google Scholar 

  36. 36.

    Mitra P, Murthy C, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312

    Article  Google Scholar 

  37. 37.

    Mukras R, Wiratunga N, Lothian R, Chakraborti S, Harper D (2007) Information gain feature selection for ordinal text classification using probability re-distribution. In: Proceedings of the Textlink workshop at IJCAI, p 16

  38. 38.

    Nakayama N, Oketani M, Kawamura Y, Inao M, Nagoshi S, Fujiwara K, Tsubouchi H, Mochida S (2012) Algorithm to determine the outcome of patients with acute liver failure: a data-mining analysis using decision trees. J Gastroenterol 47(6):664–677

    Article  Google Scholar 

  39. 39.

    Ojo AO, Hanson JA, Meier-Kriesche H-U, Okechukwu CN, Wolfe RA, Leichtman AB, Agodoa LY, Kaplan B, Port FK (2001) Survival in recipients of marginal cadaveric donor kidneys compared with other recipients and wait-listed transplant candidates. J Am Soc Nephrol 12(3):589–597

    Google Scholar 

  40. 40.

    Ojo AO, Wolfe RA, Agodoa LY, Held PJ, Port FK, Leavey SF, Callard SE, Dickinson DM, Schmouder RL, Leichtman AB (1998) Prognosis after primary renal transplant failure and the beneficial effects of repeat transplantation: Multivariate Analyses from the United States Renal Data System1, 2. Transplantation 66(12):1651–1659

    Article  Google Scholar 

  41. 41.

    Oztekin A, Al-Ebbini L, Sevkli Z, Delen D (2018) A decision analytic approach to predicting quality of life for lung transplant recipients: A hybrid genetic algorithms-based methodology. Eur J Oper Res 266(2):639–651

    MathSciNet  MATH  Article  Google Scholar 

  42. 42.

    Parmanto B, Doyle H (2001) Recurrent neural networks for predicting outcomes after liver transplantation: representing temporal sequence of clinical observations. Methods Inf Med 40(05):386–391

    Article  Google Scholar 

  43. 43.

    Poli F, Scalamogna M, Cardillo M, Porta E, Sirchia G (2000) An algorithm for cadaver kidney allocation based on a multivariate analysis of factors impacting on cadaver kidney graft survival and function. Transpl Int 13(1):S259–S262

    Article  Google Scholar 

  44. 44.

    Port FK, Bragg-Gresham JL, Metzger RA, Dykstra DM, Gillespie BW, Young EW, Delmonico FL, Wynn JJ, Merion RM, Wolfe RA (2002) Donor characteristics associated with reduced graft survival: an approach to expanding the pool of kidney donors1. Transplantation 74(9):1281–1286

    Article  Google Scholar 

  45. 45.

    Qiang G (2010) An effective algorithm for improving the performance of Naïve Bayes for text classification. In: 2010 Second International Conference on Computer Research and Development

  46. 46.

    Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  47. 47.

    Raji C, Chandra SV (2016) Graft survival prediction in liver transplantation using artificial neural network models. J Comput Sci 16:72–78

    Article  Google Scholar 

  48. 48.

    Rana A, Gruessner A, Agopian VG, Khalpey Z, Riaz IB, Kaplan B, Halazun KJ, Busuttil RW, Gruessner RW (2015) Survival benefit of solid-organ transplant in the United States. JAMA surgery 150(3):252–259

    Article  Google Scholar 

  49. 49.

    Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. In: Encyclopedia of database systems. Springer, pp 532-538

  50. 50.

    Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 22. IBM, pp 41-46

  51. 51.

    Shih DT, Kim SB, Chen VC, Rosenberger JM, Pilla VL (2014) Efficient computer experiment-based optimization through variable selection. Ann Oper Res 216(1):287–305

    MathSciNet  MATH  Article  Google Scholar 

  52. 52.

    Siedlecki W, Sklansky J (1988) On automatic feature selection. Int J Pattern Recognit Artif Intell 2(02):197–220

    MATH  Article  Google Scholar 

  53. 53.

    Talavera L (1999) Feature selection as a preprocessing step for hierarchical clustering. In: ICML. Citeseer, pp 389-397

  54. 54.

    Tang H, Hurdle JF, Poynton M, Hunter C, Tu M, Baird BC, Krikov S, Goldfarb-Rumyantzev AS (2011) Validating prediction models of kidney transplant outcome using single center data. ASAIO J 57(3):206–212

    Article  Google Scholar 

  55. 55.

    Topuz K, Uner H, Oztekin A, Yildirim MB (2018) Predicting pediatric clinic no-shows: a decision analytic framework using elastic net and Bayesian belief network. Ann Oper Res 263(1-2):479–499

    MathSciNet  MATH  Article  Google Scholar 

  56. 56.

    Topuz K, Zengul FD, Dag A, Almehmi A, Yildirim MB (2018) Predicting graft survival among kidney transplant recipients: A Bayesian decision support model. Decis Support Syst 106:97–109

    Article  Google Scholar 

  57. 57.

    Tseng W-T, Chiang W-F, Liu S-Y, Roan J, Lin C-N (2015) The application of data mining techniques to oral cancer prognosis. J Med Syst 39(5):59

    Article  Google Scholar 

  58. 58.

    Webb GI (2011) Naïve bayes. In: Encyclopedia of Machine Learning. Springer, pp 713-714

  59. 59.

    Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    MATH  Google Scholar 

  60. 60.

    Wyse N, Dubes R, Jain AK (1980) A critical evaluation of intrinsic dimensionality algorithms. Pattern Recognition in Practice:415–425

  61. 61.

    Yang C-H, Chuang L-Y, Yang CH (2010) IG-GA: a hybrid filter/wrapper method for feature selection of microarray data. Journal of Medical and Biological Engineering 30(1):23–28

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Dalia M. Atallah.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Atallah, D.M., Badawy, M., El-Sayed, A. et al. Predicting kidney transplantation outcome based on hybrid feature selection and KNN classifier. Multimed Tools Appl 78, 20383–20407 (2019).

Download citation


  • Kidney transplantation
  • Feature selection
  • Information gain
  • Naïve Bayes
  • K-nearest neighbor