Multimedia Tools and Applications

, Volume 78, Issue 14, pp 20383–20407 | Cite as

Predicting kidney transplantation outcome based on hybrid feature selection and KNN classifier

  • Dalia M. AtallahEmail author
  • Mohammed Badawy
  • Ayman El-Sayed
  • Mohamed A. Ghoneim


Kidney transplantation outcome prediction is very significant and doesn’t require emphasis. This will grant the selection of the best available kidney donor and the best immunosuppressive treatment for patients. Survival prediction before treatment could simplify patient’s decision making and boost survival by altering clinical practice. This paper proposes a new novel prediction method based on data mining techniques to predict five-year graft survival after transplantation. This new proposed prediction method composes of three stages: data preparation stage (DPS), feature selection stage (FSS), and prediction stage (PS). The new proposed prediction method merges information gain with naïve Bayes and k-nearest neighbor. Initially, it uses information gain to select the essential features, uses naïve Bayes to select the most essential features. These two methods are combined in a new hybrid feature selection method which chooses the minimum number of features that produce highest accuracy. Finally, it uses k-nearest neighbor for graft survival prediction classification. The proposed prediction method has been evaluated against recent techniques. Experimental results have proven that the proposed prediction method outperforms the recent techniques as it attains the maximum accuracy and F-measure with minimal errors. This prediction method can also be used in other transplant datasets.


Kidney transplantation Feature selection Information gain Naïve Bayes K-nearest neighbor 



  1. 1.
    Akl A, Ismail AM, Ghoneim M (2008) Prediction of graft survival of living-donor kidney transplantation: nomograms or artificial neural networks? Transplantation 86(10):1401–1406Google Scholar
  2. 2.
    Akl A, Mostafa A, Ghoneim MA (2008) Nomogram that predicts graft survival probability following living-donor kidney transplant. Exp Clin Transplant 6(1):30–36Google Scholar
  3. 3.
    Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistics Surveys 4:40–79MathSciNetzbMATHGoogle Scholar
  4. 4.
    Atallah DM, Eldesoky AI, Amira Y, Ghoneim MA (2014) One-year renal graft survival prediction using a weighted decision tree classifier. International Journal of Engineering & Technology 3(3):327Google Scholar
  5. 5.
    Ben-Bassat M (1982) Pattern recognition and reduction of dimensionality. Handbook of Statistics 2(1982):773–910MathSciNetGoogle Scholar
  6. 6.
    Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1-2):245–271MathSciNetzbMATHGoogle Scholar
  7. 7.
    Breiman L (2017) Classification and regression trees. Routledge, AbingdonGoogle Scholar
  8. 8.
    Brier ME, Ray PC, Klein JB (2003) Prediction of delayed renal allograft function using an artificial neural network. Nephrol Dial Transplant 18(12):2655–2659Google Scholar
  9. 9.
    Brown TS, Elster EA, Stevens K, Graybill JC, Gillern S, Phinney S, Salifu MO, Jindal RM (2012) Bayesian modeling of pretransplant variables accurately predicts kidney graft survival. Am J Nephrol 36(6):561–569Google Scholar
  10. 10.
    Cawley GC, Talbot NL (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107MathSciNetzbMATHGoogle Scholar
  11. 11.
    Dag A, Oztekin A, Yucel A, Bulur S, Megahed FM (2017) Predicting heart transplantation outcomes through data analytics. Decis Support Syst 94:42–52Google Scholar
  12. 12.
    Dag A, Topuz K, Oztekin A, Bulur S, Megahed FM (2016) A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival. Decis Support Syst 86:1–12Google Scholar
  13. 13.
    Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: Icml, pp 74-81Google Scholar
  14. 14.
    Dash M, Liu H (1997) Feature selection for classification. Intelligent Data Analysis 1(3):131–156Google Scholar
  15. 15.
    Doak J (1992) CSE-92-18-an evaluation of feature selection methodsand their application to computer securityGoogle Scholar
  16. 16.
    Doyle HR, Dvorchik I, Mitchell S, Marino IR, Ebert FH, McMichael J, Fung JJ (1994) Predicting outcomes after liver transplantation. A connectionist approach. Ann Surg 219(4):408Google Scholar
  17. 17.
    Duch W, Adamczak R, Grabczewski K (2001) A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans Neural Netw 12(2):277–306Google Scholar
  18. 18.
    Dy JG, Brodley CE (2000) Feature subset selection and order identification for unsupervised learning. In: ICML. Citeseer, pp 247-254Google Scholar
  19. 19.
    Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2-3):131–163zbMATHGoogle Scholar
  20. 20.
    Ghoneim MA, Bakr MA, Refaie AF, Akl AI, Shokeir AA, El-Dein S, Ahmed B, Ammar HM, Ismail AM (2013) Sheashaa HA (2013) Factors affecting graft survival among patients receiving kidneys from live donors: a single-center experience. Biomed Res IntGoogle Scholar
  21. 21.
    Goldfarb-Rumyantzev AS, Scandling JD, Pappas L, Smout RJ, Horn S (2003) Prediction of 3-yr cadaveric graft survival based on pre-transplant variables in a large national dataset. Clin Transpl 17(6):485–497Google Scholar
  22. 22.
    Grinyó JM (2013) Why is organ transplantation clinically important? Cold Spring Harbor Perspectives in Medicine 3(6):a014985Google Scholar
  23. 23.
    Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, AmsterdamzbMATHGoogle Scholar
  24. 24.
    Hariharan S, Johnson CP, Bresnahan BA, Taranto SE, McIntosh MJ, Stablein D (2000) Improved graft survival after renal transplantation in the United States, 1988 to 1996. N Engl J Med 342(9):605–612Google Scholar
  25. 25.
    Heldal K, Hartmann A, Grootendorst DC, de Jager DJ, Leivestad T, Foss A, Midtvedt K (2009) Benefit of kidney transplantation beyond 70 years of age. Nephrol Dial Transplant 25(5):1680–1687Google Scholar
  26. 26.
    Hoot N, Aronsky D (2005) Using Bayesian networks to predict survival of liver transplant patients. In: AMIA annual symposium proceedings. American Medical Informatics Association, p 345Google Scholar
  27. 27.
    Inza I, Larrañaga P, Etxeberria R, Sierra B (2000) Feature subset selection by Bayesian network-based optimization. Artif Intell 123(1-2):157–184zbMATHGoogle Scholar
  28. 28.
    Kaplan B, Schold J (2009) Transplantation: neural networks for predicting graft survival. Nat Rev Nephrol 5(4):190Google Scholar
  29. 29.
    Kim Y, Street WN, Menczer F (2000) Feature selection in unsupervised learning via evolutionary search. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 365-369Google Scholar
  30. 30.
    Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol 2. Montreal, pp 1137-1145Google Scholar
  31. 31.
    Krikov S, Khan A, Baird BC, Barenbaum LL, Leviatov A, Koford JK, Goldfarb-Rumyantzev AS (2007) Predicting kidney transplant survival using tree-based modeling. ASAIO J 53(5):592–600Google Scholar
  32. 32.
    Kusiak A, Dixon B, Shah S (2005) Predicting survival time for kidney dialysis patients: a data mining approach. Comput Biol Med 35(4):311–327Google Scholar
  33. 33.
    Lin RS, Horn SD, Hurdle JF, Goldfarb-Rumyantzev AS (2008) Single and multiple time-point prediction models in kidney transplant outcomes. J Biomed Inform 41(6):944–952Google Scholar
  34. 34.
    Liu H, Motoda H (1998) Feature extraction, construction and selection: A data mining perspective, vol 453. Springer Science & Business Media, BerlinzbMATHGoogle Scholar
  35. 35.
    Martín-Valdivia MT, Díaz-Galiano MC, Montejo-Raez A, Urena-Lopez L (2008) Using information gain to improve multi-modal information retrieval systems. Inf Process Manag 44(3):1146–1158Google Scholar
  36. 36.
    Mitra P, Murthy C, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312Google Scholar
  37. 37.
    Mukras R, Wiratunga N, Lothian R, Chakraborti S, Harper D (2007) Information gain feature selection for ordinal text classification using probability re-distribution. In: Proceedings of the Textlink workshop at IJCAI, p 16Google Scholar
  38. 38.
    Nakayama N, Oketani M, Kawamura Y, Inao M, Nagoshi S, Fujiwara K, Tsubouchi H, Mochida S (2012) Algorithm to determine the outcome of patients with acute liver failure: a data-mining analysis using decision trees. J Gastroenterol 47(6):664–677Google Scholar
  39. 39.
    Ojo AO, Hanson JA, Meier-Kriesche H-U, Okechukwu CN, Wolfe RA, Leichtman AB, Agodoa LY, Kaplan B, Port FK (2001) Survival in recipients of marginal cadaveric donor kidneys compared with other recipients and wait-listed transplant candidates. J Am Soc Nephrol 12(3):589–597Google Scholar
  40. 40.
    Ojo AO, Wolfe RA, Agodoa LY, Held PJ, Port FK, Leavey SF, Callard SE, Dickinson DM, Schmouder RL, Leichtman AB (1998) Prognosis after primary renal transplant failure and the beneficial effects of repeat transplantation: Multivariate Analyses from the United States Renal Data System1, 2. Transplantation 66(12):1651–1659Google Scholar
  41. 41.
    Oztekin A, Al-Ebbini L, Sevkli Z, Delen D (2018) A decision analytic approach to predicting quality of life for lung transplant recipients: A hybrid genetic algorithms-based methodology. Eur J Oper Res 266(2):639–651MathSciNetzbMATHGoogle Scholar
  42. 42.
    Parmanto B, Doyle H (2001) Recurrent neural networks for predicting outcomes after liver transplantation: representing temporal sequence of clinical observations. Methods Inf Med 40(05):386–391Google Scholar
  43. 43.
    Poli F, Scalamogna M, Cardillo M, Porta E, Sirchia G (2000) An algorithm for cadaver kidney allocation based on a multivariate analysis of factors impacting on cadaver kidney graft survival and function. Transpl Int 13(1):S259–S262Google Scholar
  44. 44.
    Port FK, Bragg-Gresham JL, Metzger RA, Dykstra DM, Gillespie BW, Young EW, Delmonico FL, Wynn JJ, Merion RM, Wolfe RA (2002) Donor characteristics associated with reduced graft survival: an approach to expanding the pool of kidney donors1. Transplantation 74(9):1281–1286Google Scholar
  45. 45.
    Qiang G (2010) An effective algorithm for improving the performance of Naïve Bayes for text classification. In: 2010 Second International Conference on Computer Research and DevelopmentGoogle Scholar
  46. 46.
    Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, AmsterdamGoogle Scholar
  47. 47.
    Raji C, Chandra SV (2016) Graft survival prediction in liver transplantation using artificial neural network models. J Comput Sci 16:72–78Google Scholar
  48. 48.
    Rana A, Gruessner A, Agopian VG, Khalpey Z, Riaz IB, Kaplan B, Halazun KJ, Busuttil RW, Gruessner RW (2015) Survival benefit of solid-organ transplant in the United States. JAMA surgery 150(3):252–259Google Scholar
  49. 49.
    Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. In: Encyclopedia of database systems. Springer, pp 532-538Google Scholar
  50. 50.
    Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 22. IBM, pp 41-46Google Scholar
  51. 51.
    Shih DT, Kim SB, Chen VC, Rosenberger JM, Pilla VL (2014) Efficient computer experiment-based optimization through variable selection. Ann Oper Res 216(1):287–305MathSciNetzbMATHGoogle Scholar
  52. 52.
    Siedlecki W, Sklansky J (1988) On automatic feature selection. Int J Pattern Recognit Artif Intell 2(02):197–220zbMATHGoogle Scholar
  53. 53.
    Talavera L (1999) Feature selection as a preprocessing step for hierarchical clustering. In: ICML. Citeseer, pp 389-397Google Scholar
  54. 54.
    Tang H, Hurdle JF, Poynton M, Hunter C, Tu M, Baird BC, Krikov S, Goldfarb-Rumyantzev AS (2011) Validating prediction models of kidney transplant outcome using single center data. ASAIO J 57(3):206–212Google Scholar
  55. 55.
    Topuz K, Uner H, Oztekin A, Yildirim MB (2018) Predicting pediatric clinic no-shows: a decision analytic framework using elastic net and Bayesian belief network. Ann Oper Res 263(1-2):479–499MathSciNetzbMATHGoogle Scholar
  56. 56.
    Topuz K, Zengul FD, Dag A, Almehmi A, Yildirim MB (2018) Predicting graft survival among kidney transplant recipients: A Bayesian decision support model. Decis Support Syst 106:97–109Google Scholar
  57. 57.
    Tseng W-T, Chiang W-F, Liu S-Y, Roan J, Lin C-N (2015) The application of data mining techniques to oral cancer prognosis. J Med Syst 39(5):59Google Scholar
  58. 58.
    Webb GI (2011) Naïve bayes. In: Encyclopedia of Machine Learning. Springer, pp 713-714Google Scholar
  59. 59.
    Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244zbMATHGoogle Scholar
  60. 60.
    Wyse N, Dubes R, Jain AK (1980) A critical evaluation of intrinsic dimensionality algorithms. Pattern Recognition in Practice:415–425Google Scholar
  61. 61.
    Yang C-H, Chuang L-Y, Yang CH (2010) IG-GA: a hybrid filter/wrapper method for feature selection of microarray data. Journal of Medical and Biological Engineering 30(1):23–28Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Dalia M. Atallah
    • 1
    Email author
  • Mohammed Badawy
    • 2
  • Ayman El-Sayed
    • 2
  • Mohamed A. Ghoneim
    • 3
  1. 1.Urology & Nephrology CenterMansoura UniversityMansouraEgypt
  2. 2.Computer Science & Engineering Department, Faculty of Electronic EngineeringMenoufia UniversityMenoufEgypt
  3. 3.Urology Department, Faculty of MedicineMansoura UniversityMansouraEgypt

Personalised recommendations