Abstract
Diabetes is a lifestyle-driven disease which has become a critical health issue worldwide. In this paper, we conduct an exploratory study about early detection method of diabetes mellitus using various ensemble learning techniques. Eight tree-based machine learning algorithms, i.e. classification and regression tree, decision tree (C4.5), reduced error pruning tree, random tree, naive Bayes tree, functional tree, best-first decision tree and logistic model tree are employed as a base classifier in five different ensembles, i.e. bagging, boosting, random subspace, DECORATE, and rotation forest. The performance of ensembles and base classifiers are thoroughly benchmarked on three real-world datasets in term of area under receiver operating characteristic curve metric. Finally, we assess the performance differences among the classifiers using several statistical significant tests. We contribute to the existing literature regarding an extensive benchmark of tree-based classifier ensembles for early detection method of diabetes disease.
Similar content being viewed by others
References
Ali R, Siddiqi MH, Idris M, Kang BH, Lee S (2014) Prediction of diabetes mellitus based on boosting ensemble modeling. In: International conference on ubiquitous computing and ambient intelligence, pp 25–28. Springer
Bashir S, Qamar U, Khan FH (2016) IntelliHealth: a medical decision support application using a novel weighted multi-layer classifier ensemble framework. J Biomed Inform 59:185–200
Bashir S, Qamar U, Khan FH, Naseem L (2016) HMV: a medical decision support framework using multi-layer classifiers for disease prediction. J Comput Sci 13:10–25
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, New York
Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–127
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30 (Jan)
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923
Dunn OJ (1964) Multiple comparisons using rank sums. Technometrics 6(3):241–252
El-Baz AH, Hassanien AE, Schaefer G (2016) Identification of diabetes disease using committees of neural network-based classifiers. In: Machine intelligence and big data in industry, pp 65–74. Springer
Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874
Firdaus MA, Nadia R, Tama BA (2014) Detecting major disease in public hospital using ensemble techniques. In: 2014 international symposium on technology management and emerging technologies (ISTMET), pp 149–152. IEEE
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92
Gama J (2004) Functional trees. Mach Learn 55(3):219–250
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064
Ginter E, Simko V (2013) Global prevalence and future of diabetes mellitus. In: Diabetes, pp 35–41. Springer
Heydari M, Teimouri M, Heshmati Z, Alavinia SM (2015) Comparison of various classification algorithms in the diagnosis of type 2 diabetes in Iran. Int J Diabetes Dev Ctries 36(2):167–173
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Kohavi R (1996) Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: KDD, vol. 96, pp 202–207. Citeseer
Kuncheva LI (2014) Combining pattern classifiers: methods and algorithm, 2nd edn. Wiley, New York
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59(1–2):161–205
Marcialis GL, Roli F (2004) Fusion of appearance-based face recognition algorithms. Pattern Anal Appl 7(2):151–163
Melville P, Mooney RJ (2005) Creating diversity in ensembles using artificial data. Inf Fusion 6(1):99–111
Quinlan JR (1993) C4.5: programs for machine learning. Elsevier, Amsterdam
Quinlan JR (1999) Simplifying decision trees. Int J Hum Comput Stud 51(2):497–510
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
Shaw JE, Sicree RA, Zimmet PZ (2010) Global estimates of the prevalence of diabetes for 2010 and 2030. Diabetes Res Clin Pract 87(1):4–14
Shi H (2007) Best-first decision tree learning. Ph.D. thesis, The University of Waikato
Smith JW, Everhart J, Dickson W, Knowler W, Johannes R (1988) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the annual symposium on computer application in medical care, p 261. American Medical Informatics Association
Tama BA, Firdaus MA, Fitri R (2010) Detection of type 2 diabetes mellitus disease with data mining approach using support vector machine. In: Proceeding of The 2010 international conference on informatics, cybernetics, and computer applications (ICICCA2010). Gopalan College of Engineering and Management, Bangalore
Tama BA, Fitri R (2013) Hermansyah: an early detection method of type-2 diabetes mellitus in public hospital. TELKOMNIKA (Telecommun Comput Electr Control) 9(2):287–294
Woźniak M, Graña M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17
Zar JH et al (1999) Biostatistical analysis. Pearson Education India, London
Zhu J, Xie Q, Zheng K (2015) An improved early detection method of type-2 diabetes mellitus using multiple classifier system. Inf Sci 292:1–14
Zolfaghari R (2012) Diagnosis of diabetes in female population of pima indian heritage with ensemble of BP neural network and SVM. Int J Comput Eng Manag 15:2230–7893
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2014R1A2A1A11052981), and partially supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2015-0-00403) supervised by the IITP (Institute for Information & communications Technology Promotion).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Rights and permissions
About this article
Cite this article
Tama, B.A., Rhee, KH. Tree-based classifier ensembles for early detection method of diabetes: an exploratory study. Artif Intell Rev 51, 355–370 (2019). https://doi.org/10.1007/s10462-017-9565-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-017-9565-3