Skip to main content
Log in

Tree-based classifier ensembles for early detection method of diabetes: an exploratory study

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Diabetes is a lifestyle-driven disease which has become a critical health issue worldwide. In this paper, we conduct an exploratory study about early detection method of diabetes mellitus using various ensemble learning techniques. Eight tree-based machine learning algorithms, i.e. classification and regression tree, decision tree (C4.5), reduced error pruning tree, random tree, naive Bayes tree, functional tree, best-first decision tree and logistic model tree are employed as a base classifier in five different ensembles, i.e. bagging, boosting, random subspace, DECORATE, and rotation forest. The performance of ensembles and base classifiers are thoroughly benchmarked on three real-world datasets in term of area under receiver operating characteristic curve metric. Finally, we assess the performance differences among the classifiers using several statistical significant tests. We contribute to the existing literature regarding an extensive benchmark of tree-based classifier ensembles for early detection method of diabetes disease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Ali R, Siddiqi MH, Idris M, Kang BH, Lee S (2014) Prediction of diabetes mellitus based on boosting ensemble modeling. In: International conference on ubiquitous computing and ambient intelligence, pp 25–28. Springer

  • Bashir S, Qamar U, Khan FH (2016) IntelliHealth: a medical decision support application using a novel weighted multi-layer classifier ensemble framework. J Biomed Inform 59:185–200

    Article  Google Scholar 

  • Bashir S, Qamar U, Khan FH, Naseem L (2016) HMV: a medical decision support framework using multi-layer classifiers for disease prediction. J Comput Sci 13:10–25

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, New York

    MATH  Google Scholar 

  • Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–127

    Article  Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30 (Jan)

    MathSciNet  MATH  Google Scholar 

  • Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923

    Article  Google Scholar 

  • Dunn OJ (1964) Multiple comparisons using rank sums. Technometrics 6(3):241–252

    Article  Google Scholar 

  • El-Baz AH, Hassanien AE, Schaefer G (2016) Identification of diabetes disease using committees of neural network-based classifiers. In: Machine intelligence and big data in industry, pp 65–74. Springer

  • Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  • Firdaus MA, Nadia R, Tama BA (2014) Detecting major disease in public hospital using ensemble techniques. In: 2014 international symposium on technology management and emerging technologies (ISTMET), pp 149–152. IEEE

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MathSciNet  MATH  Google Scholar 

  • Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156

    Google Scholar 

  • Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92

    Article  MathSciNet  MATH  Google Scholar 

  • Gama J (2004) Functional trees. Mach Learn 55(3):219–250

    Article  MATH  Google Scholar 

  • García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064

    Article  Google Scholar 

  • Ginter E, Simko V (2013) Global prevalence and future of diabetes mellitus. In: Diabetes, pp 35–41. Springer

  • Heydari M, Teimouri M, Heshmati Z, Alavinia SM (2015) Comparison of various classification algorithms in the diagnosis of type 2 diabetes in Iran. Int J Diabetes Dev Ctries 36(2):167–173

    Article  Google Scholar 

  • Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  • Kohavi R (1996) Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: KDD, vol. 96, pp 202–207. Citeseer

  • Kuncheva LI (2014) Combining pattern classifiers: methods and algorithm, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59(1–2):161–205

    Article  MATH  Google Scholar 

  • Marcialis GL, Roli F (2004) Fusion of appearance-based face recognition algorithms. Pattern Anal Appl 7(2):151–163

    Article  MathSciNet  Google Scholar 

  • Melville P, Mooney RJ (2005) Creating diversity in ensembles using artificial data. Inf Fusion 6(1):99–111

    Article  Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  • Quinlan JR (1999) Simplifying decision trees. Int J Hum Comput Stud 51(2):497–510

    Article  Google Scholar 

  • Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630

    Article  Google Scholar 

  • Shaw JE, Sicree RA, Zimmet PZ (2010) Global estimates of the prevalence of diabetes for 2010 and 2030. Diabetes Res Clin Pract 87(1):4–14

    Article  Google Scholar 

  • Shi H (2007) Best-first decision tree learning. Ph.D. thesis, The University of Waikato

  • Smith JW, Everhart J, Dickson W, Knowler W, Johannes R (1988) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the annual symposium on computer application in medical care, p 261. American Medical Informatics Association

  • Tama BA, Firdaus MA, Fitri R (2010) Detection of type 2 diabetes mellitus disease with data mining approach using support vector machine. In: Proceeding of The 2010 international conference on informatics, cybernetics, and computer applications (ICICCA2010). Gopalan College of Engineering and Management, Bangalore

  • Tama BA, Fitri R (2013) Hermansyah: an early detection method of type-2 diabetes mellitus in public hospital. TELKOMNIKA (Telecommun Comput Electr Control) 9(2):287–294

    Article  Google Scholar 

  • Woźniak M, Graña M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17

    Article  Google Scholar 

  • Zar JH et al (1999) Biostatistical analysis. Pearson Education India, London

    Google Scholar 

  • Zhu J, Xie Q, Zheng K (2015) An improved early detection method of type-2 diabetes mellitus using multiple classifier system. Inf Sci 292:1–14

    Article  Google Scholar 

  • Zolfaghari R (2012) Diagnosis of diabetes in female population of pima indian heritage with ensemble of BP neural network and SVM. Int J Comput Eng Manag 15:2230–7893

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2014R1A2A1A11052981), and partially supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2015-0-00403) supervised by the IITP (Institute for Information & communications Technology Promotion).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bayu Adhi Tama.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tama, B.A., Rhee, KH. Tree-based classifier ensembles for early detection method of diabetes: an exploratory study. Artif Intell Rev 51, 355–370 (2019). https://doi.org/10.1007/s10462-017-9565-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-017-9565-3

Keywords

Navigation