Tree-based classifier ensembles for early detection method of diabetes: an exploratory study

Tama, Bayu Adhi; Rhee, Kyung-Hyune

doi:10.1007/s10462-017-9565-3

Tree-based classifier ensembles for early detection method of diabetes: an exploratory study

Published: 29 May 2017

Volume 51, pages 355–370, (2019)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Bayu Adhi Tama^1,2 &
Kyung-Hyune Rhee¹

802 Accesses
40 Citations
1 Altmetric
Explore all metrics

Abstract

Diabetes is a lifestyle-driven disease which has become a critical health issue worldwide. In this paper, we conduct an exploratory study about early detection method of diabetes mellitus using various ensemble learning techniques. Eight tree-based machine learning algorithms, i.e. classification and regression tree, decision tree (C4.5), reduced error pruning tree, random tree, naive Bayes tree, functional tree, best-first decision tree and logistic model tree are employed as a base classifier in five different ensembles, i.e. bagging, boosting, random subspace, DECORATE, and rotation forest. The performance of ensembles and base classifiers are thoroughly benchmarked on three real-world datasets in term of area under receiver operating characteristic curve metric. Finally, we assess the performance differences among the classifiers using several statistical significant tests. We contribute to the existing literature regarding an extensive benchmark of tree-based classifier ensembles for early detection method of diabetes disease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Type 2 diabetes mellitus classification using predictive supervised learning model

Article 15 June 2023

An Ensemble Approach for Classification and Prediction of Diabetes Mellitus Disease

An Ensemble of Light Gradient Boosting Machine and Adaptive Boosting for Prediction of Type-2 Diabetes

Article Open access 12 February 2023

References

Ali R, Siddiqi MH, Idris M, Kang BH, Lee S (2014) Prediction of diabetes mellitus based on boosting ensemble modeling. In: International conference on ubiquitous computing and ambient intelligence, pp 25–28. Springer
Bashir S, Qamar U, Khan FH (2016) IntelliHealth: a medical decision support application using a novel weighted multi-layer classifier ensemble framework. J Biomed Inform 59:185–200
Article Google Scholar
Bashir S, Qamar U, Khan FH, Naseem L (2016) HMV: a medical decision support framework using multi-layer classifiers for disease prediction. J Comput Sci 13:10–25
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, New York
MATH Google Scholar
Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–127
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30 (Jan)
MathSciNet MATH Google Scholar
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923
Article Google Scholar
Dunn OJ (1964) Multiple comparisons using rank sums. Technometrics 6(3):241–252
Article Google Scholar
El-Baz AH, Hassanien AE, Schaefer G (2016) Identification of diabetes disease using committees of neural network-based classifiers. In: Machine intelligence and big data in industry, pp 65–74. Springer
Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874
Article MathSciNet Google Scholar
Firdaus MA, Nadia R, Tama BA (2014) Detecting major disease in public hospital using ensemble techniques. In: 2014 international symposium on technology management and emerging technologies (ISTMET), pp 149–152. IEEE
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MathSciNet MATH Google Scholar
Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156
Google Scholar
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92
Article MathSciNet MATH Google Scholar
Gama J (2004) Functional trees. Mach Learn 55(3):219–250
Article MATH Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064
Article Google Scholar
Ginter E, Simko V (2013) Global prevalence and future of diabetes mellitus. In: Diabetes, pp 35–41. Springer
Heydari M, Teimouri M, Heshmati Z, Alavinia SM (2015) Comparison of various classification algorithms in the diagnosis of type 2 diabetes in Iran. Int J Diabetes Dev Ctries 36(2):167–173
Article Google Scholar
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Article Google Scholar
Kohavi R (1996) Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: KDD, vol. 96, pp 202–207. Citeseer
Kuncheva LI (2014) Combining pattern classifiers: methods and algorithm, 2nd edn. Wiley, New York
MATH Google Scholar
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59(1–2):161–205
Article MATH Google Scholar
Marcialis GL, Roli F (2004) Fusion of appearance-based face recognition algorithms. Pattern Anal Appl 7(2):151–163
Article MathSciNet Google Scholar
Melville P, Mooney RJ (2005) Creating diversity in ensembles using artificial data. Inf Fusion 6(1):99–111
Article Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Elsevier, Amsterdam
Google Scholar
Quinlan JR (1999) Simplifying decision trees. Int J Hum Comput Stud 51(2):497–510
Article Google Scholar
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
Article Google Scholar
Shaw JE, Sicree RA, Zimmet PZ (2010) Global estimates of the prevalence of diabetes for 2010 and 2030. Diabetes Res Clin Pract 87(1):4–14
Article Google Scholar
Shi H (2007) Best-first decision tree learning. Ph.D. thesis, The University of Waikato
Smith JW, Everhart J, Dickson W, Knowler W, Johannes R (1988) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the annual symposium on computer application in medical care, p 261. American Medical Informatics Association
Tama BA, Firdaus MA, Fitri R (2010) Detection of type 2 diabetes mellitus disease with data mining approach using support vector machine. In: Proceeding of The 2010 international conference on informatics, cybernetics, and computer applications (ICICCA2010). Gopalan College of Engineering and Management, Bangalore
Tama BA, Fitri R (2013) Hermansyah: an early detection method of type-2 diabetes mellitus in public hospital. TELKOMNIKA (Telecommun Comput Electr Control) 9(2):287–294
Article Google Scholar
Woźniak M, Graña M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17
Article Google Scholar
Zar JH et al (1999) Biostatistical analysis. Pearson Education India, London
Google Scholar
Zhu J, Xie Q, Zheng K (2015) An improved early detection method of type-2 diabetes mellitus using multiple classifier system. Inf Sci 292:1–14
Article Google Scholar
Zolfaghari R (2012) Diagnosis of diabetes in female population of pima indian heritage with ensemble of BP neural network and SVM. Int J Comput Eng Manag 15:2230–7893
Google Scholar

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2014R1A2A1A11052981), and partially supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2015-0-00403) supervised by the IITP (Institute for Information & communications Technology Promotion).

Author information

Authors and Affiliations

IT Convergence and Application Engineering, Pukyong National University, (48513) Daeyon Campus, 45, Yongso-ro, Nam-Gu, Busan, Korea
Bayu Adhi Tama & Kyung-Hyune Rhee
Faculty of Computer Science, University of Sriwijaya Jln Raya Palembang-Prabumulih Km., 32 Ogan Ilir, Sumatera Selatan, Indonesia
Bayu Adhi Tama

Authors

Bayu Adhi Tama
View author publications
You can also search for this author in PubMed Google Scholar
Kyung-Hyune Rhee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bayu Adhi Tama.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tama, B.A., Rhee, KH. Tree-based classifier ensembles for early detection method of diabetes: an exploratory study. Artif Intell Rev 51, 355–370 (2019). https://doi.org/10.1007/s10462-017-9565-3

Download citation

Published: 29 May 2017
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10462-017-9565-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tree-based classifier ensembles for early detection method of diabetes: an exploratory study

Abstract

Access this article

Similar content being viewed by others

Type 2 diabetes mellitus classification using predictive supervised learning model

An Ensemble Approach for Classification and Prediction of Diabetes Mellitus Disease

An Ensemble of Light Gradient Boosting Machine and Adaptive Boosting for Prediction of Type-2 Diabetes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tree-based classifier ensembles for early detection method of diabetes: an exploratory study

Abstract

Access this article

Similar content being viewed by others

Type 2 diabetes mellitus classification using predictive supervised learning model

An Ensemble Approach for Classification and Prediction of Diabetes Mellitus Disease

An Ensemble of Light Gradient Boosting Machine and Adaptive Boosting for Prediction of Type-2 Diabetes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation