Enhancing the Performance of Classification Using Super Learning

Kabir, Md Faisal; Ludwig, Simone A.

doi:10.1007/s41688-019-0030-0

Enhancing the Performance of Classification Using Super Learning

ORIGINAL ARTICLE
Published: 17 January 2019

Volume 3, article number 5, (2019)
Cite this article

Data-Enabled Discovery and Applications

1150 Accesses
26 Citations
Explore all metrics

Abstract

Classification is one of the supervised learning models, and enhancing the performance of a classification model has been a challenging research problem in the fields of machine learning (ML) and data mining. The goal of ML is to produce or build a model that can be used to perform classification. It is important to achieve superior performance of the classification model. Obtaining a better performance is important for almost all fields including healthcare. Researchers have been using different ML techniques to obtain better performance of their models; ensemble techniques are also used to combine multiple base learner models. The ML technique called super learning or stacked-ensemble is an ensemble method that finds the optimal weighted average of diverse learning models. In this paper, we have used super learning or stacked-ensemble achieving better performance on four benchmark data sets that are related to healthcare. Experimental results show that super learning has a better performance compared to the individual base learners and the baseline ensemble.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced Data Classification Method Based on Ensemble Learning

A Study on Ensemble Methods for Classification

Single Classifier Selection for Ensemble Learning

References

J. Han, M. Kamber. Data Mining Concepts and Techniques (Moraga Kaufman, San Francisco, 2001)
MATH Google Scholar
K. Kourou, et al., Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)
Article Google Scholar
R. Agrawal, et al., An interval classier for database mining applications. in Proc. of the VLDB Conference (1992)
S.M.M. Rahman, M.D. Faisal Kabir, M.M. Rahman, Integrated data mining and business intelligence. Encyclopedia of business analytics and optimization. IGI Global, 1234–1253 (2014)
T. Fawcett, An introduction to ROC analysis. Pattern Recogn. Lett. 27.8, 861–874 (2006)
Article Google Scholar
P. Casas, et al., Big-DAMA: big data analytics for network traffic monitoring and analysis. in Proceedings of the 2016 workshop on Fostering Latin-American Research in Data Communication Networks ACM (2016)
H. Kaur, S. Batra, HPCC An Ensembled framework for the prediction of the onset of diabetes. in 2017 4th International Conference on Signal Processing, Computing and Control (ISPCC) (IEEE) (2017)
C. Gibbons, et al., Supervised machine learning algorithms can classify open-text feedback of doctor performance with human-level accuracy. J. Med. Internet Res. 19, 3 (2017)
Article Google Scholar
T. Silwattananusarn, W. Kanarkard, K. Tuamsuk, Enhanced classification accuracy for cardiotocogram data with ensemble feature selection and classifier ensemble. J. Comput. Commun. 4.04, 20 (2016)
Article Google Scholar
M.J. van der Laan, E.C. Polley, A.E. Hubbard, Super learner statistical applications in genetics and molecular biology, 6.1 (2007). Retrieved 19 Mar. 2018, from https://doi.org/10.2202/1544-6115.1309
M.J. Van der Laan, S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data (Springer Science & Business Media, Berlin, 2011)
Book Google Scholar
J. Vanerio, P. Casas, Ensemble-learning approaches for network security and anomaly detection. in Proceedings of the Workshop on Big Data Analytics and Machine Learning for Data Communication Networks ACM (2017)
S. Aiello, et al., Machine Learning with Python and H20. H2O ai Inc (2016)
D. Cireşan, U. Meier, J. Schmidhuber, Multi-column deep neural networks for image classification. arXiv:1202.2745 (2012)
T. Nykodym, et al., Generalized Linear Modeling with H2O. Published by H2O. ai Inc (2016)
E. LeDell, Scalable super learning. Handbook of Big Data 339 (2016)
E.E. LeDell. Scalable Ensemble Learning and Computationally Efficient Variance Estimation (University of California, Berkeley, 2015)
Google Scholar
D.H. Wolpert, Stacked generalization. Neural Netw. 5.2, 241–259 (1992)
Article Google Scholar
L. Breiman, Stacked regressions. Mach. Learn. 24.1, 49–64 (1996)
MATH Google Scholar
M. LeBlanc, R. Tibshirani, Combining estimates in regression and classification. J. Am. Stat. Assoc. 91.436, 1641–1650 (1996)
MathSciNet MATH Google Scholar
M.J. Van der Laan, S. Dudoit, A.W. van der Vaart, Van der The cross-validated adaptive epsilon-net estimator. Statist. Decisions. 24.3, 373–395 (2006)
MATH Google Scholar
P. Casas, J. Vanerio, Super learning for anomaly detection in cellular networks. Wireless and Mobile Computing, Networking and Communications (WiMob). IEEE (2017)
V. Baćak, E.H. Kennedy, Principled machine learning using the super learner: an application to predicting prison Violence. Sociological Methods & Research 0049124117747301 (2018)
B. Antal, A. Hajdu, An ensemble-based system for automatic screening of diabetic retinopathy. Knowl.-Based Syst. 60, 20–27 (2014)
Article Google Scholar
G.I. Salama, M. Abdelhalim, M.A. Zeid, Breast cancer diagnosis on three different datasets using multi-classifiers. Breast Cancer (WDBC). 32.569, 2 (2012)
Google Scholar
D.K. Choubey, et al., Classification of Pima indian diabetes dataset using naive bayes with genetic algorithm as an attribute selection. in Communication and Computing Systems: Proceedings of the International Conference on Communication and Computing System (ICCCS 2016) (2017)
M. Abdar, et al., Performance analysis of classification algorithms on early detection of liver disease. Expert Syst. Appl. 67, 239–251 (2017)
Article Google Scholar
M. Fatima, M. Pasha, Survey of machine learning algorithms for disease diagnostic. J. Intell. Learn. Syst. Appl. 9.01, 1–16 (2017)
Google Scholar
D. Dua, E. Karra Taniskidou, UCI Machine Learning Repository. Irvine, CA, University of California, School of Information and Computer Science http://archive.ics.uci.edu/ml (2017)

Download references

Author information

Authors and Affiliations

Department of Computer Science, North Dakota State University, Fargo, ND, USA
Md Faisal Kabir & Simone A. Ludwig

Authors

Md Faisal Kabir
View author publications
You can also search for this author in PubMed Google Scholar
Simone A. Ludwig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md Faisal Kabir.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kabir, M.F., Ludwig, S.A. Enhancing the Performance of Classification Using Super Learning. Data-Enabled Discov. Appl. 3, 5 (2019). https://doi.org/10.1007/s41688-019-0030-0

Download citation

Received: 04 May 2018
Revised: 20 November 2018
Accepted: 07 January 2019
Published: 17 January 2019
DOI: https://doi.org/10.1007/s41688-019-0030-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing the Performance of Classification Using Super Learning

Abstract

Access this article

Similar content being viewed by others

Imbalanced Data Classification Method Based on Ensemble Learning

A Study on Ensemble Methods for Classification

Single Classifier Selection for Ensemble Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhancing the Performance of Classification Using Super Learning

Abstract

Access this article

Similar content being viewed by others

Imbalanced Data Classification Method Based on Ensemble Learning

A Study on Ensemble Methods for Classification

Single Classifier Selection for Ensemble Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation