Skip to main content
Log in

Enhancing the Performance of Classification Using Super Learning

  • ORIGINAL ARTICLE
  • Published:
Data-Enabled Discovery and Applications

Abstract

Classification is one of the supervised learning models, and enhancing the performance of a classification model has been a challenging research problem in the fields of machine learning (ML) and data mining. The goal of ML is to produce or build a model that can be used to perform classification. It is important to achieve superior performance of the classification model. Obtaining a better performance is important for almost all fields including healthcare. Researchers have been using different ML techniques to obtain better performance of their models; ensemble techniques are also used to combine multiple base learner models. The ML technique called super learning or stacked-ensemble is an ensemble method that finds the optimal weighted average of diverse learning models. In this paper, we have used super learning or stacked-ensemble achieving better performance on four benchmark data sets that are related to healthcare. Experimental results show that super learning has a better performance compared to the individual base learners and the baseline ensemble.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. J. Han, M. Kamber. Data Mining Concepts and Techniques (Moraga Kaufman, San Francisco, 2001)

    MATH  Google Scholar 

  2. K. Kourou, et al., Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)

    Article  Google Scholar 

  3. R. Agrawal, et al., An interval classier for database mining applications. in Proc. of the VLDB Conference (1992)

  4. S.M.M. Rahman, M.D. Faisal Kabir, M.M. Rahman, Integrated data mining and business intelligence. Encyclopedia of business analytics and optimization. IGI Global, 1234–1253 (2014)

  5. T. Fawcett, An introduction to ROC analysis. Pattern Recogn. Lett. 27.8, 861–874 (2006)

    Article  Google Scholar 

  6. P. Casas, et al., Big-DAMA: big data analytics for network traffic monitoring and analysis. in Proceedings of the 2016 workshop on Fostering Latin-American Research in Data Communication Networks ACM (2016)

  7. H. Kaur, S. Batra, HPCC An Ensembled framework for the prediction of the onset of diabetes. in 2017 4th International Conference on Signal Processing, Computing and Control (ISPCC) (IEEE) (2017)

  8. C. Gibbons, et al., Supervised machine learning algorithms can classify open-text feedback of doctor performance with human-level accuracy. J. Med. Internet Res. 19, 3 (2017)

    Article  Google Scholar 

  9. T. Silwattananusarn, W. Kanarkard, K. Tuamsuk, Enhanced classification accuracy for cardiotocogram data with ensemble feature selection and classifier ensemble. J. Comput. Commun. 4.04, 20 (2016)

    Article  Google Scholar 

  10. M.J. van der Laan, E.C. Polley, A.E. Hubbard, Super learner statistical applications in genetics and molecular biology, 6.1 (2007). Retrieved 19 Mar. 2018, from https://doi.org/10.2202/1544-6115.1309

  11. M.J. Van der Laan, S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data (Springer Science & Business Media, Berlin, 2011)

    Book  Google Scholar 

  12. J. Vanerio, P. Casas, Ensemble-learning approaches for network security and anomaly detection. in Proceedings of the Workshop on Big Data Analytics and Machine Learning for Data Communication Networks ACM (2017)

  13. S. Aiello, et al., Machine Learning with Python and H20. H2O ai Inc (2016)

  14. D. Cireşan, U. Meier, J. Schmidhuber, Multi-column deep neural networks for image classification. arXiv:1202.2745 (2012)

  15. T. Nykodym, et al., Generalized Linear Modeling with H2O. Published by H2O. ai Inc (2016)

  16. E. LeDell, Scalable super learning. Handbook of Big Data 339 (2016)

  17. E.E. LeDell. Scalable Ensemble Learning and Computationally Efficient Variance Estimation (University of California, Berkeley, 2015)

    Google Scholar 

  18. D.H. Wolpert, Stacked generalization. Neural Netw. 5.2, 241–259 (1992)

    Article  Google Scholar 

  19. L. Breiman, Stacked regressions. Mach. Learn. 24.1, 49–64 (1996)

    MATH  Google Scholar 

  20. M. LeBlanc, R. Tibshirani, Combining estimates in regression and classification. J. Am. Stat. Assoc. 91.436, 1641–1650 (1996)

    MathSciNet  MATH  Google Scholar 

  21. M.J. Van der Laan, S. Dudoit, A.W. van der Vaart, Van der The cross-validated adaptive epsilon-net estimator. Statist. Decisions. 24.3, 373–395 (2006)

    MATH  Google Scholar 

  22. P. Casas, J. Vanerio, Super learning for anomaly detection in cellular networks. Wireless and Mobile Computing, Networking and Communications (WiMob). IEEE (2017)

  23. V. Baćak, E.H. Kennedy, Principled machine learning using the super learner: an application to predicting prison Violence. Sociological Methods & Research 0049124117747301 (2018)

  24. B. Antal, A. Hajdu, An ensemble-based system for automatic screening of diabetic retinopathy. Knowl.-Based Syst. 60, 20–27 (2014)

    Article  Google Scholar 

  25. G.I. Salama, M. Abdelhalim, M.A. Zeid, Breast cancer diagnosis on three different datasets using multi-classifiers. Breast Cancer (WDBC). 32.569, 2 (2012)

    Google Scholar 

  26. D.K. Choubey, et al., Classification of Pima indian diabetes dataset using naive bayes with genetic algorithm as an attribute selection. in Communication and Computing Systems: Proceedings of the International Conference on Communication and Computing System (ICCCS 2016) (2017)

  27. M. Abdar, et al., Performance analysis of classification algorithms on early detection of liver disease. Expert Syst. Appl. 67, 239–251 (2017)

    Article  Google Scholar 

  28. M. Fatima, M. Pasha, Survey of machine learning algorithms for disease diagnostic. J. Intell. Learn. Syst. Appl. 9.01, 1–16 (2017)

    Google Scholar 

  29. D. Dua, E. Karra Taniskidou, UCI Machine Learning Repository. Irvine, CA, University of California, School of Information and Computer Science http://archive.ics.uci.edu/ml (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md Faisal Kabir.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kabir, M.F., Ludwig, S.A. Enhancing the Performance of Classification Using Super Learning. Data-Enabled Discov. Appl. 3, 5 (2019). https://doi.org/10.1007/s41688-019-0030-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41688-019-0030-0

Keywords

Navigation