RETRACTED ARTICLE: Impact of the learning set’s size

This article was retracted on 05 January 2021

This article has been updated


Learning techniques have proven their capacity to treat large amount of data. Most statistical learning approaches use specific size learning sets and create static models. Withal, in certain some situations such as incremental or active learning the learning process can work with only a smal amount of data. In this case, the search for algorithms capable of producing models with only a few examples begin to be necessary. Generally, the literature relative to classifiers are evaluated according to criteria such as their classification performance, their ability to sort data. But this taxonomy of classifiers can singularly evolve if one is interested in their capabilities in the presence of some few examples. From our point of view, few studies have been carried out on this issue. It is in sense that this paper seeks to study a wider range of learning algorithms as well as data sets in order to show the power of every chosen algorithm that manipulates data. It also appears from this study, problem of algorithm’s choice to process small or large amount of data. And in order to resolve this, we will show that there are algorithms able of generating models with little data. In this case we look to select the smallest amount of data allowing the best learning to be achieved. We also wanted to show that some algorithms are capable of making good predictions with little data that is therefore necessary in order to have the least costly labeling procedure possible. And to concretize this, we will talk first about learning speed and typology of the tested algorithms to know the ability of a classifier to obtain an “interesting” solution to a classification problem using a minimum of examples present in learning, and we will know some various families of classification models based on parameter learning. After that, we will test all the classifiers mentioned previously such as linear and Non-linear classifiers. Then, we will seek to study the behavior these algorithms as a function of learning set’s size trough the experimental protocol in which various datasets will be Splited, manipulated and evaluated from the classification field in order to give results that merge from our experimental protocol. After that, we will discuss the obtained results through the global analysis section, and then conclude with recommendations.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Change history

  • 05 January 2021

    An Erratum to this paper has been published: <ExternalRef><RefSource></RefSource><RefTarget Address="10.1007/s10639-020-10422-x" TargetType="DOI"/></ExternalRef>


  1. Bauer, E., & Kohavi, R. (1999). An Empirical Comparison Of Voting Classification Algorithms: Bagging, boosting, and variants. Machine Learning, 36(1–2), 105–139.

    Article  Google Scholar 

  2. Beluch, W. H., Genewein, T., Nürnberger, A., & Köhler, J. M. (2018). The power of ensembles for active learning in image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9368–9377).

    Google Scholar 

  3. Blake, C. L. & Merz, C. J. (1998). UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences.

  4. Bouchard, G. & Triggs, B. (2004). August. The tradeoff between generative and discriminative classifiers, pp.721–728.

  5. Bouckaert, R. R. (2004). Bayesian network classifiers in weka.

  6. Boulle, M. (2004). Khiops: A Statistical Discretization Method Of Continuous Attributes. Machine Learning, 55(1), 53–69.

    Article  Google Scholar 

  7. Boullé, M. (2005). A grouping method for categorical attributes having very large number of values. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 228–242). Berlin, Heidelberg: Springer.

    Google Scholar 

  8. Boullé, M. (2006a). MODL: A Bayes optimal discretization method for continuous attributes. Machine Learning, 65(1), 131–165.

    Article  Google Scholar 

  9. Boullé, M. (2006b). Regularization and averaging of the selective Na ï ve Bayes classifier. In The 2006 IEEE International Joint Conference on Neural Network Proceedings (pp. 1680–1688) IEEE.

    Google Scholar 

  10. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  11. Breiman, L., Friedman, J., Stone, C. J. & Olshen, R. A. (1984). Classification and regression trees. Boca Raton: CRC press.

  12. Cervantes, A., Gagné, C., Isasi, P. & Parizeau, M. (2018). Evaluating and characterizing incremental learning from non-stationary data. arXiv preprint arXiv:1806.06610.

  13. Chen, S., Webb, G. I., Liu, L., & Ma, X. (2019). A novel selective Naïve Bayes Algorithm. Knowledge-Based Systems, 105361.

  14. Cucker, F., & Smale, S. (2002). Best Choices For Regularization Parameters in learning theory: on the bias-variance problem. Foundations of Computational Mathematics, 2(4), 413–428.

  15. Demiröz, G., & Güvenir, H. A. (1997). Classification by voting feature intervals. In European Conference on Machine Learning (pp. 85–92). Berlin, Heidelberg: Springer.

    Google Scholar 

  16. Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 71–80).

    Google Scholar 

  17. Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2–3), 103–130.

  18. Fawcett, T. (2004). ROC graphs: notes and practical considerations for researchers. Machine Learning, 31(1), 1–38.

  19. Féraud, R., Boullé, M., Clérot, F., Fessant, F., & Lemaire, V. (2010). The orange customer analysis platform. In Industrial Conference on Data Mining (pp. 584–594). Springer, Berlin, Heidelberg.

  20. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.

  21. Freund, Y., & Mason, L. (1999). The alternating decision tree learning algorithm. In icml (Vol. 99, pp. 124–133).

    Google Scholar 

  22. Gama, J., Rocha, R., & Medas, P. (2003). Accurate decision trees for mining high-speed data streams. In proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 523–528).

    Google Scholar 

  23. Gama, J., Medas, P., & Rodrigues, P. (2005). Learning decision trees from dynamic data streams. In proceedings of the 2005 ACM symposium on applied computing (pp. 573–577).

    Google Scholar 

  24. Guyon, I., Lemaire, V., Boullé, M., Dror, G., & Vogel, D. (2009). Analysis of the kdd cup 2009: Fast scoring on a large orange customer database. In KDD-Cup 2009 Competition (pp. 1–22).

    Google Scholar 

  25. Guyon, I., Cawley, G. C., Dror, G., & Lemaire, V. (2011). Results of the active learning challenge. In Active Learning and Experimental Design workshop. In conjunction with AISTATS 2010 (pp. 19–45).

    Google Scholar 

  26. Han, T., Jiang, D., Zhao, Q., Wang, L., & Yin, K. (2018). Comparison of random forest, artificial neural networks and support vector machine for intelligent diagnosis of rotating machinery. Transactions of the Institute of Measurement and Control, 40(8), 2681–2693.

  27. Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: principles and practice. OTexts.

  28. John, G. H. & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence.

  29. Langley, P., Iba, W. & Thomas, K. (1992). An analysis of Bayesian classi er. In proceedings of the Tenth National Conference of Artificial Intelligence.

  30. Le Cessie, S., & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(1), 191–201.

  31. Lim, T. S., Loh, W. Y., & Shih, Y. S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40(3), 203–228.

  32. Losing, V., Hammer, B., & Wersing, H. (2018). Incremental on-iine learning: a review and comparison of state of the art algorithms. Neurocomputing, 275, 1261–1274.

  33. Michalski, R. S., Mozetic, I., Hong, J. & Lavrac, N. (1986). The multi-purpose incremental learning system Aq15 and its testing application to three medical domains. Proc. AAAI 1986, pp.1–041.

  34. Mohamad, S., Sayed-Mouchaweh, M., & Bouchachia, A. (2018). Active learning for classifying data streams with unknown number of classes. Neural Networks, 98, 1–15.

  35. Quinlan, J. R. (1993). C4. 5: programs for machine learning. Morgan Kaufmann, San Francisco. C4. 5: Programs for machine learning. Morgan Kaufmann, San Francisco.

  36. Settles, B. (2010). Active learning literature survey. University of Wisconsin. Madison: Computer Science technical report 1648 52, 55-66.

  37. Wang, J., Zhang, L., Cao, J. J., & Han, D. (2018). NBWELM: Naive Bayesian based weighted extreme learning machine. International Journal of Machine Learning and Cybernetics, 9(1), 21–35.

  38. Wen, J., Fang, X., Cui, J., Fei, L., Yan, K., Chen, Y., & Xu, Y. (2018). Robust sparse linear discriminant analysis. IEEE Transactions on Circuits and Systems for Video Technology, 29(2), 390–403.

  39. Witten, I. H., & Frank, E. (2002). Data mining: practical machine learning tools and techniques with java implementations. ACM SIGMOD Record, 31(1), 76–77.

  40. Wolpert, D. H. (2018). The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In The mathematics of generalization (pp. 117–214). CRC press.

  41. Xu, J., Xu, C., Zou, B., Tang, Y. Y., Peng, J., & You, X. (2018). New incremental learning algorithm with support vector machines. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49(11), 2230–2241.

Online references

  1. Palachy, S. (2019). Detecting stationarity in time series data. Available on line at:

Download references

Author information



Corresponding author

Correspondence to Adil Korchi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been retracted. Please see the retraction notice for more detail:

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Korchi, A., Dardor, M. & Mabrouk, E.H. RETRACTED ARTICLE: Impact of the learning set’s size. Educ Inf Technol 25, 4637–4657 (2020).

Download citation


  • Learning
  • Algorithms
  • Dataset
  • Classifiers
  • Regression
  • Curve