Automated Machine Learning Algorithm Mining for Classification Problem

  • Meng-Sung WuEmail author
  • Jun-Yi Lu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10934)


Hyper-parameter optimization and the identification of the learning algorithm best suited to a particular dataset can be exceedingly difficult. Researchers have developed automated methods for the selection of an algorithm and the associated hyper-parameters; however, this approach is not necessarily applicable to other datasets. In this paper, we present a method for the selection of a learning algorithm while simultaneously setting the hyper-parameters in a two-stage process: (1) Identification of important hyper-parameters to streamline the optimization process, and (2) Heuristic formulation based on sequence analysis to limit the long-tuning time and identify the optimal algorithm/ hyper-parameter combination. The proposed method greatly reduces the training time without a significant loss of performance in classification tasks.


Machine learning Hyper-parameter Algorithm selection Optimization 



We are very grateful to the anonymous reviewers and editor. This work was sponsored by Ministry of Economic Affairs, Taiwan, R.O.C. through project No. G301ARY910 conducted by ITRI. The author would like to thank the anonymous reviewers for their detailed comments and suggestions that helped to improve the paper.


  1. 1.
    Ali, S., Smith, K.: On learning algorithm selection for classification. Appl. Soft Comput. 6, 119–138 (2006)CrossRefGoogle Scholar
  2. 2.
    Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)Google Scholar
  3. 3.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Bernard, S., Heutte, L., Adam, S.: Influence of hyperparameters on random forest accuracy. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds.) MCS 2009. LNCS, vol. 5519, pp. 171–180. Springer, Heidelberg (2009). Scholar
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefGoogle Scholar
  6. 6.
    Caruana, R., Lawrence, S., Giles, L.: Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: Proceedings of the 13th International Conference on Neural Information Processing Systems, pp. 381–387 (2000)Google Scholar
  7. 7.
    Collobert, R., Bengio, S.: Links between perceptrons, MLPs and SVMs. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 23–30 (2004)Google Scholar
  8. 8.
    Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 1070–1077 (2016)Google Scholar
  9. 9.
    Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Fidler, F., Thompson, B.: Computing correct confidence intervals for ANOVA fixed- and random-effects effect sizes. Educ. Psychol. Measur. 61, 575–604 (2001)MathSciNetGoogle Scholar
  11. 11.
    Hooker, G.: Generalized functional ANOVA diagnostics for high-dimensional functions of dependent variables. J. Comput. Graph. Stat. 16, 709–732 (2007)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Jones, D., Schonlau, M., Welch, W.: Efficient global optimization of expensive black box functions. J. Glob. Optim. 13, 455–492 (1998)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1137–1143 (1995)Google Scholar
  14. 14.
    Komer, B., Bergstra, J., Eliasmith, C.: Hyperopt-Sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML Workshop on AutoML (2014)Google Scholar
  15. 15.
    Lin, S., Ying, K., Chen, S., Lee, Z.: Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst. Appl. 35, 1817–1824 (2008)CrossRefGoogle Scholar
  16. 16.
    Luo, G.: A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Netw. Model. Anal. Health Inform. Bioinform. 5, 18 (2016)CrossRefGoogle Scholar
  17. 17.
    Masini, S., Bientinesi, P.: High-performance parallel computations using python as high-level language. In: Guarracino, Mario R., et al. (eds.) Euro-Par 2010. LNCS, vol. 6586, pp. 541–548. Springer, Heidelberg (2011). Scholar
  18. 18.
    McElroy, F.: A necessary and sufficient condition that ordinary least-squares estimators be best linear unbiased. J. Am. Stat. Assoc. 62, 1302 (1967)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Pedregosa, F.: Hyperparameter optimization with approximate gradient. In: Proceedings of the International Conference on Machine Learning, pp. 737–746 (2016)Google Scholar
  21. 21.
    Prechelt, L.: Automatic early stopping using cross validation: quantifying the criteria. Neural Netw. 11, 761–767 (1998)CrossRefGoogle Scholar
  22. 22.
    Puntanen, S., Styan, G.: The equality of the ordinary least squares estimator and the best linear unbiased estimator. Am. Stat. 43, 153 (1989)MathSciNetGoogle Scholar
  23. 23.
    Rao, C.: Linear Statistical Inference and Its Applications. Wiley, New York (2002)Google Scholar
  24. 24.
    Schreuder, M., Höhne, J., Blankertz, B., Haufe, S., Dickhaus, T., Tangermann, M.: Optimizing event-related potential based brain–computer interfaces: a systematic evaluation of dynamic stopping methods. J. Neural Eng. 10, 036025 (2013)CrossRefGoogle Scholar
  25. 25.
    Skipper, S., Josef, P.: Statsmodels: econometric and statistical modeling with python. In: Proceedings of the 9th Python in Science Conference, pp. 57–61 (2010)Google Scholar
  26. 26.
    Snoek, J., Larochelle, H., Adams, R.: Practical Bayesian optimization of machine learning algorithms. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 2951–2959 (2012)Google Scholar
  27. 27.
    Thornton, C., Hutter, F., Hoos, H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855 (2013)Google Scholar
  28. 28.
    Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Amsterdam (2011)Google Scholar
  29. 29.
    Wolpert, D.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8, 1341–1390 (1996)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Industrial Technology Research InstituteHsinchuTaiwan
  2. 2.National Chiao Tung UniversityHsinchuTaiwan

Personalised recommendations