Advertisement

Self Hyper-Parameter Tuning for Data Streams

  • Bruno Veloso
  • João Gama
  • Benedita MalheiroEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11198)

Abstract

The widespread usage of smart devices and sensors together with the ubiquity of the Internet access is behind the exponential growth of data streams. Nowadays, there are hundreds of machine learning algorithms able to process high-speed data streams. However, these algorithms rely on human expertise to perform complex processing tasks like hyper-parameter tuning. This paper addresses the problem of data variability modelling in data streams. Specifically, we propose and evaluate a new parameter tuning algorithm called Self Parameter Tuning (SPT). SPT consists of an online adaptation of the Nelder & Mead optimisation algorithm for hyper-parameter tuning. The method explores a dynamic size sample method to evaluate the current solution, and uses the Nelder & Mead operators to update the current set of parameters. The main contribution is the adaptation of the Nelder-Mead algorithm to automatically tune regression hyper-parameters for data streams. Additionally, whenever concept drifts occur in the data stream, it re-initiates the search for new hyper-parameters. The proposed method has been evaluated on regression scenario. Experiments with well known time-evolving data streams show that the proposed SPT hyper-parameter optimisation outperforms the results of previous expert hyper-parameter tuning efforts.

Keywords

Parameter tuning Hyper-parameters Optimisation Nelder-Mead Regression 

Notes

Acknowledgements

This work is partially funded by the ERDF through the COMPETE 2020 Programme within project POCI-01-0145-FEDER-006961, and by National Funds through the FCT as part of project UID/EEA/50014/2013.

References

  1. 1.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012). http://dl.acm.org/citation.cfm?id=2503308.2188395
  2. 2.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11(May), 1601–1604 (2010)Google Scholar
  3. 3.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). http://dl.acm.org/citation.cfm?id=1248547.1248548
  4. 4.
    Duarte, J., Gama, J., Bifet, A.: Adaptive model rules from high-speed data streams. ACM Trans. Knowl. Discov. Data 10(3), 30:1–30:22 (2016). http://doi.acm.org/10.1145/2829955CrossRefGoogle Scholar
  5. 5.
    Escalante, H.J., Montes, M., Sucar, E.: Ensemble particle swarm model selection. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)Google Scholar
  6. 6.
    Escalante, H.J., Montes, M., Sucar, L.E.: Particle swarm model selection. J. Mach. Learn. Res. 10(Feb), 405–440 (2009)Google Scholar
  7. 7.
    Fernandes, S., Tork, H.F., Gama, J.: The initialization and parameter setting problem in tensor decomposition-based link prediction. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 99–108 (Oct 2017).  https://doi.org/10.1109/DSAA.2017.83
  8. 8.
    Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)Google Scholar
  9. 9.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1126–1135. PMLR, International Convention Centre, Sydney, Australia (06–11 Aug 2017). http://proceedings.mlr.press/v70/finn17a.html
  10. 10.
    Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 329–338. ACM (2009)Google Scholar
  11. 11.
    Gama, J.: Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013).  https://doi.org/10.1007/s10994-012-5320-9MathSciNetCrossRefGoogle Scholar
  12. 12.
    Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification (2003)Google Scholar
  13. 13.
    Kar, R., Konar, A., Chakraborty, A., Ralescu, A.L., Nagar, A.K.: Extending the nelder-mead algorithm for feature selection from brain networks. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 4528–4534. IEEE (2016)Google Scholar
  14. 14.
    Koenigstein, N., Dror, G., Koren, Y.: Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy. In: Proceedings of the Fifth ACM Conference on Recommender Systems, pp. 165–172. ACM (2011)Google Scholar
  15. 15.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, pp. 1137–1143. IJCAI 1995. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1995). http://dl.acm.org/citation.cfm?id=1643031.1643047
  16. 16.
    Kohavi, R., John, G.H.: Automatic parameter selection by minimizing estimated error. In: Machine Learning Proceedings 1995, pp. 304–312. Elsevier (1995)Google Scholar
  17. 17.
    Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K.: Auto-weka 2.0: Automatic model selection and hyperparameter optimization in weka. J. Mach. Learn. Res. 18(1), 826–830 (2017). http://dl.acm.org/citation.cfm?id=3122009.3122034
  18. 18.
    Laboratoire d’Informatique de Grenoble: Twitter data set, http://ama.liglab.fr/resourcestools/datasets/buzz-prediction-in-social-media/, Accessed on March 2018
  19. 19.
    Maclaurin, D., Duvenaud, D., Adams, R.P.: Gradient-based hyperparameter optimization through reversible learning. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, pp. 2113–2122. ICML 2015, JMLR.org (2015), http://dl.acm.org/citation.cfm?id=3045118.3045343
  20. 20.
    McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947).  https://doi.org/10.1007/BF02295996CrossRefGoogle Scholar
  21. 21.
    Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965).  https://doi.org/10.1093/comjnl/7.4.308MathSciNetCrossRefGoogle Scholar
  22. 22.
    Nemenyi, P.: Distribution-free multiple comparisons. In: Biometrics. vol. 18, p. 263. INTERNATIONAL BIOMETRIC SOC 1441 I ST, NW, SUITE 700, WASHINGTON, DC 20005–2210 (1962)Google Scholar
  23. 23.
    Nichol, A., Schulman, J.: Reptile: a Scalable Metalearning Algorithm. ArXiv e-prints (2018)Google Scholar
  24. 24.
    Pfaffe, P., Tillmann, M., Walter, S., Tichy, W.F.: Online-autotuning in the presence of algorithmic choice. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1379–1388. IEEE (2017)Google Scholar
  25. 25.
    Sebastião, R., Fernandes, J.M.: Supporting the page-hinkley test with empirical mode decomposition for change detection. In: Kryszkiewicz, M., Appice, A., Ślkezak, D., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2017. LNCS (LNAI), vol. 10352, pp. 492–498. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-60438-1_48Google Scholar
  26. 26.
    Takács, G., Pilászy, I., Németh, B., Tikk, D.: Scalable collaborative filtering approaches for large recommender systems. J. Mach. Learn. Res. 10, 623–656 (2009). http://dl.acm.org/citation.cfm?id=1577069.1577091
  27. 27.
    Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 847–855. KDD 2013. ACM, New York, NY, USA (2013). http://doi.acm.org/10.1145/2487575.2487629
  28. 28.
    University of California: SGEMM GPU kernel performance data set, https://archive.ics.uci.edu/ml/datasets/SGEMM+GPU+kernel+performance/, Accessed on March 2018
  29. 29.
    University of California: YearPredictionMSD data set, https://archive.ics.uci.edu/ml/datasets/yearpredictionmsd, Accessed on March 2018
  30. 30.
    Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945). http://www.jstor.org/stable/3001968CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Bruno Veloso
    • 1
    • 2
  • João Gama
    • 1
    • 3
  • Benedita Malheiro
    • 4
    • 5
    Email author
  1. 1.LIAAD - INESC TECPortoPortugal
  2. 2.UPT - University PortucalensePortoPortugal
  3. 3.FEP - University of PortoPortoPortugal
  4. 4.ISEP - Polytechnic of PortoPortoPortugal
  5. 5.CRAS - INESC TECPortoPortugal

Personalised recommendations