SVM Parameter Tuning with Grid Search and Its Impact on Reduction of Model Over-fitting
In this paper we describe our submission to the IJCRS’15 Data Mining Competition, which is concerned with prediction of dangerous concentrations of methane in longwalls of a Polish coalmine. We address the challenge of building robust classification models with support vector machines (SVMs) that are built from time series data. Moreover, we investigate the impact of parameter tuning of SVMs with grid search on the classification performance and its effect on preventing over-fitting. Our results show improvements of predictive performance with proper parameter tuning but also improved stability of the classification models even when the test data comes from a different time period and class distribution. By applying the proposed method we were able to build a classification model that predicts unseen test data even better than the training data, thus highlighting the non-over-fitting properties of the model. The submitted solution was about 2 % behind the winning solution.
KeywordsSupport Vector Machines SVM Grid search Over-fitting Parameter tuning Time series Coalminig
- 3.Kozielski, M., Skowron, A., Wrbel, L., Sikora, M.: Regression rule learning for methane forecasting in coal mines. In: Kozielski, S., Mrozek, D., Kasprowski, P., Malysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures. Communications in Computer and Information Science, vol. 521, pp. 495–504. Springer, Cham (2015)Google Scholar
- 4.Krasuski, A., Jankowski, A., Skowron, A., Slezak, D.: From sensory data to decision making: a perspective on supporting a fire commander. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 229–236. IEEE (2013)Google Scholar
- 5.Janusz, A., Ślȩzak, D., Sikora, M., Wróbel, ł., Stawicki, S., Marek, G., Slezak, D.: Mining data from coal mines: IJCRS’15 data challenge. In: Yao, Y., Hu, Q., Yu, H. Grzymala-Busse, J. (eds.) RSFDGrC 2015. LNCS, vol. 9437, pp. 429–438. Springer, Heidelberg (2015). https://knowledgepit.fedcsis.org/contest/view.php?id=109. Accessed 29 Jun 2015Google Scholar
- 8.Hu, B., Chen, Y., Keogh, E.: Classification of streaming time series under more realistic assumptions. Data Min. Knowl. Disc. 1–35 (2015)Google Scholar
- 12.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
- 14.Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classificationGoogle Scholar
- 15.Zdravevski, E., Lameski, P., Mingov, R., Kulakov, A., Gjorgjevikj, D.: Robust histogram-based feature engineering of time series data. In Ganzha, M., Maciaszek, L.A., Paprzycki, M., (eds.) Proceedings of the 2015 Federated Conference on Computer Science and Information Systems (2015, in print)Google Scholar
- 16.Zdravevski, E., Lameski, P., Kulakov, A., Gjorgjevikj, D.: Feature selection and allocation to diverse subsets for multi-label learning problems with large datasets. In: 2014 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 387–394, September 2014Google Scholar
- 17.Jolliffe, I.: Principal component analysis. In: Balakrishnan, N., Colton, T., Everitt, B., Piegorsch, W., Ruggeri, F., Teugels, J.L. (eds.) Wiley StatsRef: Statistics Reference Online. Wiley, Chichester (2014)Google Scholar
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.