Advertisement

SVM Parameter Tuning with Grid Search and Its Impact on Reduction of Model Over-fitting

  • Petre Lameski
  • Eftim ZdravevskiEmail author
  • Riste Mingov
  • Andrea Kulakov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9437)

Abstract

In this paper we describe our submission to the IJCRS’15 Data Mining Competition, which is concerned with prediction of dangerous concentrations of methane in longwalls of a Polish coalmine. We address the challenge of building robust classification models with support vector machines (SVMs) that are built from time series data. Moreover, we investigate the impact of parameter tuning of SVMs with grid search on the classification performance and its effect on preventing over-fitting. Our results show improvements of predictive performance with proper parameter tuning but also improved stability of the classification models even when the test data comes from a different time period and class distribution. By applying the proposed method we were able to build a classification model that predicts unseen test data even better than the training data, thus highlighting the non-over-fitting properties of the model. The submitted solution was about 2 % behind the winning solution.

Keywords

Support Vector Machines SVM Grid search Over-fitting Parameter tuning Time series Coalminig 

References

  1. 1.
    Finkelman, R.B.: Health impacts of coal: facts and fallacies. AMBIO J. Hum. Environ. 36(1), 103–106 (2007)CrossRefGoogle Scholar
  2. 2.
    Hendryx, M., Ahern, M.M., Nurkiewicz, T.R.: Hospitalization patterns associated with appalachian coal mining. J. Toxicol. Environ. Health Part A 70(24), 2064–2070 (2007)CrossRefGoogle Scholar
  3. 3.
    Kozielski, M., Skowron, A., Wrbel, L., Sikora, M.: Regression rule learning for methane forecasting in coal mines. In: Kozielski, S., Mrozek, D., Kasprowski, P., Malysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures. Communications in Computer and Information Science, vol. 521, pp. 495–504. Springer, Cham (2015)Google Scholar
  4. 4.
    Krasuski, A., Jankowski, A., Skowron, A., Slezak, D.: From sensory data to decision making: a perspective on supporting a fire commander. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 229–236. IEEE (2013)Google Scholar
  5. 5.
    Janusz, A., Ślȩzak, D., Sikora, M., Wróbel, ł., Stawicki, S., Marek, G., Slezak, D.: Mining data from coal mines: IJCRS’15 data challenge. In: Yao, Y., Hu, Q., Yu, H. Grzymala-Busse, J. (eds.) RSFDGrC 2015. LNCS, vol. 9437, pp. 429–438. Springer, Heidelberg (2015). https://knowledgepit.fedcsis.org/contest/view.php?id=109. Accessed 29 Jun 2015Google Scholar
  6. 6.
    Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)CrossRefGoogle Scholar
  7. 7.
    Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surv. 45(1), 12:1–12:34 (2012)CrossRefGoogle Scholar
  8. 8.
    Hu, B., Chen, Y., Keogh, E.: Classification of streaming time series under more realistic assumptions. Data Min. Knowl. Disc. 1–35 (2015)Google Scholar
  9. 9.
    Nguyen, H.S.: On efficient handling of continuous attributes in large data bases. Fundam. Inf. 48(1), 61–81 (2001)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Grzymala-Busse, J.W.: A new version of the rule induction system lers. Fundam. Inf. 31(1), 27–39 (1997)zbMATHGoogle Scholar
  11. 11.
    Riza, L.S., Janusz, A., Bergmeir, C., Cornelis, C., Herrera, F., Slezak, D., Bentez, J.M.: Implementing algorithms of rough set theory and fuzzy rough set theory in the R package “roughsets”. Information Sciences 287, 68–89 (2014)CrossRefGoogle Scholar
  12. 12.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Ben-Hur, A., Weston, J.: A users guide to support vector machines. In: Carugo, O., Eisenhaber, F. (eds.) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol. 609, pp. 223–239. Humana Press, New York (2010)CrossRefGoogle Scholar
  14. 14.
    Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classificationGoogle Scholar
  15. 15.
    Zdravevski, E., Lameski, P., Mingov, R., Kulakov, A., Gjorgjevikj, D.: Robust histogram-based feature engineering of time series data. In Ganzha, M., Maciaszek, L.A., Paprzycki, M., (eds.) Proceedings of the 2015 Federated Conference on Computer Science and Information Systems (2015, in print)Google Scholar
  16. 16.
    Zdravevski, E., Lameski, P., Kulakov, A., Gjorgjevikj, D.: Feature selection and allocation to diverse subsets for multi-label learning problems with large datasets. In: 2014 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 387–394, September 2014Google Scholar
  17. 17.
    Jolliffe, I.: Principal component analysis. In: Balakrishnan, N., Colton, T., Everitt, B., Piegorsch, W., Ruggeri, F., Teugels, J.L. (eds.) Wiley StatsRef: Statistics Reference Online. Wiley, Chichester (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Petre Lameski
    • 1
  • Eftim Zdravevski
    • 1
    Email author
  • Riste Mingov
    • 2
  • Andrea Kulakov
    • 1
  1. 1.Faculty of Computer Science and EngineeringSaints Cyril and Methodius UniversitySkopjeMacedonia
  2. 2.NI TEKNA - Intelligent TechnologiesNegotinoMacedonia

Personalised recommendations