Prediction of Methane Outbreak in Coal Mines from Historical Sensor Data under Distribution Drift
We describe our submission to the IJCRS’15 Data Mining Competition, where the objective is to predict methane outbreaks from multiple sensor readings. Our solution exploits a selective naive Bayes classifier, with optimal preprocessing, variable selection and model averaging, together with an automatic variable construction method that builds many variables from time series records. One challenging part of the challenge is that the input variables are not independent and identically distributed (i.i.d.) between the train and test datasets, since the train data and test data rely on different time periods. We suggest a methodology to alleviate this problem, that enabled to get a final score of 0.9439 (team marcb), second among the 50 challenge competitors.
KeywordsMulti-Relational Data Mining Supervised classification Feature selection Drift detection
- 1.Blockeel, H., De Raedt, L., Ramon, J.: Top-Down Induction of Clustering Trees. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 55–63. Morgan Kaufmann (1998)Google Scholar
- 2.Bondu, A., Boullé, M.: A supervised approach for change detection in data streams. In: Proceedings of International Joint Conference on Neural Networks, pp. 519–526 (2011)Google Scholar
- 5.Boullé, M.: Towards automatic feature construction for supervised classification. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part I. LNCS, vol. 8724, pp. 181–196. Springer, Heidelberg (2014) Google Scholar
- 7.Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0 : step-by-step data mining guide. Technical report, The CRISP-DM consortium (2000)Google Scholar
- 9.Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction: Foundations And Applications. Studies in Fuzziness and Soft Computing, 1st edn. Springer, Heidelberg (2006)Google Scholar
- 12.Knobbe, A.J., Blockeel, H., Siebes, A., Van Der Wallen, D.: Multi-Relational Data Mining. In: Proceedings of Benelearn 1999 (1999)Google Scholar
- 16.Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: 10th National Conference on Artificial Intelligence, pp. 223–228. AAAI Press (1992)Google Scholar
- 17.Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pp. 399–406. Morgan Kaufmann (1994)Google Scholar
- 20.Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., San Francisco (1999) Google Scholar
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.