Prediction of Methane Outbreak in Coal Mines from Historical Sensor Data under Distribution Drift

  • Marc BoulléEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9437)


We describe our submission to the IJCRS’15 Data Mining Competition, where the objective is to predict methane outbreaks from multiple sensor readings. Our solution exploits a selective naive Bayes classifier, with optimal preprocessing, variable selection and model averaging, together with an automatic variable construction method that builds many variables from time series records. One challenging part of the challenge is that the input variables are not independent and identically distributed (i.i.d.) between the train and test datasets, since the train data and test data rely on different time periods. We suggest a methodology to alleviate this problem, that enabled to get a final score of 0.9439 (team marcb), second among the 50 challenge competitors.


Multi-Relational Data Mining Supervised classification Feature selection Drift detection 


  1. 1.
    Blockeel, H., De Raedt, L., Ramon, J.: Top-Down Induction of Clustering Trees. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 55–63. Morgan Kaufmann (1998)Google Scholar
  2. 2.
    Bondu, A., Boullé, M.: A supervised approach for change detection in data streams. In: Proceedings of International Joint Conference on Neural Networks, pp. 519–526 (2011)Google Scholar
  3. 3.
    Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Mach. Learn. 65(1), 131–165 (2006)CrossRefGoogle Scholar
  4. 4.
    Boullé, M.: Compression-based averaging of selective naive Bayes classifiers. J. Mach. Learn. Res. 8, 1659–1685 (2007)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Boullé, M.: Towards automatic feature construction for supervised classification. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part I. LNCS, vol. 8724, pp. 181–196. Springer, Heidelberg (2014) Google Scholar
  6. 6.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHGoogle Scholar
  7. 7.
    Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0 : step-by-step data mining guide. Technical report, The CRISP-DM consortium (2000)Google Scholar
  8. 8.
    Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the 12th International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, San Francisco (1995)CrossRefGoogle Scholar
  9. 9.
    Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction: Foundations And Applications. Studies in Fuzziness and Soft Computing, 1st edn. Springer, Heidelberg (2006)Google Scholar
  10. 10.
    Hand, D., Yu, K.: Idiot’s bayes ? not so stupid after all? Int. Stat. Rev. 69(3), 385–399 (2001)zbMATHGoogle Scholar
  11. 11.
    Hoeting, J., Madigan, D., Raftery, A., Volinsky, C.: Bayesian model averaging: a tutorial. Stat. Sci. 14(4), 382–417 (1999)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Knobbe, A.J., Blockeel, H., Siebes, A., Van Der Wallen, D.: Multi-Relational Data Mining. In: Proceedings of Benelearn 1999 (1999)Google Scholar
  13. 13.
    Kohavi, R., John, G.: Wrappers for feature selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefGoogle Scholar
  14. 14.
    Kramer, S., Flach, P.A., Lavrač, N.: Propositionalization approaches to relational data mining. In: Džeroski, S., Lavrač, N. (eds.) Relational data mining, chap. 11, pp. 262–286. Springer-Verlag, Heidelberg (2001)CrossRefGoogle Scholar
  15. 15.
    Krogel, M.-A., Wrobel, S.: Transformation-based learning using multirelational aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, p. 142. Springer, Heidelberg (2001) CrossRefGoogle Scholar
  16. 16.
    Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: 10th National Conference on Artificial Intelligence, pp. 223–228. AAAI Press (1992)Google Scholar
  17. 17.
    Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pp. 399–406. Morgan Kaufmann (1994)Google Scholar
  18. 18.
    Liu, H., Hussain, F., Tan, C., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 4(6), 393–423 (2002)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Liu, H., Motoda, H.: Feature Extraction: A Data Mining Perspective, Construction and Selection. Kluwer Academic Publishers, Boston (1998) CrossRefGoogle Scholar
  20. 20.
    Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., San Francisco (1999) Google Scholar
  21. 21.
    Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (, which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  1. 1.Orange LabsLannionFrance

Personalised recommendations