Skip to main content

Prediction of Methane Outbreak in Coal Mines from Historical Sensor Data under Distribution Drift

  • Conference paper
  • First Online:
Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9437))

Abstract

We describe our submission to the IJCRS’15 Data Mining Competition, where the objective is to predict methane outbreaks from multiple sensor readings. Our solution exploits a selective naive Bayes classifier, with optimal preprocessing, variable selection and model averaging, together with an automatic variable construction method that builds many variables from time series records. One challenging part of the challenge is that the input variables are not independent and identically distributed (i.i.d.) between the train and test datasets, since the train data and test data rely on different time periods. We suggest a methodology to alleviate this problem, that enabled to get a final score of 0.9439 (team marcb), second among the 50 challenge competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    https://knowledgepit.fedcsis.org/contest/view.php?id=109.

  2. 2.

    Available as a shareware at http://www.khiops.com.

References

  1. Blockeel, H., De Raedt, L., Ramon, J.: Top-Down Induction of Clustering Trees. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 55–63. Morgan Kaufmann (1998)

    Google Scholar 

  2. Bondu, A., Boullé, M.: A supervised approach for change detection in data streams. In: Proceedings of International Joint Conference on Neural Networks, pp. 519–526 (2011)

    Google Scholar 

  3. Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Mach. Learn. 65(1), 131–165 (2006)

    Article  MATH  Google Scholar 

  4. Boullé, M.: Compression-based averaging of selective naive Bayes classifiers. J. Mach. Learn. Res. 8, 1659–1685 (2007)

    MathSciNet  MATH  Google Scholar 

  5. Boullé, M.: Towards automatic feature construction for supervised classification. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part I. LNCS, vol. 8724, pp. 181–196. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  6. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    Article  MATH  Google Scholar 

  7. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0 : step-by-step data mining guide. Technical report, The CRISP-DM consortium (2000)

    Google Scholar 

  8. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the 12th International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  9. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction: Foundations And Applications. Studies in Fuzziness and Soft Computing, 1st edn. Springer, Heidelberg (2006)

    Google Scholar 

  10. Hand, D., Yu, K.: Idiot’s bayes ? not so stupid after all? Int. Stat. Rev. 69(3), 385–399 (2001)

    MATH  Google Scholar 

  11. Hoeting, J., Madigan, D., Raftery, A., Volinsky, C.: Bayesian model averaging: a tutorial. Stat. Sci. 14(4), 382–417 (1999)

    MathSciNet  MATH  Google Scholar 

  12. Knobbe, A.J., Blockeel, H., Siebes, A., Van Der Wallen, D.: Multi-Relational Data Mining. In: Proceedings of Benelearn 1999 (1999)

    Google Scholar 

  13. Kohavi, R., John, G.: Wrappers for feature selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Article  MATH  Google Scholar 

  14. Kramer, S., Flach, P.A., Lavrač, N.: Propositionalization approaches to relational data mining. In: Džeroski, S., Lavrač, N. (eds.) Relational data mining, chap. 11, pp. 262–286. Springer-Verlag, Heidelberg (2001)

    Google Scholar 

  15. Krogel, M.-A., Wrobel, S.: Transformation-based learning using multirelational aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, p. 142. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  16. Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: 10th National Conference on Artificial Intelligence, pp. 223–228. AAAI Press (1992)

    Google Scholar 

  17. Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pp. 399–406. Morgan Kaufmann (1994)

    Google Scholar 

  18. Liu, H., Hussain, F., Tan, C., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 4(6), 393–423 (2002)

    Article  MathSciNet  Google Scholar 

  19. Liu, H., Motoda, H.: Feature Extraction: A Data Mining Perspective, Construction and Selection. Kluwer Academic Publishers, Boston (1998)

    Book  MATH  Google Scholar 

  20. Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., San Francisco (1999)

    Google Scholar 

  21. Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Boullé .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Boullé, M. (2015). Prediction of Methane Outbreak in Coal Mines from Historical Sensor Data under Distribution Drift. In: Yao, Y., Hu, Q., Yu, H., Grzymala-Busse, J.W. (eds) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Lecture Notes in Computer Science(), vol 9437. Springer, Cham. https://doi.org/10.1007/978-3-319-25783-9_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25783-9_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25782-2

  • Online ISBN: 978-3-319-25783-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics