Abstract
We describe our submission to the IJCRS’15 Data Mining Competition, where the objective is to predict methane outbreaks from multiple sensor readings. Our solution exploits a selective naive Bayes classifier, with optimal preprocessing, variable selection and model averaging, together with an automatic variable construction method that builds many variables from time series records. One challenging part of the challenge is that the input variables are not independent and identically distributed (i.i.d.) between the train and test datasets, since the train data and test data rely on different time periods. We suggest a methodology to alleviate this problem, that enabled to get a final score of 0.9439 (team marcb), second among the 50 challenge competitors.
Notes
- 1.
- 2.
Available as a shareware at http://www.khiops.com.
References
Blockeel, H., De Raedt, L., Ramon, J.: Top-Down Induction of Clustering Trees. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 55–63. Morgan Kaufmann (1998)
Bondu, A., Boullé, M.: A supervised approach for change detection in data streams. In: Proceedings of International Joint Conference on Neural Networks, pp. 519–526 (2011)
Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Mach. Learn. 65(1), 131–165 (2006)
Boullé, M.: Compression-based averaging of selective naive Bayes classifiers. J. Mach. Learn. Res. 8, 1659–1685 (2007)
Boullé, M.: Towards automatic feature construction for supervised classification. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part I. LNCS, vol. 8724, pp. 181–196. Springer, Heidelberg (2014)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0 : step-by-step data mining guide. Technical report, The CRISP-DM consortium (2000)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the 12th International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, San Francisco (1995)
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction: Foundations And Applications. Studies in Fuzziness and Soft Computing, 1st edn. Springer, Heidelberg (2006)
Hand, D., Yu, K.: Idiot’s bayes ? not so stupid after all? Int. Stat. Rev. 69(3), 385–399 (2001)
Hoeting, J., Madigan, D., Raftery, A., Volinsky, C.: Bayesian model averaging: a tutorial. Stat. Sci. 14(4), 382–417 (1999)
Knobbe, A.J., Blockeel, H., Siebes, A., Van Der Wallen, D.: Multi-Relational Data Mining. In: Proceedings of Benelearn 1999 (1999)
Kohavi, R., John, G.: Wrappers for feature selection. Artif. Intell. 97(1–2), 273–324 (1997)
Kramer, S., Flach, P.A., Lavrač, N.: Propositionalization approaches to relational data mining. In: Džeroski, S., Lavrač, N. (eds.) Relational data mining, chap. 11, pp. 262–286. Springer-Verlag, Heidelberg (2001)
Krogel, M.-A., Wrobel, S.: Transformation-based learning using multirelational aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, p. 142. Springer, Heidelberg (2001)
Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: 10th National Conference on Artificial Intelligence, pp. 223–228. AAAI Press (1992)
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pp. 399–406. Morgan Kaufmann (1994)
Liu, H., Hussain, F., Tan, C., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 4(6), 393–423 (2002)
Liu, H., Motoda, H.: Feature Extraction: A Data Mining Perspective, Construction and Selection. Kluwer Academic Publishers, Boston (1998)
Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., San Francisco (1999)
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Boullé, M. (2015). Prediction of Methane Outbreak in Coal Mines from Historical Sensor Data under Distribution Drift. In: Yao, Y., Hu, Q., Yu, H., Grzymala-Busse, J.W. (eds) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Lecture Notes in Computer Science(), vol 9437. Springer, Cham. https://doi.org/10.1007/978-3-319-25783-9_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-25783-9_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25782-2
Online ISBN: 978-3-319-25783-9
eBook Packages: Computer ScienceComputer Science (R0)