Abstract
Drinking water is one of fundamental human needs. During delivery in distribution network, drinking water is susceptible to contaminants. Early recognition of changes in water quality is essential in the provision of clean and safe drinking water. For this purpose, Contamination warning system (CWS) composed of sensors, central database and event detection system (EDS) has been developed. Conventionally, EDS employs time series analysis and domain knowledge for automated detection. This paper proposes a general data driven approach to construct an automated online event detention system for drinking water. Various tree ensemble models are investigated in application to real-world water quality data. In particular, gradient boosting methods are shown to overcome challenges in time series data imbalanced class and collinearity and yield satisfied predictive performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McKenna, S.A., Hart, D.B., Murray, R., Haxton, T.: Testing and evaluation of water quality event detection algorithms. In: Clark, R.M., Hakim, S., Ostfeld, A. (eds.) Handbook of Waterand Wastewater Systems Protection, pp. 369–396. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-0189-6_19
Hamilton, J.D.: Time Series Analysis. Princeton University Press, Princeton (1994)
Byer, D., Carlson, K.H.: Real-time detection of intentional chemical contamination in the distribution system. J.- Am. Water Work. Assoc. 97(7), 130–133 (2005)
Hall, J., Szabo, J.: WaterSentinel Online Water Quality Monitoring as an Indicator of Drinking Water Contamination. Environmental Protection Agency, Washington, DC, USA (2005)
Klise, K.A., McKenna, S.A.: Multivariate applications for detecting anomalous water quality. In: Water Distribution Systems Analysis Symposium 2006, Cincinnati, Ohio, United States, pp. 1–11. American Society of Civil Engineers, March 2008
Jeffrey Yang, Y., Haught, R.C., Goodrich, J.A.: Real-time contaminant detection and classification in a drinking water pipe using conventional water quality sensors: techniques and experimental results. J. Environ. Manag. 90(8), 2494–2506 (2009)
Hou, D., He, H., Huang, P., Zhang, G., Loaiciga, H.: Detection of water-quality contamination events based on multi-sensor fusion using an extented Dempster-Shafer method. Meas. Sci. Technol. 24(5), 055801 (2013)
Perelman, L., Arad, J., Housh, M., Ostfeld, A.: Event detection in water distribution systems from multivariate water quality time series. Environ. Sci. Technol. 46(15), 8212–8219 (2012)
Muharemi, F., Logofătu, D., Andersson, C., Leon, F.: Approaches to building a detection model for water quality: a case study. In: Sieminski, A., Kozierkiewicz, A., Nunez, M., Ha, Q.T. (eds.) Modern Approaches for Intelligent Information and Database Systems. SCI, vol. 769, pp. 173–183. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76081-0_15
Kang, G., Gao, J.Z., Xie, G.: Data-driven water quality analysis and prediction: a survey. In: 2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService), pp. 224–232, April 2017
Li, P.: Robust logitboost and adaptive base class (ABC) logitboost. In: Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI 2010, Arlington, Virginia, United States, pp. 302–311. AUAI Press (2010)
He, X., et al.: Practical lessons from predicting clicks on ads at Facebook. In: Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD 2014, New York, NY, USA, pp. 5:1–5:9. ACM (2014)
Rehbach, F., Moritz, S., Chandrasekaran, S., Rebolledo, M., Friese, M., Bartz-Beielstein, T.: GECCO 2018 Industrial Challenge, Monitoring of drinking-water quality (2018)
Muharemi, F., Logofătu, D., Leon, F.: Review on general techniques and packages for data imputation in R on a real world dataset. In: Nguyen, N.T., Pimenidis, E., Khan, Z., Trawiński, B. (eds.) ICCCI 2018. LNCS (LNAI), vol. 11056, pp. 386–395. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98446-9_36
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26(5), 1651–1686 (1998)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst., Man, Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)
Dormann, C.F., et al.: Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36(1), 27–46 (2013)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Deng, H., Runger, G.: Gene selection with guided regularized random forest. Pattern Recognit. 46(12), 3483–3489 (2013)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, New York, NY, USA, pp. 785–794. ACM (2016)
Rashmi, K., Gilad-Bachrach, R.: Dart: dropouts meet multiple additive regression trees. In: International Conference on Artificial Intelligence and Statistics, pp. 489–497 (2015)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Nguyen, M., Logofătu, D. (2018). Applying Tree Ensemble to Detect Anomalies in Real-World Water Composition Dataset. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2018. IDEAL 2018. Lecture Notes in Computer Science(), vol 11314. Springer, Cham. https://doi.org/10.1007/978-3-030-03493-1_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-03493-1_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03492-4
Online ISBN: 978-3-030-03493-1
eBook Packages: Computer ScienceComputer Science (R0)