Skip to main content

Applying Tree Ensemble to Detect Anomalies in Real-World Water Composition Dataset

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2018 (IDEAL 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11314))

Abstract

Drinking water is one of fundamental human needs. During delivery in distribution network, drinking water is susceptible to contaminants. Early recognition of changes in water quality is essential in the provision of clean and safe drinking water. For this purpose, Contamination warning system (CWS) composed of sensors, central database and event detection system (EDS) has been developed. Conventionally, EDS employs time series analysis and domain knowledge for automated detection. This paper proposes a general data driven approach to construct an automated online event detention system for drinking water. Various tree ensemble models are investigated in application to real-world water quality data. In particular, gradient boosting methods are shown to overcome challenges in time series data imbalanced class and collinearity and yield satisfied predictive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. McKenna, S.A., Hart, D.B., Murray, R., Haxton, T.: Testing and evaluation of water quality event detection algorithms. In: Clark, R.M., Hakim, S., Ostfeld, A. (eds.) Handbook of Waterand Wastewater Systems Protection, pp. 369–396. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-0189-6_19

    Chapter  Google Scholar 

  2. Hamilton, J.D.: Time Series Analysis. Princeton University Press, Princeton (1994)

    MATH  Google Scholar 

  3. Byer, D., Carlson, K.H.: Real-time detection of intentional chemical contamination in the distribution system. J.- Am. Water Work. Assoc. 97(7), 130–133 (2005)

    Article  Google Scholar 

  4. Hall, J., Szabo, J.: WaterSentinel Online Water Quality Monitoring as an Indicator of Drinking Water Contamination. Environmental Protection Agency, Washington, DC, USA (2005)

    Google Scholar 

  5. Klise, K.A., McKenna, S.A.: Multivariate applications for detecting anomalous water quality. In: Water Distribution Systems Analysis Symposium 2006, Cincinnati, Ohio, United States, pp. 1–11. American Society of Civil Engineers, March 2008

    Google Scholar 

  6. Jeffrey Yang, Y., Haught, R.C., Goodrich, J.A.: Real-time contaminant detection and classification in a drinking water pipe using conventional water quality sensors: techniques and experimental results. J. Environ. Manag. 90(8), 2494–2506 (2009)

    Article  Google Scholar 

  7. Hou, D., He, H., Huang, P., Zhang, G., Loaiciga, H.: Detection of water-quality contamination events based on multi-sensor fusion using an extented Dempster-Shafer method. Meas. Sci. Technol. 24(5), 055801 (2013)

    Article  Google Scholar 

  8. Perelman, L., Arad, J., Housh, M., Ostfeld, A.: Event detection in water distribution systems from multivariate water quality time series. Environ. Sci. Technol. 46(15), 8212–8219 (2012)

    Article  Google Scholar 

  9. Muharemi, F., Logofătu, D., Andersson, C., Leon, F.: Approaches to building a detection model for water quality: a case study. In: Sieminski, A., Kozierkiewicz, A., Nunez, M., Ha, Q.T. (eds.) Modern Approaches for Intelligent Information and Database Systems. SCI, vol. 769, pp. 173–183. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76081-0_15

    Chapter  Google Scholar 

  10. Kang, G., Gao, J.Z., Xie, G.: Data-driven water quality analysis and prediction: a survey. In: 2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService), pp. 224–232, April 2017

    Google Scholar 

  11. Li, P.: Robust logitboost and adaptive base class (ABC) logitboost. In: Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI 2010, Arlington, Virginia, United States, pp. 302–311. AUAI Press (2010)

    Google Scholar 

  12. He, X., et al.: Practical lessons from predicting clicks on ads at Facebook. In: Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD 2014, New York, NY, USA, pp. 5:1–5:9. ACM (2014)

    Google Scholar 

  13. Rehbach, F., Moritz, S., Chandrasekaran, S., Rebolledo, M., Friese, M., Bartz-Beielstein, T.: GECCO 2018 Industrial Challenge, Monitoring of drinking-water quality (2018)

    Google Scholar 

  14. Muharemi, F., Logofătu, D., Leon, F.: Review on general techniques and packages for data imputation in R on a real world dataset. In: Nguyen, N.T., Pimenidis, E., Khan, Z., Trawiński, B. (eds.) ICCCI 2018. LNCS (LNAI), vol. 11056, pp. 386–395. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98446-9_36

    Chapter  Google Scholar 

  15. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  16. Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26(5), 1651–1686 (1998)

    Article  MathSciNet  Google Scholar 

  17. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst., Man, Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)

    Article  Google Scholar 

  18. Dormann, C.F., et al.: Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36(1), 27–46 (2013)

    Article  Google Scholar 

  19. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    MATH  Google Scholar 

  20. Deng, H., Runger, G.: Gene selection with guided regularized random forest. Pattern Recognit. 46(12), 3483–3489 (2013)

    Article  Google Scholar 

  21. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, New York, NY, USA, pp. 785–794. ACM (2016)

    Google Scholar 

  22. Rashmi, K., Gilad-Bachrach, R.: Dart: dropouts meet multiple additive regression trees. In: International Conference on Artificial Intelligence and Statistics, pp. 489–497 (2015)

    Google Scholar 

  23. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)

    Article  MathSciNet  Google Scholar 

  24. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minh Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, M., Logofătu, D. (2018). Applying Tree Ensemble to Detect Anomalies in Real-World Water Composition Dataset. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2018. IDEAL 2018. Lecture Notes in Computer Science(), vol 11314. Springer, Cham. https://doi.org/10.1007/978-3-030-03493-1_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03493-1_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03492-4

  • Online ISBN: 978-3-030-03493-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics