Data Mining Application in Assessment of Weather-Based Influent Scenarios for a WWTP: Getting the Most Out of Plant Historical Data

  • Sina BorzooeiEmail author
  • Ramesh Teegavarapu
  • Soroush Abolfathi
  • Youri Amerlinck
  • Ingmar Nopens
  • Maria Chiara Zanetti


Since the introduction of environmental legislations and directives, the impact of combined sewer overflows (CSO) on receiving water bodies has become a priority concern in water and wastewater treatment industry. Time-consuming and expensive local sampling and monitoring campaigns are usually carried out to estimate the characteristic flow and pollutant concentrations of CSO water. This study focuses on estimating the frequency and duration of wet-weather events and their impacts on influent flow and wastewater characteristics of the largest Italian wastewater treatment plant (WWTP) located in Castiglione Torinese. Eight years (viz. 2009–2016) of historical data in addition to arithmetic mean daily precipitation rates (PI) of the plant catchment area are elaborated. Relationships between PI and volumetric influent flow rate (Qin), chemical oxygen demand (COD), ammonium (N-NH4), and total suspended solids (TSS) are investigated. A time series data mining (TSDM) method is implemented with MATLAB computing package for segmentation of time series by use of a sliding window algorithm (SWA) to partition the available records associated with wet and dry weather events. According to the TSDM results, a case-specific wet-weather definition is proposed for the Castiglione Torinese WWTP. Two significant weather-based influent scenarios are assessed by kernel density estimation. The results confirm that the method suggested within this study based on plant routinely collected data can be used for planning the emergency response and long-term preparedness for extreme climate conditions in a WWTP. Implementing the obtained results in dynamic process simulation models can improve the plant operational efficiency in managing the fluctuating loads.


Waste water treatment plant Combined sewer system Data mining Wet-weather Historical data 


Funding Information

This project was financially supported by SMAT (Società Metropolitana Acque Torino).


  1. Antunes, C. M., & Oliveira, A. L. (2001). Temporal data mining: an overview. In KDD Workshop on Temporal Data Mining, p. 13.Google Scholar
  2. Arpa Piemonte. (2016). Agenzia regionale per la protezione ambientale . [ONLINE] Available at: [Accessed 1 February 2016].
  3. Berthouex, P., & Fan, R. (1986). Evaluation of treatment plant performance: Causes, frequency, and duration of upsets. Journal - Water Pollution Control Federation, 368–375.Google Scholar
  4. Bertrand-Krajewski, J.-L., Lefebvre, M., Lefai, B., & Audic, J.-M. (1995). Flow and pollutant measurements in a combined sewer system to operate a wastewater treatment plant and its storage tank during storm events. Water Science and Technology, 31, 1–12.CrossRefGoogle Scholar
  5. Burian, S. J., Nix, S. J., Durrans, S. R., Pitt, R. E., Fan, C.-Y., & Field, R. (1999). Historical development of wet-weather flow management. Journal of Water Resources Planning and Management, 125, 3–13.CrossRefGoogle Scholar
  6. CEC. (1991). Directive concerning urban wastewater treatment (91/271/EEC). Official Journal of the European Community, L135, 40–52.Google Scholar
  7. CEC. (1996). Directive concerning integrated pollution prevention and control (96/61/EEC). Official Journal of the European Community, L, 257, 26–40.Google Scholar
  8. Chandola, V., Banerjee, A., & Kumar, V. (2007). Outlier detection: a survey. ACM Comput. Surv.Google Scholar
  9. Chundi, P., & Rosenkrantz, D. (2009). Segmentation of time series data. In J. Wang (Ed.), Encyclopaedia of data warehousing and mining (pp. 1753–1758). New York: Information Science Reference.CrossRefGoogle Scholar
  10. Chung, L., Fu, T. C., & Luk, R. (2004). An evolutionary approach to pattern-based time series segmentation. IEEE Transactions on Evolutionary Computation, IEEE Press, 8(5), 471–489.CrossRefGoogle Scholar
  11. Clark, S. E., Burian, S., Pitt, R., & Field, R. (2007). Urban wet-weather flows. Water Environment Research, 79, 1166–1227.CrossRefGoogle Scholar
  12. Edwards, L. J., Muller, K. E., Wolfinger, R. D., Qaqish, B. F., & Schabenberger, O. (2008). An R2 statistic for fixed effects in the linear mixed model. Statistics in Medicine, 27, 6137–6157.CrossRefGoogle Scholar
  13. Field, P. R., & Sullivan, P. D. (2001). Overview of EPA’s wet-weather flow research program. Urban Water, 3, 165–169.CrossRefGoogle Scholar
  14. Franzblau, A. N. (1958). A primer of statistics for non-statisticians. Oxford: Harcourt, Brace.Google Scholar
  15. Fu, T. (2011). A review on time series data mining. Engineering Applications of Artificial Intelligence, 24, 164–181.CrossRefGoogle Scholar
  16. Fu, C., Chung, F. L., Ng, V., & Luk, R. (2001). Evolutionary segmentation of financial time series into sub-sequences. In Proceedings of the 2001 congress on evolutionary computation (pp. 426–430). Seoul.Google Scholar
  17. Giokas, D., Vlessidis, A., Angelidis, M., Tsimarakis, G. J., & Karayannis, M. (2002). Systematic analysis of the operational response of activated sludge process to variable wastewater flows. A case study. Clean Technologies and Environmental Policy, 4, 183–190.CrossRefGoogle Scholar
  18. Gionis, A., & Mannila, H. (2003). Finding recurrent sources in sequences. In Proceedings of the 7th annual international conference on research in computational molecular biology (RECOMB 2003) (pp. 123–130).Google Scholar
  19. Grubbs, F. E. (1969). Procedures for detecting outlying observations in samples. Technometrics, 11, 1–21.CrossRefGoogle Scholar
  20. IRSA, C. (1994). Metodi analitici per le acque. Ist. Poligr. E Zecca Dello Stato Roma.Google Scholar
  21. Karagozoglu, B., & Altin, A. (2003). Flow-rate and pollution characteristics of domestic wastewater. International Journal of Environment and Pollution, 19, 259–270.CrossRefGoogle Scholar
  22. Kothandaraman, V. (1972). Water quality characteristics of storm sewer discharges and combined sewer overflows (Illinois state water survey).Google Scholar
  23. Lovrić, M., Milanović, M., & Stamenković, M. (2014). Algoritmic methods for segmentation of time series: An overview. J. Contemp. Econ. Bus. Issues, 1, 31–53.Google Scholar
  24. McMahan, E.K., (2006). Impacts of Rainfall Events on Wastewater Treatment Processes. Retrieved from
  25. Metcalf, E., Eddy, H. P., & Tchobanoglous, G. (1991). Wastewater engineering: Treatment, disposal and reuse. New York: McGraw-Hill.Google Scholar
  26. Mines, R. O., Jr., Lackey, L. W., & Behrend, G. H. (2007). The impact of rainfall on flows and loadings at Georgia’s wastewater treatment plants. Water, Air, and Soil Pollution, 179, 135–157.CrossRefGoogle Scholar
  27. Mostert, E. (2003). The European water framework directive and water management research. Phys. Chem. Earth Parts ABC, 28, 523–527.CrossRefGoogle Scholar
  28. Oliveira-Esquerre, K. P., Seborg, D. E., Bruns, R. E., & Mori, M. (2004). Application of steady-state and dynamic modeling for the prediction of the BOD of an aerated lagoon at a pulp and paper mill. Part I. Linear approaches. Chem. Eng. J., 104, 73–81.Google Scholar
  29. Reynolds, T. D., & Richards, P. A. (1996). Unit operations and processes in environmental engineering. Boston: PWS Publishing Company.Google Scholar
  30. Rosner, B. (1983). Percentage points for a generalized ESD many-outlier procedure. Technometrics, 25, 165–172.CrossRefGoogle Scholar
  31. Rouleau, S., Lessard, P., & Bellefleur, D. (1997). Behaviour of a small wastewater treatment plant during rain events. Canadian Journal of Civil Engineering, 24, 790–798.CrossRefGoogle Scholar
  32. Sansalone, J. J., & Cristina, C. M. (2004). First flush concepts for suspended and dissolved solids in small impervious watersheds. Journal of Environmental Engineering, 130(11), 1301–1314.CrossRefGoogle Scholar
  33. Schilperoort, R.P.S. (2011). Monitoring as a tool for the assessment of wastewater quality dynamics.Google Scholar
  34. Schmetterer, L. (2012). Introduction to mathematical statistics. Springer Science & Business Media.Google Scholar
  35. Silverman, B. W. (2018). Density estimation for statistics and data analysis. Routledge.Google Scholar
  36. Stricker, A.-E., Lessard, P., Héduit, A., & Chatellier, P. (2003). Observed and simulated effect of rain events on the behaviour of an activated sludge plant removing nitrogen. Journal of Environmental Engineering and Science, 2, 429–440.CrossRefGoogle Scholar
  37. Suarez, J., & Puertas, J. (2005). Determination of COD, BOD, and suspended solids loads during combined sewer overflow (CSO) events in some combined catchments in Spain. Ecological Engineering, 24, 199–217.CrossRefGoogle Scholar
  38. Tietjen, G. L., & Moore, R. H. (1972). Some Grubbs-type statistics for the detection of several outliers. Technometrics, 14, 583–597.CrossRefGoogle Scholar
  39. Zhu, J.-J., Segovia, J., & Anderson, P. R. (2015). Defining influent scenarios: Application of cluster analysis to a water reclamation plant. Journal of Environmental Engineering, 141, 4015005.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Environment, land and infrastructure Engineering (DIATI)Politecnico di TorinoTorinoItaly
  2. 2.Department of Civil, Environmental and Geomatics EngineeringFlorida Atlantic UniversityBoca RatonUSA
  3. 3.Warwick Water Research Group, School of EngineeringThe University of WarwickCoventryUK
  4. 4.Department of Data Analysis and Mathematical Modelling, Faculty of Bioscience EngineeringGhent UniversityGhentBelgium

Personalised recommendations