A Hybrid Surrogate Modelling Strategy for Simplification of Detailed Urban Drainage Simulators
- 339 Downloads
Urban drainage modelling typically requires development of highly detailed simulators due to the nature of various underlying surface and drainage processes, which makes them computationally too expensive. Application of such simulators is still challenging in activities such as real-time control (RTC), uncertainty quantification analysis or model calibration in which numerous simulations are required. The focus of this paper is to present a rather simple hybrid surrogate modelling (or emulation) strategy to simplify and accelerate a detailed urban drainage simulator (UDS). The proposed surrogate modelling strategy includes: a) identification of the variables to be emulated; b) development of a simplified conceptual model in which every component contributing to the variables identified in step (a) is replaced by a function; c) definition of these functions, either based on knowledge about the mechanisms of the simulator, or based on the data produced by the simulator; and finally, d) validation of the results produced by the surrogate model in comparison with the original detailed simulator. Herein, a detailed InfoWorks ICM simulator was selected for surrogate modelling. The case study area was a small urban drainage network in Luxembourg. An emulator was developed to map the rainfall time series, as input, to a storage tank volume and combined sewer overflow (CSO) in the case study network. The results showed that the introduced strategy provides a reliable method to simplify the simulator and reduce its run time significantly. For the specific case study, the emulator was approximately 1300 times faster than the original detailed simulator. For quantification of the emulation error, an ensemble of 500 rainfall scenarios with 1 month duration was generated by application of a multivariate autoregressive model for conditional simulation of rainfall time series. The results produced by the emulator were compared to the ones produced by the simulator. Finally, as an indicator of the emulation error, distributions of Nash-Sutcliffe efficiency (NSE) between the emulator and simulator results for prediction of storage tank volume and CSO flow time series were presented.
KeywordsSurrogate model Model simplification Emulator Urban drainage Combined sewer overflow (CSO)
During a Combined Sewer Overflow (CSO) event, untreated wastewater spills into natural water bodies, which may cause serious negative impacts to the receiving waters and its ecosystems (see e.g. De Toffol 2006). Most of the urban drainage systems which were built during the 19th and 20th centuries are combined systems and cause CSOs during intense or long rainfall events (Burian et al. 1999). Prevention and limitation of pollution of receiving waters due to CSO events was one of the main objectives of the Urban Wastewater Treatment Directive (EEC Council 1991). In 2000, the same issue was highlighted in the article 16 of the European Union’s Water Framework Directive (WFD) (EEC Council 2000).
One way of reducing the frequency and volume of CSO events is to manage the urban drainage system in a dynamic way using Real-Time Control (RTC) (Schilling 1989) with application of model-based RTC (e.g. Fiorelli et al. 2013; Joseph-Duran et al. 2014a, b). In a model-based RTC, the simulator is run frequently to produce predictions of the outcomes of an extensive set of reasonable actions. Therefore, computationally expensive simulators limit the application of RTC making it unavoidable to replace them with alternative fast simulators.
Data-driven approach, in which the detailed or complex simulator is approximated through an empirical (or statistical) model which captures the input-output mapping of the original simulator. This category covers rather a broad range of methods. Some of common methods in this regard with their example application in the field of water engineering and management are: Artificial Neural Networks (ANN) (Sreekanth and Datta 2011) and Deep Learning (DL) (Li et al. 2016); Radial Basis Functions (RBF) (Christelis and Mantoglou 2016); Kriging (Zhao et al. 2016); and Gaussian Processes Emulators (GPE) (Carbajal et al. 2016). The main advantage of the data-driven methods is their generic and non-mechanistic nature. It means, one would only need to deal with the data generated by the simulator, rather than dealing with the mathematical descriptions behind the simulator. Besides, they result in considerably lower run time, in comparison with other surrogate modelling approaches. However, these methods are normally preferred when small number of parameters, varying in limited ranges, are involved in surrogate modelling process. Apart from popularity, data-driven methods have a main disadvantage which is their subjective (researcher dependent) structure. Besides, their applicability is normally limited to the ranges of the training dataset used.
Projection-based approach, in which the dimensionality of the parameter space is reduced by projecting the governing equations of the simulator onto a basis of orthonormal vectors. For application in the field of water engineering and management, Balanced Truncation (BT) (e.g. Sahlan et al. 2013) and Proper Orthogonal Decomposition (POD) (e.g. Volkwein 2013; Xu et al. 2013) are among the most popular methods in this category. The main advantages of projection-based approaches are: their computational efficiency once constructed, as well as producing an error bound after Model Order Reduction (MOR) in most of these techniques (Willcox and Megretski 2005). The major disadvantage is that they are highly mechanistic; meaning that one should initially define a clear mathematical description (e.g. a state-space model) for the given simulator which is subject to MOR. These approaches are rather difficult to be implemented in practice, especially if the commercial modelling software does not provide access to the source code or description of the implemented algorithms (which is normally the case, except for open-source software).
Hierarchical or multi-fidelity approach, where the surrogate is developed, for instance, by ignoring some of the processes which are less relevant in a given case, or by reducing the numerical resolution of the model, (e.g. Meirlaen and Vanrolleghem 2002; Leitão et al. 2010). Here, the principal advantage is that sometimes these methods are able to maintain the detail and accuracy of the original detailed or complex simulator. The dominant disadvantage is that, these methods are also highly mechanistic and difficult to implement in practice. Besides, they are case-specific and it is more challenging to generalise and automate them to be applied for other given simulators of interest.
Hybrid approach, in which different combinations of any of the above-mentioned approaches can be applied to develop the surrogate model. For instance, a data-driven approach can be mixed with a projection-based or multi-fidelity approach.
The purpose of surrogate modelling in this study is to reduce the computational cost of a detailed urban drainage simulator (UDS) and make it available for future applications such as RTC. Even though importance of surrogate modelling based on already existing, well-established, detailed UDSs (Bach et al. 2014) has been emphasised repeatedly (Meirlaen et al. 2002 and Schütze et al. 2004) still developing new simple and fast simulators for specific applications, such as RTC, is common (e.g. Joseph-Duran et al. 2014b; Mahmoodian et al. 2016). Nevertheless, in urban drainage modelling domain, there are few studies in which the potential of developing surrogate models based on existing detailed simulators have been shown.
Focusing on RTC application, (Meirlaen and Vanrolleghem 2002; Langeveld et al. 2013; van Daal-Rombouts et al. 2016) preferred the multi-fidelity approach. For example, (Langeveld et al. 2013) simplified parts of a detailed integrated UDS and successfully applied the surrogate model in RTC with focus on receiving water quality control. Few other researchers found the hybrid surrogate modelling approaches more practical for acceleration of computationally expensive UDSs. With focus on urban pluvial flood simulation, Bermúdez et al. (2018) developed a hybrid surrogate model, which applies ANN as the data-driven part, for acceleration of a 1D-2D detailed UDS. For the specific case study in this research, a simulation speed up factor of more than 104 with a low accuracy cost was achieved. Keupers et al. (2015) developed a hybrid surrogate model for a computationally demanding integrated river-sewer simulator in order to quantify the impact of CSO events on quality of the receiving water. In this study, the highly detailed quantity and quality modelling components of the integrated simulator were substituted by surrogate models which mostly had data-driven nature. A speed up factor of 1.104 was achieved for the specific case study.
Application of data-driven approaches in various aspects of urban water management domain has been growing rapidly during the past decade (Eggimann et al. 2017). Due to the advantages addressed above, data-driven surrogate modelling approaches are not exempt in this regard (Fu et al. 2010; Gradano and Le Roux 2012; Nadiri et al. 2018). However, in most of data-driven approaches the input-output mapping is performed in a black (or grey) box manner, neglecting most of the mechanisms inside the simulator and solely focusing on the input-output data.
In the current study, we argue that, if it is possible to identify some of the modelling components directly from studying the mechanisms of the case study simulator, these components can be excluded from the data-driven analysis. Hence, in this article, we propose a novel hybrid surrogate modelling strategy, which is partly based on the ad-hoc information obtained from the detailed simulator under study and partly data-driven. The focus in this study is on wastewater quantity modelling. Based on the introduced hybrid surrogate modelling strategy, we developed an emulator for storage tank volume and CSO flow time series prediction based on upcoming rainfall time series in the case study catchment.
In the following sections of this document, first, a case study detailed UDS subject to surrogate modelling and a small urban drainage network are introduced; second, the surrogate modelling strategy is explained briefly together with step-by-step application for the specific case study in hand; third, the surrogate model is validated in comparison with the original UDS and the emulation error is quantified; and finally, a conclusion is made based on the achieved results and future potential studies are highlighted. Throughout the paper, the detailed or complex UDS will be addressed simply as “simulator” and accordingly the surrogate model will be also called the “emulator”.
2 Case Study
2.1 Case Study Simulator
For the runoff modelling in this simulator it is possible to select among 15 types of runoff volume models and 13 types of runoff routing models (Wallingford procedure fixed percentage runoff model and Wallingford model were selected for the case study of this research respectively). Each of these models require their own specific parameters and inputs. The hydraulic model is based on the De Saint-Venant equations for conservation of mass and momentum (Innovyze 2017). The rainfall, which is the main input of the runoff sub-model, can be in forms of observed (recorded) or design rainfall. It should be noted that, the focus in this research is only on wastewater quantity modelling and wastewater quality modelling is neglected. In this study, it is assumed that the simulator represents the reality through “virtual reality” and the goal is to emulate it by focusing on inputs and outputs of interest. This assumption is the common practice in surrogate modelling (Kroll et al. 2017). It is assumed that a detailed simulator is in hand which is already calibrated with the observed measurements. However, this simulator is computationally expensive to be applied directly in applications such as model-based RTC or uncertainty propagation analysis. Hence, the focus is on developing a surrogate model based on this simulator to facilitate those applications.
2.2 Case Study Area
The structure of CSO location 1 in the case study is described next (see Fig. 2b). The inflow from the upstream sub-catchment flows into the main storage tank through a conduit which is connected to a rectangular weir structure for depleting the excess water in case of CSO events. The wastewater level in the main storage tank is controlled automatically by a fixed pump with maximum capacity of 6 × 10−3 m3/s. The pump operates based on user-defined switch on/off water levels inside the tank.
3.1 Identification of Variables of Interest to be Emulated
3.2 Development of a Simplified Conceptual Model
3.3 Identification of Simplified Model Components
In this step, components of the Eq. (1) should be identified either based on the knowledge from studying the mechanisms of the simulator at hand (simulator-based components) or based on the data generated by the simulator (data-based components). For the case study at hand, the flow components D, P and C of Eq. (1) are considered simulator-based components. While, R is a data-based component and it is identified (learned) based on synthetic data generated by the simulator.
3.3.1 Simulator-Based Components
Accordingly, the P component is the pump flow, which depletes water from the tank at an assumed constant discharge determined by the manufacturer. Therefore, P takes the value 0 (if the pump is off) or pc (if the pump is on). pc is the pump flow rate. In this study, pc has a value of 6 × 10−3 m3/s. A similar approach can be considered for other types of system actuators such as orifices or controllable valves.
3.3.2 Data-Based Components
The inflow to the storage tank due to rainfall (R) implements a short-cut for all the transformations that the upstream network applies on the runoff flowing through the sewer network. Two major transformations are the delay introduced by physical properties of the upstream network (e.g. lengths, slopes, etc.) and the scaling of the rainfall-runoff process. These processes are simulated via detailed rainfall-runoff and routing models in the original simulator and have the largest contribution to simulation computational cost.
where rmin is the minimum value of rainfall intensity in the training set. The structure of α is given by a cubic polynomial fit on the logarithm of the training data, i.e. (rainfall intensity, filling slopes) pairs.
The last step of surrogate modelling strategy is to validate the results produced by the emulator in comparison with the ones generated by the original simulator. Hence, in this section, the emulator is applied to predict the storage tank volume and CSO flow rate time series using a real observed rainfall time series recorded by a rain gauge located in the catchment (Fig. 2a). The prediction results are compared to the corresponding results derived by the simulator.
As it can be observed from Fig. 9, the quality of the emulator regarding CSO flow prediction is not as high as storage tank volume prediction. However, in Fig. 9 we are only focusing on three CSO events, which is not enough data to evaluate the accuracy of the emulator. Hence, in the next step, the emulator is validated using an ensemble of rainfall scenarios, which triggered more CSO events.
5 Emulation Error
The results shown in Fig. 10 indicate the high accuracy of the emulator compared to the simulator. The predictions of CSO flows are not as precise as the ones for storage tank volume. The main reason for this is that, the emulator was developed only based on the storage tank filling data. In fact, the CSO flow is a side-product of the storage tank volume emulator, since it is calculated after surpassing the maximum capacity of the storage tank. This fact led to a delay of the CSO events by about 20 min forward (time resolution of simulations input and outputs was 10 min). The right panel of Fig. 10, shows the improvement on the NSE distribution obtained when the emulated CSO signals were shifted by this amount (20 min).
6 Discussion and Conclusions
The aim of the present research was to introduce a hybrid surrogate modelling strategy for acceleration of a computationally expensive UDS. A “hybrid” strategy was followed, since the component functions of the emulator were learned partly based on studying the mechanisms of the case study simulator at hand (simulator-based) and partly via synthetic input-output data generated by the simulator (data-based). Based on this strategy, an emulator was developed and validated for wastewater volume and CSO flow time series prediction for a small case study in Luxembourg. The novelty and added value of this research can be addressed in two main aspects. The first and the most important aspect is the simplicity of the introduced method and its hybrid nature. It means, most of the component functions of the emulator are quantified directly, and rather simply, using the knowledge obtained from studying the mechanisms of the simulator at hand. If one can quantify these components, with high certainty, directly from the simulator, there is no need to consider them as data-driven components. This is not the case in pure data-driven (black-box) surrogate modelling approaches. The second novelty of this research is regarding the lag or delay model for the R component of the emulator. In this research, time warping was applied instead of traditional cross-correlation technique. Time warping was useful to account for deformation of the emulator’s output signal in time as well as its delay.
In compliance with the previous studies in application of hybrid surrogate modelling approaches which are partly data-driven (e.g. Bermúdez et al. 2018; Keupers et al. 2015), the introduced emulator in this research also provides satisfactory results in terms of speeding up the simulations with low accuracy cost (Fig. 10). It should be noted that the speed up factor depends on the case study at hand. As an example, for a 1-years-long time series simulation of observed values, the emulator herein provided a speed up factor of approximately 1300 (i.e. the emulator was 1300 times faster than the simulator). This speed up was achieved mainly because of: 1) making a shortcut for replacement of rainfall-runoff and routing models inside the original simulator, via R component of the emulator; and 2) by avoiding computation of unnecessary details (e.g. volumes and flows in all intermediate nodes and links of the network. This considerable speed up would be an outstanding aspect regarding applications such as RTC, uncertainty analysis or calibration in which numerous simulations are required.
In contrast with some previous research, in which the simulation input was the inflow to the storage tank or WWTP (e.g. Mahmoodian et al. 2016; Vanrolleghem et al. 2005), the emulator herein uses rainfall measurements (or forecasts) as inputs, and predicts the storage tank volume and CSO flow in advance. Hence, considering such an emulator in applications such as model-based RTC would provide a longer reaction time (e.g. to avoid potential upcoming CSO events).
Another advantage of the hybrid emulator introduced in this article can be highlighted in comparison with the previous works in which the rainfall event characteristics (e.g. volume, depth, duration, maximum intensity) were mapped directly to CSO events detection; either in form of binary detection of the CSO occurrence and duration (Schroeder et al. 2011; Thorndahl and Willems 2008) or in form of analog/digital detection of the CSO volume (Yu et al. 2013). Since, the introduced hybrid emulator in this article, was able to predict the storage tank volume as well as CSO flow time series and can be used for dry weather situation as well.
Finally, it should be emphasized that, the emulators or surrogate models are mainly tailored to specific cases and applications at hand. In surrogate modelling it is not intended to completely substitute a detailed simulator by an emulator. Besides, there is no universal and unique technique which can deal with all surrogate modelling challenges (Asher et al. 2015). Hence, in this study, we tried to introduce a simple and generic surrogate modelling strategy (see Fig. 3) which can be adapted according to the specific case studies or emulation purposes. For example, the hybrid emulator here was developed to predict the storage tank volume and CSO flow at a CSO location. Such an emulator can be useful for application in CSO management or model-based RTC. Development of the emulator would get more complex for more detailed case studies with several inputs and outputs of interest or by taking into account the spatial variability of rain within the urban drainage network. In such cases, one would require to estimate the data-based component functions (R) via other techniques such as non-linear regression, Artificial Neural Networks (ANNs) or Gaussian Process Emulators (GPEs).
The future steps of this research can be improvement of the emulator regarding aforementioned aspects as well as considering wastewater quality emulation to be applied in RTC practice in an integrated way. Another significant aspect to consider in future studies is uncertainty quantification and propagation for the emulator inputs and outputs.
6.1 Parameter Summary
Summary of parameters values used to develop the emulator for CSO location 1
Value/range and unit
Effective inflow rain gain
2.94 e−1 m3/(s mm)
a α , b α , c α , d α
Rain gain model
4.364, 0.9880, 0.02118, −0.0090822
aτ, bτ, cτ, dτ
228.13, −0.5687, 4904.5, −0.7947
Discharge coefficient of the weir
Dry weather flow scaling
6.6 e-4 m3/s
6.0 e−3 m3/s
Minimum rainfall intensity [mm/h]
Maximum tank volume at weir height (CSO threshold volume)
Rainfall inflow lag time
This research was done as part of the Marie Curie ITN – Quantifying Uncertainty in Integrated Catchment Studies (QUICS) project. This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 607000. MM and VB were part of the QUICS project. JPC received funding from the EmuMore Project URL: kakila.bitbucket.io/emumore. We acknowledge J.A. Torres-Matallana for providing the rainfall data and help in automation of InfoWorks ICM software. A previous shorter version of this paper has been presented at the 10th World Congress of EWRA “Panta Rei” Athens, Greece, 5–9 July 2017 (Mahmoodian et al. 2017). The emulator was built using GNU Octave software and the SUNDIALS ODE suit.
MM provided the case study and ran the simulations, contributed to data analysis and emulator design, and lead the writing and revision of this article. JPC and VB performed the data analysis and surrogate design, and contributed to the writing of the article. UL, GS, and FC supervised MM, and revised the drafts of the article.
Compliance with Ethical Standards
Conflict of Interest
- Asher MJ, Croke BFW, Jakeman AJ, Peeters LJM (2015) A review of surrogate models and their application to groundwater modeling, Water Resour Res 51:5957–5973. https://doi.org/10.1002/2015WR016967
- Bieker HP, Slupphaug O, Johansen TA (2007) Real-time production optimization of oil and gas production systems: a technology survey. SPE Prod Oper 22(04):382–391Google Scholar
- Blanning RV (1975) The construction and implementation of metamodels, simulation. Water Resources 24(6):177–184Google Scholar
- Burian SJ, Nix SJ, Durrans SR, Pitt RE, Fan CY, Field R (1999) Historical development of wet-weather flow management. journal of water resources planning and management. American Society of Civil Engineers (ASCE), Reston, VA, 125(1):3–13.Google Scholar
- Carbajal JP, Leitão JP, Albert C (2016) Appraisal of data-driven and mechanistic emulators of nonlinear hydrodynamic urban drainage simulators. Environ Model Softw 92:17–27. https://doi.org/10.1016/j.envsoft.2017.02.006
- Christelis V, Mantoglou A (2016) Pumping optimization of coastal aquifers assisted by adaptive metamodelling methods and radial basis functions. Water Resour Manage 30:5845. https://doi.org/10.1007/s11269-016-1337-3
- De Toffol S (2006) Sewer system performance assessment – an indicators based methodology. Universität InnsbruckGoogle Scholar
- EEC Council (1991) Urban waste-water treatment directive. EEC Council Directive, (L), p 10. http://eur-lex.europa.eu/legal-content/en/ALL/?uri=CELEX:31991L0271
- Fu G, Makropoulos C, Butler D (2010) Simulation of urban wastewater systems using artificial neural networks. J Hydroinf. https://doi.org/10.2166/hydro.2009.151
- Gradano JEA, Le Roux GAC (2012) Comparison of surrogate models for wastewater process synthesis. Computer Aided Chemical Engineering 30(June):1322–1326. https://doi.org/10.1016/B978-0-444-59520-1.50123-8 CrossRefGoogle Scholar
- Innovyze (2017) InfoWorks ICM. InnovyzeGoogle Scholar
- Joseph-Duran B, Jung MN, Ocampo-Martinez C et al (2014a) Minimization of sewage network overflow. Water Resour Manage 28:41. https://doi.org/10.1007/s11269-013-0468-z
- Keupers I, Kroll S, Willems P (2015) Impact analysis of CSOs on the receiving river water quality using an integrated conceptual model. In: 10th international urban drainage modelling conference. Quebec, Canada, pp 205–218Google Scholar
- Li C, Bai Y, Zeng B (2016) Deep feature learning architectures for daily reservoir inflow forecasting. Water Resour Manage 30:5145. https://doi.org/. https://doi.org/10.1007/s11269-016-1474-8
- Mahmoodian M, Delmont O, Schutz G (2016) Pollution-based model predictive control of combined sewer networks, considering uncertainty propagation. Int J Sustain Dev Plan. https://doi.org/10.2495/SDP-V0-N0-1-14
- Mahmoodian M, Carbajal JP, Bellos V, Leopold U, Schutz G, Clemens F (2017) Surrogate modelling for simplification of a complex urban drainage model. European Water 57:293–297Google Scholar
- Sahlan S, Wahab NA, Darus IZM (2013) Results on frequency weighted model reduction techniques of activated sludge process. Proceedings - UKSim 15th International Conference on Computer Modelling and Simulation, UKSim 2013, pp 172–176. https://doi.org/10.1109/UKSim.2013.137
- Schilling W (1989) Real time control of urban drainage systems-the state of the art. IAWPRC Task Group on Real-Time Control of Urban Drainage Systems, LondonGoogle Scholar
- Torres-Matallana JA, Leopold U, Heuvelink GBM (2017) Multivariate autoregressive modelling and conditional simulation of precipitation time series for urban water models. European Water 57:299–306Google Scholar
- Volkwein S (2013) Proper orthogonal decomposition: theory and reduced-order modelling. University of Konstanz Department of Mathematics and StatisticsGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.