Advertisement

Surrogate-Based Design Optimisation Tool for Dual-Phase Fluid Driving Jet Pump Apparatus

  • D. MifsudEmail author
  • P. G. Verdin
Open Access
Original Paper
  • 201 Downloads

Abstract

A comparative study of four well established surrogate models used to predict the non-linear entrainment performance of a dual-phase fluid driving jet pump (JP) apparatus is performed. A JP design flow configuration comprising a dual-phase (air and water) flow driving a secondary gas-air flow, for which no one has ever provided a unique set of design solutions, is described. For the construction of the global approximations (GA), the response surface methodology (RSM), Kriging and the radial basis function artificial neural network (RBFANN), were primarily used. The stacked/ensemble models methodology was integrated in this study, to improve the predictive model results, thus providing accurate GA that facilitate the multi-variable non-linear response design optimisation. An error analysis of all four models along with a multiple model accuracy analysis of each case study were performed. The RSM, Kriging, RBFANN and stacked models formed part of the surrogate-based optimisation, having the entrainment ratio as the main objective function. Optimisation problems were solved by the interior-point algorithm and the genetic algorithm and incurred a hybrid formulation of both algorithms. A total of 60 optimisation problems were formulated and solved with all three approximation models. Results showed that the hybrid formulation having the level-2 ensemble Kriging model performed best, predicting the experimental performance results for all JP models within an error margin of less than 10 % in 90 % of the cases.

Keywords

Dual-phase jet pump Surrogate modelling Global approximations Global optimisation Ensemble modelling Genetic algorithm Gaussian process Radial basis function 

1 Introduction

Methodologies applied to build adequate learning-models are crucial in performing a model-based optimisation (MBO). With the quick advances in computer science, MBO is becoming more and more applicable for modelling, simulations, experimental and optimisation processes. It has proved to be one of the most efficient techniques for expensive and time demanding real-world optimisation problems [1]. Several studies which considered MBO in the context of global optimisation (GO), were performed to solve close-related design problems such as the one considered in this study.

Here, MBO is used for predicting the non-linear entrainment performance of a dual-phase fluid driving jet pump (JP) apparatus, a technology well-known as an artificial, oil and gas lifting method.

Among all pumping equipment one of the most simplistic and effective way to retrieve low-pressure oil and gas wells or boost production from such wells is via the use of JPs. A JP, also known as ejector, eductor, thermo-compressor or injector, is a revolutionised piece of equipment readily known for the pumping and mixing of fluids within a wide range of applications in various engineering industrial segments such as: water, nuclear and aviation technologies [2]. In contrast to other Improved Oil Recovery (IOR) and Enhanced Oil Recovery (EOR) solutions, the surface jet pump (SJP) technology is classified as the cheaper and most effective solution, thus highlights its importance in the current oil and gas situation [3, 4, 5]. Besides, among all SJP technological advancements, to date, no unique design solution has been established to design a JP which operates effectively under motive multiphase fluid sources. Given the fact that the commercialised variety of single-phase and multiphase JPs/ejectors are particularly relevant, it was found that multiphase JPs (for applications in the oil and gas industry, where the motive fluid contains more than a single-phase, such as liquid and gas mixtures), undergo several drawbacks. Such drawbacks are mainly attributed to the degradation of entrainment and boost performance, and therefore provides a reduction in the overall operating efficiency. This causes limitations on their applicability due to the lack of flexibility, rangeability and versatility. These drawbacks, either individually or collectively, precinct onshore, remote offshore and totally limit the possibility of subsea JP applications. A feasible and practical means to deal with a typical complex (non-linear) multi-variable design problem is presented here, and a surrogate-based design optimisation is considered.

Relevant work in the literature can be categorised into two types. The first type includes studies which consider one surrogate model, while the second type involves multiple learning algorithms (either comparison or stack models to conduct ensemble modelling techniques), thus involves more than one surrogate model to perform surrogate-based optimisation. Several studies comprising the build-up of models based on one surrogate methodology, include the work of Kajero et al. [6], who used the Kriging meta-model approach. These authors coupled the Kriging to the expected improvement (EI) to assist in the calibration of computational fluid dynamics (CFD) models. They successfully calibrated a single-model CFD parameter with experimental data and considered a case of single-phase flow in straight-pipe and convergent-divergent-type annular JP. This study involved fixed design parameters based on the experimental data of Shimizu et al. [7], and considered HP and LP static pressures to compute the pressure coefficient. Yang et al. [8] made use of the RSM and desirability approach to investigate and optimise a JP in a thermoacoustic Stirling heat engine (TASHE). This study considered four designing parameters: position, length, diameter and tapered angle of the nozzle. Also, slightly different than global optimisation (GO), Lyu et al. [9] worked on the design of experiments (DOE) methodology in combination with CFD, to structurally optimise the design of annular liquid-liquid JPs. This work considered the volumetric flow ratio, angle of suction chamber, throat length and the diffuser diverging angle as the input design parameters. For another different engineering design optimisation problem, Di Piazza et al. [10] used the Kriging estimation method to investigate partial shading in photovoltaic fields. It was showed that this learning method provided cheaper and simpler characterisation of the photo voltaic plant output power, and as a result allowed energy forecast.

It is difficult to establish if one of these surrogate modelling methods is superior to others. A comprehensive comparison analysis of multiple surrogate models (built with different methods) has therefore been conducted here to try to answer this question. Simpson et al. [11] compared a maximum of 3 models. He compared the polynomial based response surface with kriging surrogates for the aerodynamic design optimisation of hypersonic spiked blunt bodies. Similarly, Shyy et al. [12] compared the relative performance between polynomials and neural network surrogate models for aerodynamics and rocket propulsion problems. Other methodological related studies include the work of Simpson et al. [13], who emphasised the robustness of Kriging over the RSM surrogate model. The authors used Kriging for global approximation in simulation-based multidisciplinary (multiple input and multiple response) design optimisation problems. In this study Kriging was used for a real aerospace engineering application, a problem related with the design of an aerospike nozzle. The most comprehensive work found in the literature includes the work of Luo and Lu [14]. The authors performed a comparison between the RSM, Kriging and RBFANN surrogate models for building surrogate models of a dual-phase flow simulation model in a simplified nitrobenzene contaminated aquifer remediation problem. The surrogate-based optimisation methodology identified the most cost-effective remediation strategy.

However, among all the referred work related to ejectors/jet pumps, none considered studying a dual-phase driving fluid JP configuration. In addition, to the best knowledge of the authors, none of the referenced work considered applying ensemble modelling, which is a modelling approach which in many cases, has proved to improve global approximations (GA) (relative to GA based on single methodology approaches) when applied to non-linear systems. Considering the potential of using a unique experimental data-set (an unpublished data-set which forms part of an ongoing investigatory work related to the performance of JPs under dual-phase driving fluids), motivated the authors to use this data to build surrogate models to perform surrogate-based optimisation. Thus the objective entails the following tasks:
  • Develop surrogate models using the RSM, Kriging, RBFANN and ensemble methodologies for constructing global approximations (GA) for use in a real dual-phase JP design application, is particular, the design of the injection body and parts of the ejection portions of a JP apparatus.

  • Estimate the accuracy of the different surrogate models.

  • Perform a comparison between level-1 and level-2 (ensemble models) and select the surrogate models showing acceptable accuracy in the non-linear optimisation model (consider multiple optimisation algorithms) to identify the best design parameters which optimise entraintment performance under various motive fluid gas volume fractions (GVFs).

This research shows novelty as (1) different surrogate modelling methodologies were applied and compared in the design of a dual-phase (water and air) driving gas jet pump apparatus, and (2) it is shown that the ensemble surrogate modelling approach/stacked-generalisation approach having Kriging as a level-2 learning model, is suitable for predicting the optimised design parameters of a highly non-linear design problem attributed to the complex design of a dual-phase JP apparatus.

2 Background: Surrogate Models

2.1 Response Surface Models

The response surface methodology is the simplest and most common method applied for analysing the results of physical experiments and to generate empirically based models for response values [15, 16].

The RSM is defined by:
$$\begin{aligned} y\left( x\right) =f\left( x\right) +\, \varepsilon _{\mathrm {i}} \end{aligned}$$
(1)
where \(y\left( x\right)\) is the unknown function of interest, \(f\left( x\right)\) is the polynomial approximation of \(\left( x\right)\), and \({\varepsilon }_i\) entails the normal distributed error (having mean of 0 and variance of \({\delta }^2\)). The error \({\varepsilon }_i\) is independent and identically distributed.
The polynomial function \(f\left( x\right)\), typically comprises a low-order degree polynomial, which in most cases is assumed to be either linear or quadratic. A quadratic polynomial is expressed as follows:
$$\begin{aligned} {\hat{y}}={\beta }_0+\, \sum \limits ^k_{i=1}{{\beta }_ix_i}+ \, \sum \limits ^k_{i=1}{{\beta }_{ii}x^2_i}+ \, \sum \limits _{i=1}{\sum \limits _{j<1}{{\beta }_{ij}x_ix_j}} \end{aligned}$$
(2)
where \({\hat{y}}\) is the quadratic polynomial, \({\beta }_0\), \({\beta }_i\), \({\beta }_{ii},{\beta }_{ij}\), are the polynomial regression coefficients determined through the least-squares regression (LSM), k is the number of variables, while \(x_i\) and \(x_j\) are the input variables.
The coefficients of Eq. (2) can be found using the following equation:
$$\begin{aligned} \beta ={\left[ \mathbf{X }^T\mathbf{X }\right] }^{-1}\mathbf X^Ty \end{aligned}$$
(3)
where \(\mathbf{X}\) is the design matrix of sample data points, \(\mathbf{X }^T\) is the transpose, while \(\mathbf{y}\) denotes a column vector that includes the response values for each corresponding sample point. Further details about response surface modelling can be found in: Myers et al. [17], Myers et al. [18], Lin and Tu [19], Myers [20] and Myers et al. [21].

2.2 Kriging Models

The Kriging method is a probabilistic approach that involves a statistical estimation technique for spatial interpolation of random quantities. The very initial mathematical formulation of the Kriging method was developed on the basis of experiments performed by Daniel Krige in 1963, who established the distribution of minerals in the subsoil by performing punctual surveys. Later, Sacks et al. [22] developed the model into a surrogate model, thus shaped in the form which is mostly known nowadays. This model is also known as the design and analysis of computer experiment (DACE) [23, 24]. The ooDACE toolbox, developed by Couckuyt et al. [25], is a versatile Matlab toolbox which incorporates the popular Gaussian Process based Kriging surrogate models.

Unlike RSM, Kriging models were purposely developed for mining and geostatistical, and spatial applications, which induce spatially and temporally correlated data [26, 27]. More recently, this model gained in popularity and started to be used for other engineering applications, such as real aerospace design engineering applications and for the build-up of a meta-model to assist calibration of computational fluid dynamics models [6, 13].

The Kriging model includes the combination of two main components. Thereby a deterministic function/global model and localised departures as given in Eq. (4) [28, 29].
$$\begin{aligned} y\left( x\right) =\sum \limits ^k_{i=1}{f_i\left( x\right) {\beta }_i}+\, Z\left( x\right) \end{aligned}$$
(4)
where \(\left[ \sum \nolimits ^k_{i=1}{f_i\left( x\right) {\beta }_i}\right]\) comprises the deterministic function, \({\beta }_i\) denotes the coefficients of the deterministic function, \(f_i\left( x\right)\) are k known regression functions (typically polynomial functions) providing a “global” model of the design space, and \(Z\left( x\right)\) is the realization of a stochastic stationary process with mean zero, process variance \({\sigma }^2\) and covariance given by the covariance matrix in Eq. (5). Controversially to \(f_i\left( x\right)\), \(Z\left( x\right)\) considers “localised” deviations of the interpolation of the data points \(n_s\).
$$\begin{aligned} Cov \,\, \left[ z\left( x_i\right) , \, z\left( x_j \right) \right] = {\sigma }^{2} \mathbf{R }\left( x_ix_j\right) \end{aligned}$$
(5)
where \(\mathbf{R }\left( x_ix_j\right)\) is the spatial correlation function (SCF) between any two of the sampled data points \(x_ix_j\). This function controls the smoothness, the influence on nearby points and differentiability of the response surface model.
Equation (4) amalgamates a first part which models the drift of the process mean over the domain, and a second part which models the systematic deviations from the linear model that pulls the response surface along the data via weighing the correlation of the nearby points. Moreover, the spatial correlation matrix function (SCF) \(\mathbf{R }\left( x_ix_j\right)\) is an (\(n_s\, x\, n_s\)) symmetric matrix, having unity along its diagonal where \(n_s\) is the number of sampled points. This correlation function is selected by the user and a variety of functions exists. The most applicable correlation functions are: (a) the Gaussian, (b) exponential, (c) linear and (d) spline functions [22, 30]. In this study the Gaussian correlation function, given by Eq. (6) was used.
$$\begin{aligned} \mathbf{R }\left( x_ix_j\right) =exp\left[ -\sum \limits ^n_{k=1}{{\theta }_k{\left| \mathbf{X }_{ki}-\mathbf{X }_{kj}\right| }^2}\right] \end{aligned}$$
(6)
where \({\theta }_k\) are the unknown correlation parameters, and \({\left| X_{ki}-X_{kj}\right| }\) are the kth components of the sample points \({x}_{i}\) and \({x}_j\), and \(n_{dv}\) is the number of design variables. In many cases, such as in McKay et al. [31], Sacks et al. [22], and Osio and Amon [32], a single value of \(\theta\) provided good results, though, in this study a different value of \(\theta\) is used for each design variable.
Eventually, the predicted estimates \({\hat{y}}\) of the response surface \(y\left( x\right)\) at untried values of x for a universal Kriging model, are found by Eq. (7).
$$\begin{aligned} {\hat{y}}\left( x_{{*}}\right) =f^T\left( x_{{*}}\right) {\widehat{\beta }} +\, r^T\left( x_{{*}}\right) \mathbf{R }^{{-1}}\left( y{-}\mathbf{F }\widehat{\beta }\right) \end{aligned}$$
(7)
where \(\mathbf{y}\), is a column vector of length n which contains the sample values of the response, and \(\mathbf{f }\) comprises a column vector of length \({n}_s\), where in the case of ordinary Kriging (not universal Kriging) \(\mathbf{f }\left( x_*\right) =1\), thus, reduces to a scalar function fixed with values of unity. The literature comprises the work of: Deutsch and Journel [33], Cassie [27], Simpson et al. [13], Emery [34] and Bayraktar and Turalioglu [35], which involve Ordinary Kriging models, while for the work of Zimmerman et al. [36], Brus and Heuvelink [37] and Sampson et al. [38], universal kriging models are applied.

If a first order polynomial is involved, then \(f\left( x_*\right) ={\left[ 1,x_{*1},x_2,\, \ldots ,x_{*n}\right] }^T\) and so on. In this study the 0th, 1st and 2nd order polynomial were considered, \(\mathbf{F }\) is a matrix of the form \({\mathbf{F }}={\left[ f\left( x_1\right) ,\, f\left( x_2\right) ,\, \ldots ,f\left( x_m\right) ,\right] }^T\) containing all matrix functions calculated for all m training data points.

The term \(\mathbf{r }^T\) denotes the correlation vector of length n between an untried x and the sampled data points \(\left\{ x_1,\, \ldots ,x_{ns}\right\}\). Such \({r}^T\) is expressed by:
$$\begin{aligned} \mathbf{r }^T\left( x\right) =\, {\left[ {R(x,x}_1),\, R(x,x_2),\ldots ,{R(x,x}_{ns})\right] }^T \end{aligned}$$
(8)
Three parameters are included in the Kriging model: (a) \(\theta\) , which is included in the correlation function (\((\theta =\, {\left[ {\theta }_1,\, {\theta }_2,\, \ldots ,\, {\theta }_n\, \, \right] }^T)\), (b) the process variance \({\sigma }^2\) and (c) the regression coefficients \(\beta\).
First, \({\widehat{\beta }}\) is estimated by the maximum likelihood method (MLE) as given in Eq. (9) [39].
$$\begin{aligned} {\widehat{\beta }}\, ={\left( \mathbf{F }^T\mathbf{R }^{-1} \mathbf{F }\right) }^{-1}\mathbf{F }^T\mathbf{R }^{-1}\mathbf{y } \end{aligned}$$
(9)
The process variance \({\sigma }^2\) between the global model \({\widehat{\beta }}\) and y is estimated by Eq. (10):
$$\begin{aligned} {\sigma }^2\, =\frac{1}{n_s}{\left( \mathbf{y }-\mathbf{F }\widehat{\beta }\right) }^T\mathbf{R }^{-1}\left( \mathbf{y }-\mathbf{F }\widehat{\beta }\right) \end{aligned}$$
(10)
The fact that \(\mathbf{R }(\cdot)\) is sometimes parameterised by \(\theta =({\theta }_1, {\theta }_2,\, \ldots ,\, {\theta }_d)\) the partial derivative of the likelihood function does not always yield to provide an analytical solution for \(\theta\) when assigned to zero. However, a constrained iterative search is used, where an optimisation algorithm is applied to evaluate the optimal parameters values. Lastly, the correlation parameter \(\theta\) is estimated by solving an optimisation problem as given in Eq. (11):
$$\begin{aligned} {{\mathrm {min}}_{\theta } \left( {\left| \mathbf{R }\right| }^{{1}/{m}}{\sigma }^2\right) } \end{aligned}$$
(11)
All estimated parameters are firstly used as inputs in Eq. (7) to obtain the prediction mean. Secondly the corresponding prediction variance \({\sigma }^2\) is obtained as the estimated mean square error (MSE) of the predictor.
$$\begin{aligned}&MSE\left[ {\hat{y}}\left( x_*\right) \right] \nonumber \\&\quad =\,\sigma ^2\left( 1-\left[ f^T\left( x_*\right) \mathbf{r }^T\left( x_*\right) \left[ \begin{array}{cc} 0 &{} f^T \\ \mathbf{F } &{} \mathbf{R } \end{array} \right] \right] \left[ \genfrac{}{}{0.0pt}{}{f\left( x\right) }{r\left( x\right) }\right] \right) \end{aligned}$$
(12)

2.3 Radial Basis Function Neural Network (RBFANN)

The radial basis function neural network, initially developed by Broomhead and Lowe [40], comprises an artificial neural network having a 3-layer feed forward network and makes use of the radial basis functions as activation functions. As illustrated in Fig. 1, the network comprises input, hidden and output layers. The network hidden layer involves a non-linear RBF activation function, while the network output layer contains a linear combination of radial basis functions of both inputs and neuron parameters.
Fig. 1

Architecture of a decomposed RBFANN with hidden neurons

Considering the input vector X, being \(X {\varvec{\in }} {{\mathbb {R}}}^n\), then the output of the neurons in the RBFANN hidden layer is given by:
$$\begin{aligned} q_i={\varPhi }\left( \left\| X-\left. c_i\right\| \right. \right) \end{aligned}$$
(13)
where \(c_i\) is the centre vector for the ith neuron in the RBF hidden layer, \(i=1,\, 2,\, \ldots , N\); where N is the number of neurons in the hidden layer, \(\left\| X- c_i\right\|\) is the norm of \(X-c_i\), which is either the Euclidean distance or the Mahalanobis distance, while \({\varPhi }\, (\cdot)\) is the radial basis function, commonly taken to be Gaussian, as given in (14), or either the ‘Cauchy’ function or the ‘Multiquadratic’ function is used instead [41].
$$\begin{aligned} q_i={\varPhi }\left( \left\| X-\left. c_i\right\| \right. \right) =\, \left[ -\beta {\left\| X-\left. c_i\right\| \right. }^2\right] \end{aligned}$$
(14)
The Gaussian basis functions are local to the centre vector in the sense that:
$$\begin{aligned} {{\mathrm {lim}}_{\left\| X\right\| \rightarrow \infty } {\varPhi }\left( \left\| X-\left. c_i\right\| \right. \right) }=0 \end{aligned}$$
(15)
Ultimately, the output of the network, being a scalar function of the input vector is given by Eq. (16):
$$\begin{aligned} y_k=\sum \limits ^N_{i=1}{w_{ki}(q_i-\, {{\uptheta }}_k)}\, \, \, \left( k=1,\, 2,\, \ldots ,\, M\right) \end{aligned}$$
(16)
where \(w_{ki}\) is the connecting weights from the ith hidden layer neuron to the kth output layer and \({\theta }_k\) is the threshold value of the kth output layer neuron.

This model is widely applicable for approximation, classification, prediction and system control. Park and Sandberg [42] added that the RBF network model performs highly well as universal approximates for compact subsets of \({{\mathbb {R}}}^n\); implying a RBF network having enough hidden neurons, will approximate a continuous function on a closed, bounded set with arbitrary precision.

2.4 Model Ensembling

Ensemble methods are algorithms which amalgamate several machine learning methodologies into one predictive model, to either decrease the variance, or improve predictions. This model methodology is illustrated in Fig. 2, and includes a mechanism which is divided into level-1 and level-2 processes. Typical ensemble models are used to provide global approximations to solve optimisation problems of highly nonlinear situations. Typical applications include diverse topics, such as: weather forecasting [43], climate change [44], ecological models [45], wind speed [46] and chemical flooding process for an oil and gas application [47].
Fig. 2

Concept diagram of ensemble modelling

Four of the most common types of ensemble methods are: (a) model averaging, (b) bagging, (c) stacking, and (d) boosting [48, 49].

2.4.1 Model Averaging

Model averaging is the most commonly used of all listed methods. It involves the averaging of N predictions via Eq. (17). Each prediction is the output from N trained models, used to perform the scalar ensemble predictions.
$$\begin{aligned} {{\overline{y}}}_k=\frac{1}{N}\sum \limits ^N_{i=1}{{{\hat{y}}}_n}\, \, \, \left( n=1,\, 2,\, \ldots ,\, N\right) \end{aligned}$$
(17)
where \({{\overline{y}}}_k\) is the averaging scalar ensemble output for a sample K, and \({\hat{y}}_n\) is the predictive output from each learning-model.

2.4.2 Bagging Model/Bootstrap Aggregating

Bagging is considered very similar to model averaging, but it comprises a slightly modified training procedure. Bagging uses a subset of samples to train each model to train the base learners. This is also known as bootstrap sampling. Thus, if the RBFANN method is used, multiple models are generated to include all subsets of sample data. However, when done, the corresponding responses from different models are then averaged to obtain a scalar ensemble output.

2.4.3 Stacking Model

Stacking, better known as meta-ensembling, is a model ensemble technique which combines response data from a plurality of predictive models, to generate a new and improved model. This model is therefore an extension (or a second procedure), after responses have been generated by each single predictive model [50].

Stacking requires a modified training procedure relative to other described methods. In this case, N trained models are used to predict output of the new sample. Thus, the outputs from the 1st level (separated-models) learning are used as inputs for another model that is ‘stacked’ upon the other models. This will lead to a layer-chain of models. Thereby, the 2nd level model is used to predict the actual output for the new samples. In most cases, it is expected that the model will outperform each of the individual models, due to its smoothing nature, and the capability of selectivity between each case model at regions where it performs best, and avoids other regions where it performs poorly. Eventually, this will make stacking the most effective when base model predictions are significantly different.

2.4.4 Boosting Mode

Boosting involves a family of algorithms which transform weak learners into effective learners. This method deals with weak learner’s models such as decision trees. It functions by combining the predictions via a weighted majority vote (classification) or a weighted sum (regression), to generate the final prediction. It differs from bagging as the base learners are trained in a sequenced manner and on a weighted version of the data.

3 Applied Methodology: JP Device Apparatus

As described in the introduction section, a JP is a passive apparatus, thus reluctant to change in operating conditions. Investigative work from Mifsud et al. [51], clearly demonstrates that the performance of a JP apparatus is dictated by its internal geometric features, mainly the injection, entraining and mixing bodies.

These three bodies include unique geometrical features, which vary in both shape and clearance under different flow conditions; namely, type of fluid and more specifically the fluid properties such as fluid density, viscosity, compressibility and diffusivity, mutually dictating the hydrodynamic behaviour. Thus, a gas-driving-gas JP varies in design from a liquid-driving-liquid JP or a liquid-driving-gas JP. However, this work focuses specifically on a JP application having a dual-phase (water and air) HP fluid driving a relative low-pressure air. An analysis of experimental results from unpublished work showed that under dual-phase operating conditions, there exists no linear behaviour which correlates the design parameters against entrainment performance.

A total of five different nozzle bodies were used in this study. A brief description of each injection bodyis provided in Table 1 and accompanied by the schematics of four injection bodies (M-01 to M-04) shown in Fig. 3, while the design of model M-05 and M-10 cannot be shown to protect the proprietary nature of the design. Figure 3 highlights the differences attributed to the number of orifice holes and their positioning.

Table 1 includes models M-01 to M-05 which describe the JP injection bodies without the use of any swirl-body mechanism, while M-06 to M-10 involves the use of a swirl-body mechanism. Such swirl-body mechanism (illustrated in Fig. 3a–d) is fixed within the motive fluid conduit, closely to the converging portion of the respective nozzles. Note that for cases with no swirl-body mechanism, a void cylindrical transition piece is used instead.
Table 1

Details about the considered test-setup models

Model

Injection body description

Flow

M-01

Standard converging-nozzle (single-orifice)

No swirl-induced flow

M-02

Converging-diverging nozzle (single-orifice)

No swirl-induced flow

M-03

Converging-diverging nozzle (multiple-orifice) horizontally drilled

No swirl-induced flow

M-04

Converging-diverging nozzle (multiple-orifice) drilled at an inclination angle of \(5^{\circ }\)

No swirl-induced flow

M-05*

A bi-nozzle design configuration (multiple-orifices)

No swirl-induced flow

M-06

Standard converging-nozzle (single-orifice)

Swirl-induced flow

M-07

Converging-diverging nozzle (single-orifice)

Swirl-induced flow

M-08

Converging-diverging nozzle (multiple-orifice) horizontally drilled

Swirl-induced flow

M-09

Converging-diverging nozzle (multiple-orifice) drilled at an inclination angle of \(5^{\circ }\)

Swirl-induced flow

M-10*

A bi-nozzle design configuration (multiple-orifices)

Swirl-induced flow

*Model not illustrated in Fig. 3 to protect the proprietary nature of the design

Fig. 3

Schematics of the swirl-body mechanism and injection body for: a M-01 and M-06, b M-02 and M-07, c M-03 and M-08 and d M-04 and M-09

A total of five design variables were considered, see Fig. 4. These are: (1) nozzle-to-throat clearance X, (2) throat-inlet angle At, (3) two-phase mixture composition in terms of gas-volume fraction GVF, (4) nozzle body design N, and (5) spinning body mechanism S.
Fig. 4

Schematic of a typical JP device configuration denoting the selected design variables

3.1 JP Device Design Analysis

The selected variables are classified into: (a) discrete design variables, including: X, At, and GVF, and (b) non-discrete variables, N and S. Figure 5 shows a data flow diagram, having a black box that bridges the inputs and outputs with a learning algorithm.
Fig. 5

Problem formulation for a multi-variable input and multiple non-linear response global approximations for the dual-phase driving fluid JP device

Fig. 6

Simplified global approximations problem for the dual-phase driving fluid JP device

As the non-discrete variables N and S are neither continuous nor discrete, all learning models (both level-1 and level-2) were distinctively developed for each model. This led to the reduction of the number of input response variables, from five to three, as illustrated in Fig. 6.

Also, for further simplification, the bi-response methodology included in Fig. 5, was reduced to a single output. This step was justified on the basis that entrainment ratio will always tend to increase while the magnitude of pressure vibration decreases. The entrainment ratio was preferably selected over magnitude of pressure vibration for the main reason that the focus of this study considers the design parameters which have a significance on the performance of the LP pressure and/or LP/secondary flowrate.

3.1.1 Approximations for the JP Device Design Problem

The sample data for all learning models was obtained from a unique experimental data-set comprising tests performed on a dual-phase (water and air) facility located in the Process Systems Engineering Lab at Cranfield University, UK. Such experimental data-set includes a total of 1440 tests-setups combinations, which comprised both 100 % liquid-water and dual-phase (\(0 \, \le \, \hbox {GVF} \, \le 50\)) motive fluid flows driving a secondary gas-air flow. All experiments were performed at high-pressures below 8 bara.

The whole data-set was divided into 10 data sub-sets. The sets of data were discretised according to the type of injector body, for the JP configurations equipped with or without spin body mechanism. This categorised the data into 10 unique models (M-01...M-10), as listed in Table 2. Note that the range of the nozzle-to-throat clearance X comprised eight divisions, (0.1, 0.25, 0.6, 0.9, 1.4, 1.8, 2.8 and 4), 3 throat-inlet angle At were considred for (\(13^{\circ }\), \(30^{\circ }\) and \(50^{\circ }\)), and the motive fluid GVFs included the values 0, 10, 20, 30, 40 and 50.

Furthermore, each of the 10 data sub-sets was further divided into another two mutually exclusive subsets called the training and the validation sets. The first set (training set) included three quarters of the whole sub-set samples which was used to build and train the level-1 models: RSM, Kriging, RBFANN, while the other set (validation set) included the remaining quarter of samples, which were used for model validation and verification of the same level-1 models.
Table 2

Applicable test matrix for this work unique data set

Non-discrete variables

Discrete design variables

Injection body

Swirl flow

Nozzle to throat-inlet ratio

Throat-inlet converging angle [\(^\circ\)]

HP fluid gas volume fraction [%]

No

Yes

[X]

[At]

[GVF]

M-01

\({\surd}\)

 

0.1–4

1330 & 50

0–50

M-02

\({\surd}\)

 

0.1–4

1330 & 50

0–50

M-03

\({\surd}\)

 

0.1–4

1330 & 50

0–50

M-04

\({\surd}\)

 

0.1–4

1330 & 50

0–50

M-05

\({\surd}\)

 

0.1–4

1330 & 50

0–50

M-06

 

\({\surd}\)

0.1–4

1330 & 50

0–50

M-07

 

\({\surd}\)

0.1–4

1330 & 50

0–50

M-08

 

\({\surd}\)

0.1–4

1330 & 50

0–50

M-09

 

\({\surd}\)

0.1–4

1330 & 50

0–50

M-10

 

\({\surd}\)

0.1–4

1330 & 50

0–50

Specific details on the bisection of sample and response data in each learning model are provided and discussed discussed in Sections 3.1.2 to 3.1.5. In each case study, including a single JP model M, a sub-set comprising an array of 144 samples and responses was used for processing each learning model. The 144 samples and responses were selected according to the combinations illustrated in Fig. 7.

The arrangement shows that each At is linked to every value of X first, then the joint combination is further linked to every value of GVF, and ultimately the full joint combinations (comprising a value of GVF, X and At) are linked with every JP model.
Fig. 7

Schematic illustration the build-up of this study orthogonal array

The respective ranges of each discrete design variables were set according to specific justifications. The range of the secondary-nozzle/throat inlet angle At, includes an upper-bound limit of \(50^{\circ }\), being an optimal design angle for liquid-driving liquid JP applications. Thus, a lower bound (low as reasonable possible) and intermediate angles of \(13^{\circ }\) and \(30^{\circ }\), were then considered.

The range of the nozzle-to-throat clearance X varied between a ratio of 0.1 and 4. This cophered the well known design ratios (as applicable for gas-gas, liquid-liquid and liquid-gas JP applications) and beyond. Also, the fact that a swirl induced flow was included in half the tests setups, lower values of X were considered than used for gas-gas applications. In the latter cases, optimal values of X can even go down below 0.4. Besides, more frequent values of X were considered to avoid black spots due to high sensitivity, even for small increment of X.

Lastly, the range denoting the values of GVF, cophered a range of HP fluid compositions, including 100 % liquid motive flow and a combination of liquid dominant two-phase (water-air) motive fluid compositions. Particularly, one should consider that some practical intuition was also applied based on practical real field applications (mainly for selection the range of GVF), and limitations were considered due to difficulties to manufacturing the designed components (mainly for selecting the range of At).

3.1.2 RSM for the JP Device Study

Having three variables denoting a performance parameter, makes it extremely difficult to illustrate the responses on 3-D surface plots. It appeared complex enough to illustrate the non-linear behaviour between the gas volume fraction GVF and the nozzle-to-throat clearance X. A clear example which demonstrates the complex correlation between the JP design variables when under dual-phase flow conditions, is given in Fig. 8. The three sub-figures present three sets of surface plots of the same model (M-01), but having a different throat-inlet angle [At]. Thus Fig. 8a–c denote cases having throat angle At of \(13^\circ\), \(30^\circ\) and \(55^{\circ }\) respectively. Each set comprises two plots, one for the case without swirl-induced flow and the other with swirl-induced flow. However, plotting the 3rd variable resulted in an obscure surface plot. It was concluded that such complex behaviour could only be exemplified via polynomial equations.

Such complex relationships led to the development of a series of third-order response surface models for entrainment ratio ER. Such models, fitted the 118 sample points by using the ordinary least-squares regression. The quadratic expressions of the developed response surface models are given in Eqs. (27) to (36), in “Appendix 1”.
Fig. 8

3D-surface plots, for models M-01 and M-06, having: a\(\hbox {At}= {13}^{\circ }\), b\(\hbox {At}= \, {30}^{\circ }\) and c\(\hbox {At}= {{50}^\circ }\)

A pair of polynomial expressions are generated for each JP model design, a first expression including a device body without the swirl-body mechanism, and a second device body, having the same injection body as the first one, but amalgamated with the swirl-body mechanism. Eventually a total of 10 expressions covers all 5 JP models.

A reference to the resulting \({R}^2\), \(R^2_{\mathrm {adjusted}}\), and root mean square error for each response surface model given in Table 6 of “Appendix 2”, results high (ideal close to 1) \({R}^2\), \(R^2_{\mathrm {adjusted}}\) and low (ideal close to 0) RMSE values. Thus, the response surface models appear to capture a large portion of the observed variance, resulting in an acceptable good fit.

3.1.3 Kriging Models for the Jet Pump Device Study

For the Kriging models, a Gaussian correlation of either 0th, 1st or 2nd order regression function was applied, while Eq. (10) was used for the local deviations. It was also noted that a single \(\theta\) parameter was insufficient to model the data accurately, thus a simulated annealing algorithm was used to determine the maximum likelihood estimates (MLEs) for the three \(\theta\) parameters needed (one for each variable) to generate the best Kriging model.

The optimal \(\theta\) parameters values for each case study are given in Table 3. This was accomplished via Eq. (11), and simulations were performed and executed via a dedicated generated script written in MATLAB. Eventually, the Kriging models were identified once all parameters for the Gaussian correlation function and the 118 sample data points were obtained.

For each Kriging model, the testing of models included a total of 26 data points; thereby including the samples which were not considered for training purposes during the build-up of the models. The regression R values for the Kriging models are given in Table 6 of “Appendix 2”.
Table 3

Theta parameters for Kriging models for all case studies

Models 01-10 for case studies 01-10

\(\theta\)*

M-01

M-02

M-03

M-04

M-05

M-06

M-07

M-08

M-09

M-10

\({\sigma }_X\)

8.75

3.53

20

2.77

3.42

11.49

10

20

20

6.59

\({\sigma }_{GVF}\)

4.35

2.33

3.24

1.13

1.39

1.44

1.34

3.99

2.18

1.65

\({\sigma }_{At}\)

1.25

1.44

2.10

0.29

0.31

0.24

1.25

1.71

0.95

1.09

\(\theta\)* The optimal \(\theta\) parameters values for each case study

3.1.4 Neural Network Models for the Jet Pump Device Study

For the Neural Network models, a script was generated by the Neural Fitting application provided in MATLAB 2018. The input and response samples data sets were randomly selected by the syntax ‘dividerand’ and divided into three main categories: training, validation and testing. Each category was specifically assigned a portion out of the whole dataset, allowing 118 samples for training, 7 samples for validation, and 19 samples for testing. The training samples are presented to the network during training, while the network adjusts in correspondence to its error. The validation samples are used to measure network generalisation and to stop training when generalisation stops improving, while testing offers an autonomous measure of network performance during and after training.

The Bayesian Regularisation backpropagation training function was used via the syntax ‘trainbr’. This type of algorithm typically requires more time than the Levenberg-Marquardt algorithm and the Scaled Conjugate-Gradient algorithm. Though, such algorithms have the potential to result in good generalisation for difficult, small or noisy datasets. Using the Bayesian Regularization algorithm, training stops according to adaptive weight minimisation (regularization).

Furthermore, this function performs backpropagation to calculate the Jacobian jX, of the performance syntax ‘perf’ with respect to the weight and the bias variables X. Each variable is adjusted according to Levenberg-Marquardt. Further details about the Bayesian regularisation can be found in MacKay [52] and Foresee and Hagan [53]. The mean squared error performance function syntax ‘mse’ was applied to the model performance function. The mean squared error is the average squared difference between outputs and targets. Lower values are better. Additionally, the number of hidden layers as denoted by the letter ‘N’ in Fig. 9, varied according to the results illustrated in the model accuracy analysis section. The first trial consisted of N = 15. Regression R values are presented and discussed in section 4.2.
Fig. 9

This problem neural network diagram

3.1.5 Model Averaging and Stacking for the Jet Pump Device Study

For the ensemble-stacking model, a more advanced command script (more complex than for the single- model build-up) was formulated and executed in MATLAB 2018.

The ensemble-modelling approach denotes a 2nd level algorithm, which ultimately led to the final scalar predictions. In brief, the applied methodology followed the build-up path presented in Fig. 10. The original training data X was used for training each of the level-1 models, leading to generate respective predicted outputs which were then used as inputs for training the stacking model. In this work, the corresponding outputs of all three algorithms, say \(\mathbf{Y }_1\), \(\mathbf{Y }_2\) and \(\mathbf{Y }_3\), were first stacked and then averaged, generating an average scalar output matrix \({{\hat{\mathbf{X }}}}^{l2}\). The \({{\hat{\mathbf{X }}}}^{l2}\) matrix, now considered as the training sample, was then used for cross-validation (CV), i.e., in each fold, a new training and validation data set was generated. Such data was then used for training and testing the level-2 Kriging model.
Fig. 10

This work ensemble predictive model

The applied method for the segregation of sample data, involved a different procedure than applied in the other three former discussed models. To formulate the out-of-sample predictions, data were divided similarly as done for the well-known ‘K-fold’ cross-validation method.

The out-of-sample method involved the division of the training sample into a number of folds N. In each \({\hbox {N}}\)th fold, some of the sample data were held out for validation and/or testing (holdout fold), while the remaining number of folds were used to obtain predictions for all 26 samples. This brief explanation is demonstrated in Fig. 11a, b. Each sub-figure represents a Nth fold and the highlighted cells include the holdout sample.
Fig. 11

An illustration of the out-of-sample/‘k-fold’ cross-validation method

Opting to select the cross-validation method was based on the fact that the out-of-sample predictions sustain a higher chance of capturing distinct regions where each model performs the best.

4 Results and Discussions for Global Approximations

4.1 Learning Model’s Accuracy Analysis

In this study, the absolute error (AE) was selected as the main loss function to estimate the accuracy of the learning models. Figures 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, and 37 in “Appendix 3” illustrate the boxplots of absolute error for the RSM, Kriging and RBFANN models, as applicable for the JP models; for both with and without swirl-body mechanism. Table 4 details the varied parameters in each learning model.

Table 5 includes the optimal parameter values, resulting from the presented box-plots for each JP models (M-01–M-10).

The results demonstrated that for the RSM models, the approximation accuracy for the 3rd order polynomial is higher than that of the 1st and 2nd order in all cases. The Kriging models showed that the 2nd order polynomial function (as a regression function) obtained the highest approximation accuracy, except for models M-02, M-04 and M-06, which resulted in 1st, 0th and 1st order showing the highest accuracy respectively. For the RBFANN models, the RBFANN with 45 hidden neurons obtained the highest approximation accuracy, with the exception of M-03, M-07 and M-10 which obtained values of 60, 30 and 30 respectively.
Table 4

Varied model settings to estimate the accuracy of the surrogate models

Model

Varied setting

Model varied settings

Response surface

Degree of polynomial

1st

2nd

3rd

Kriging

Degree of regression function

0th

1st

2nd

Neural network

Number of hidden neurons/layers

15

30

50

80

Table 5

Optimal model settings based on absolute error (AE) as a loss function, to estimate the accuracy of each model

Model type

Optimal model settings for models M-01–M-10

M-01

M-02

M-03

M-04

M-05

M-06

M-07

M-08

M-09

M-10

RSM

3rd

3rd

3rd

3rd

3rd

3rd

3rd

3rd

3rd

3rd

Kriging

2nd

1st

2nd

0th

2nd

1st

2nd

2nd

2nd

2nd

RBFANN

45

45

60

45

45

45

30

45

45

30

RSM (optimal order of polynomial).

Kriging (optimal order of polynomial as regression function).

RBFANN (optimal number of hidden neurons)

4.2 Error Analysis of the RSM, Kriging and RBFANN Models

Towards the end, an error comparison was performed to determine the capability of producing accurate global approximations for a dual-phase fluid-driving SJP. The error is defined between the actual response samples y (26 samples used in each model) and the predicted values \({\hat{y}}\) from either the RSM, Kriging, RBFANN or ensemble models. The accuracy of the 26 validation points for all models was estimated via numerical error analysis equations, given from Eqs. (18) to (22).
  1. 1.
    MaxAPE—Maximum Absolute Percent Error
    $$\begin{aligned} MaxAPE=100\left| \frac{y_i-{{\hat{y}}}_i}{y_i}\right| \end{aligned}$$
    (18)
    Lower values of MaxAPE indicate lower difference, thus variance tends to decrease as \(\left[ {{MaxAPE\, {\mathop {\longrightarrow }\limits ^{yields}}}}0\right]\).
     
  2. 2.
    MAPE—Mean Absolute Percentage Error
    $$\begin{aligned} MAPE=\frac{100}{n}\sum \limits ^n_{i=1}{\left| \frac{y_i-{{\hat{y}}}_i}{y_i}\right| } \end{aligned}$$
    (19)
    The absolute value in this calculation is summed up for every predicted value and divided by the number of testing points. Note that in this work, the mean absolute error was calculated, thus Eq. (19) was divided by 100. Also, similarly to MaxAPE, MAPE indicates lower difference, thus variance reduces as \(\left[ {{MAPE\, {\mathop {\longrightarrow }\limits ^{yields}}}}0\right]\).
     
  3. 3.

    RMSE—Root Mean Square Error

    The RMSE signifies the variances, better known as residuals or prediction errors between predicted and observed values. This error estimator method as given in Eq. (20), is capable to aggregate the individual error magnitude into a single measure of prediction power.
    $$\begin{aligned} RMSE=\sqrt{\frac{\sum \nolimits ^n_{i=1}{{\left( {{\hat{y}}}_i-y_i\right) }^2}}{n}} \end{aligned}$$
    (20)
    Lower values of RMSE indicate lower variance. Variance tends to decrease as \(\left[ {{RMSE\, {\mathop {\longrightarrow }\limits ^{yields}}}}0\right]\).
     
  4. 4.

    R Squared (\({{R}}^{{2}}\))

    This is known as the coefficient of determination or multiple determination in this case since models involve multiple regression. This measures the correlation between outputs and targets via:
    $$\begin{aligned} R^2= & {} \left( 1-\frac{Sum\, of\, Squares\, of\, Residuals}{Total\, Sum\, of\, Squares}\right) \nonumber \\= & {} \, \frac{\sum \nolimits _i{e^2_i}}{\sum \nolimits _i {{\left( y_i-{\overline{y}}\right) }^2}} \end{aligned}$$
    (21)
    where \(e_i\) denotes the residuals for each \({y}_i\), while \({\overline{y}}\) is the mean of the observed data.
     
  5. 5.

    Adjusted R Squared (\({{R}}^{{2}}_{{adj}}\))

    Adjusted \(R^2\) developed by Henri Theil (1961) includes a modification that accounts for the adjustment of the number of explanatory terms in a model relative to the number of data points.
    $$\begin{aligned} R^2_{adj}=\left( 1-\left( 1-R^2\right) \right) \frac{n-1}{n-p-1} \end{aligned}$$
    (22)
    where p is the total number of variables, and n denotes the sample size.

    For both regression R values, close relationships are determined if results are close to 1. In the case of the \({R}^2_{adj}\), the result would always be less or equal to that of \({R}^2\).

     
The results of all performed quality criterion methods and for all considered JP models (M-01 to M-10) are presented in Table 6, of “Appendix 2”.

Figure 12 shows a collective set of three plots, where Fig. 12a includes \(R^2\) errors, Fig. 12b includes RMSE errors, and Fig. 12c includes MaxAPE errors. The comparison of errors exhibit a consistent behaviour in all three error methodologies.

In general, it can be concluded that the stacked-ensemble model performed best, followed by the RBFANN, then the Kriging and lastly the RSM. It is reassuring to note, that besides finding the optimal values of the parameters for the Gaussian correlation function used by the Kriging model, in most cases, it is only slightly less accurate than the 3rd order RSM. The low RMSE values (for the RBFANN model) proved to perform slightly better than the ensemble model, for predicting the response in several JP models. As shown in the three subfigures, it appears that all models predicted values within reasonable prediction errors.
Fig. 12

A comparison of the error analysis: a\({{R}}^{{2}}\), b RMSE [%] and c MaxAPE [%], for the RSM, Kriging, RBFANN and Ensemble models and for each JP model

4.3 Results Comparison of Actual Against Predicted via Scatter Plots for RSM, Kriging, RBFANN and Ensemble Models

This section aims at illustrating a comparison analysis of the behaviour between actual and predicted entrainment ratio results. The scatter plots shown in Figs. 13, 14, 15, 16, and 17 provide a more comprehensive way to illustrate and analyse the behavioural fit between all four learning-models. Each figure comprises two sub-figures, which include comparison results between models without swirl and with swirl induced flow.
Fig. 13

Actual versus predicted values for: a M-01 (no swirl) and b M-06 (swirl)

Fig. 14

Actual versus predicted values for: a M-02 (no swirl) and b M-07 (swirl)

Fig. 15

Actual versus predicted values for: a M-03 (no swirl) and b M-08 (swirl)

Fig. 16

Actual versus predicted values for: a M-04 (no swirl) and b M-09 (swirl)

Fig. 17

Actual versus predicted values for: a M-05 (no swirl) and b M-10 (swirl)

Generally, it can be noted that in most cases, the points are scattered symmetrically around the \(45^{\circ }\) diagonal line and fitted within the (± 10 %) error band margin. However, the scattering tends to increase in cases involving a swirl induced flow. As expected this could be reasoned by the complex nature of the hydrodynamic behaviour of dual-phase flow inside the JP. This emphasises further the non-linearity behaviour which exists between design parameters and the JP performance when operated under dual-phase flow conditions. Once again, all cases showed that the RSM registered the highest scattering while the RBFANN and ensemble models resulted in minimal scattering and shared high similarity between one another. Another point of interest which is well demonstrated involves the plotting behaviour of the ensemble model. The overall results for such model, minimised drastically the scattering effect and smoothened prediction in areas where other models failed to perform within the (± 10 % error) bandwidth.

4.4 Optimisation Heuristics

Several types of optimisation algorithms exist. However, such methods can be used for constrained and unconstrained optimisation or are able to perform only one of the latter types.

In this work, constrained optimisation was applied. This method comprised the process of optimising an objective function with respect to some variables in the presence of constraints on those variables. The objective function (hard-type), is to be maximised, so, the negative part of the process function \(f\left( x\right)\) is taken in the constrained minimisation problems. The hard-type involved constraints which set conditions for the variable required to be satisfied [54].

Three optimisation algorithms which are well applicable in engineering-related problems include: (1) the multiple response desirability approach, (2) the interior-point algorithm (IPA), and (3) the augmented Langrangian genetic algorithm (ALGA). Also, a combination of the latter two types of optimisation algorithms can form a hybrid function, which in most cases turns to be more robust and accurate than single-optimisation algorithms. A brief overview of the methodology of the interior-point and the Augmented Langrangan genetic algorithms (both optimisation methods applied in this study) are given hereunder.

4.4.1 Interior-Point Algorithm

The interior-point algorithm comprises a variety of solvers capable of solving both linear and nonlinear convex optimisation problems which have constraint inequalities.

The types of interior-point algorithms which are found in the MATLAB Optimisation \(\hbox {Toolbox}^{TM}\), include the ‘fmincon’, ‘quadprog’, ‘lsqlin’, and ‘linprog’ solvers. All have good characteristics, such as low memory usage and the ability to solve large problems quickly. However, besides their simplicity, they are considered as slightly less accurate than other algorithms. Such inaccuracy may result from the calculation process, in which the internally calculated barrier function tends to keep iterating away from the set inequality constraint boundaries. The constrained minimisation involves finding a vector x, which is the local minimum of a scalar function \(f\left( x\right)\), subject to the set constraints on the allowable x.
$$\begin{aligned} {{\mathrm {min}}_{x} \,\,f\left( x\right) } \end{aligned}$$
(23)
such that one or more of the following equality constraints hold: \(c\left( x\right) \le 0\), \(ceq\left( x\right) =0\), \(A{{\cdot}}x\le b\), \(Aeq {{\cdot}}x=beq\), \(lb\le x\le ub\).
Moreover, the ‘fmincon’ as referred in the MATLAB Optimisation \({\hbox {Toolbox}}^{TM}\) solvers (being the algorithm used in this work) is based on a trust-region method for nonlinear minimisation. The trust-region method involves the approximation of a function f with a simpler function \(q{\cdot }\) This is done to increase the resolution, thus for understanding better the behaviour of the function f in a neighborhood around the point x. Thereby the neighborhood is referred to as the trust region [55, 56]. As provided in Eq. (24), this is computed in the form of a sub-problem in parallel to the main minimisation problem.
$$\begin{aligned} {{\mathrm {min}}_{s} \left\{ q\left( s\right) ,s \in N\right\} } \end{aligned}$$
(24)

4.4.2 ALGA: Augmented Langrangian Genetic Algorithm

The genetic algorithm (GA), is a method able to solve both constrained and unconstrained problems and involves a natural selection process that imitates the biological evolution. It solves problems that differ from other ‘standard’ optimisation algorithms; namely: stochastic, nonlinear and discontinuous.

The solving procedure comprises a continual process which first modifies the population of individual solutions, and then the algorithm picks random individuals (via random number generators) to form the modified population and use such individuals, referred to as ‘parents’, to produce the ‘children’ for the next generation. This procedure is repeated until the population evolves and yields the optimal prediction [57].

By default, the genetic algorithm uses the Augmented Lagrangian Genetic Algorithm (ALGA) to solve nonlinear problems without integer constraints.

The optimisation problem solved by the ALGA algorithm is given by Eq. (25):
$$\begin{aligned} {{\mathrm {min}}_{x} \,\, f\left( x\right) } \end{aligned}$$
(25)
having \({c}_i\left( x\right) \le 0,i=1\ldots m\), \({ceq}_i\left( x\right) =0\), \(i=m+1\ldots mt\), \(A{{\cdot }}x\le b\), \(Aeq{{\cdot }}x=beq\), \(lb\le x\le ub\); \(c\left( x\right)\) and \(ceq\left( x\right)\) denote the nonlinear inequality and equality constraints respectively, while m and mt describe the number of nonlinear inequality and total number of nonlinear constraints respectively.
However, in this study, bounded constraints optimisation problems were solved. Bounds and linear constraints were handled separately from nonlinear constraints. Thereby the sub-problem formulation for the ALGA as given in Eq. (26), included only the fitness function, and excluded the nonlinear constraint function.
$$\begin{aligned} {\Theta }\left( x,\lambda ,s,p\right)= & {} f\left( x\right) -\, \sum \limits ^m_{i=1}{{\lambda }_i}s_i{\mathrm {log} \left( s_i-c_i\left( x\right) \right) } \nonumber \\&+\sum \limits ^{ml}_{i=m+1}{{\lambda }_i}ceq_i\left( x\right) \nonumber \\&+\frac{\rho }{2}\sum \limits ^{ml}_{i=m+1}{ceq_i{\left( x\right) }^2} \end{aligned}$$
(26)
where the components \(\lambda _i\) of the vector \(\lambda\) are non-negative (Lagrange multiplier estimates). The elements \({s_i}\) of the vector s are non-negative shifts, while \(\rho\) is a positive penalty parameter.

4.4.3 Hybrid (Fmincon and Genetic Algorithm)

The hybrid optimisation method comprises a combination of 2 or more single-optimisation algorithms. Whenever a hybrid optimisation problem is to be solved, a hybrid function needs to be first formulated. A typical function will contain the order of execution of each algorithm. When the ALGA stops, the hybrid function will then start from the final point as returned by the same generic algorithm.

In this work, the ‘fminunc’ (the hybrid function), was set to be automatically called and initiates the execution with the optimised point found by the former method. Since ‘fminunc’ has its own options structure an additional argument has to be provided when specifying the hybrid function. The hybrid function can improve the accuracy of the solution.

4.5 Optimisation Results using the RSM, Kriging and RBFANN Models

As a final comparison of the accuracy of both individual and ensemble learning models, 6 pairs of optimisation problems were first formulated and then solved for each JP model. However, the term ‘pair’ exemplifies that a total of 12 optimisation problems were both formulated and solved for each JP model. In all the optimisation problems, the entrainment ratio was set to be maximised, at specific GVFs. This led to a total of 60 optimisation problems.

As provided in Tables 7, 8, 9, 10, and 11 in “Appendix 4”, the set objective functions denote traditional, single-objective/discipline optimisation problems. Also note that each one of the six boxes, namely ‘Prob. #1M-X & M-X’, includes a common objective function of (a) swirl and (b) no swirl flow respectively. In each optimisation problem, constraints are placed on the maximum and minimum allowable values of responses (not part of the objective function). Each optimisation is formulated and solved using a plurality of optimisation algorithms which were all adopted to solve the developed nonlinear optimisation models within the MATLAB platform. The three optimisation algorithms used included: (1) the ‘interior-point’ algorithm, (2) the Augmented Lagrangian genetic algorithm (ALGA) and (3) a hybrid formulation. The latter algorithm combines the former two optimisation algorithms.

Each optimisation is solved 4 times, firstly for RSM, secondly for Kriging, thirdly for RBFANN and finally for the ensemble models. In each case, three different starting points (the lower, middle and upper bounds) are used for each objective function to assess the number of analysis and gradient calls necessary to obtain the optimum design. To proceed with the three types of optimisation algorithms, four separate predictive functions (one for each learning model) were created. In the case of RSM, a dedicated script was generated to formulate a predictive function, which was later called in the respective optimisation algorithms. In the case of Kriging, the predictor function (the function which is based on the developed model, referred to as ‘dmodel’, was generated by the ooDACE toolbox), while for the RBFANN, a dedicated function called ‘MyNeuralNetworkFunction’, was purposely set to be automatically generated during the execution of the neural network model when using the MATLAB neural network toolbox application. Also, for the ensemble model, the same procedure as applied for Kriging was followed, but this time a new, namely ‘dmodel2’ was generated, thereby containing samples which were derived from the combination of model-averaging and stacking procedures.

During the applied procedures for solving all optimisation problems, a negative predictive response function was taken to maximise the objective function. This approach was adopted as the MATLAB toolbox software always tries to find the minimum of the fitness function.

The results of all 60 optimisation problems using all discussed learning models are summarised in Tables 12, 13, 14, 15, and 16 of “Appendix 4”. Note, that each table includes results for two JP bodies, thus a single nozzle body with and without swirl.

From the numerical results (Tables 8, 9, 10, 11, and 12), it can be noticed that in general, the optimisation requires fewer iterations and/or generations for the RSM than for the Kriging and RBFANN models. Note that iterations for the ensemble level-2 model are not included in the table, but results were identical to level-1 Kriging model. The variance in computational time, and iterations is attributed to the complicity of the respective model. Thus, fewer iterations were needed for the RSM simple 3rd order polynomial equations, while more iterations and generations were required for the Kriging models [comprising non-linear equations as given in Eqs. (4) to (12)] and the RBFANN models. Besides, the computational expense for all sets of approximations still lies in the order of seconds per evaluation. The optimum designs obtained from the RSM, Kriging, RBFANN and ensemble models are in their majority identical for each objective function. However, it is noted that there are some drastic variances in both X and At when maximising the entrainment ratio Er, using the interior-point algorithm based on predicted results from the RSM.

Furthermore, to check the accuracy of the predicted optima and prediction errors, the optimum design values X, GVF and At, for the level-1 Kriging model and the level-2 ensemble-Kriging model were again used as inputs to the predictor function of their respective model. Thus, the predicted responses of each model were then used to calculate the percentage difference between the actual and the predicted values. Note that the actual values were taken from the corresponding experimental results. Ultimately, to illustrate the benefits of the ensemble model, error residual plots were generated for each of the 10 JP models. Each respective pair of plots as exemplified from Figs. 18, 19, 20, 21, and 22, includes the same type of nozzle body, but have a different setup configuration. All labelled ‘a’ figures describe the results for JP bodies with no swirl-body mechanism, whereas those labelled ‘b’ involve the use of the swirl-body mechanism.
Fig. 18

Error residual plots for: a M-01 (no Swirl) and b M-06 (swirl flow)

Fig. 19

Error residual plots for: a M-02 (no Swirl) and b M-07 (swirl flow)

Fig. 20

Error residual plots for: a M-03 (no Swirl) and b M-08 (swirl flow)

Fig. 21

Error residual plots for: a M-04 (no Swirl) and b M-09 (swirl flow)

Fig. 22

Error residual plots for: a M-05 (no Swirl) and b M-10 (swirl flow)

A comparison performed (between the level-1 kriging models and the level-2 ensemble-kriging models) in each error residual plots, demonstrated that the ensemble-Kriging models predicted optima values with error less than 10 % for all cases, and less than 4 % in 90 % of the results.

4.6 Conclusions

This study has demonstrated the use of four learning-models for constructing global approximations to facilitate single-discipline nonlinear design optimisation.

The accuracies of each set of approximations were compared via numerical error analysis, graphical analysis and tested ability in the generation of accurate solutions for 60 different optimisation problems. It was found that the response surfaces registered the highest model fit error among all other models. It was clear that neither 1st nor 2nd order polynomial models proved capable to model the dual-phase-fluid driving JP nonlinear performance behaviour. Eventually, it was found that the 3rd order response surface models could be used to approximate the nonlinear design space within a reasonable margin of error. However, instabilities arose when considering higher order polynomials. This may have resulted from a lack of sample points to estimate all coefficients of the polynomial equation.

Kriging models in conjunction with a Gaussian correlation function (comprising of either 0th, 1st or 2nd order regression function), yield global approximations which were slightly more accurate than RSM.

The RBFANN models (including varied optima number of hidden neurons), showed a drastic decrease in prediction error. However, this improvement was not registered throughout all prediction ranges respective to all JP model cases.

Ultimately, the ensemble models (including a combination of model averaging and stacking methodologies) gave the best overall performance results throughout all prediction ranges. Such results illustrated the benefits of the stacked generalisation approach; thereby the combination of model information, including information from level-1 models with poor approximations capabilities. However, the performance increase attributed to the ensemble approach is not computationally cheap. The complexity and additional evaluations slowed down the modelling process.

Furthermore, a comparison between the three optimisation algorithms has been performed for solving a total of 60 optimisation problems. It was noted that the three algorithms, including: (a) the ‘interior-point’ algorithm, (b) the Augmented Lagrangian genetic algorithm (ALGA) and (c) a hybrid formulation, produced similar results for the same optimisation algorithms, but varied for the four types of applied global approximation methods.

As expected, higher accuracy and consistency were obtained for the optimisation algorithms based on the predicted data from both RBFANN and ensemble models.

The ensemble model (having Kriging as the level-2 learning model) estimated the optimised design parameters, which closely matched the actual data, thereby proved that such model-based optimiser was a powerful optimiser. Thus, in situations where the dual-phase driving-fluid JP is considered, where the data or the structure of the fitness function are highly nonlinear, the stacked generalisation approach might be an adequate first approach.

This comprehensive study should serve as a model-based optimiser tool to assist in the design of dual-phase surface JPs and in particular to cases involving dual-phase fluid composition as driving fluids. Also, due to the nature of the input model variables, mainly the nozzle-to-throat clearance X (considered as an adjustable parameter), such model-based optimisation tool has the potential to be implemented for on-line control purposes.

Notes

References

  1. 1.
    Jones DR (2001) A taxonomy of global optimization methods based on response surfaces. J Glob Optim 21:345–383MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    ESDU, Ejectors and jet pumps. Design and performance for incompressible liquid flows (1985)Google Scholar
  3. 3.
    Mali PV, Singh R, De S, Bhatta M (1999) Downhole ESP & surface multiphase pump—cost effective lift technology for isolated and marginal offshore field development. In: SPE Asia Pacific oil and gas conference and exhibition, society of petroleum engineersGoogle Scholar
  4. 4.
    Lastra R, Johnson I (2005) Feasibility study on application of multiphase pumping towards zero gas flaring in Nigeria. In: Nigeria annual international conference and exhibition, society of petroleum engineersGoogle Scholar
  5. 5.
    Peeran SM, Beg DN, Sarshar S (2013) Novel examples of the use of surface jet pumps (SJPs) to enhance production & processing. Case studies and lessons learnedGoogle Scholar
  6. 6.
    Kajero OT, Thorpe RB, Chen T, Wang B, Yao Y (2016) Kriging meta-model assisted calibration of computational fluid dynamics models. AIChE J 62:4308–4320CrossRefGoogle Scholar
  7. 7.
    Shimizu Y, Nakamura S, Kuzuhara S, Kurata S (1987) Studies of the configuration and performance of annular type jet pumps. J Fluids Eng 109:205–212CrossRefGoogle Scholar
  8. 8.
    Yang P, Chen H, Liu Y-W (2017) Application of response surface methodology and desirability approach to investigate and optimize the jet pump in a thermoacoustic stirling heat engine. Appl Therm Eng 127:1005–1014CrossRefGoogle Scholar
  9. 9.
    Lyu Q, Xiao Z, Zeng Q, Xiao L, Long X (2016) Implementation of design of experiment for structural optimization of annular jet pumps. J Mech Sci Technol 30:585–592CrossRefGoogle Scholar
  10. 10.
    Di Piazza A, Di Piazza MC, Vitale G (2009) A kriging-based partial shading analysis in a large photovoltaic field for energy forecast. In: International conference on renewable energies and power quality (ICREPQ’09) Valencia, SpainGoogle Scholar
  11. 11.
    Simpson T, Mistree F, Korte J, Mauery T (1998) Comparison of response surface and kriging models for multidisciplinary design optimization. In: 7th AIAA/USAF/NASA/ISSMO symposium on multidisciplinary analysis and optimization, p 4755Google Scholar
  12. 12.
    Shyy W, Papila N, Vaidyanathan R, Tucker K (2001) Global design optimization for aerodynamics and rocket propulsion components. Prog Aerosp Sci 37:59–118CrossRefGoogle Scholar
  13. 13.
    Simpson TW, Mauery TM, Korte JJ, Mistree F (2001) Kriging models for global approximation in simulation-based multidisciplinary design optimization. AIAA J 39:2233–2241CrossRefGoogle Scholar
  14. 14.
    Luo J, Lu W (2014) Comparison of surrogate models with different methods in groundwater remediation process. J Earth Syst Sci 123:1579–1589CrossRefGoogle Scholar
  15. 15.
    Box GE, Draper NR (1987) Empirical model-building and response surfaces. Wiley, New YorkzbMATHGoogle Scholar
  16. 16.
    Morris MD, Mitchell TJ (1995) Exploratory designs for computational experiments. J Stat Plan Inference 43:381–402zbMATHCrossRefGoogle Scholar
  17. 17.
    Myers RH, Montgomery DC, Anderson-Cook CM (2016) Response surface methodology: process and product optimization using designed experiments. WileyGoogle Scholar
  18. 18.
    Myers RH, Montgomery DC et al (1995) Response surface methodology: process and product optimization using designed experiments, vol 3. Wiley, New YorkzbMATHGoogle Scholar
  19. 19.
    Lin DK, Tu W (1995) Dual response surface optimization. J Qual Technol 27:34–39CrossRefGoogle Scholar
  20. 20.
    Myers RH (1999) Response surface methodology—current status and future directions. J Qual Technol 31:30–44CrossRefGoogle Scholar
  21. 21.
    Myers RH, Montgomery DC, Vining GG, Borror CM, Kowalski SM (2004) Response surface methodology: a retrospective and literature survey. J Qual Technol 36:53–77CrossRefGoogle Scholar
  22. 22.
    Sacks J, Schiller SB, Welch WJ (1989) Designs for computer experiments. Technometrics 31:41–47MathSciNetCrossRefGoogle Scholar
  23. 23.
    Lophaven SN, Nielsen HB, Søndergaard J (2002a) DACE: a Matlab kriging toolbox, vol 2. Citeseer, PrincetonGoogle Scholar
  24. 24.
    Lophaven SN, Nielsen HB, Søndergaard J (2002b) A matlab kriging toolbox, version 2.0. Technical University of Denmark, Kgs. LyngbyGoogle Scholar
  25. 25.
    Couckuyt I, Dhaene T, Demeester P (2012) ooDACE toolbox. Adv Eng Softw 49:1–13zbMATHCrossRefGoogle Scholar
  26. 26.
    Matheron G (1963) Principles of geostatistics. Econ Geol 58:1246–1266CrossRefGoogle Scholar
  27. 27.
    Cassie NA (1993) Statistics for spatial data, Revised edn. Wiley, New YorkGoogle Scholar
  28. 28.
    Queipo NV, Haftka RT, Shyy W, Goel T, Vaidyanathan R, Tucker PK (2005) Surrogate-based analysis and optimization. Prog Aerosp Sci 41:1–28CrossRefGoogle Scholar
  29. 29.
    Martin JD, Simpson TW (2005) Use of kriging models to approximate deterministic computer models. AIAA J 43:853–863CrossRefGoogle Scholar
  30. 30.
    Koehler J, Owen A (1996) 9 computer experiments. Handb Stat 13:261–308MathSciNetzbMATHCrossRefGoogle Scholar
  31. 31.
    McKay MD, Beckman RJ, Conover WJ (1979) Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21:239–245MathSciNetzbMATHGoogle Scholar
  32. 32.
    Osio IG, Amon CH (1996) An engineering design methodology with multistage Bayesian surrogates and optimal sampling. Res Eng Des 8:189–206CrossRefGoogle Scholar
  33. 33.
    Deutsch CV, Journel AG (1992) GSLIB: geostatistical software library and user’s guide. Oxford University Press, OxfordGoogle Scholar
  34. 34.
    Emery X (2005) Simple and ordinary multigaussian kriging for estimating recoverable reserves. Math Geol 37:295–319zbMATHCrossRefGoogle Scholar
  35. 35.
    Bayraktar H, Turalioglu FS (2005) A kriging-based approach for locating a sampling site–in the assessment of air quality. Stoch Env Res Risk Assess 19:301–305zbMATHCrossRefGoogle Scholar
  36. 36.
    Zimmerman D, Pavlik C, Ruggles A, Armstrong MP (1999) An experimental comparison of ordinary and universal kriging and inverse distance weighting. Math Geol 31:375–390CrossRefGoogle Scholar
  37. 37.
    Brus DJ, Heuvelink GB (2007) Optimization of sample patterns for universal kriging of environmental variables. Geoderma 138:86–95CrossRefGoogle Scholar
  38. 38.
    Sampson PD, Richards M, Szpiro AA, Bergen S, Sheppard L, Larson TV, Kaufman JD (2013) A regionalized national universal kriging model using partial least squares regression for estimating annual pm2. 5 concentrations in epidemiology. Atmos Environ 75:383–392CrossRefGoogle Scholar
  39. 39.
    Myung IJ (2003) Tutorial on maximum likelihood estimation. J Math Psychol 47:90–100MathSciNetzbMATHCrossRefGoogle Scholar
  40. 40.
    Broomhead DS, Lowe D (1988) Radial basis functions, multi-variable functional interpolation and adaptive networks, Technical Report, Royal Signals and Radar Establishment Malvern (United Kingdom)Google Scholar
  41. 41.
    Tinós R, Júnior LOM (2009) Use of the q-gaussian function in radial basis function networks. In: Foundations of computational intelligence, vol 5. Springer, pp 127–145Google Scholar
  42. 42.
    Park J, Sandberg IW (1991) Universal approximation using radial-basis-function networks. Neural Comput 3:246–257CrossRefGoogle Scholar
  43. 43.
    Gneiting T, Raftery AE (2005) Weather forecasting with ensemble methods. Science 310:248–249CrossRefGoogle Scholar
  44. 44.
    Giorgi F, Mearns LO (2002) Calculation of average, uncertainty range, and reliability of regional climate changes from aogcm simulations via the “reliability ensemble averaging”(rea) method. J Clim 15:1141–1158CrossRefGoogle Scholar
  45. 45.
    Wintle BA, McCarthy MA, Volinsky CT, Kavanagh RP (2003) The use of Bayesian model averaging to better represent uncertainty in ecological models. Conserv Biol 17:1579–1590CrossRefGoogle Scholar
  46. 46.
    Sloughter JM, Gneiting T, Raftery AE (2010) Probabilistic wind speed forecasting using ensembles and Bayesian model averaging. J Am Stat Assoc 105:25–35CrossRefGoogle Scholar
  47. 47.
    Zerpa LE, Queipo NV, Pintos S, Salager J-L (2005) An optimization methodology of alkaline-surfactant-polymer flooding processes using field scale numerical simulation and multiple surrogates. J Petrol Sci Eng 47:197–208CrossRefGoogle Scholar
  48. 48.
    Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198zbMATHCrossRefGoogle Scholar
  49. 49.
    Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models. In: Proceedings of the twenty-first international conference on Machine learning. ACM, p 18Google Scholar
  50. 50.
    Bartz-Beielstein T (2016) Stacked generalization of surrogate models-a practical approach, Bibliothek der Technischen Hochschule KölnGoogle Scholar
  51. 51.
    Mifsud D, Cao Y, Verdin P, Lao L (2018) The hydrodynamics of two-phase flows in the injection part of a conventional ejector. Int J Multiph Flow.  https://doi.org/10.1016/j.ijmultiphaseflow.2018.10.007 CrossRefGoogle Scholar
  52. 52.
    MacKay DJ (1992) Bayesian interpolation. Neural Comput 4:415–447zbMATHCrossRefGoogle Scholar
  53. 53.
    Foresee FD, Hagan MT (1997) Gauss-Newton approximation to Bayesian learning. In: Proceedings conference on neural networks (ICNN'97), vol 3. IEEE, pp 1930–1935Google Scholar
  54. 54.
    Nocedal J, Wright SJ (1999) Numerical optimization. Series in operations research, Springer, Berlin Heidelberg New YorkGoogle Scholar
  55. 55.
    Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, CambridgezbMATHCrossRefGoogle Scholar
  56. 56.
    Koh K, Kim S-J, Boyd S (2007) An interior-point method for large-scale l1-regularized logistic regression. J Mach Learn Res 8:1519–1555MathSciNetzbMATHGoogle Scholar
  57. 57.
    Conn AR, Gould NI, Toint P (1991) A globally convergent augmented lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J Numer Anal 28:545–572MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© The Author(s) 2019

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Energy and Power, Cranfield UniversityCranfieldUK
  2. 2.Institute of Engineering and Transport, MechanicalMalta College for Arts, Science and TechnologyPaolaMalta

Personalised recommendations