1 Introduction

Fiber reinforced polymer (FRP) sheets or laminates have been successfully applied as externally bonded reinforcement (EBR) systems for quite some time in civil engineering constructions. In particular, the use of FRP EBR system for strengthening and retrofitting the existing masonry buildings like monumental heritage is significantly increased. The reason for increasing the use of FRP materials for retrofitting masonry buildings is due to the favorable properties of FRP EBR techniques such as high strength and resistance to corrosion, high durability, non-magnetic, high fatigue resistance, and no increment in mass and stiffness of the structure [1]. In common practice, the FRP EBR technique is designed based on several empirical and semi-empirical design equations. The design procedure of FRP EBR technique including installation, loads, safety requirements and acceptable features of materials has been coded in several guidelines. Different in-plane or out-of-plane failure modes of retrofitted masonry units have also been investigated in recent years [2, 3].

The bond interface between FRP and masonry units is often one of the weakest links in tensile strengthened masonry structures, and debonding at this interface is one of the critical failure modes of FRP EBR systems. During the design procedure, bond strength at the interface level must be taken into account because of its sudden and brittle failure. Debonding process is very complex due to several preeminent parameters involved in this process including mechanical properties of masonry blocks, mortar joints, adhesive, and FRP reinforcement. This mechanism involves the presence of cracks a few millimeters underneath the bond line inside the masonry units [4]. In fact, FRP debonding will initiate in the substrate if the bond strength directly depends on the tensile strength of masonry units.

Various experimental studies have been conducted to investigate the effect of different influential parameters on debonding failure and a number of empirical and analytical prediction models have been developed. As examples of such studies, one can refer to [5, 6]. However, due to the complexity and the brittle characteristics of debonding failure mechanism, a remarkable scatter has been observed between predicted maximum bond strength and the measured ones [7]. Many of these models are highly empirical, and their predictive abilities are limited by the corresponding data sets from which they were derived and do not provide a reliable prediction of maximum transferable load. Recently, soft computing methods have been successfully used to develop predictive models for different problems in civil engineering [8,9,10,11]. In particular, Mansouri and Kisi [7] evaluated the applications of neuro-fuzzy and neural network approaches for estimation of debonding strength for masonry elements retrofitted with FRP composites using eight available experimental datasets consisting of altogether 109 data points. Results showed that approaches can be successfully used to predict the bond strength. However, these methods do not give sufficient insight into the generated models and are not as easy to use as the empirical formulas. Gene expression programming has been used by Mansouri et al. [12] to predict the debonding strength of retrofitted masonry members using ten available experimental databases [5] consisting of 134 data points. Although the approach has successfully been used to predict the bond strength and results in an explicit formulation of maximum load but it is well known that compiling more comprehensive database is requisite in generating exact predictive models.

The main purpose of this study is to employ M5′ and Multivariate adaptive regression splines (MARS) algorithms to develop transparent models to predict maximum bond strength and determine the most effective parameters. To achieve this aim, a new comprehensive database collected from different sources in the literature (230 data points from 23 available experimental studies). The M5′ model tree as a robust data-based method provides understandable formulas that allow users to have more insight in the physics of the phenomenon [13, 14]. The MARS algorithm is also known as a self-organized predictive approach that can discover complex behaviors between input and output parameters and determine the most effective parameters for predicting the maximum bond strength [15]. Five different predictive variables that characterize the mechanical and geometrical properties related to the FRP rods and the substrate including: (1) the reinforcement width, (2) ratio between the widths of FRP reinforcement and masonry unit, (3) tensile strength of substrate, (4) axial strength of reinforcement and (5) bond length are considered as input variables. A comparative study is implemented to evaluate the performances of the developed models against the most common design equations in literature. In addition, the safety analysis based on the Demerit Points Classification (DPC) scale has been also done to measure the reliability of the proposed formulations. It is demonstrated that the M5′ and MARS algorithms can successfully be used as reliable alternative approaches to predict the bond strength of FRP EBR systems.

2 Background

Failure of masonry or concrete elements externally bonded with Fiber Reinforced Polymer (FRP) under shear stresses corresponds to debonding of reinforcement, and the separation almost occurs a few millimeters below the adhesive. The externally bond strength between FRP and masonry substrate have been tested and analyzed generally on the basis of a single pull-off test. In the test, the direct tensile force Papp is applied to the FRP plate bonded to a masonry substrate in order to determine the maximum transferable load (Fmax). A sketch of a single pull-off test for FRP to masonry unit is given in Fig. 1. The shear behavior of reinforcement elements externally bonded to a masonry substrate is often modeled by a shear stress-tangential slip law [16]. This model can be more refined by a normal stress-displacement law [7].

Fig. 1
figure 1

Schematic diagram of single pull-off test for FRP to masonry substrate

It is well-known that the transmissible force increases asymptotically up to a maximum value by increasing the length of the substrate [17]. The maximum transmissible load is related to geometrical/mechanical properties of the plate and fracture energy of interface law. In general, the maximum transferable force by an anchorage of infinite length for substrates made of brittle materials with FRP externally bonded is written as:

$$F_{max} = b_{p} \mathop \smallint \limits_{0}^{\infty } \tau \left( x \right) \cdot d\left( x \right)$$
(1)

where x is the longitudinal axis, \(\tau \left( x \right)\) is the bond shear stress distribution along the interface and \(b_{p}\) is the reinforcement width. The energy required to achieve a local bond element to complete shear debonding is expressed as the fracture energy and it is related to the area below the interface bond shear stress–slip law, \(\left( {\tau - s} \right)\) and can be defined as:

$$\varGamma_{p} = \mathop \smallint \limits_{0}^{\infty } \tau \left( s \right) \cdot d\left( s \right)$$
(2)

The relationship between the fracture energy of the interface law, Гp, and maximum transferable force \(F_{max}\) (debonding load) for a bond–slip model based on a general interface bond shear stress–slip law \(\tau \left( s \right)\), is written as [17]:

$$F_{max} = b_{p} \sqrt {2E_{p} t_{p} \varGamma_{p} }$$
(3)

where \(E_{p}\) is the Young’s modulus of the FRP reinforcement and \(t_{p}\) is the thickness of the FRP.

Depending on the shape of the interface law, a function of the strength of the substrate subjected to cracking can be defined as the fracture energy, Гp, in the following form:

$$\varGamma_{p} = k_{\tau } k_{b} \tau_{max}$$
(4)

in which \(k_{b}\) representes the effect of the reinforcement width on \(F_{max}\) through a width factor coefficient. \(k_{\tau }\) is the coefficient with dimension \(\left( L \right)\) expressing the shape of the \(\left( {\tau - s} \right)\) law. Complete debonding \(\left( {\tau = 0} \right)\) occurs where \(S_{ult}\) is the ultimate slip of the bond law \(\tau \left( s \right)\) and \(k_{\tau } = S_{ult} /2\) with condition that the bond–slip law is stated as a bilinear law [17].

3 Materials and methods

The methodology adopted in this study is based on two well-known and practical decision tree algorithms namely the M5´ model tree and multivariate adaptive regression splines (MARS) approach. The distinctive features of the M5′ and MARS algorithms are employed to investigate the shear behavior of FRP EBR systems. The details of algorithms are presented as follows:

3.1 M5′ algorithm

M5′ algorithm is an efficient technique for analyzing complex systems with very high dimensionality-up to hundreds of attributes. Quinlan [13] presented the M5 algorithm to solve regression and classification problems. Later, Wang and Witten [14] improved the M5 algorithm to so-called M5′ algorithm. The M5′ algorithm divides a complex problem into a number of simple sub-problems and provides the response as a combination of the solutions of these sub-problems. The M5′ algorithm generally includes three processes: (i) building the initial tree; (ii) pruning the tree; and (iii) smoothing. An initial tree is constructed by dividing data space into smaller subspaces based on the divide and conquer method [18]. For more illustration, the structure of a derived decision tree with two input parameters based on the method of dividing the sample space is depicted in Fig. 2. The developed model tree looks like an inverted tree in which the root is on the top while the leaves are at the bottom. The leaves (i.e., subspaces) are identified based on the divide and conquer method (see Fig. 2a). Then, a multivariate linear regression (MLR) model is created at each leave (see Fig. 2b).

Fig. 2
figure 2

The M5′ algorithm a splitting the input space X1  ×  X2 to 4 subsets, b final developed model based on algorithm

As shown in Fig. 2a, there are some splitting values that divide the whole data sets into several subsets. These splitting values are selected from input variables that maximize the expected error reduction at each node. The standard deviation reduction, SDR, is calculated as a measure of the error at each node as follows:

$$SDR = sd\left( T \right) - \mathop \sum \limits_{i} \frac{{\left| {T_{i} } \right|}}{\left| T \right|} \times sd\left( {T_{i} } \right)$$
(5)

in which T is the set of instances that reach the node, Ti is resulted from splitting the node according to a given attribute and split value, and sd is the standard variation. If the output values of all instances that reach the node vary by less than 5% of the standard deviation of the original instance set or when few instances remain, the splitting procedure will automatically cease. After building the tree, a MLR model is created at the bottommost subspace. During generating model tree and MLR model at each leave, the over-fitting problem is often inevitable. To mitigate this problem, a pruning procedure is usually applied. To detect the over-fitting problem, the algorithm provides an estimation of expected errors for the testing data set. To achieve this, the average of absolute errors between the estimated response values by the unpruned tree and the actual ones are calculated for the training instances that reached each node. However, this average value can underestimate the expected error for the validation dataset; in fact, this problem happens because the generated tree was actually built based on the training dataset. Consequently, the response values are multiplied by the factor (n + v)/(n − v), where n is the number of training data vectors that reach the node and v is the number of attributes in the model that represent the output value at that node. As a result, the generated leaves that their estimated errors are bigger than previous nodes (parents) can be removed by the algorithm.

To mitigate the sharp discontinuousness between adjacent leaves (classes), the M5 algorithm employs the smoothing phase at various leaves of the pruned tree [19]. In this procedure, the estimated value of each leave is filtered along the path back to the root. The value at each node that is joining with the estimated value of the linear model for that node is calculated as follows:

$$P^{\prime} = \frac{np + kq}{n + k}$$
(6)

where P′ is the prediction value which is passed up to the next higher node, p is the prediction which is passed to the current node from the below, q is the predicted value by the developed model at that node, n is the number of training instances reached to the previous node, and k is a constant Wang and Witten [14]. Finally, the M5′ algorithm yields a set of linear multivariable equations (rules) to estimate the target values as shown in Fig. 2b.

4 Multivariate adaptive regression splines algorithm

Multivariate adaptive regression splines (MARS) was firstly proposed by Friedman [15] as a nonlinear and nonparametric intelligent computing regression algorithm. The MARS algorithm models the nonlinear relationship between input and output variables by employing a series of piecewise linear or cubic segments (splines). The final developed model based on the MARS algorithm is presented as a linear combination of the piecewise equation which is also known as the basis functions (BFs). The slope and curvature of each BFs change from one segment to the next one. The segments are connected to each other at a point that called knot. In fact, a knot specifies the end of one region of data and the beginning of another. Unlike the well-known parametric linear regression methods, the MARS algorithm can model the nonlinear relationship between predictive and response values with more flexibility. The algorithm makes no assumptions about the functional relationship between response and predictor variables. To achieve more flexibility, the MARS algorithm searches all possible interactions between input variables by checking all degrees of interactions. On the other hand, it can discover the complex structures latent in a high-dimensional problem by considering all functional forms and also interactions. The general MARS function can be given as follows:

$$\tilde{f}\left( x \right) = \beta_{0} + \mathop \sum \limits_{i = 1}^{m} \beta_{m} \lambda_{m} \left( x \right)$$
(7)

where \(\tilde{f}(x)\) is the predicted response, β0 and βm are constants which must be determined based on an optimization problem and m is the number of basis functions included into the model. It should be stated that the basis functions can also be generated based on a product of two or more spline functions for different predictive variables. In general, the spline basis function, λm(x), can be calculated as:

$$\lambda_{m} \left( x \right) = \mathop \prod \limits_{k = 1}^{m} \left[ {S_{{\left( {k,m} \right)}} \left( {X_{{v\left( {k,m} \right)}} - t_{{\left( {k,m} \right)}} } \right)} \right]$$
(8)

where (k,m) is the number of knots, s(k,m) takes either 1 or − 1 and represents the right/left regions of the associated step function, v(k,m) is the label of the predictor variable and t(k,m) is the knot location.

In order to illustrate how the MARS algorithm can detect data patterns using piecewise linear spline functions, an example is given in Fig. 3. The MARS mathematical equation is as follows for this example:

$$y \, = { 4} . 3 2 { + 2} \times BF_{1} + 1.5 \times BF_{2} - 3.2 \times BF_{3} \,$$
(9)

in which BF1= max(0, x − 4), BF2= max(0, 4− x), and BF3= max(0, x − 10) and max is defined as: max(x1, x2) is equal to x1 if x1> x2, else x2. x =4 and x =10 are the location of knots. These two knots divide the x range into three intervals where different relationships are detected.

Fig. 3
figure 3

Knots and linear splines for a simple MARS example

MARS produces the basis functions by searching in a stepwise manner. The knot locations are determined based on an adaptive regression. The final predictive model is generated based on a two-stage forward and backward procedures. The MARS approach overfits to training database by including a great number of basis functions in the forward stage. To avoid the overfitting problem, the redundant basis functions are removed from Eq. (8) in the backward stage. MARS employs the Generalized Cross-Validation (GCV) as a criterion to remove the redundant basis functions. The expression of GCV is defined as [20]:

$$GCV = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left[ {y_{i} - \hat{f}\left( {x_{i} } \right)} \right]^{2} /\left[ {1 - \frac{C\left( B \right)}{N}} \right]^{2}$$
(10)

in which N is the number of data and C(B) is a complexity penalty that increases with the number of basis function (BF) in the model. The complexity penalty is given as:

$$C\left( B \right) = \left( {B + 1} \right) + dB$$
(11)

where d is a penalty for each BF included into the model and B is the number of basis functions [15].

5 Model development

5.1 Influential parameters

According to previous studies in the literature, several preeminent parameters can involve in the debonding process of FRP EBR systems. The most important factors commonly used in the previous models and codes including the reinforcement width, ratio between widths of FRP reinforcement and masonry unit, tensile strength of substrate, axial strength of reinforcement, and bond length are employed as the predictor variables in the present study. Consequently, the formulation of maximum bond strength is considered as follows:

$$F_{max} = f\left( {f_{mt} , E_{p} t_{p} , b_{b} , L_{b} ,k_{b} } \right)$$
(12)

where fmt is tensile strength of substrate, Eptp is axial rigidity of FRP reinforcement, bp is the width of FRP reinforcement, Lb is the bond length, and kb is width factor, which the common expression for this factor is as follow [21]:

$$k_{b} = \sqrt {\left( {3 - b_{p} /b_{m} } \right)/\left( {1 + b_{p} /b_{m} } \right)}$$
(13)

The width factor, as defined in the Eq. (13), particularly depends on the ratio between FRP plate width and masonry substrate width (bp/bm). Recent studies confirmed that the failure mechanisms leading to debonding are three dimensional when the FRP reinforcement width is smaller than the substrate width [16, 22]. In particular, when the bond length is comparable with the width, the bond stresses can spread laterally over the actual bonded width and, thus a volume of material is involved in the debonding mechanism. In order to take this three-dimensional effects into account in the final models, kb is considered as an input variable.

5.2 Dataset description

In this study, for the first time, a comprehensive collection of 575 test series obtained from published literature [4, 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42] are used to model maximum bond strength between FRP reinforcement and masonry units. It should be noted that some tests were conducted several times on a particular specimen for the same conditions. In this regard, the average of maximum bond strength recorded in identical tests is used to develop models. Therefore, 230 distinctive data points are derived from 575 test series. Details of datasets used and the range of input and output variables are presented in Table 1. Test results obtained from single or double lap shear test have been only employed to ensure that the results are more homogeneous. The Database consists of different types of FRP materials including carbon (C), glass (G), basalt (B) and steel (S). Different types of masonry units including clay brick units (CB); ancient and recent brick units (B-old and B-new, respectively); tuff natural stones (NS-tuff); Yellow and Gray tuff (YT and GT); lime stones (LS); lime natural stones (NS-Limes); calcareous stones (CS) were considered. The models are developed for all type of FRP materials and mentioned masonry units. For more visualization, matrix-plot of input and output parameters is represented in Fig. 4. This figure considers all possible scatter plots that can be plotted between input and output parameters. In fact, the matrix-plot is used to assess the existing relationships between pairs of input and output variables at once. This type of plot is effective when you have many variables and you would like to see relationships between pairs of variables. For example, all possible scatterplots between kb parameter and other input and output variables are demonstrated in the first column of this matrix-plot. The histograms of input and output parameters for the whole data points are also illustrated in the diagonal of this matrix. It should be noted that the results of developed models are more reliable in ranges of input variables in which data points are more concentrated.

Table 1 Details of database and range of input and output variables
Fig. 4
figure 4

Box-plots of input and output variables

To develop new models based on M5′ and MARS algorithms, the whole dataset is randomly divided into two parts such that 70% (161 data vectors) were used for the learning process and 30% (69 data vectors) were employed to test the developed models. The training and testing databases are presented in Tables 8 and 9, respectively, in the “Appendix” section.

5.3 The M5′ model

M5′ algorithm divides data spaces into smaller subspaces and it builds a local multivariate linear regression (MLR) in each subspace. In fact, creating only a local MLR to predict response values is one of the disadvantages of the M5´ algorithm. To mitigate this limitation, all input and output parameters are transformed to logarithmic space and the M5´ algorithm is developed in this space. Then, the developed local MLR model in each subspace is rewritten as:

$$F_{\hbox{max} } = a_{1} f_{mt}^{{a_{2} }} \left( {E_{p} t_{p} } \right)^{{a_{3} }} b_{f}^{{a_{4} }} L_{b}^{{a_{5} }} \left( {\frac{{b_{p} }}{{b_{m} }}} \right)^{{a_{6} }}$$
(14)

where a1, a2,…, a6 are constants. The M5′ algorithm provides its predictive model in the form of rules. These rules are quite simple and user-friendly that can be easily used to develop a practical model for predicting the bond strength. The developed model tree is shown in Fig. 5. The developed rules are as follows:

$$LM_{1} :F_{\hbox{max} } = 0.0166f_{mt}^{0.2072} \left( {E_{p} t_{p} } \right)^{0.2785} b_{p}^{0.088} L_{b}^{0.9004} \left( {\frac{{b_{p} }}{{b_{m} }}} \right)^{ - 0.0407}$$
(15a)
$$LM_{2} :F_{\hbox{max} } = 0.9844f_{mt}^{0.0881} \left( {E_{p} t_{p} } \right)^{0.0566} b_{p}^{0.088} L_{b}^{0.2015} \left( {\frac{{b_{p} }}{{b_{m} }}} \right)^{ - 0.0407}$$
(15b)
$$LM_{3} :F_{\hbox{max} } = 1.0873f_{mt}^{0.1149} \left( {E_{p} t_{p} } \right)^{0.0566} b_{p}^{0.088} L_{b}^{0.2015} \left( {\frac{{b_{p} }}{{b_{m} }}} \right)^{ - 0.0407}$$
(15c)
$$LM_{4} :F_{\hbox{max} } = 0.0094f_{mt}^{0.1303} \left( {E_{p} t_{p} } \right)^{0.0485} b_{p}^{0.2211} L_{b}^{1.1385} \left( {\frac{{b_{p} }}{{b_{m} }}} \right)^{ - 0.1022}$$
(15d)

It is seen that bp and bp/bm parameters were selected as the major classification parameters and the equations were separated at bp= 50.5 mm and bp/bm= 0.47. According to the selected splitting parameters, it can be interpreted that the width of reinforcement and its ratio with substrate width are very important parameters in the prediction of maximum bond strength. This is in line with the obtained results in the previous subsection, where the correlation coefficient (R) between the bp and Fmax was one of the highest among other parameters (see Fig. 4). It should be noted that the splitting value should not necessarily have a specific physical interpretation because it is determined based on minimizing the prediction error [43]. However, most of the underlying physical interpretations of derived equations are in line with structural engineering sense.

Fig. 5
figure 5

The M5′ data splitting diagrams

For example, the developed equations show that the maximum bond strength decreases as the ratio between bp and bm (bp/bm) increases. This can be justified that as the bp/bm increases, the effects of three-dimensional shear transfer increases. When the bp/bm< 1, the larger failure surface involved in the bond mechanism, and consequently, the bond stresses can spread laterally over the bond width (bp). On the contrary, when the ratio bp/bm tends to 1 (corresponding to plane strain conditions), the fracture energy per unit of FRP-width is smaller than in case of bp/bm< 1. As seen in Eq. (15), the M5′ algorithm correctly captured this underlying physical concept. Furthermore, according to the developed tree in Fig. 5, the prediction of maximum bond strength for smaller FRP-width (< 50.5 mm) is more complex and the algorithm divided the space of problem into three subspaces. This is due to this fact that considering the effects of three-dimensional shear transfer in real problems is a quite complex task, which is also addressed in [44].

Furthermore, the developed equations show that the bond strength increases as bond length (Lb), the mean tensile strength of substrate (fmt) and axial strength of reinforcement (Eptp) increase in all the rules. Hereinafter, the maximum bond strength refers to the maximum transmissible force (Fmax). These results are all expected cases in the investigation of the debonding process. However, the nonlinear relationships between these parameters and maximum bond strength are different in each class. In order to have a better illustration of the maximum bond strength behaviors, scatter diagrams of the observed maximum bond strength and Lb, Eptp and fmt parameters are demonstrated for each class in Fig. 6. According to Eq. (15), the maximum bond strength approximately increases linearly as Lb increases in both classes 1 (LM1) and 4 (LM4). On the other hand, experimental observations confirmed this behavior and the M5′ algorithm correctly captured this behavior (see Fig. 6). This figure also shows that the maximum bond strength is slightly affected by varying the Eptp parameter, especially in LM2, LM3 and LM4 classes. The most effect of Eptp on bond strength was observed in class 1 in which bond width and the ratio between bond width and substrate width were the smallest. According to Eq. (15), the Eptp parameter has also the larger power in LM1 while its power was remarkably smaller in other classes. The fmt parameter has also approximately similar trends in each class. As seen in Fig. 6 and Eq. (15), the most effect of fmt on bond strength was observed in classes 1 and 2 while its influence was less in classes 2 and 3. However, the influence of fmt in class 3 is more than its influence in class 2 and this parameter plays as a splitting criterion for separating these classes. As a result, it can be concluded that the maximum bond strength behavior is more complicated in ranges of bp≤ 50.5 mm and bp/bm> 0.47 (LM2 and LM3) than other ranges, therefore, more experimental investigations could be of interest for future work in these ranges.

Fig. 6
figure 6

The variations of Fmax with input variables for different classes

In general, most results derived from M5′ algorithm are based on information inherent in the collected database. As stated, the obtained results physically sound but some nonlinear relationships between input and output parameters may be different in comparison with other existing equations. For example, the relationship between Lb (or Le, effective length) and Fmax in most equations in literature are linear while this linear relation was found by M5′ algorithm only in two classes. However, it is shown that the M5′ results are more compatible with experimental observations.

5.4 The MARS model

As stated in the previous section, the MARS algorithm can be developed based on either piecewise linear or cubic splines. In this study, both segments have been used to develop model; but the developed model based on piecewise linear segment, which had better performance, is only presented. After presenting the training data set to the MARS algorithm, the following equation for maximum bond strength is derived:

$$F_{\hbox{max} } (kN) = \, 17 - 0.71 \times BF_{1} - 1.2 \times BF_{2} + 0.25 \times BF_{3} - 0.076 \times BF_{4} + 3.8 \times 10^{3} \times BF_{5} \quad \, - 0.019 \times BF_{6} - 0.94 \times BF_{7} + 0.11BF_{8} + 0.067BF_{9} - 7.5 \times 10^{4} BF_{10} \quad + 3.7 \times 10^{3} BF_{11} - 3.8 \times 10^{3} BF_{12} - 3BF_{13}$$
(16)

The basis functions (BFs) and their corresponding equations are listed in Table 2. It should be noted that, from Table 2, that of 14 BFs, 12 BFs with interaction terms are integrated into this model. This observation confirms that the developed MARS model is just not simply based on additive splines and interaction terms between different spines also play a remarkable role in the developing process. According to Eq. (16) and Table 2, the MARS algorithm can capture the nonlinear relationship between input and output variables without any additional effort for considering a priori assumption about the relationship between input and response variables. This feature of the MARS algorithm is more practical when the dimension and parameters involved in the problem increase.

Table 2 The basis functions (BFs) of the developed MARS model

6 Results and discussions

To statistically measure the performances of the developed models, four statistical error parameters were employed as follows: mean absolute error (MAE), root mean square error (RMSE), correlation coefficient (R) and coefficient of determination (R2).

$$MAE = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left| {P_{i} - O_{i} } \right|}}{N}$$
(17)
$$RMSE = \sqrt {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {P_{i} - O_{i} } \right)^{2} }$$
(18)
$$R = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {P_{i} - O_{i} } \right)^{2} }}{{\sqrt {\left( {P_{i} - P_{m} } \right)^{2} } \sqrt {\left( {O_{i} - O_{m} } \right)^{2} } }}$$
(19)
$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {O_{i} - P_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{N} \left( {O_{i} - O_{m} } \right)^{2} }}$$
(20)

where Oi is the measured value; Pi stands for prediction values; N is the number of data points; Om is the mean value of observations; and Pm is the mean value of predictions.

6.1 Performance analysis

The number of data used to develop a reliable model based on data mining approaches plays a crucial role in the modeling process. Frank and Todeschini [45] suggested that the minimum ratio between the number of data used and the number of involved variables should be 3. A safer value of 5 can be more conservative. In the present study, this ratio is remarkably more and is equal to 230/7 = 32.85. For evaluation of the developed models, the scatter diagrams between measured and predicted maximum bond strength based on M5′ and MARS algorithms for testing and training data points are shown in Fig. 7. As seen, there are little scatters between predicted and measured values of bond strength around the optimal line in both training and testing sets; and data points mostly concentrated around this line. For further verification of the developed models, analytical analysis of statistical error parameters for training and testing data sets are presented in Table 3.

Fig. 7
figure 7

Comparisons between measured and predicted values of Fmax (kN). a M5′ and b MARS

Table 3 The performances of the developed models

Most previous studies applied the correlation coefficient (R) to measure the correlation between observed and predicted values. Smith [46] suggested that if |R| > 0.8, there is a strong correlation between measured and predicted values. However, R cannot necessarily be considered as an indicator for the goodness of correlation between observed and predicted values; particularly, when data range is very wide and the data points distributed about their mean. Therefore, in the present study, the R2 parameter is employed as an unbiased estimate and also a better measure for evaluating the correlation between observed and predicted values. The MAE and RMSE are also used to measure the absolute difference between predicted and measured values. These values must be near to zero for having a close match between observed and predicted values. As shown in Table 3, the MARS model constructed based on piecewise linear basis segments outperformed the models of MARS with cubic segments and M5′ in terms of accuracy for both training and testing data sets. For example, it decreases the RMSE value by 11% and 35%, respectively, and increases the R2 values by 3.4% and 23.2% in respect to MARS model with cubic segment and M5′ model, respectively. It should be noted that the performances of the developed MARS with cubic segments and M5′ models are also acceptable and MARS with cubic segments also outperforms the M5′ model.

The performances of the developed MARS (with linear segments) and M5′ models are also compared with the most common existing equations including Tanaka [47], Sato (from [48]), Iso (from [48]), Yang et al. [49], Neubauer and Rostasy [50], Willis et al. [6], Kashyap et al. [5], Maeda et al. [51], Khalifa et al. [52], De Lorenzis et al. [53], ACI guideline [54], and CNR guideline [55]. The statistical error parameters related to these equations are presented in Table 4. As shown, the accuracy of these equations is remarkably limited. However, the CNR model has the best performance among other existing design equations. The proposed M5′ and MARS models outperform the CNR model by improving the R2 value by 51% and 86%, respectively and the RMSE value by 42% and 55%, respectively. Therefore, the proposed models can predict the target values of the maximum bond strength with an acceptable level of accuracy and are remarkably more accurate than the available empirical models over a wide range of input parameters.

Table 4 Comparison of the developed models with the most common equations

The histograms of discrepancy ratio between measured and predicted maximum bond strength (DR = Fmax, meas/Fmax, pre) by equations of Tanaka [47], Yang et al. [49], Neubauer and Rostasy [50], Khalifa et al. [52], and the proposed ones (M5′ and MARS) are illustrated in Fig. 8. The errors of a good predictive equation should be close to one and also be symmetrical around their mean values. A wider distribution generally leads to more uncertainty. As seen in Fig. 8, the distribution errors of mentioned empirical equations are remarkably wide and their predictions generally underestimate the values of maximum bond strength. On the contrary, the distribution errors of the proposed M5′ and MARS models concentrate around 1 and they just slightly underestimate the bond strength.

Fig. 8
figure 8

The histograms of discrepancy ratio between measured and predicted maximum bond strength by different models

The variations of DR values of different empirical equations and the proposed ones as a function of axial rigidity (Eptp) are shown in Fig. 9. A good model has errors that are independent of the input parameters [56]. The errors of Tanaka [47], Yang et al. [49], Neubauer and Rostasy [50], and Khalifa et al. [52] formulas are very sensitive to the variation of Eptp. However, according to Fig. 9, the model trees had a better performance than the others in this aspect. In addition, this figure shows that Eptp is included in a better way in the developed models. A similar process has also been observed (not shown) for other input parameters, such as bond length, the width of reinforcement, the ratio between the width of reinforcement and substrate, and the tensile strength.

Fig. 9
figure 9

The variations of DR values of different models as a function of axial rigidity (Eptp)

6.2 Sensitivity analysis

In this study, a sensitivity analysis based on the distinctive features of MARS algorithm is done to specify the importance of each input parameter in the developing process of the final predictive model. To achieve this, the Analysis of Variances (ANOVA) decomposition of the developed MARS models based on both with piecewise linear and cubic segments is presented in Table 5. In the first column, the number of each ANOVA function is presented. The second column gives the standard deviation of the corresponding ANOVA functions. In fact, the standard deviation can be interpreted as an indicator for measuring the relative contribution of each ANOVA function in developing process of the final predictive model. The most important indicator is the GCV score which is given in the third column of Table 5. The GCV score for each ANOVA function is calculated based on this assumption that the mentioned ANOVA function is removed from the final developed model. The more GCV score indicates that the importance of the mentioned ANOVA function is more significant in comparison with other functions for developing the final predictive model. According to Eq. (10), the GCV score is a criterion for both accuracy and complexity of the developed model. In other words, the GCV scores reported in Table 5 indicate how much the accuracy and complexity of the developed model can change by removing an input variable from the model development process The input variables associated with each ANOVA function are also presented in the last column of Table 5 [15]. This ability of MARS can be employed to specify the most influential parameters in the prediction of bond strength of EBR FRP systems.

Table 5 The analysis of variances (ANOVA) decomposition of the developed MARS models

In Fig. 10, the relative importance of the input parameters for the developed MARS models based on ANOVA analysis is depicted. The relative importance of each input parameter was determined based on the increase of GCV value that caused by removing the considered variables from the developed MARS model. For example, according to Table 5, bp parameter was involved in the ANOVA functions of 4, 6, and 8. In order to calculate the importance of this variable, the summation of GCV scores of the mentioned ANOVA functions are calculated, and then it was divided to the global GCV value, i.e., the summation of all GCV values in Table 5. Therefore, according to Table 5, the ANOVA functions with parameters of bp and bp/bm remarkably improved the GCV values. Therefore, it can be expected that these parameters have remarkable contributions in developing prediction models for bond strength. As graphically shown in Fig. 10, bp/bm and bp were the most important parameters in the developed MARS models and Lb, fmt, and Eptp were the other important parameters, respectively.

Fig. 10
figure 10

The relative importance of input variables in developing MARS models

6.3 Safety analysis

To have a safe and economical design of EBR FRP systems, the reliability of developed models for prediction of maximum bond strength between FRP reinforcements and masonry units must be investigated. To achieve this, the box plot of discrepancy ratio between observed and predicted maximum bond strength (DR) is used to measure the reliability and uncertainty of the existing and developed models. Box plot is a convenient graphical way to illustrate data points through their quartiles. The variations in samples of a statistical population are monitored without making any prior assumptions about the underlying distribution. The space between the different parts of the box can be assumed as an indicator for the degree of dispersion (spread) or skewness of data points. According to the statistical analysis of previous sections, it can be expected that the developed M5′ and MARS models are more reliable than other existing equations. However, the uncertainty/safety factor of existing models and the developed ones cannot be determined. To mitigate this limitation, a safety factor can be attributed to different models based on the acceptable level of risk.

Figure 11 illustrates the box plots of different models including the proposed ones and the most accurate empirical equations according to the previous results. It is clear that Yang et al. [49] and Khalifa et al. [52] equations generally overestimate the bond strength while Tanaka [47] and Neubauer and Rostasy [50] equations underestimate the maximum bond strength. Among the mentioned equations, the Khalifa et al. [52] formula is more conservative and has the highest uncertainty (wider box plot). The Tanaka [47] formula has the lowest uncertainty among existing empirical equations. Figure 11 also shows that the box plots of M5′ and MARS models developed in this study are narrower than those of others, which is an indicator of a higher level of confidence. Furthermore, the error distribution of DR values for a precise and accurate formula should be symmetrical around its mean value and close to 1. As it is shown, the error distributions of developed M5′ and MARS models are nearly symmetrical and their averages are very close to 1. In addition, the safety factors of M5′ and MARS models are generally smaller than other equations. For example, according to Fig. 11, if 10% risk is acceptable, the prediction of MARS and M5′ models should be divided by 1.40 and 1.45, respectively, while these factors for Tanaka [47], Yang et al. [49], Neubauer and Rostasy [50], Khalifa et al. [52] equations are 1.8, 2.7, 2.1 and 3.1, respectively.

Fig. 11
figure 11

Box plot of different equations

To quantitatively evaluate the safety of the proposed models and also existing design equations, a new scale introduced by Collins [57] has been employed in this study. This scale is also known as Demerit Points Classification (DPC). The DPC tries to involve the safety, accuracy, and scattering of design codes as a function of the discrepancy ratio between the ultimate resistances of materials reported in experimental test and the estimated ones based on theoretical analysis. In this study, the measured maximum bond strength (Fmax,exp) and predicted bond strength by existing formulae and proposed models (Fmax,predicted) are the ultimate resistances of materials reported in experimental test and the estimated ones based on theoretical analysis, respectively. Table 6 presents an adaption made in the present study to the original values proposed by Collins [57]. According to the bond strength predicted by each formula, a demerit point to each prediction for 230 data points is attributed to that formula based on Table 6. Then, the general value of demerit of each formula is calculated by the sum of the products of the number of specimens in each interval and their corresponding demerit penalty. The lower the value of total sum, the more reliable the formula considered.

Table 6 Classification by demerit points

Table 7 presents the evaluation of M5′, MARS and some existing equations as a function of the adapted criteria from Collins [57]. According to this criteria, ACI [47] model presents higher total demerit points (1614 total points) than other models, with 85% of the values in the first and second classification ranges (smaller than 0.85), which is unfavorable in terms of safety. Khalifa et al. [52] model presents the lowest total demerit points amongst the other design equations. However, 57% of its prediction values are in the fourth and fifth classification ranges (larger than 1.15), which are classified as conservative and extremely conservative in terms of safety. In general, the existing equations usually suffer from either having limited accuracy or being too conservative or both problems. However, the developed M5′ and MARS models in this study with the lowest total demerit points (281 and 282, respectively) and having 60% and 59% of their predictions in the range of the third classification (appropriate and safe) had the best performance amongst the other equations in terms of safety.

Table 7 Classification of developed and design equations according to the criteria of collins

7 Conclusions

In this study, a comprehensive database of 575 measurements of bond strength between FRP reinforcements externally glued on masonry units was compiled for the first time from datasets published in the literature. New equations for predicting maximum bond strength based on M5′ and MARS algorithms were proposed. The final models were established using the reinforcement width (bp), the ratio between FRP reinforcement and masonry width (bp/bm), the tensile strength of substrate (fmt), the axial strength of reinforcement (Eptp) and the bond length (Lb) as input variables. The M5′ model as a rule-based method was employed to provide understandable formulas that allow users to have more insight into the physics of the phenomenon. The MARS algorithm was also used as a reliable predictive model to determine the most important parameters in predicting the bond strength.

The performances of some common empirical models and the proposed ones were evaluated based on the size of the errors and uncertainty. The prediction errors and uncertainties associated with the developed M5′ and MARS models were remarkably smaller than those associated with the most common existing models. The proposed M5′ and MARS models outperformed the CNR model as the best empirical model by improving the R2 value by 51% and 86%, respectively and the RMSE value by 42% and 55%, respectively. The results of sensitivity analysis based on MARS models showed that the width ratio between FRP reinforcement and masonry substrate (bp/bm) was the most important and the axial strength of FRP reinforcement (Eptp) was the least important parameter in predicting the bond strength. Furthermore, the safety analysis based on Collins criteria indicated that the developed MARS and M5′ models also remarkably outperformed the existing equations in terms of safety.