1 Introduction

Optimization is an important decision making tool in many fields, including, but not limited to, operations research, engineering design and data mining. Without loss of generality, a global unconstrained single objective optimization problem, as considered in this paper, can be stated as finding the values of a decision vector \(\overrightarrow{x}=(x_{1},x_{2},\ldots ,x_{D})\in \mathbb {R}^{\text {D}}\), which satisfies the variable bounds, \(x^{min}\le x\le x^{max}\) and minimizes or maximizes an objective function \(f(\overrightarrow{x})\), where \(x^{min}\) and \(x^{max}\) are the lower and upper boundaries, respectively. In these problems, the decision variables may be integer, real, discrete, or mixed [10] and the objective function can be linear or nonlinear, convex or non-convex, continuous or not continuous, and uni-modal or multi-modal [9].

As gradient based methods usually encounter many difficulties when solving such complex problems [16], evolutionary algorithms (EAs) have received much interest over the last few decades. EAs are population-based search strategies that have demonstrated promising results in solving complex optimization problems [29]. The reasons for this popularity are (1) they do not require the satisfaction of specific mathematical properties; (2) they are flexible to dynamic changes; and (3) they have the capability for self-organization [12]. However, as EAs are stochastic algorithms, there is no guarantee that they will reach an optimal solution in every run. To add to this, the performance of EAs depends on parameter settings.

The family of EAs contains various algorithms, such as differential evolution (DE) [30], genetic algorithm (GA) [13] and evolution strategy (ES) [27]. The major difference between these algorithms, is in the way they produce new solutions. Among those algorithms, DE has gained popularity in solving continuous optimization problems [7, 28]. However, there is no guarantee that a DE algorithm, which performs well for one problem, or a certain class of problems, will work well for another, or on a range of problems. One reason for this is the variability of the underlying mathematical properties of optimization problems.

As a consequence, researchers have proposed multi-operator and multi-method based algorithms to solve complex optimization problems [9, 11]. However, the way of combining these operators and/or methods in the best way is still a challenging task. In the evolutionary algorithms, the selection of operators for use in a search process is made based on different criteria, such as the improvement in the quality of solutions, and/or constraint violations and/or the feasibility rate [9], re-enforcement learning mechanisms [1, 17], convergence differences and progress ratios [14]. However, the use of landscape information in the selection process is rare, even though it may boost the performance of an algorithm if it is carefully incorporated [2, 6]. However, for these methods that do exist, they have some limitations: (1) the landscape analysis was performed using an off-line mode, i.e., initial experiments were conducted to calculate landscape statistics values independently of the evolutionary process used for solving the problem [22, 23]; (2) the calculation of the landscape measures was computationally expensive [23]; and (3) a training and testing mechanism is used, which may mean the algorithm is biased towards the considered test problems, and hence its performance can deteriorate when solving another set of problems.

In this paper, a new DE framework is proposed, in which a function’s landscape information is considered, in addition to the usual performance history of the operators in selecting the best-performing DE operator during the evolutionary process. We also consider linear population size reduction, in which population size is reduced continuously with a linear function. In linear population size reduction, the worst individual is deleted to resize the population. In this paper, before deleting the worst-ranking individuals, a modified technique is used, the 2 worst solutions and the centroid of the entire population are used to generate a new individual. If the new one is better than the second worst one, it replaces it. To speed up the convergence of the proposed algorithm, the sequential quadratic programming (SQP) technique is periodically applied, once every predefined number of generations. This DE algorithm with landscape based operator selection is named DE-LOS.

To judge the performance of the proposed framework, a total of 30 test functions were solved from the CEC2014 competition [18]. These benchmark sets have different mathematical properties, and are of 10, 30, 50 and 100 dimensions. The computational results show that the performance of DE-LOS is much better than the top two algorithms from the CEC2014 competition.

The rest of this paper is organized as follows: in Sect. 2, a review of DE algorithms and operators are reviewed, along with some landscape measures. Section 3 presents the proposed framework. The simulation results on benchmark problems, and the value of parameters are provided in Sect. 4. Finally, Sect. 5 provides conclusions and possible future research directions.

2 Related Work

In this section, a literature review of DE and the concept of landscape analysis are discussed.

2.1 Differential Evolution Algorithm

DE was proposed by Storn and Price [30]. It is a popular EA because it usually converges fast, is simple in implementation, and the same settings can be used for many different optimization problems. As of the literature, DE showed good performance in comparison to several other EAs on a wide variety of problems [8]. The DE algorithm uses three operators (mutation, crossover and selection) to evolve a population of individuals during the search process.

2.2 Improved DE Algorithms

In this section, some of the improved variants of DE are discussed.

2.2.1 Single Operator de Variants

An adaptive DE algorithm with an optional external memory (JADE) was proposed by Zhang et al. [35], in which the \(CR_{i}\) of each individual \(x_{i}\) at each generation was independently generated according to a normal distribution of mean \(\mu Cr\) and standard deviation 0.1, where when the value of \(CR_{i}\) falls outside [0,1], it is repaired to a value in [0,1]. Also, the value of, \(F_{i}\), of each individual, \(x_{i}\), was independently generated according to a Cauchy distribution with parameter \(\mu F\) and scale parameter 0.1. If its value is greater than 1, then it is truncated to 1, or regenerated if \(F_{i}<0\).

Success-history based parameter adaptation for differential evolution (SHADE), which is an improved version of JADE, uses a history based parameter adaptation method. In SHADE, instead of using a single pair \((\mu CR,\,\mu F)\) to guide parameter adaptation, the mean values of SCR and SF for each generation, were stored in memory as MCR and MF.

The L-SHADE [31] algorithm is a SHADE algorithm that uses linear population size reduction (LPSR) to dynamically re-size its population during a run. LPSR reduces the population linearly as the number of fitness evaluations increases. LSHADE showed good performance, in comparison with other algorithms over a set of unconstrained optimization problems.

Sallam et al. [28] proposed a neurodynamic differential evolution algorithm for solving the CEC2015 single objective optimization problems. An adaptive mechanism was proposed for the appropriate use of LSHADE and neuro-dynamic during the search process.

2.2.2 Multi-operator DE Variants

In this section, a brief review of multi-operator based DE and self-adaptive DE is provided.

Self adaptive multi-operator differential evolution (SAMO-DE) was proposed by Elsayed et al. [9] for solving constrained optimization problems. In their proposed algorithm, each operator has its own sub-population which are evolved by different DE operators. Based on an improvement measure, in which the solution quality, constraint violation and feasibility ratio were used to calculate the success of each operator, the number of individuals in each sub-population was adaptively updated, and more emphasis was given to the operator with the highest success. The results showed that SAMO-DE performed better than other-state-of-the-art algorithms.

Composite DE (CoDE) was proposed by Wang et al. [33] for solving optimization problems. In CoDE, three mutation strategies were randomly combined with three fixed control parameter settings for generating a new trial vector at each generation. To generate a new solution, three vectors were generated, then the best one among them was selected to enter the next generation. From the experimental results, it was concluded that CoDE is a promising DE algorithm for solving optimization problems.

A self-adaptive DE (SaDE) was proposed by Qin et al. [26] for solving unconstrained real-parameter optimization. In SaDE, both the trial vector generation strategy and its associated control parameter values, were gradually self-adapted according to a success rate, that was calculated based on previous learning experience. At the beginning, all mutation strategies had equal probability to generate a new solution, and the probability was updated after an initial LP generations, accordingly as follows: at the end of each generation, after evaluating all the generated trial vectors, the number of trial vectors generated by each strategy that successfully entered the next generation was recorded in its success memory and the number of trial vectors generated by each strategy that failed to enter the next generation was recorded in its failure memory. This algorithm performed much better than both the traditional DE algorithm and several state-of-the-art adaptive parameter DE variants.

All of the above mentioned methods did not incorporate any landscape information in the selection phase.

2.3 Landscape Analysis

Generally, a fitness landscape consists of: (1) a set of solutions (populations of individuals), (2) fitness values (objective function values) of individuals, and (3) a neighborhood operator which can be used as a distance measure [19, 22]. Measuring the fitness landscape of a problem aids researchers to classify a problem as easy or hard to solve [25]. Many landscape measures have been proposed to understand and analyze different characteristics of a problem [19, 24], and this section reviews some of them.

Auto-correlation is often used to measure the ruggedness of a fitness landscape [5, 24]. Fitness distance correlation (FDC), proposed by Jones and Forrest [15], is another method used to measure problem difficulty [32]. It measures the correlation between the objective value and the distance to the nearest optimum in the search domain. Among landscape measures is also the searchability of a problem. To measure the searchability of a problem, which is the ability of the search operator to move to a region of a search space of better fitness value, an information landscape metric exists, which is computed based on the difference between the information landscape vector of the problem to be solved and a reference landscape vector. The reference landscape is the landscape of a function that is easy to be optimized by any optimization algorithm in any dimension [3].

An information matrix \(M=[a_{i,j}]\) for a minimization problem, is constructed using Eq. 1

$$\begin{aligned} a_{i,j}={\left\{ \begin{array}{ll} 1 &{} \text {if}\,\,\,f(x_{i})<f(x_{j})\\ 0.5 &{} \text {if}\,\,\,f(x_{i})=f(x_{j})\\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(1)

Not all of the entries in the information landscape are necessary for defining the information landscape [3, 4]. There is duplication in the entries due to symmetry (so the lower triangle should be omitted), the entries on the diagonal are always 0.5 (and also should be omitted), and the row and column of the optimum solution should also be omitted. So, the information matrix can be reduced to a vector \(LS=(ls1,ls_{2},...,ls_{|LS|})\), where the number of elements in LS, \(|LS|=\frac{(NP\,-\,1)\,\times \,(NP\,-\,2)}{2}\). Continuing from this:

$$\begin{aligned} LD=\frac{1}{|LS|}\times \sum _{i=1}^{|LS|}|(ls_{i})_{f}-(ls_{i})_{p}| \end{aligned}$$
(2)

where \((ls_{i})_{p}\) is the information landscape vector of the problem to be solved, and \((ls_{i})_{f}\) is the information landscape vector of the reference function. When LD is near 0, the problem is considered easy, while \(LD=\) 1, means the problem is difficult.

In the recent past, researchers and practitioners have used fitness landscape to determine and select an appropriate algorithm or operator for solving optimization problems. In [20], a prediction model was developed to predict when a particle swarm optimization (PSO) algorithm would fail to solve a particular optimization problem. Decision trees were employed to predict the failure of seven different PSO algorithms, by using a number of different fitness landscape metrics. In [6], an adaptive operator selection mechanism, based on a set of four fitness landscape analysis techniques, was used to train an online regression learning model (dynamic weighted majority), which was used to predict the weight of each operator in each generation. Their proposed mechanism was used to determine the most suitable crossover operator, among four crossover operators, to solve a set of Capacitated Arc Routing Problem (CARP) instances. The authors used instantaneous reward, in which the reward was considered as the value computed at the last evaluation. In comparison with some of the-state-of-the-art algorithms, the algorithm did not show significant benefit.

3 Landscape-Based Adaptive Operator Selection DE

In this section, our novel DE-LOS algorithm is presented.

3.1 DE-LOS

The existing multi-operator algorithms use an adaptive operator selection mechanism, which is usually based on the success of generating new offspring. In this section, a DE-LOS algorithm is proposed, which uses problem landscape information, as well as the performance of operators, to adaptively place emphasis on the most suitable DE operator. The general steps in DE-LOS are given in Algorithm 1.

To begin with, three mutation strategies (DE/\(\varphi \)best/1, DE/current-to-\(\varphi \)best/1/archive and DE/current-to-\(\varphi \)best/1/without archive) are used. Initially, NP random individuals are generated within the variable bounds using a Latin Hypercube design. Then, each operator is randomly assigned to the same number of individuals. Next, a new solution is generated using its assigned mutation strategy. At the same time, the information landscape negative searchability metric and performance history, using Eqs. 4 and 2, respectively, are calculated for each single operator. This process continues for a certain number of generations, say CS generations. After CS generations, the average value of the landscape metric and performance history are computed for every operator, using Eqs. 5 and 6, respectively. Subsequently, the normalized value of both measures is computed using Eq. 7. Based on this value, the best two operators are selected to be used in the subsequent cycle. Throughout the next cycle, at each generation, offspring are generated using one of those two operators, while the performance measure and landscape value are calculated for each operator. Then, the normalized values are calculated for the two mutation strategies. Based on the overall mean normalized performance measure (Eq. 7), the worst operator (the one with the minimum value) is discarded. Subsequently, the remaining best operator is used to evolve the entire population, for the subsequent CS generations. Note that after every CS generations, the success and landscape metrics are reset to zero. The above process is repeated every 3CS generations, however, after a predefined number of fitness evaluations is reached, the best-performing operator so far, is used to evolve the population until a stopping criterion is reached. Furthermore, during this stage, SQP is periodically applied to the best individual from the whole population.

figure a

3.2 The Selection Phase

3.2.1 Information Landscape Negative Searchability Measure

The information landscape negative searchability measure, which is based on the difference between the information landscape vector of the problem to be solved and a well-known spherical function as a reference landscape, is considered in this research, due to its simplicity and scalability [21].

The reference function \(f_{ref}(\overrightarrow{x})\) is constructed using Eq. 3.

$$\begin{aligned} f_{ref}(\overrightarrow{x})=\sum _{j=1}^{D}(x_{j}-x_{j}^{*})^{2} \end{aligned}$$
(3)

where \(\overrightarrow{x}_{i}^{*}\) is the best individual in the sample.

In this paper, Latin Hypercube Design is used to generate an initial population [34] that properly covers the search space of the problem. After constructing the vector landscape of the problem to be optimized (\(LS_{f}\)) and the vector landscape of the reference function (\(LS_{ref}\)), the information landscape negative searchability measure is computed using Eq. 2, this is done as part of Algorithm 2.

figure b

3.2.2 Average Normalized Value (ANV)

After the information landscape negative searchability value for each operator was computed, the success rate (SR) of each operator is computed. The success rate of each operator (\(SR_{op}\)) is defined as the number of successful offspring generated by a search operator (op), divided by the number of individuals assigned to op, as shown in Eq. 4:

$$\begin{aligned} SR_{op}=\frac{\text {Number of improved offsprings}}{\text {Number of all individuals evolved by operator}} \end{aligned}$$
(4)

The normalized value for the SR and landscape metrics are calculated using Eqs. 5 and 6, respectively.

$$\begin{aligned} NM_{SR}=\frac{M_{SR_{OP}}}{\sum _{OP=1}^{m}M_{SR_{OP}}} \end{aligned}$$
(5)
$$\begin{aligned} NM_{LD}=\frac{(1-M_{LD_{OP}})}{\sum _{OP=1}^{m}(1-M_{LD_{OP}})} \end{aligned}$$
(6)

where \(M_{SR}\) and \(M_{LD}\) are the mean value of the success rate and landscape value, respectively.

Subsequently, the normalized performance of each operator is computed using Eq. 7:

$$\begin{aligned} ANV_{OP}=(NM_{SR_{OP}}+NM_{LD})/2 \end{aligned}$$
(7)

3.3 Population Updating Method

A linear population size reduction scheme is used to adaptively re-size NP during the evolutionary process [31], as follows:

$$\begin{aligned} NP_{iter}=round[(\frac{NP^{min}-NP^{max}}{FES_{max}})\times cfe+NP^{max}] \end{aligned}$$
(8)

where \(NP^{min}\) is the smallest number of individuals that the proposed algorithm can use. cfe is the current number of fitness evaluations, \(FES_{max}\) is the maximum number of fitness evaluations. The default value of \(NP^{max}\) is set as 18D, \(NP^{min}\) is set as 7.

To get some benefit from the worst individuals before deleting them, a new solution is generated using information from the worst two individuals and the centroid of the population (\(X_{cent}=\frac{\sum _{j=1}^{D}\sum _{i=1}^{NP}x_{i,j}}{NP}\)), as

$$\begin{aligned} X_{new}=X_{cent}+rand\times (X_{NP}-X_{NP-1}) \end{aligned}$$
(9)

Then the worst individual is deleted, and a decision is made to decide if \(X_{NP-1}\) is replaced by \(X_{new}\) or not, based on the objective value.

4 Experimental Results

In this section, the performance of the proposed algorithm is tested by solving a set of problems taken from the CEC2014 competition on learning-based real-parameter single objective optimization [18]. The CEC2014 benchmark test set contains 30 test problems. The search space for all the problems is \([-100,100]^{D}\). The proposed algorithm was run following the guidelines of the competition. That required 51 independent runs for each test problem with up to \(FES_{MAX}=10,000\text {D}\) fitness evaluations. In the experimentation, if the deviation of the best fitness value from the optimal solution is less than or equal to \(1.0e-8\), it was considered as zero. The algorithm was coded using Matlab R2014a, and was run on a PC with a 3.4 GHz Core I7 processor with 16 GB RAM, and windows 7.

4.1 Algorithm Parameters and Operators

The default values of \(NP^{init}\), and \(NP^{min}\) were set based on our experimental analysis, \(NP^{init}=18D\) and \(NP^{min}=7\). \(\varphi \) was set at a value of 0.6 for DE/\(\varphi \)best/1 to maintain diversity, while its value was 0.1, for the other two variants, to speed up the convergence rate. A is the archive rate, and it was set at a value of 1.4. H, the memory size, was set at the value of 5. limit the maximum limit to run the multi-operator phase, where as after it the best performing operator evolves the population until the end of the run, was set at the value of \(\frac{2}{3}\times FES_{MAX}\), and CS was100. The scaling factor F and the crossover probability CR were set as in [31].

4.2 Detailed Results for 10, 30, and 50D

The computational results of DE-LOS for 10, 30, and 50D are shown in Table 1. For 10D, from the results obtained, the proposed algorithm provided the optimal solutions for all unimodal functions (\(F01-F03\)). For the multimodal functions (\(F04- F16\)), DE-LOS was able to obtain the optimal solutions on six problems, while it was very close for the rest. For hybrid functions (\(F17- F22\)), DE-LOS was able to obtain the optimal solution for only F17, and was very close for the rest of the test problems. However it became stuck in local solutions for all the composition test problems, \(F23- F30\).

Table 1 Detailed Results for 10D

For \(30\text {D}\), from the results, DE-LOS was able to obtain the optimal solution on all the unimodal problems. For multimodal problems, DE-LOS was able to obtain the optimal solution for F04, F06, F07, F08 and F10, while it was very close to the optimal solution for the rest. For hybrid functions, the best solutions obtained were close to the optimal. Again, for the composition problems, DE-LOS got stuck in local solutions.

For 50D, DE-LOS was able to obtain the optimal solutions in F02 and F03, while for F01 it obtained very close solutions to the optimal one. For multimodal problems, DE-LOS was robust in solving F08, efficient in solving F04, F07, F08 and F10, while it got stuck in local solutions for the rest of the test problems. This was also the situation for the hybrid and composition problems, although its performance in solving the hybrid problems was a little bit better than its performance in solving the composition problems.

Table 2 A comparison summary between DE-LOS and other state-of-the-art algorithms

4.3 DE-LOS Versus State-of-the-art Algorithms on CEC2014

DE-LOS was compared with the top two algorithms in the literature LSHADE [31] and UMOEAs [11]. The matlab source codes for LSHADE and UMOEAs were downloaded online. We ran these algorithms using the same parameters suggested by the authors in their papers and the other conditions were the same as the competition guidelines. To make a fair comparison, all the algorithms were run using the same seeds.

Table 2 shows a comparison summary of the results obtained from DE-LOS and the other two algorithms for 10D, 30D, and 50D problems. A non-parametric test, Wilcoxon rank-sum test, was chosen, to judge the difference between any paired algorithms. The results regarding the best and average fitness functions are presented in Table 2. The significance level was set at a value of \(10\,\%\). Based on the test results/rankings, one of three signs (\(+\), −, and \(\approx \)) was assigned for the comparison of any two algorithms (shown in the last column), where the “\(+\)” sign means that the first algorithm is significantly better than the second, the “−” sign means that the first algorithm is significantly worse, and the “\(\approx \)” sign means that there is no significant difference between the two algorithms. Considering the quality of solutions, and from the results in Table 2, it is clear that DE-LOS is always better than the other algorithms, based on the best and average results obtained, and this is obvious for 30D and 50D.

Based on the statistical test, DE-LOS is better than UMOEAs in 10D, 30D, and 50D in regard to best and average results, except for the best results in 10D and 50D, where there is no significant difference between DE-LOS and UMOEAs. Considering the comparison between DE-LOS and LSHADE, DE-LOS is significantly better than LSHADE in 10D, 30D, and 50D.

In addition, based on the average results obtained, the average ranking of DE-LOS, LSHADE and UMOEAs, as produced by the Friedman test, is summarized in Table 3. The results in Table 3 are consistent with the results in Table 2, in which DE-LOS had the best rank.

Table 3 Friedman’s test results

5 Conclusion and Future Work

During the last few decades, DE algorithms have shown superior performance to many other-state-of-the-art algorithms in solving both unconstrained and constrained optimization problems. It is known that no single algorithm or operator is able to solve all kinds of optimization problems. Even though for a single run, an algorithm or operator may perform well in the earlier generations, its performance often decreases during later generations. So the selection of an appropriate algorithm or operator is not an easy task. In this paper, the DE-LOS algorithm has been presented. It used landscape and normalized performance measures to dynamically place more emphasis of the best-performing DE mutation.

The algorithm has been tested on 30 bound constrained numerical optimization problems from the CEC2014 competition. The results obtained were better than those obtained from the best two algorithms in the literature.

In future work, we will investigate the use of more than one landscape measure, and will incorporate some of them with multi-method-based algorithms.