1 Introduction

Many practical real-world optimisation problems within engineering can be considered as global optimisation problems. Such problems are also often complex and large-scale (i.e. large number of parameters) which makes them difficulty to solve (Ali et al. 2005; Lozano et al. 2011). Heuristics and metaheuristics are among the existing methods to solve continuous optimisation problems.

Multi-start methods are one type of metaheuristics. These methods usually embed other optimisation algorithms, such as local search and neighbourhood search, which lack the required diversification to explore the search space globally (Grendreau and Potvin 2010). The multi-start method starts the embedded algorithm from multiple different initial solutions, often obtained by random sampling. A well-known multi-start method, the Continuous Greedy Randomised Adaptive Search Procedure (CGRASP) (Hirsch et al. 2010), is relevant for the work presented in this paper.

Another metaheuristic, that is adopted in this work, is the Cooperative Coevolutionary (CC) algorithm (Potter and De Jong 1994). This algorithm requires that the optimisation problem is decomposed into subproblems. Then, each subproblem has a group of corresponding parameters. These subproblems are optimised separately, but there is cooperation between the optimisations. The cooperation is necessary to ensure that the solutions for the subproblems are still optimal when combined into a solution for the entire problem. CC has a good performance especially on large-scale optimisation problems. Though, other works (Li and Yao 2009; Omidvar et al. 2014, 2010; Potter and De Jong 1994; Ray and Yao 2009; Yang et al. 2008) found that the CC struggles to solve non-separable problems (i.e. interactions/interdependencies between parameters). Those works propose different versions of CC to improve the performance on non-separable problems. These different versions of CC mainly focus on regrouping the parameters into different subproblems. The aim with this is to gather interacting parameters in the same subproblem and optimise them together.

This work considers a different approach to improve CC’s performance, but can still be used together with earlier proposed regrouping strategies. The focus is on extending CC using the GRASP multi-start architecture. The algorithm proposed in this work is called the Constructive Cooperative Coevolutionary (\(\mathrm {C}^3\)) algorithm. A constructive heuristic is incorporated in \(\mathrm {C}^3\) to efficiently find good feasible solutions. These feasible solutions are used as initial solution for CC. When the search of CC stagnates, \(\mathrm {C}^3\) restarts the constructive heuristic to get a new initial solution. \(\mathrm {C}^3\)’s aim is to increase the performance of CC, specifically on non-separable large-scale optimisation problems.

The main contribution of this paper is the evaluation of the performance and the robustness of the \(\mathrm {C}^3\) algorithm and to get insight into its behaviour towards specific characteristics of large-scale optimisation problems (modality, separability, etc.). Compared to previous work (Glorieux et al. 2014, 2015), \(\mathrm {C}^3\) is improved in order to make it scalable in number of subproblems. Thereby, it can be applied on a wider range of problems. \(\mathrm {C}^3\) is compared with other algorithms, such as the Cooperative Coevolutionary (CC) algorithm (Potter and De Jong 2000; Wiegand et al. 2001; Zamuda et al. 2008), Self-Adaptive Differential Evolution (jDErpo) (Brest et al. 2014) and Particle Swarm Optimiser (PSO) (Nickabadi et al. 2011). This shows that \(\mathrm {C}^3\) is better than the other algorithms, both in terms of performance and robustness.

The remaining of this paper is organised as follows, in Sect. 2 the relevant background information for \(\mathrm {C}^3\) is given. In Sect. 3, the details of \(\mathrm {C}^3\) are presented. The implementation of the tests performed in this paper is described in Sect. 4. The results of these tests are presented and discussed in Sect. 5 and finally Sect. 6 concludes this work.

2 Background

In earlier work, \(\mathrm {C}^3\) is applied on practical optimisation problems concerning control of interacting production stations (Glorieux et al. 2014, 2015). There, it is used within a simulation-based optimisation framework and it is shown that \(\mathrm {C}^3\) outperforms other optimisation methods. The previous version of \(\mathrm {C}^3\) was not scalable in number of subproblems, which was a limitation of the algorithm. In this work, an improved version of \(\mathrm {C}^3\) is proposed for which this limitation is removed. Hence, it can be applied on a wider range of problems. In previous work on \(\mathrm {C}^3\) the algorithm has only been tested on problems that has up to 100 dimensions. In this work, \(\mathrm {C}^3\) is tested on a wide range of large-scale problems with up to 1000 dimensions. This section provides the relevant background for the design of \(\mathrm {C}^3\).

2.1 Greedy randomised adaptive search procedure

A well-known multi-start method (Martí et al. 2010) is the Continuous Greedy Randomised Adaptive Search Procedure (CGRASP) (Hirsch et al. 2007, 2010), which is based on the discrete GRASP (Feo and Resende 1995). For each start (also referred to as iteration), two phases are executed: a constructive phase and a local improvement phase. The constructive phase builds a feasible solution. This is done by performing a line search separately in each search direction while keeping the parameters for the other directions fixed to a random initial value. This solution is then used as an initial solution for the optimisation algorithm used in the local improvement phase. The local improvement phase terminates when the search reaches a local optimum.

Hirsch et al. (2007, 2010) evaluated the performance of CGRASP on a set of standard benchmark functions and a real world continuous optimisation problems. For some of the benchmark functions, CGRASP’s performance was not as good compared to other optimisation methods (Simulated Annealing and Tabu Search). Later, an improved version of DC-GRASP was proposed by Araújo et al. (2015) that has an improved performance, especially on high-dimensional problems.

\(\mathrm {C}^3\) adopts CGRASP’s multi-start architecture with a constructive and improvement phase for each start. The constructive phase of \(\mathrm {C}^3\) is different in that the subproblems are stepwise optimised (instead of a single parameter) and only the previously optimised subproblems are kept fixed and the other subproblems are not considered during the function evaluations. Another difference is that in \(\mathrm {C}^3\)’s improvement phase CC is incorporated.

2.2 Cooperative coevolutionary algorithm

The Cooperative Coevolutionary (CC) algorithm for continuous global function optimisation was first proposed by Potter and De Jong (1994). It requires that the problem is decomposed into subproblems. Typically, a natural decomposition is used that groups the D parameters into n sets, one set for each subproblem. For each subproblem, a subpopulation is initialised and optimised separately. In order to evaluate the cost of a member of a subpopulation, collaborator solutions are selected from the other subpopulations in order to form a complete solution. The combination of these collaborators is called the context solution. These collaborators are updated at specific intervals.

It has been proposed to use multiple collaborators from each subpopulation (Potter and De Jong 1994; Wiegand et al. 2001). When multiple collaborators are used, different combinations of these collaborators are evaluated to calculate the cost for a given subpopulation member. The function evaluation results of these different combinations are then combined into a single cost value for the subpopulation member. One of the questions when using CC is how to select the collaborators for the context solution and how many from each subpopulation.Wiegand et al. (2001) investigate the choice of the collaborator solutions, more specifically the collaborator selection pressure, the number of collaborators for a given function evaluation, and the credit assignment when using multiple collaborators. It was shown that how to select the collaborators depends on specific characteristics of the problem, especially the separability. Moreover, the selection pressure of collaborators is of less importance than the number of collaborators. Although, increasing the number of collaborators is not always preferred becomes the cost calculation because computationally more expensive.

The decomposition or grouping of the parameters influences the performance of the cooperative coevolutionary algorithm. Random grouping has been proposed to increase the performance on non-separable high dimensional problems (Omidvar et al. 2010; Yang et al. 2008). With random grouping, the parameters are frequently regrouped to increase the chance of having interacting parameters in the same subproblem. Omidvar et al. (2014) propose differential grouping to automatically uncover the underlying substructures of the problem for grouping the parameters. The parameter groups are then determined so that the interactions between the subproblems is kept to a minimum. Results show that this increases the performance significantly for non-separable problems. Ray and Yao (2009) propose to group the parameters according to their observed correlation during the optimisation. With this approach, a population of solutions for the entire problem is generated, prior to the optimisation, to determine the initial grouping. Thus, it requires an additional computational effort. Decomposition strategies for \(\mathrm {C}^3\) are not investigated in the work presented in this paper but is considered a topic for future studies.

CC could have the tendency to limit its search to a single neighbourhood instead of exploring the search space more (Grendreau and Potvin 2010). This behaviour has been observed especially when best solution in the subpopulation is used as collaborator. The search then convergence towards a local optimum.

Shi et al. (2005) propose using Differential Evolution (DE) instead of a genetic algorithm for the subproblem optimisations in the cooperative coevolutionary algorithm. Furthermore, an alternative static decomposition scheme is proposed in which a subproblem takes half of the parameters. Typically, a problem is decomposed so that there is a subproblem for each parameter. The proposed algorithm (CCDE) and decomposition scheme showed an improved performance compared to DE and compared to the typical decomposition scheme. Though, the proposed algorithm was not tested on large scale non-separable problems.

Using the Particle Swarm Optimiser (PSO) for the subproblem optimisation in CC has been proposed by Bergh and Engelbrecht (2004). CCPSO has an increased performance and robustness compared to PSO, especially when the optimisation problem’s dimension increases. Though, it was also noticed that the chance of converging to a sub-optimum increases. Li and Yao (2012) propose a more advanced version of CCPSO that incorporates a new PSO model and a random parameter regrouping scheme with dynamic size for large scale problems up to 2000 parameters. The results showed an increased performance compared to PSO and other state-of-the-art optimisation algorithms.

3 Constructive cooperative coevolutionary algorithm

The details of the Constructive Cooperative Coevolutionary (\(\mathrm {C}^3\)) algorithm are described in this section. In Algorithm  1 the pseudo code of \(\mathrm {C}^3\) is presented. \(\mathrm {C}^3\) is based on CGRASP’s construction and adopts the multi-start architecture. Furthermore, \(\mathrm {C}^3\) also incorporates the CC algorithm. Hence, each iteration (or start) of \(\mathrm {C}^3\) includes a constructive heuristic (Phase I) and CC (Phase II).

The optimisation problem is decomposed into n subproblems. This is done by partitioning the D-dimensional set of search dimensions \(G=\{1,2,\dots ,D\}\) into n sets \(G_1,\dots ,G_n\) (line 1 in Algorithm 1). The decomposition or partitioning \({\mathcal {G}}\) can have an influence on the performance of \(\mathrm {C}^3\) and is thus important. This is not investigated further in this work, but suggested as future work. The problems in this work are always decomposed randomly in equal sized subproblems.

In Phase I, the constructive heuristic builds up a feasible solution for the entire optimisation problem (\(\mathbf {x}_{it,constr}\) on line 4 in Algorithm 1). A feasible solution is a solution that does not violate the constraints of the optimisation problem. Next, \(\mathbf {x}_{it,constr}\) is used in Phase II as initial context solution for CC that further improves this solution (line 5 in Algorithm 1). Phase II is terminated when CC’s search stagnates. At the next iteration, \(it+1\), the constructive heuristic (Phase I) is restarted to build up a new feasible solution. This is repeated for all iterations until the termination criteria are met. An example of a termination criterion is to limit the maximum number of function evaluations. The best solution \(\mathbf {x}^*\) found over all iterations is recorded and presented as the result when the optimisation ends.

In both Phase I and II, the subproblems are optimised separately during n steps, by an embedded optimisation algorithm. To calculate the cost of the members during Phase I, the subproblems’ trial solutions are assembled in a partial solution. A partial solution \(\mathbf {p}^i\) from Step i, is a solution for the first i subproblems, with \( 1\le i\le n \), and neglects Subproblem \(i+1\) to Subproblem n. During the function evaluation, the neglected subproblems are then also not considered. Note that calculating the cost of a partial solution is equivalent to calculating the cost of a solution for a smaller problem that only considers the parameters that are included in the partial solution.

During Phase II, to calculate the cost of a member of a subpopulation, they are assembled in a context solution (a solution for the entire problem) in the same way as with CC. As mentioned earlier, the context solution consists of collaborators, one from each of the other subpopulations. In this work, different collaborators are randomly chosen for each function evaluation.

The embedded optimisation algorithm must be a population-based algorithm. When it is deployed for optimising a subproblem, it initialises a (sub)population for the corresponding subproblem’s parameters and evolves this subpopulation according to the procedure of the used population-based algorithm. After the optimisation, the best partial solutions in this subpopulations are used while the others are discarded. Hence, in general terms, \(\mathrm {C}^3\) can be combined with any suitable population-based optimisation algorithm. This is demonstrated in this work by using both an evolution algorithm (i.e. Differential Evolution) and a swarm-based algorithm (i.e. Particle Swarm Optimiser).

figure a

3.1 Phase I: constructive heuristic

In Phase I, a constructive heuristic builds up a feasible solution \( \mathbf {x}_{it,constr} \) for the optimisation problem in a stepwise manner, without backtracking. It includes up to n steps, one for each subproblem. The subproblem is optimised by the embedded optimisation algorithm. In Algorithm 2, the pseudo code of Phase I is presented.

figure b

In the first iteration, \(it=0\), Step 1 starts with an empty solution \(\emptyset \) (line 1–5 in Algorithm 2) and optimises only Subproblem 1 starting from a randomly initialised subpopulation \(pop_1\) (SubOpt on lines 2-3 in Algorithm 2). During Step 1, the function evaluations are done on only Subproblem 1. The next subproblem then is added in the next step to initialise and optimise its subpopulation \(pop_i\) (lines 9–10 in Algorithm 2). During Step i, Subproblem 1 to Subproblem i are included, and Subproblem \(i+1\) to Subproblem n are neglected. Hence, the current parameter vector only contains the parameters of the first i subproblems and the function evaluations only take into account those subproblems. For the optimisation in each step (SubOpt on line 10 in Algorithm  2), only the parameters related to the most recently added subproblem are optimised (Subproblem i in Step i). All the other included parameters are kept fixed to values of partial solution selected in the previous step. This is illustrated in Fig. 1 for the second and the third step.

At the end of the optimisation of subproblem i, the k best (partial) solutions in subpopulation \(pop_i\) are stored in \(\{{\mathbf {X}^p}\}\) (line 11 in Algorithm 2). The purpose of the stored partially constructed solutions in \(\{{\mathbf {X}^p}\}\) is to be further constructive in Phase I of next iterations of \(\mathrm {C}^3\). For the next step, one of the k stored partial solutions is randomly chosen (line 12 in Algorithm  2). In the next Step \(i+1\) of the current iteration, the parameters of Subproblem  i are now kept fixed to the randomly chosen partial solution. Then, the parameters of Subproblem \(i+1\) are optimised in the same way as Subproblem i in Step i. Finally, in the last step, Step n, the found partial solution is now a solution for the entire problem since all subproblems have been added. The best one, \(\mathbf {p}^n_{j_n}\), is then used as initial context solution for CC in Phase II.

When the constructive heuristic is restarted in the next iterations, it does not start constructing a new solution from scratch. Instead, it starts with the best unexplored partial solution in \(\{\mathbf {X}^p\}\) (line 6 in Algorithm 2). That partial solution is then further constructed in the same way as in the first iteration. Since all stored partial solutions in \(\{\mathbf {X}^p\}\) are unique and different, the constructed solutions will also all be different.

Note that the cost value of all stored partial solutions in \(\{{\mathbf {X}^p}\}\), even though some include more parameters (subproblems) than others, is compared to selected the best one. If all subproblems have the same optimal cost value, this is possible. In the other cases, a scale factor or a heuristic estimate that compensates for the differences in the cost value between stored partial solutions from different steps can be introduced.

A constructive heuristic typically creates better feasible solutions, with the same effort (i.e. in the same number of cost calculations), compared to random sampling (Grendreau and Potvin 2010). Obviously, using better solutions as initial context solution for CC is beneficial for its convergence. The role of the constructive heuristic of Phase I is to construct a feasible solution in a greedy fashion. The greediness of the constructive heuristic comes from the fact that a single partial solution (one of the k best) is further constructed in each step. The constructive heuristic also avoids redundancy and guarantees that, in each iteration, a different feasible solution is constructed. This forces CC in Phase II to search in unexplored areas.

Fig. 1
figure 1

Illustration of the second (a) and third step (b) of the constructive heuristic in Phase I

3.2 Phase II: cooperative coevolution

Phase II starts from the constructed feasible solution \(\mathbf {x}_{it,constr}\) and searches for better solutions using CC. The pseudo code of Phase II is presented in Algorithm 3. The optimisation in Phase II is organised in cycles. In one cycle, the same subproblems as in Phase I are optimised stepwise, in a round-robin fashion. Consequently, a cycle includes n steps, one for each subproblem.

figure c

Subpopulation \(pop_i\) is optimised in the corresponding Step i by the embedded optimisation algorithm (line 11 in Algorithm 3). To evaluate the cost, an individual of the subpopulation is assembled in a context solution to form a complete solution. This context solution consists of collaborator solutions that are randomly chosen from the other subpopulations (line 10 in Algorithm 3). For each function evaluation, different collaborators are randomly selected as proposed by Wiegand et al. (2001).

During the first cycle of Phase II the context solution is initially the constructed solution \(\mathbf {x}_{it,constr}\) instead of collaborators from the other subpopulations (lines 6–9 in Algorithm 3). A collaborator from subpopulation \(pop_i\) of Subproblem i is used only after Step i has been completed. In other words, in Step i, the collaborators for Subproblem \(i+1\) until Subproblem n are taken from \(x_{it,constr}\). Only in the first cycle, the subpopulations are randomly initialised at the start of the subproblem optimisation (line 7 in Algorithm 3).

Phase II is terminated when the search stagnates because it is likely that then a local optimum is reached. When the relative difference between the best solution found during the current cycle and the best solution from the previous cycle is less than \(\varepsilon \), Phase II is terminated. This is shown in line 15 in Algorithm 3, where \(\mathbf {b}^{l*}\) is the best solution found in cycle l.

Because CC optimises the smaller subproblems separately, it is well-suited for large-scale problems. The context solution ensures that a subproblem is co-adaptively optimised, as a part of the complete problem, and not as an isolated optimisation problem. By using different collaborators in the context solution for each cost calculation, a subproblem is optimised to collaborate with the individuals of the other subpopulations and not with just a single specific context solution. On the other hand, using the constructed solution from Phase I as context solution in the first cycle ensures that the CC starts search in the region of the search space specific by this constructed solution. In each iteration, the constructed solution directs CC to search a different region of the search space.

4 Implementation

Two version of \(\mathrm {C}^3\), \(\mathrm {C}^3\)jDErpo and \(\mathrm {C}^3\)PSO, are compared with 4 other algorithms. The 6 different optimisation algorithms compared in this work are: \(\mathrm {C}^3\)jDErpo, CCjDErpo, jDErpo, \(\mathrm {C}^3\)PSO, CCPSO, PSO. Here, \(\mathrm {C}^3\)jDErpo refers to \(\mathrm {C}^3\) where jDErpo is used as embedded algorithm to optimise the subproblems, and in the same way for CCjDErpo, \(\mathrm {C}^3\)PSO and CCPSO. To evaluate the performance and robustness of \(\mathrm {C}^3\), 51 tests on large-scale benchmark functions were done for both versions of \(\mathrm {C}^3\). Of which, 36 are based on 12 benchmark functions (see Table 1 and Appendix 1) and each is tested with 3 different number of dimensions (\(D=100,D=500,D=1000\)). Additionally, tests are done on the test suite of the CEC’2013 special session on Large-Scale Global Optimisation (LSGO) (Li et al. 2013).

The jDErpo algorithm used as an embedded algorithm in \(\mathrm {C}^3\)jDErpo for the subproblem optimisation is proposed by Brest et al. (2014), and has a self-adaptive mechanism to tune the control parameters, i.e. the mutation scale factor (F) and the crossover parameter (CR). The PSO algorithm used as an embedded algorithm in \(\mathrm {C}^3\)PSO for the subproblem optimisation is proposed by Nickabadi et al. (2011), and has a dynamic inertia weight to progressively increase the greediness of the search as this is beneficial for large-scale optimisation (Schutte and Groenwold 2005). The used CC algorithm for the comparison is based on Wiegand et al. (2001) and uses self-adaptation based on Zamuda et al. (2008).

The population size of standalone jDErpo is set to \(NP=100\) and of standalone PSO to \(NP=75\). For \(\mathrm {C}^3\)jDErpo and CCjDErpo, the population size is set to \(NP=100\) and for \(\mathrm {C}^3\)PSO and CCPSO, the population size is set to \(NP=30\). All tests are repeated 25 times to obtain reliable mean results. All repetitions are repeated independently, with random start values.

For all tests with CC and \(\mathrm {C}^3\), the problem decomposed randomly into 10 equal sized different subproblems (\(n=10\)). During the steps of Phase I, only the parameters of the subproblems that are included so far, are used to calculate the cost. For example for \(D=100\), in the first step, the dimension for the function evaluation is 10, in the second step it becomes 20, in the third step 30, and so on until in the last step it finally becomes 100.

The termination criteria for the optimisation is \(3\hbox {E}{+}6\) function evaluations. The stop criterion for a subproblem optimisation in a step of \(\mathrm {C}^3\) and CC is 60, 000 function evaluations. Consequently, in \(\mathrm {C}^3\) this allows up to five iterations (\(it\le 5\)), depending on how many cycles before Phase II stagnates in each iteration. The predefined value \(\varepsilon \), that is used to detect when the search of Phase II stagnates to start the next iteration, was set to \((\varepsilon = 1\hbox {E}{-}6)\). The number of stored partial solutions during each step of Phase I was set to 15 \((k=15)\), to be able to construct more than enough different solutions.

The 3 different versions (i.e. \(\mathrm {C}^3\), CC, and stand-alone) are pairwise compared for each specific benchmark function and dimension using the Wilcoxon signed-rank test, which is a non-parametric test as recommended by García et al. (2009). The null hypothesis is that there is no significant difference (i.e. belong to same distribution) and is rejected when the p-value is smaller than the significance level of \(\alpha = 0.05\).

Table 1 Specifications of the used benchmark functions (Jamil and Yang 2013; Li et al. 2013)

5 Results and discussion

In this section, the results of the performed tests in this work are presented and the relevant aspects of the results are highlighted and discussed. For simplicity, in this section \(\mathrm {C}^3\) is used as a collective name for \(\mathrm {C}^3\)jDErpo and \(\mathrm {C}^3\)PSO, and CC for CCjDErpo and CCPSO, and the “stand-alone algorithms” for jDErpo and PSO.

5.1 Convergence analysis

The evaluation of \(\mathrm {C}^3\)’s convergence performance is presented in this section. The main indicator for this is the cost of the best solution found after all \(3\hbox {E}+6\) function evaluations. The results are shown in Table 2. These are the mean of 25 independent repetitions. When there is a significant difference between the algorithms according to the pairwise comparison using the Wilcoxon signed-rank test, the best result(s) are highlighted in bold font in Table 2.

It can be seen that \(\mathrm {C}^3\) has a better convergence for the majority of the tests compared to CC and the stand-alone algorithms. Considering the statistically significant differences, \(\mathrm {C}^3\)jDErpo is the best algorithm or among the best in 28 of the 36 tests compared to CCjDErpo and jDErpo, and \(\mathrm {C}^3\)PSO in 23 of the 36 tests compared to CCPSO and PSO. Furthermore, the pairwise comparison showed that \(\mathrm {C}^3\)jDErpo performs better than CCjDErpo in 20 of the 36 tests, and similar in 10. \(\mathrm {C}^3\)PSO performs better in 20 of the 36 tests, and similar in 5 tests. There is also no drastic deterioration in convergence performance when the number of dimensions increases. It can be concluded that in general there is a benefit of using \(\mathrm {C}^3\) instead of CC because it either performs better or at least similar. It must be noted that \(\mathrm {C}^3\)’s convergence performance is better than CC on the non-separable functions, except for \( f_\mathrm {Rosenbrock}\). This indicates that \(\mathrm {C}^3\) struggles less with this type of optimisation problems.

The pairwise comparison between \(\mathrm {C}^3\) and the standalone algorithms showed that \(\mathrm {C}^3\)jDErpo performs significantly better in 33 of the 36 tests compared with jDErpo, and \(\mathrm {C}^3\)PSO performs significantly better in 31 of the 36 tests. It can be concluded that \(\mathrm {C}^3\) performs better on these large-scale problems compared to the stand-alone algorithms.

Table 2 Performance results of different algorithms on the 12 benchmark functions (when there is a significant difference, the best result is highlighted in bold)

It can be concluded that in general \(\mathrm {C}^3\)jDErpo shows the best convergence performance compared with \(\mathrm {C}^3\)PSO. The same is true for CCjDErpo and CCPSO, and also when comparing jDErpo and PSO. Note that the embedded optimisation algorithm for the subproblem optimisations has a significant influence on the convergence of \(\mathrm {C}^3\). If the subproblems have very different characteristics, it might be valuable to even consider different optimisation algorithms for specific subproblems.

5.2 Computational effort

The difference in computation effort of \(\mathrm {C}^3\), CC and a stand-alone algorithm is analysed. This was done by recording the optimisation time on the Rosenbrock benchmark function (\(f_\mathrm {Rosenbrock}\)) and with jDErpo. The results of this are presented in Table  3. Each test was repeated 25 times on the same computer. The specific time values differ for different problems but the relations between the times for \(\mathrm {C}^3\)jDErpo, CCjDErpo and jDErpo will remain the same. The results show that \(\mathrm {C}^3\)jDErpo’s optimisation time is the shortest, and jDErpo’s is the longest. It can be assumed that \(\mathrm {C}^3\)jDErpo’s and CCjDErpo’s shorter optimisation times, compared to jDErpo, is due to separately optimising smaller subproblems. Furthermore, the difference between \(\mathrm {C}^3\)jDErpo and CCjDErpo is assumingly due to only considering a subset of subproblems during the steps of Phase I.

Table 3 Average CPU times for 3E\(+\)6 FEs on \(f_\mathrm {Rosenbrock}\) (\(D=1000\))

5.3 Robustness analysis

The robustness of \(\mathrm {C}^3\) has also been analysed and compared with the other optimisation algorithms CC, jDErpo and PSO. In this analysis, the term robustness implies that the algorithm succeeds in repeatedly finding a solution that is of a certain expected quality, as by Bergh and Engelbrecht (2004). Hence, a robust algorithm manages to consistently find high quality solutions. The tests were done on the same 12 benchmark functions as before with dimension \(D=100\). The required quality of the solutions was set to a cost of maximum \(10^{-9}\). In Table 4, the number of successful repetitions is shown for each algorithm. Each test was repeated 25 times.

Table 4 Robustness results for \(D=100\)

On the 6 separable functions (\(f_\mathrm {Ackley}\), \(f_\mathrm {Elliptic}\), \(f_\mathrm {Rastrigin}\), \(f_\mathrm {Sphere}\), \(f_\mathrm {SumOfSquares}\), \(f_\mathrm {W/Wavy}\)), the robustness of \(\mathrm {C}^3\)jDErpo and CCjDErpo are similar. Both \(\mathrm {C}^3\)jDErpo and CCfDErpo are successful for all 25 repetitions on these 6 functions. Whereas jDErpo is less robust because this is the case on only 3 of the 6 separable functions. The same behaviour can also be seen when comparing the results of \(\mathrm {C}^3\)PSO, CCPSO and PSO. The robustness is very similar for \(\mathrm {C}^3\)PSO and CCPSO, whereas PSO is less robust.

On the 6 non-separable functions (\(f_\mathrm {DixonPrice}\), \(f_\mathrm {Rot.Ackley}\), \(f_\mathrm {Rot.Rastrigin}\), \(f_\mathrm {Rosenbrock}\), \(f_\mathrm {Schwefels}\), \(f_\mathrm {Griewank}\)), the results show that \(\mathrm {C}^3\)jDErpo is more robust compared to CCjDErpo, and to jDErpo. \(\mathrm {C}^3\)jDErpo is successful for all 25 repetitions on 4 of the 5 non-separable functions. Whereas CCjDErpo has 25 successful repetitions on just a 1 of these 6 functions and on the other 5 functions, all repetitions are unsuccessful. The difference between \(\mathrm {C}^3\)jDErpo and CCjDErpo is interesting because it is known that CC struggles to optimise non-separable problems, and these results indicate that non-separable problems are less problematic for \(\mathrm {C}^3\). For jDErpo, the robustness is less compared to \(\mathrm {C}^3\)jDErpo, and interestingly slightly better compared to CCjDErpo. Again, the same can be seen when comparing the results of \(\mathrm {C}^3\)PSO, CCPSO and PSO on the non-separable functions, although smaller differences.

5.4 Results CEC’2013 LSGO functions

The two version of \(\mathrm {C}^3\) was also evaluated on the test suite proposed on CEC’2013 special session on Large-Scale Global Optimisation (LSGO) (Li et al. 2013). This test suite consists of 15 functions, each function has 1000 dimensions (\(D=1000\)). The same user-settings for the \(\mathrm {C}^3\) were used, i.e. 3E\(+\)6 function evaluations, \(n=10\), \(\varepsilon =\) 1E−6, \(k=15\), \(NP=100\) for \(\mathrm {C}^3\)jDErpo and \(NP=30\) for \(\mathrm {C}^3\)PSO. Again each tests was repeated 25 times. The results of the tests with \(\mathrm {C}^3\)jDErpo and \(\mathrm {C}^3\)PSO are given in Table 6 in Appendix 6.1.

These results were compared with 9 other large-scale global optimisation algorithms representing the state-of-the-art, next to the previously used CCjDErpo, jDErpo, CCPSO and PSO algorithms. These included the following algorithms: MOS (LaTorre et al. 2013), IHDELS (Molina and Herrera 2015), CC-CMA-ES (Liu and Tang 2013), DECC-G (Yang et al. 2008), VMODE (López et al. 2015), MPS-CMA-ES (Bolufe-Rohler et al. 2015), jDEsps (Brest et al. 2012), FBG-CMA-CC (Liu et al. 2015), DECC-DG (Omidvar et al. 2014). The results for these 9 algorithms were taken from literature (LaTorre et al. 2015; López et al. 2015; Bolufe-Rohler et al. 2015; Liu et al. 2015)

The algorithms were ranked based on their reported mean performance for each one of the 15 benchmark functions in the CEC’2013 LSGO test suite and an overall ranking based on the average rank across the 15 functions was consequently calculated. The results of the ranking are given in Table 5.

Table 5 Algorithm ranking on the 15 benchmark functions of the CEC’2013 test suite for large-scale global optimisation with \(D=1000\) (Li et al. 2013)

Table 5 shows that the one of \(\mathrm {C}^3\) algorithms is the highest ranked algorithm for 5 of the 15 benchmark functions (\(f_4\), \(f_5\), \(f_8\), \(f_9\), \(f_{13}\)). Furthermore, \(\mathrm {C}^3\)jDErpo is ranked 4th overall and \(\mathrm {C}^3\)PSO is ranked 6th overall. Both \(\mathrm {C}^3\) algorithms are thus in the top 6 of the 15 algorithms. This shows that \(\mathrm {C}^3\) is a competitive algorithm, with respect to these other algorithms representing the state-of-the-art. It can thus be said that the proposed \(\mathrm {C}^3\) algorithm is effective for solving large-scale global optimisation problems.

The \(\mathrm {C}^3\) algorithms are high ranked specifically for the partially additively separable functions \((f_4-f_9)\), the overlapping functions \((f_{12}-f_{14})\) and the non-separable function \((f_{15})\). This indicates that \(\mathrm {C}^3\) is effective, in respect to the other algorithms, on all functions except the fully-separable ones. This confirms the conclusion from the tests presented in Sect. 5.1.

6 Conclusions and future work

The Constructive Cooperative Coevolutionary \((\mathrm {C}^3)\) algorithm for global optimisation of bound-constrained large-scale global optimisation problems is presented in this paper. \(\mathrm {C}^3\) includes a novel constructive heuristic combined with the Cooperative Coevolutionary (CC) algorithm in a multi-start architecture. For each restart, a new good initial solution is created by the constructive heuristic. The region in the search space around the constructed solution is then explored by using it as initial solution for CC. The constructive heuristic ensures that a different solution is constructed for each restart. Thereby, it drives CC to search specific regions of the search space.

\(\mathrm {C}^3\) was compared with state-of-the-art algorithms on a set of large-scale benchmark functions with up to 1000 dimensions, and on the test suite of CEC’2013 competition on large-scale global optimisation (Li et al. 2013). For the latter, 15 algorithms (including two versions of \(\mathrm {C}^3\)) were compared on the 15 benchmark functions of the CEC’2013 test suite. The latter shows that a \(\mathrm {C}^3\) algorithm is highest ranked for 5 of the 15 benchmark functions, outperforming the top algorithms from the most recent CEC’2015 competition on large-scale global optimisation.

Based on the overall ranking across all benchmark function, the two proposed \(\mathrm {C}^3\) algorithms are in the top 6 out of 15 algorithms (i.e. \(\mathrm {C}^3\)jDErpo is 4th and \(\mathrm {C}^3\)PSO is 6th). The results also showed that \(\mathrm {C}^3\) outperforms the other algorithms on the partially separable functions and the overlapping functions. Results also showed that there is no extra computational cost with \(\mathrm {C}^3\). It can thus be concluded that \(\mathrm {C}^3\) is a competitive effective algorithm for large-scale global optimisation.

It was demonstrated that \(\mathrm {C}^3\) can be embedded with different population-based optimisation algorithms for the subproblem optimisation. Results showed that the embedded algorithm can significantly influence \(\mathrm {C}^3\)’s performance. Hence, it is important to select an optimisation algorithm that is well-suited for the specific subproblems at hand.

Future work with \(\mathrm {C}^3\) should investigate whether it is rewarding to use automatic decomposition strategies (i.e. parameter grouping), instead of a static decomposition as used in this work. An adaptive or dynamic decomposition strategy would be preferable in order to adjust the decomposition during the search. This could further improve the performance and abilities of the \(\mathrm {C}^3\) algorithm.