1 Introduction

Many combinatorial optimisation problems are computationally difficult to solve and require methods that use sufficient knowledge of the problem domain. Such methods cannot however be reused for solving problems from other domains. On the other hand, researchers have been working on designing more general solution methods that aim to work well across different problem domains. Hyper-heuristics have emerged as such methodologies and can be broadly categorised into two categories; generation hyper-heuristics to generate heuristics from existing components, and selection hyper-heuristics to select the most appropriate heuristic from a set of low level heuristics [3]. This study focuses on selection hyper-heuristics.

A selection hyper-heuristic framework operates on a single solution and iteratively selects a heuristic from a set of low level heuristics and applies it to the candidate solution. Then a move acceptance method decides whether to accept or reject the newly generated solution. This process is iteratively repeated until a termination criterion is satisfied. In [5], a range of simple selection methods are introduced, including Simple Random (SR) that randomly selects a heuristic at each step, and Random Descent which works similarly to SR, but the selected low level heuristic is applied repeatedly until no additional improvement in the solution is observed. Most of the simple non-stochastic basic move acceptance methods are tested in [5]; including All Moves (AM), which accepts all moves, Only Improving (OI), which accepts only improving moves and Improving or Equal (IE), which accepts all non-worsening moves. Late acceptance [4] accepts an incumbent solution if its quality is better than a solution that was obtained a specific number of steps earlier. More on selection hyper-heuristics can be found in [3].

HyFlex [14] (Hyper-heuristics Flexible framework) is a cross-domain heuristic search API and HyFlex v1.0 is a software framework written in Java, providing an easy-to-use interface for the development of selection hyper-heuristic search algorithms along with the implementation of several problem domains, each of which encapsulates problem-specific components, such as solution representation and low level heuristics. We will refer to HyFlex v1.0 as HyFlex from this point onward. HyFlex was initially developed to support the first Cross-domain Heuristic Search Challenge (CHeSC) in 2011Footnote 1. Initially, there were six minimisation problem domains implemented within HyFlex [14]. The HyFlex problem domains have been extended to include three more of them, including 0–1 Knapsack Problem (KP), Quadratic Assignment Problem (QAP) and Max-Cut (MAC) [1]. In this study, we only consider the ‘unseen’ extended HyFlex problem domains to investigate the performance and the generality of some previously proposed well performing selection hyper-heuristics.

2 Selection Hyper-heuristics for the Extended HyFlex Problem Domains

In this section, we provide a description of the selection hyper-heuristic methods which are investigated in this study. These hyper-heuristics use different combinations of heuristic selection and move acceptance methods.

Sequence-based selection hyper-heuristic (SSHH) [10] is a relatively new method which aims to discover the best performing sequences of heuristics for improving upon an initially generated solution. The hidden Markov model (HMM) is employed to learn the optimum sequence lengths of heuristics. The hidden states in HMM are replaced by the low level heuristics and the observations in HMM are replaced by the sequence-based acceptance strategies (AS). A transition probabilities matrix is utilised to determine the movement between the hidden states; and an emission probabilities matrix is employed to determine whether a particular sequence of heuristics will be applied to the candidate solution or will be coupled with another LLH. The move acceptance method used in [10] accepts all improving moves and non-improving moves with an adaptive threshold. The SSHH showed excellent performance across CHeSC 2011 problem domains achieving better overall performance than Adap-HH which was the winner of the challenge.

Dominance-based and random descent hyper-heuristic (DRD) [16] is an iterated multi-stage hyper-heuristic that hybridises a dominance-based and random descent heuristic selection strategies, and uses a naïve move acceptance method which accepts improving moves and non-improving moves with a given probability. The dominance-based stage uses a greedy-like method aiming to identify a set of ‘active’ low level heuristics considering the trade-off between the delta change in the fitness and the number of iterations required to achieve that change. The random descent stage considers only the subset of low level heuristics recommended by the dominance-based stage. If the search stagnates, then the dominance-based stage may kick in again aiming to detect a new subset of active heuristics. The method has proven to perform relatively well in the MAX-SAT and 1D bin-packing problem domains as reported in [16].

Robinhood (round-robin neighbourhood) hyper-heuristic [11] is an iterated multi-stage hyper-heuristic. Robinhood contains three selection hyper-heuristics. They all share the same heuristic selection method but differ in the move acceptance. The Robinhood heuristic selection allocates equal time for each low level heuristic and applies them one at a time to the incumbent solution in a cyclic manner during that time. The three move acceptance criteria employed by Robinhood are only improving, improving or equal, and an adaptive move acceptance method. The latter method accepts all improving moves and non-improving moves are accepted with a probability that changes adaptively throughout the search process. This selection hyper-heuristic outperformed eight ‘standard’ hyper-heuristics across a set of instances from HyFlex problem domains. A detailed description of the Robinhood hyper-heuristic can be found in [11].

Modified choice function (MCF) [6] uses an improved version of the traditional choice function (CF) heuristic selection method used in [5] and has a better average performance than CF when compared across the CHeSC 2011 competition problems. The basic idea of a choice function hyper-heuristic is to choose the best low level heuristic at each iteration. Hence, move acceptance is not needed and all moves are accepted. In the traditional CF method, each low level heuristic is assigned a score based on three factors; the recent effectiveness of the given heuristic (\(f_1\)), the recent effectiveness of consecutive pairs of heuristics (\(f_2\)), and the amount of time since the given heuristic was used (\(f_3\)) where each factor within CF is associated with a weight; \(\alpha \), \(\beta \), and \(\delta \) respectively [5]. It was also stated in the CF study that the hyper-heuristic was insensitive to the parameter settings for solving Sales Summit Scheduling problems and are consequently fixed throughout the search. MCF extends upon CF by controlling the weights of each factor for improving its cross-domain performance [6]. In MCF, the weights for \(f_1\) and \(f_2\) are equal as defined by the parameter \(\phi _t\), and the weight for \(f_3\) is set to \(1 - \phi _t\). \(\phi _t\) is controlled using a simple mechanism. If an improving move is made, then \(\phi _t = 0.99\). If a non-improving move is made, then \(\phi _t = max\{\phi _{t-1} -0.01, 0.01\}\).

Fuzzy late acceptance-based hyper-heuristic (F-LAHH) [8] was implemented for solving MAX-SAT problems and showed promising results. F-LAHH utilises a fitness proportionate selection mechanism (RUA1-F1FPS) [7] for the heuristic selection method and uses late acceptance, whose list length is adaptively controlled using a fuzzy control system, for its move acceptance method. In RUA1-F1FPS, the low level heuristics are assigned scores which are updated based on acceptance of the candidate solution as defined by the RUA1 scheme. A heuristic is chosen using a fitness proportionate (roulette wheel) selection mechanism utilising Formula 1 (F1) ranking scores (F1FPS). Each low level heuristic is ranked based on their current scores using F1 ranking and are assigned probabilities to be selected proportional to their F1 rank. The fuzzy control system, as defined in [8], adapts the list length of a late acceptance move acceptance method at the start of each phase each to promote intensification or diversification within the subsequent phase of the search based on the amount of improvement over the current phase. The F1FPS scoring mechanism used in this study is the RUA1 method as used in [7, 8]. The parameters of the fuzzy system are the same as those used in [8] with the universe of discourse of the list length fuzzy sets \(U = [10000,30000]\), the initial list length of late acceptance \(L_0 = 10000\), and the number of phases equal to 50.

Simple Random-Great Deluge (SR-GD) is a single-parameter selection hyper-heuristic method. At each step, a random heuristic will be selected and applied to the current solution. Great deluge move acceptance method [9] accepts improving solutions by default. A non-improving solution is only accepted if its quality is better than a threshold level at each iteration. Initially, the threshold level is set to the cost of the initially constructed solution. The threshold level is then updated at each iteration with a linear rate given by the following formula:

$$\begin{aligned} T_t = c + \varDelta C \times \left( 1-\frac{t}{N}\right) \end{aligned}$$
(1)

where \(T_t\) is the value of the threshold level at time t, N is the time limit, \(\varDelta C\) is the expected range for the maximum change in the cost, and c is the final cost.

3 Empirical Results

The methods presented in Sect. 2 are applied to 10 instances from each of the recently introduced HyFlex problem domains. The experiments are conducted on an i7-3820 CPU at 3.60 GHz with a memory of 16.00 GB. Each run is repeated 31 times with a termination criteria of 415 s corresponding to 600 nominal seconds of the CHeSC 2011 challenge test machineFootnote 2. The following performance indicators are used for ranking hyper-heuristics across all three domains:

  • rank: rank of a hyper-heuristic with respect to \(\mu _{norm}\).

  • \({{\mu }}_{rank}\): each algorithm is ranked based on the median objective values that they produce over 31 runs for each instance. The top algorithm is assigned to rank 1, while the worst algorithm’s rank equals to the number of algorithms being considered in ranking. In case of a tie, the ranks are shared by taking the average. The ranks are then accumulated and averaged over all instances producing \(\mu _{rank}\).

  • \({{\mu }}_{norm}\): the objective function values are normalised to values in the range [0,1] based on the following formula:

    $$\begin{aligned} norm(o,i) = \frac{o(i)-o_{best}(i)}{o_{worst}(i)-o_{best}(i)} \end{aligned}$$
    (2)

    where o(i) is the objective function value on instance i, \(o_{best(i)}\) is the best objective function value obtained by all methods on instance i, and \(o_{worst(i)}\) is the worst objective function value obtained by all methods on instance i. \(\mu _{norm}\) is the average normalised objective function value.

  • best: is the number of instances for which the hyper-heuristic achieves the best median objective function value.

  • worst: the number of instances for which the hyper-heuristic delivers the worst median objective function value.

As a performance indicator, \(\mu _{rank}\) focusses on median values and does not consider how far those values are from each other for the algorithms in consideration, while \(\mu _{norm}\) considers the mean performance of algorithms by taking into account the relative performance of all algorithms over all runs across each problem instance.

Table 1. The performance comparison of SSHH, DRD, Robinhood, MCF, F-LAHH and SR-GD over 31 runs for each instance. The best median values per each instance are highlighted in bold. Based on the Mann-Whitney-Wilcoxon test, for each pair of algorithms; SSHH versus X; SSHH > (<) X indicates that SSHH (X) is better than X (SSHH) and this performance variance is statistically significant with a confidence level of 95 %, and SSHH \(\ge \) (\(\le \)) X indicates that there is no statistical significant between SSHH and X, but SSHH (X) is better than X (SSHH) on average.

Table 1 summarises the results. On KP, SSHH delivers the best median values for 8 instances including 4 ties. Robinhood achieves the best median results in 5 instances including a tie. SR-GD, F-LAHH and DRD show comparable performance. On the QAP problem domain, SR-GD performs the best in 6 instances and F-LAHH shows promising results in this particular problem domain. This gives an indication that simple selection methods are potentially the best for solving QAP problems. SSHH ranked as the third best based on the average rank on QAP problem. On MAC, SSHH clearly outperforms all other methods, followed by SR-GD and then Robinhood. The remaining hyper-heuristics have relatively poor performance, with MCF being the worst of the 6 hyper-heuristics. Overall, SSHH turns out to be the best with \(\mu _{norm} = 0.16\) and \(\mu _{rank} = 2.28\). SR-GD also shows promising performance, scoring the second best. MCF consistently delivers weak performance in all the instances of the three problem domains. Table 1 also provides the pairwise average performance comparison of SSHH versus (DRD, Robinhood, MCF, F-LAHH and SR-GD) based on the Mann-Whitney-Wilcoxon statistical test. SSHH performs significantly better than any hyper-heuristic on all MAC instances, except Robinhood which performs better than SSHH on four out of ten instances. On the majority of the KP instances, SSHH is the best performing hyper-heuristic. SSHH performs poorly on QAP when compared to F-LAHH and SR-GD and both hyper-heuristics produce significantly better results than SSHH on almost all instances. SSHH performs statistically significantly better than the remaining hyper-heuristics on QAP.

The performance of the best hyper-heuristic from Table 1, SSHH is compared to the methods whose performances are reported in [1], including Adap-HH, which is the winner of the CHeSC 2011 competition [13], an Evolutionary Programming Hyper-heuristic (EPH) [12], Fair-Share Iterated Local Search with (FS-ILS) and without restart (NS-FS-ILS), Simple Random-All Moves (SR-AM) (denoted as AA-HH previously) and Simple Random-Improving or Equal (SR-IE) (denoted as ANW-HH previously). Table 2 summarises the results based on \(\mu _{rank}\), \(\mu _{norm}\), best and worst counts. Adap-HH performs better than SSHH in KP and QAP while SSHH performs the best on MAC. Overall, SSHH is the best method based on \(\mu _{norm}\) with a value of 0.113, however Adap-HH is the top ranking algorithm based on \(\mu _{rank}\) with a value of 2.53 and SSHH is the second best with a value of 3.20.

Table 2. The performance comparison of SSHH, Adap-HH, FS-ILS, NR-FS-ILS, EPH, SR-AM and SR-IE

4 Conclusion

A hyper-heuristic is a search methodology, designed with the aim of reducing the human effort in developing a solution method for multiple computationally difficult optimisation problems via automating the mixing and generation of heuristics. The goal of this study was to assess the level of generality of a set of selection hyper-heuristics across three recently introduced HyFlex problem domains. The empirical results show that both Adap-HH and SSHH perform better than the previously proposed algorithms across the problem domains included in the HyFlex extension set. Both adaptive algorithms embed different online learning mechanisms and indeed generalise well on the ‘unseen’ problems. It has also been observed that the choice of heuristic selection and move acceptance combination could lead to major performance differences across a diverse set of problem domains. This particular observation is aligned with previous findings in [2, 15].