LevelBased Analysis of the Univariate Marginal Distribution Algorithm
Abstract
Estimation of Distribution Algorithms (EDAs) are stochastic heuristics that search for optimal solutions by learning and sampling from probabilistic models. Despite their popularity in realworld applications, there is little rigorous understanding of their performance. Even for the Univariate Marginal Distribution Algorithm (UMDA)—a simple populationbased EDA assuming independence between decision variables—the optimisation time on the linear problem OneMax was until recently undetermined. The incomplete theoretical understanding of EDAs is mainly due to the lack of appropriate analytical tools. We show that the recently developed levelbased theorem for nonelitist populations combined with anticoncentration results yield upper bounds on the expected optimisation time of the UMDA. This approach results in the bound \(\mathcal {O}\left( n\lambda \log \lambda +n^2\right) \) on the LeadingOnes and BinVal problems for population sizes \(\lambda >\mu =\varOmega (\log n)\), where \(\mu \) and \(\lambda \) are parameters of the algorithm. We also prove that the UMDA with population sizes \(\mu \in \mathcal {O}\left( \sqrt{n}\right) \cap \varOmega (\log n)\) optimises OneMax in expected time \(\mathcal {O}\left( \lambda n\right) \), and for larger population sizes \(\mu =\varOmega (\sqrt{n}\log n)\), in expected time \(\mathcal {O}\left( \lambda \sqrt{n}\right) \). The facility and generality of our arguments suggest that this is a promising approach to derive bounds on the expected optimisation time of EDAs.
Keywords
Estimation of distribution algorithms Runtime analysis Levelbased analysis Anticoncentration1 Introduction
Estimation of Distribution Algorithms (EDAs) are a class of randomised search heuristics with many practical applications [15, 20, 24, 47, 48]. Unlike traditional Evolutionary Algorithms (EAs) which search for optimal solutions using genetic operators such as mutation or crossover, EDAs build and maintain a probability distribution of the current population over the search space, from which the next generation of individuals is sampled. Several EDAs have been developed over the last decades. The algorithms differ in how they capture interactions among decision variables, as well as in how they build and update their probabilistic models. EDAs are often classified as either univariate or multivariate; the former treats each variable independently, while the latter also considers variable dependencies [40]. Wellknown univariate EDAs include the compact Genetic Algorithm (cGA [21]), the PopulationBased Incremental Learning Algorithm (PBIL [4]), and the Univariate Marginal Distribution Algorithm (UMDA [37]). Given a problem instance of size n, univariate EDAs represent probabilistic models as an nvector, where each vector component is called a marginal. Some Ant Colony Optimisation (ACO) algorithms and even certain singleindividual EAs can be cast in the same framework as univariate EDAs (or n\({{\mathrm{Bernoulli}}}\)\(\lambda \)EDA, see, e.g., [18, 22, 25, 42]). Multivariate EDAs, such as the Bayesian Optimisation Algorithm, which builds a Bayesian network with nodes and edges representing variables and conditional dependencies respectively, attempt to learn relationships between decision variables [22]. The surveys [1, 22, 39] describe further variants and applications of EDAs.
Recently EDAs have drawn a growing attention from the theory community of evolutionary computation [10, 13, 18, 26, 27, 28, 32, 44, 45, 46]. The aim of the theoretical analyses of EDAs in general is to gain insights into the behaviour of the algorithms when optimising an objective function, especially in terms of the optimisation time, that is the number of function evaluations, required by the algorithm until an optimal solution has been found for the first time. Droste [14] provided the first rigorous runtime analysis of an EDA, specifically the cGA. Introduced in [21], the cGA samples two individuals in each generation and updates the probabilistic model according to the fittest of these individuals. A quantity of \(\pm 1/K\) is added to the marginals for each bit position where the two individuals differ. The reciprocal K of this quantity is often referred to as the abstract population size of a genetic algorithm that the cGA is supposed to model. Droste showed a lower bound \(\varOmega (K\sqrt{n})\) on the expected optimisation time of the cGA for any pseudoBoolean function [14]. He also proved the upper bound \(\mathcal {O}(nK)\) for any linear function, where \(K=n^{1/2+\varepsilon }\) for any small constant \(\varepsilon >0\). Note that each marginal of the cGA considered in [14] is allowed to reach the extreme values zero and one. Such an algorithm is referred to as an EDA without margins, since in contrast it is possible to reinforce some margins (also called borders) on the range of values for each marginal to keep it away from the extreme probabilities, often within the interval \([1/n,11/n]\). An EDA without margins can prematurely converge to suboptimal solutions; thus, the runtime bounds of [14] were in fact conditioned on the event that early convergence never happens. Very recently, Witt [45] studied an effect called domino convergence on EDAs, where bits with heavy weights tend to be optimised before bits with light weights. By deriving a lower bound of \(\varOmega (n^2)\) on the expected optimisation time of the cGA on BinVal for any value of \(K>0\), Witt confirmed the claim made earlier by Droste [14] that BinVal is a harder problem for the cGA than the OneMax problem is. Moreover, Lengler et al. [32] considered \(K=\mathcal {O}\left( \sqrt{n}/\log ^2 n\right) \), which was not covered by Droste in [14], and obtained a lower bound \(\varOmega (K^{1/3}n+n\log n)\) on the expected optimisation time of the cGA on OneMax. Note that if \(K=\varTheta (\sqrt{n}/\log ^2 n)\), the above lower bound will be \(\varOmega (n^{7/6}/\log ^2 n)\), which further tightens the bounds on the expected optimisation time of the cGA.
An algorithm closely related to the cGA with (reinforced) margins is the 2Max Min Ant System with iteration best (\(2\)MMAS\(_{\text {ib}}\)). The two algorithms differ only in the update procedure of the model, and \(2\)MMAS\(_{\text {ib}}\) is parameterised by an evaporation factor \(\rho \in (0,1)\). Sudholt and Witt [42] proved the lower bounds \(\varOmega (K\sqrt{n}+n\log {n})\) and \(\varOmega (\sqrt{n}/\rho +n\log {n})\) for the two algorithms on OneMax under any setting, and upper bounds \(\mathcal {O}(K\sqrt{n})\) and \(\mathcal {O}(\sqrt{n}/\rho )\) when K and \(\rho \) are in \(\varOmega (\sqrt{n}\log {n})\). Thus, the optimal expected optimisation time \(\varTheta (n\log {n})\) of the cGA and the \(2\)MMAS\(_{\text {ib}}\) on OneMax is achieved by setting these parameters to \(\varTheta (\sqrt{n}\log {n})\). The analyses revealed that choosing lower parameter values results in strong fluctuations that may cause many marginals (or pheromones in the context of ACO) to fix early at the lower margin, which then need to be repaired later. On the other hand, choosing higher parameter values resolves the issue but may slow down the learning process.
Friedrich et al. [18] pointed out two behavioural properties of univariate EDAs at each bit position: a balanced EDA would be sensitive to signals in the fitness, while a stable one would remain uncommitted under a biasless fitness function. During the optimisation of LeadingOnes, when some bit positions are temporarily neutral, while the others are not, both properties appear useful to avoid commitment to wrong decisions. Unfortunately, many univariate EDAs without margins, including the cGA, the UMDA, the PBIL and some related algorithms are balanced but not stable [18]. A more stable version of the cGA—the socalled stable cGA (or scGA)—was then introduced in [18]. Under appropriate settings, it yields an expected optimisation time \(\mathcal {O}(n\log n)\) on LeadingOnes with high probability. Furthermore, a recent study by Friedrich et al. [17] showed that cGA can cope with higher levels of noise more efficiently than mutationonly heuristics do.
Introduced by Baluja [4], the PBIL is another univariate EDA. Unlike the cGA that samples two solutions in each generation, the PBIL samples a population of \(\lambda \) individuals, from which the \(\mu \) fittest individuals are selected to update the probabilistic model using a convex combination with a smoothing parameter\(\rho \in (0,1]\) of the current model and the frequencies of ones among all selected individuals at that bit position. The PBIL can be seen as a special case of the crossentropy method [38] on the binary hypercube \(\{0,1\}^n\). Wu et al. [46] analysed the runtime of the PBIL on OneMax and LeadingOnes. The authors argued that due to the use of a sufficiently large population size, it is possible to prevent the marginals from reaching the lower border early even when a large smoothing parameter \(\rho \) is used. Runtime results were proved for the PBIL without margins on OneMax and the PBIL with margins on LeadingOnes, and were then compared to the runtime of some Ant System approaches. However, the required population size is large, i.e. \(\lambda =\omega (n)\). Very recently, Lehre and Nguyen [28] obtained an upper bound \(\mathcal {O}(n\lambda \log \lambda +n^2)\) on the expected optimisation time for the PBIL with margins on BinVal and LeadingOnes, which improves the previously known upper bound \(\mathcal {O}(n^{2+\epsilon })\) in [46] by a factor of \(n^{\varepsilon }\), where \(\varepsilon \) is some positive constant, for smaller population sizes \(\lambda =\varOmega (\log n)\).
The UMDA is a special case of the PBIL with the largest smoothing parameter \(\rho = 1\), that is, the probabilistic model for the next generation depends solely on the selected individuals in the current population. The algorithm has a wide range of applications, not only in computer science, but also in other areas like population genetics and bioinformatics [20, 48]. Moreover, the UMDA relates to the notion of linkage equilibrium [36, 41], which is a popular model assumption in population genetics. Thus, studies of the UMDA can contribute to the understanding of population dynamics in population genetics.
Despite an increasing momentum in the runtime analysis of EDAs over the last few years, our understanding of the UMDA in terms of runtime is still limited. The algorithm was early analysed in a series of papers [5, 6, 7, 8], where timecomplexities of the UMDA on simple unimodal functions were derived. These results showed that the UMDA with margins often outperforms the UMDA without margins, especially on functions like BVLeadingOnes, which is a unimodal problem. The possible reason behind the failure of the UMDA without margins is due to fixation, causing no further progression for the corresponding decision variables. The UMDA with margins is able to avoid this by ensuring that each search point always has a positive chance to be sampled. Shapiro investigated the UMDA with a different selection mechanism than truncation selection [40]. In particular, this variant of the UMDA selects individuals whose fitnesses are no less than the mean fitness of all individuals in the current population when updating the probabilistic model. By representing the UMDA as a Markov chain, the paper showed that the population size has to be at least \(\sqrt{n}\) for the UMDA to prevent the probabilistic model from quickly converging to the corners of the hypercube on the search space. This phenomenon is wellknown as genetic drift [2]. A decade later, the first upper bound on the expected optimisation time of the UMDA on OneMax was revealed [10]. Working on the standard UMDA using truncation selection, Dang and Lehre [10] proved an upper bound \(\mathcal {O}(n\lambda \log \lambda )\) on the expected optimisation time of the UMDA on OneMax, assuming a population size \(\lambda =\varOmega (\log n)\). If \(\lambda = \varTheta (\log n)\), then the upper bound is \(\mathcal {O}(n\log n\log \log n)\). Inspired by the previous work of [42] on cGA/\(2\)MMAS\(_{\text {ib}}\), Krejca and Witt [26] obtained a lower bound \(\varOmega (\mu \sqrt{n}+n\log n)\) for the UMDA on OneMax via drift analysis, where \(\lambda = (1+\varTheta (1))\mu \). Compared to [42], the analysis is much more involved since, unlike in cGA/\(2\)MMAS\(_{\text {ib}}\) where each change of marginals between consecutive generations is small and limited by to the smoothing parameter, large changes are always possible in the UMDA. From these results, we observe that the latest upper and lower bounds for the UMDA on OneMax still differ by \(\varTheta (\log \log n)\). This raises the question of whether this gap could be closed.
This paper derives upper bounds on the expected optimisation time of the UMDA on the following problems: OneMax, BinVal, and LeadingOnes. The preliminary versions of this work appeared in [10] and [27]. Here we use the improved version of the levelbased analysis technique [9]. The analyses for LeadingOnes and BinVal are straightforward and similar to each other, i.e. yielding the same runtime \(\mathcal {O}(n\lambda \text {log} \lambda +n^2)\); hence, they will serve the purpose of introducing the technique in the context of EDAs. Particularly, we only require population sizes \(\lambda = \varOmega (\log {n})\) for LeadingOnes which is much smaller than previously thought [6, 7, 8]. For OneMax, we give a more detailed analysis so that an expected optimisation time \(\mathcal {O}(n\log n)\) is derived if the population size is chosen appropriately. This significantly improves the results in [9, 10] and matches the recent lower bound in [26]. More specifically, we assume \(\lambda \ge b \mu \) for a sufficiently large constant \(b>0\), and separate two regimes of small and large selected populations: the upper bound \(\mathcal {O}(\lambda n)\) is derived for \(\mu = \varOmega (\log n) \cap \mathcal {O}(\sqrt{n})\), and the upper bound \(\mathcal {O}(\lambda \sqrt{n})\) is shown for \(\mu = \varOmega (\sqrt{n}\log n)\). These results exhibit the applicability of the levelbased technique in the runtime analysis of (univariate) EDAs. Table 1 summarises the latest results about the runtime analyses of univariate EDAs on simple benchmark problems; see [25] for a recent survey on the theory of EDAs.
Related independent work Witt [44] independently obtained the upper bounds \(\mathcal {O}(\lambda n)\) and \(\mathcal {O}(\lambda \sqrt{n})\) on the expected optimisation time of the UMDA on OneMax for \(\mu = \varOmega (\log n)\cap o(n)\) and \(\mu = \varOmega (\sqrt{n}\log n)\), respectively, and \(\lambda =\varTheta (\mu )\) using an involved drift analysis. While our results do not hold for \(\mu =\varOmega (\sqrt{n})\cap \mathcal {O}\left( \sqrt{n}\log n\right) \), our methods yield significantly easier proofs. Furthermore, our analysis also holds when the parent population size \(\mu \) is not proportional to the offspring population size \(\lambda \), which is not covered in [44].
Expected optimisation time (number of fitness evaluations) of univariate EDAs on the three problems OneMax, LeadingOnes and BinVal
Problem  Algorithm  Constraints  Runtime 

OneMax  UMDA  \(\lambda =\varTheta (\mu ), \lambda =\mathcal {O}\left( \text {poly(n)}\right) \)  \(\varOmega (\lambda \sqrt{n}+n\log n)\) [26] 
\(\lambda =\varTheta (\mu ),~ \mu =\varOmega (\log n)\cap o(n)\)  \(\mathcal {O}\left( \lambda n\right) \) [44]  
\(\lambda =\varTheta (\mu ), ~ \mu = \varOmega (\sqrt{n}\log n)\)  \(\mathcal {O}\left( \lambda \sqrt{n}\right) \) [44]  
\(\lambda =\varOmega (\mu ), ~\mu = \varOmega (\log n) \cap \mathcal {O}\left( \sqrt{n}\right) \)  \(\mathcal {O}\left( \lambda n\right) \) [Theorem 8]  
\(\lambda =\varOmega (\mu ), ~\mu =\varOmega (\sqrt{n}\log n)\)  \(\mathcal {O}\left( \lambda \sqrt{n}\right) \) [Theorem 9]  
PBIL \(^{\!\!*}\)  \(\mu =\omega (n), \lambda =\omega (\mu )\)  \(\omega (n^{3/2})\) [46]  
cGA  \(K=n^{1/2+\epsilon }\)  \(\varTheta (K\sqrt{n})\) [14]  
\(K=\mathcal {O}\left( \sqrt{n}/\log ^2 n\right) \)  \(\varOmega (K^{1/3}n+n\log n)\) [32]  
\(\textsc {scGA}\)  \(\rho =\varOmega (1/\log n), a=\varTheta (\rho ), c>0\)  \(\varOmega (\min \{2^{\varTheta (n)},2^{c/\rho }\})\) [13]  
LeadingOnes  UMDA  \(\mu = \varOmega (\log n), \lambda =\varOmega (\mu )\)  \(\mathcal {O}\left( n\lambda \log \lambda + n^2\right) \) [Theorem 7] 
PBIL  \(\lambda =n^{1+\epsilon }, \mu = \mathcal {O}\left( n^{\epsilon /2}\right) , \epsilon \in (0,1)\)  \(\mathcal {O}\left( n^{2+\epsilon }\right) \) [46]  
\(\lambda = \varOmega (\mu ), \mu =\varOmega (\log n)\)  \(\mathcal {O}\left( n\lambda \log \lambda +n^2\right) \) [28]  
\(\textsc {scGA}\)  \(\rho =\varTheta (1/\log n), a=\mathcal {O}\left( \rho \right) \)  \(\mathcal {O}\left( n\log n\right) \) [18]  
BinVal  UMDA  \(\mu = \varOmega (\log n), \lambda =\varOmega (\mu )\)  \(\mathcal {O}\left( n\lambda \log \lambda +n^2\right) \) [Theorem 7] 
PBIL  \(\lambda = \varOmega (\mu ), \mu =\varOmega (\log n)\)  \(\mathcal {O}\left( n\lambda \log \lambda +n^2\right) \) [28]  
cGA  \(K=n^{1/2+\epsilon }\)  \(\varTheta (Kn)\) [14]  
any \(K>0\)  \(\varOmega (n^2)\) [45] 
2 Preliminaries
This section describes the three standard benchmark problems, the algorithm under investigation and the levelbased theorem, which is a general method to derive upper bounds on the expected optimisation time of nonelitist populationbased algorithms. Furthermore, a sharp upper bound on the sum of independent Bernoulli trials, which is essential in the runtime analysis of the UMDA on OneMax for a small population size, is presented, followed by Feige’s inequality.
We use the following notation throughout the paper. The natural logarithm is denoted as \(\ln (\cdot )\), and \(\log (\cdot )\) denotes the logarithm with base 2. Let [n] be the set \(\{1,2,\ldots ,n\}\). The floor and ceiling functions are \(\lfloor x\rfloor \) and \(\lceil x\rceil \), respectively, for \(x \in \mathbb {R}\). For two random variables X, Y, we use \(X \preceq Y\) to indicate that Y stochastically dominates X, that is \(\hbox {Pr}\left( X \ge k\right) \le \hbox {Pr}\left( Y \ge k\right) \) for all \(k \in \mathbb {R}\).
We consider a partition of the finite search space \(\mathcal {X}=\{0,1\}^n\) into m ordered subsets \(A_1,\ldots ,A_m\) called levels, i.e. \(A_i \cap A_j = \emptyset \) for any \(i \ne j\) and \(\cup _{i=1}^{m}A_i = \mathcal {X}\). The union of all levels above j inclusive is denoted \(A_{\ge j}:=\cup _{i=j}^m A_i\). An optimisation problem on \(\mathcal {X}\) is assumed, without loss of generality, to be the maximisation of some function \(f:\mathcal {X} \rightarrow \mathbb {R}\). A partition is called fitnessbased (or fbased) if for any \(j \in [m1]\) and all \(x \in A_{j}\), \(y \in A_{j+1} :f(y)>f(x)\). An fbased partitioning is called canonical when \(x,y \in A_j\) if and only if \(f(x) =f(y)\).
Given the search space \(\mathcal {X}\), each \(x\in \mathcal {X}\) is called a search point (or individual), and a population is a vector of search points, i.e. \(P \in \mathcal {X}^{\lambda }\). For a finite population \(P= \left( x^{(1)},\ldots ,x^{(\lambda )}\right) \), we define \(P \cap A_j := \{i \in [\lambda ] \mid x^{(i)} \in A_j\}\), i.e. the number of individuals in population P which are in level \(A_j\). Truncation selection, denoted as \((\mu ,\lambda )\)selection for some \(\mu <\lambda \), applied to population P transforms it into a vector \(P'\) (called selected population) with \(P'=\mu \) by discarding the \(\lambda  \mu \) worst search points of P with respect to some fitness function f, where ties are broken uniformly at random.
2.1 Three Problems
We consider the three pseudoBoolean functions: OneMax, LeadingOnes and BinVal, which are defined over the finite binary search space \(\mathcal {X}=\{0,1\}^n\) and widely used as theoretical benchmark problems in runtime analyses of EDAs [10, 14, 26, 28, 44, 46]. Note in particular that these problems are only required to describe and compare the behaviour of the EDAs on problems with wellunderstood structures. The first problem, as its name may suggest, simply counts the number of ones in the bitstring and is widely used to test the performance of EDAs as a hill climber [25]. While the bits in OneMax have the same contributions to the overall fitness, BinVal, which aims at maximising the binary value of the bitstring, has exponentially scaled weights relative to bit positions. In contrast, LeadingOnes counts the number of leading ones in the bitstring. Since bits in this particular problem are highly correlated, it is often used to study the ability of EDAs to cope with dependencies among decision variables [25].
The global optimum for all functions is the allones bitstring, i.e. \(1^n\). For any bitstring \(x=(x_1,\ldots ,x_n) \in \mathcal {X}\), these functions is defined as follows:
Definition 1
\(\textsc {OneMax} (x) := \sum \limits _{i=1}^{n}x_i\).
Definition 2
\(\textsc {LeadingOnes} (x) := \sum \limits _{i=1}^{n}\prod \limits _{j=1}^{i}x_j\).
Definition 3
\(\textsc {BinVal} (x) := \sum \limits _{i=1}^{n}2^{ni}x_i\).
2.2 Univariate Marginal Distribution Algorithm
2.3 LevelBased Theorem
We are interested in the optimisation time of the UMDA, which is a nonelitist algorithm; thus, tools for analysing runtime for this class of algorithms are of importance. Currently in the literature, drift theorems have often been used to derive upper and lower bounds on the expected optimisation time of the UMDA, see, e.g., [26, 44] because they allow us to examine the dynamics of each marginal in the vectorbased probabilistic model. In this paper, we take another perspective where we consider the population of individuals. To do this, we make use of the socalled levelbased theorem.
Introduced by Corus et al. [9], the levelbased theorem is a general tool that provides upper bounds on the expected optimisation time of many nonelitist populationbased algorithms on a wide range of optimisation problems [9]. It has been applied to analyse the expected optimisation time of Genetic Algorithms with or without crossover on various pseudoBoolean functions and combinatorial optimisation problems [9], selfadaptive EAs [11], the UMDA with margins on OneMax and LeadingOnes [10], and very recently the PBIL with margins on LeadingOnes and BinVal [28].
Furthermore, the theorem assumes a partition \(A_1,\ldots ,A_m\) of the finite search space \(\mathcal {X}\) into m subsets, which we call levels. We assume that the last level \(A_m\) consists of all optimal solutions. Given a partition of the search space \(\mathcal {X}\), we can state the levelbased theorem as follows:
Theorem 4
 (G1) for each level \(j\in [m1]\), if \(P_t\cap A_{\ge j}\ge \gamma _0\lambda \) then$$\begin{aligned} {\hbox {Pr}}_{y \sim \mathcal {D}(P_t)}\left( y \in A_{\ge j+1}\right) \ge z_j. \end{aligned}$$
 (G2) for each level \(j\in [m2]\) and all \(\gamma \in (0,\gamma _0]\), if \(P_t\cap A_{\ge j}\ge \gamma _0\lambda \) and \(P_t\cap A_{\ge j+1}\ge \gamma \lambda \) then$$\begin{aligned} {\hbox {Pr}}_{y \sim \mathcal {D}(P_t)}\left( y \in A_{\ge j+1}\right) \ge \left( 1+\delta \right) \gamma . \end{aligned}$$
 (G3) and the population size \(\lambda \in \mathbb {N}\) satisfieswhere \(z_* :=\min _{j\in [m1]}\{z_j\}\), then$$\begin{aligned} \lambda \ge \left( \frac{4}{\gamma _0\delta ^2}\right) \ln \left( \frac{128m}{z_*\delta ^2}\right) , \end{aligned}$$$$\begin{aligned} \mathbb {E}\left[ T\right] \le \left( \frac{8}{\delta ^2}\right) \sum _{j=1}^{m 1}\left[ \lambda \ln \left( \frac{6\delta \lambda }{4+z_j\delta \lambda }\right) +\frac{1}{z_j}\right] . \end{aligned}$$
Informally, the first condition (G1) requires that the probability of sampling an individual in levels \(A_{\ge j+1}\) is at least \(z_j\) given that at least \(\gamma _0\lambda \) individuals in the current population are in levels \(A_{\ge j}\). Condition (G2) further requires that at least \(\gamma \lambda \) of them are in levels \(A_{\ge j+1}\), the probability of sampling an offspring in levels \(A_{\ge j+1}\) is at least \((1+\delta )\gamma \). The last condition (G3) sets a lower limit on the population size \(\lambda \). As long as the three conditions are satisfied, an upper bound on the expected time to reach the last level \(A_m\) of a populationbased algorithm is guaranteed.
To apply the levelbased theorem, it is recommended to follow the fivestep procedure in [9]: (1) identifying a partition of the search space (2) finding appropriate parameter settings such that condition (G2) is met (3) estimating a lower bound \(z_j\) to satisfy condition (G1) (4) ensuring the the population size is large enough and (5) derive the upper bound on the expected time to reach level \(A_m\).
Note in particular that Algorithm 2 assumes a mapping \(\mathcal {D}\) from the space of populations \(\mathcal {X}^{\lambda }\) to the space of probability distributions over the search space. The mapping \(\mathcal {D}\) is often said to depend on the current population only [9]; however, this is not strictly necessary. Very recently, Lehre and Nguyen [28] applied Theorem 4 to analyse the expected optimisation time of the PBIL with a sufficiently large offspring population size \(\lambda =\varOmega (\log n)\) on LeadingOnes and BinVal, when the population for the next generation is sampled using a mapping that depends on the previous probabilistic model \(p_t\) in addition to the current population \(P_t\). The rationale behind this is that, in each generation, the PBIL draws \(\lambda \) samples from the probability distribution (1), that correspond to \(\lambda \) individuals in the current population. If the number of samples \(\lambda \) is sufficiently large, it is highly likely that the empirical distributions for all positions among the entire population cannot deviate too far from the true distributions, i.e. marginals \(p_t(i)\) [28], due to the Dvoretzky–Kiefer–Wolfowitz inequality [34].
2.4 Feige’s Inequality
In order to verify conditions (G1) and (G2) of Theorem 4 for the UMDA on OneMax using a canonical fbased partition \(A_1,\ldots ,A_m\), we later need a lower bound on the probability of sampling an offspring in given levels, that is \(\hbox {Pr}_{y\sim p_t}(y \in A_{\ge j})\), where y is the offspring sampled from the joint probability distribution (1). Let Y denote the number of ones in the offspring y. It is wellknown that the random variable Y follows a Poisson–Binomial distribution with expectation \(\mathbb {E}\left[ Y\right] =\sum _{i=1}^{n}p_t(i)\) and variance \(\sigma _n^2=\sum _{i=1}^{n}p_t(i)\left( 1p_t(i)\right) \). A general result due to Feige [16] provides such a lower bound when \(Y<\mathbb {E}\left[ Y\right] \); however, for our purposes, it will be more convenient to use the following variant [10].
Theorem 5
2.5 Anticoncentration Bound
In addition to Feige’s inequality, it is also necessary to compute an upper bound on the probability of sampling an offspring in a given level, that is \(\hbox {Pr}_{y\sim p_t}\left( y \in A_j \right) \) for any \(j \in [m]\), where \(y\sim \hbox {Pr}(\cdot \mid p_t)\) as defined in (1). Let Y be the random variable that follows a Poisson–Binomial distribution as introduced in the previous subsection. Baillon et al. [3] derived the following sharp upper bound on the probability \(\hbox {Pr}_{y\sim p_t}\left( y \in A_j \right) \).
Theorem 6
3 Runtime of the UMDA on LeadingOnes and BinVal
As a warmup example, and to illustrate the method of levelbased analysis, we consider the two functions—LeadingOnes and BinVal—as defined in Definitions 2 and 3. It is wellknown that the expected optimisation time of the (1+1) EA on LeadingOnes is \(\varTheta (n^2)\), and that this is optimal for the class of unary unbiased blackbox algorithms [29]. Early analysis of the UMDA on LeadingOnes [8] required an excessively large population, i.e. \(\lambda = \omega (n^2\log n)\). Our analysis below shows that a population size \(\lambda = \varOmega (\log n)\) suffices to achieve the expected optimisation time \(\mathcal {O}(n^2)\).
Theorem 7
The UMDA (with margins) with parent population size \(\mu \ge c \log {n}\) for a sufficiently large constant \(c>0\), and offspring population size \(\lambda \ge (1+\delta )e\mu \) for any constant \(\delta >0\), has expected optimisation time \(\mathcal {O}(n\lambda \log {\lambda }+n^2)\) on LeadingOnes and BinVal.
Proof
We apply Theorem 4 by following the guidelines from [9].
Step 4 Considering (G3), because \(\delta \) is a constant, and both \(1/z_*\) and m are \(\mathcal {O}(n)\), there must exist a constant \(c>0\) such that \(\mu \ge c \log n \ge (4/\delta ^2)\ln (128 m / (z_* \delta ^2))\). Note that \(\lambda = \mu /\gamma _0\), so (G3) is satisfied.
\(\square \)
4 Runtime of the UMDA on OneMax
We consider the problem in Definition 1, i.e., maximisation of the number of ones in a bitstring. It is wellknown that OneMax can be optimised in expected time \(\varTheta (n\log n)\) using the simple \((1+1)\) EA. The levelbased theorem yielded the first upper bound \(\mathcal {O}(n\lambda \log \lambda )\) on the expected optimisation time of the UMDA on OneMax, assuming that \(\lambda =\varOmega (\log n)\) [10]. This leaves open whether an improved bound \(\mathcal {O}(n\lambda )\) can be obtained for the UMDA (with margins) on problem OneMax.

Let \(Y:=(Y_1,Y_2,\ldots ,Y_n)\) denote an offspring sampled from the probability distribution (1) in generation t, where \(\hbox {Pr}(Y_i=1)=p_t(i)\) for each \(i\in [n]\).

Let \(Y_{i,j}:=\sum _{k=i}^j Y_k\) denote the number of ones sampled from the subvector \(\left( p_t(i),p_t(i+1),\ldots ,p_t(j)\right) \) of the model \(p_t\) where \(1\le i\le j\le n\).
4.1 Small Parent Population Size

for all \(i\in [1,k],\)\(1\le X_i \le \mu 1\) and \(p_t(i)=X_i/\mu \),

for all \(i\in (k,k+\ell ]\), \(X_i = \mu \) and \(p_t(i)=11/n\), and

for all \(i\in (k+\ell ,n]\), \(X_i =0\) and \(p_t(i)=1/n\).
We aim at obtaining an upper bound \(\mathcal {O}(n\lambda )\) on the expected optimisation time of the UMDA on OneMax using the levelbased theorem. The logarithmic factor \(\mathcal {O}(\log \lambda )\) in the previous upper bound \(\mathcal {O}(n\lambda \log \lambda )\) in [10] stems from the lower bound \(\varOmega (1/\mu )\) on the parameter \(z_j\) in the condition (G1) of Theorem 4. We aim for the stronger bound \(z_j=\varOmega (\frac{nj+1}{n})\). Note that in the following proofs, we choose the parameter \(\gamma _0:=\mu /\lambda \).
 1.\(k\ge \mu \). In this situation, the variance of \(Y_{1,k}\) is not too small. By the result of Theorem 6, the distribution of \(Y_{1,k}\) cannot be too concentrated on its mean \(\mathbb {E}\left[ Y_{1,k}\right] =j\ell 1\), and with probability at least \(\varOmega (1)\), the algorithm can sample at least \(j\ell \) ones from the first k bit positions to obtain an offspring with at least \((j\ell )+\ell =j\) ones. Thus, the probability of sampling at least j ones is bounded from below by$$\begin{aligned} \hbox {Pr}(Y_{1,n}\ge j) \ge \hbox {Pr}(Y_{1,k}\ge j\ell ) \hbox {Pr}(Y_{k+1, k+\ell }=\ell ) =\varOmega (1). \end{aligned}$$
 2.
\(k<\mu \) and \(j\ge n+1\frac{n}{\mu }\). In this case, the current level is very close to the last level \(A_{n+1}\), and the bitstring has few zeros. As already obtained from [10], the probability of sampling an offspring in \(A_{\ge j+1}\) in this case is \(\varOmega (\frac{1}{\mu })\). Since the condition can be rewritten as \(\frac{1}{\mu }\ge \frac{nj+1}{n}\), it ensures that \(z_j=\varOmega (\frac{1}{\mu })=\varOmega (\frac{nj+1}{n})\).
 3.
The remaining cases. Later will we prove that if \(\mu \le \sqrt{n(1c)}\) for some constant \(c\in (0,1)\), and excluding the two cases above, imply \(0\le k<(1c)(nj+1)\). In this case, k is relatively small, and \(\ell \) is not too large since the current level is not very close to the last level \(A_{n+1}\). This implies that most zeros must be located among bit positions \(i\in (k+\ell ,n]\), and it suffices to sample an extra one from this region to get at least \((j\ell 1)+\ell +1=j\) ones. The probability of sampling an offspring in levels \(A_{\ge j+1}\) is then \(z_j=\varOmega (\frac{nj+1}{n})\).
Theorem 8
For some constant \(a>0\) and any constant \(c\in (0,1)\), the UMDA (with margins) with parent population size \(a\ln (n)\le \mu \le \sqrt{n(1c)}\), and offspring population size \(\lambda \ge (13e/(1c))\mu \), has expected optimisation time \(\mathcal {O}\left( n\lambda \right) \) on OneMax.
Proof
We rearrange the bit positions as explained above and follow the recommended 5step procedure for applying Theorem 4 [9].
Step 1 The levels are defined as in Eq. (2). There are exactly \(m=n+1\) levels from \(A_1\) to \(A_{n+1}\), where level \(A_{n+1}\) consists of the optimal solution.
Step 3 We now consider condition (G1) for any level j. Let \(P_t\) be any population where \(P_t\cap A_{\ge j}\ge \gamma _0\lambda =\mu \). For a lower bound on \(\hbox {Pr}\left( Y_{1,n}\ge j\right) \), we modify the population such that any individual in levels \(A_{\ge j+1}\) is moved to level \(A_j\). Thus, the \(\mu \) fittest individuals belong to level \(A_j\). By the definition of the UMDA, this will only reduce the probabilities \(p_{t+1}(i)\) on the OneMax problem. Hence, by Lemma 13, the distribution of \(Y_{1,n}\) for the modified population is stochastically dominated by \(Y_{1,n}\) for the original population. A lower bound \(z_j\) that holds for the modified population therefore also holds for the original population. All the \(\mu \) fittest individuals in the current sorted population \(P_t\) have exactly \(j1\) ones, and, therefore,\( \sum _{i=1}^{n}X_i= \mu \left( j1\right) \) and \( \sum _{i=1}^{k}X_i= \mu \left( j\ell 1\right) \). There are four distinct cases that cover all situations according to different values of variables k and j. We aim to show that in all four cases, we can use the parameter \(z_j=\varOmega (\frac{nj+1}{n})\).
4.2 Large Parent Population Size
For larger parent population sizes, i.e., \(\mu = \varOmega (\sqrt{n}\log n)\), we prove the upper bound \(\mathcal {O}(\lambda \sqrt{n})\) on the expected optimisation time of the UMDA on OneMax. Witt [44] obtained a similar result, and we actually rely on one of his lemmas to derive our improved result. In overall, our proof is not only significantly simpler but also holds for different settings of \(\mu \) and \(\lambda \), that is, \(\lambda =\varOmega (\mu )\) instead of \(\lambda =\varTheta (\mu )\).
Theorem 9
For sufficiently large constants \(a>1\) and \(c>0\), the UMDA (with margins) with offspring population size \(\lambda \ge a\mu \), and parent population size \(\mu \ge c\sqrt{n}\log n\), has expected optimisation time \(\mathcal {O}\left( \lambda \sqrt{n}\right) \) on OneMax.

\(p_t(i)\in \left[ p_{\min }, 1\frac{1}{\mu }\right] \) for all \(1\le i \le k\),

\(p_t(i) =1\frac{1}{n}\) for all \(k+1\le i \le n\).
Proof of Theorem 9
We apply Theorem 4.
5 Empirical Results
We have proved upper bounds on the expected optimisation time of the UMDA on OneMax, LeadingOnes and BinVal. However, they are only asymptotic upper bounds as growth functions of the problem and population sizes. They provide no information on the multiplicative constants or the influences of lower order terms. Our goal is also to investigate the runtime behaviour for larger populations. To complement the theoretical findings, we therefore carried out some experiments by running the UMDA on the three functions.
For each function, the parameters were chosen consistently with the theoretical analyses. Specifically, we set \(\lambda =n\), and \(n \in \{100, 200,\ldots ,4500\}\). Although the theoretical results imply that significantly smaller population sizes would suffice, e.g. \(\lambda =O(\log n)\) for Theorem 8 we chose a larger population size in the experiments to more easily observe the impact of \(\lambda \) on the running time of the algorithm. The results are shown in Figs. 1, 2 and 3. For each value of n, the algorithm is run 100 times, and then the average runtime is computed. The average runtime for each value of n is estimated with \(95\%\) confidence intervals using the bootstrap percentile method [30] with 100 bootstrap samples. Each average point is plotted with two error bars to illustrate the upper and lower margins of the confidence intervals.
5.1 OneMax

\(\mathcal {O}\left( \lambda n\right) \) with parent population sizes \(\mu =\varOmega (\log n)\cap \mathcal {O}(\sqrt{n})\),

\(\mathcal {O}(\lambda \sqrt{n})\) with parent population sizes \(\mu = \varOmega (\sqrt{n}\log (n))\).
Correlation coefficient \(\rho \) for the bestfit models in the experiments with OneMax shown in Fig. 1a, b
Setting  Model  \({\rho }\) 

\(\mu = \sqrt{n}\)  \(5.8297 \;n\log n\)  0.9968 
\(0.8104 \;n^{3/2}\)  0.9996  
\(0.0133 \;n^{2}\)  0.9910  
\(\mu = \sqrt{n}\log n\)  \(7.7544 \;n\log n\)  0.9974 
\(1.0767 \;n^{3/2}\)  0.9995  
\(0.0177 \;n^{2}\)  0.9903 
In Table 2, we observe that for small parent populations (i.e. \(\mu = \sqrt{n}\)), model \(0.8104\;n^{3/2}\) fits the empirical data best, while the quadratic model gives the worst result. For larger parent population (i.e. \(\mu = \sqrt{n}\log n\)), the model \(1.0767~n^{3/2}\) fits best the empirical data among the three models. Since \(0.8104~n^{3/2} \in \mathcal {O}(n^2)\), these findings are consistent with the theoretical expected optimisation time and may further suggest that the quadratic bound in case of small population is not tight.
5.2 LeadingOnes
Correlation coefficient \(\rho \) for the bestfit models in the experiments with LeadingOnes shown in Fig. 2
Setting  Model  \({\rho }\) 

\(\mu = \sqrt{n}\)  \(646.14 \;n\log n\)  0.9756 
\(91.160 \;n^{3/2}\)  0.9928  
\(1.5223 \;n^{2}\)  0.9999  
\(0.1851 \;n^{2}\log n\)  0.9999 
Figure 2 and Table 3 show that both the model \(1.5223~n^2\) and the model \(0.1851~n^2\log n\), having the same correlation coefficient, fit well with the empirical data (i.e. the empirical data lie between these two curves). This finding is consistent with the theoretical runtime bound \(\mathcal {O}(n^2\log n)\). Note also that these two models differ asymptotically by \(\varTheta (\log n)\), suggesting that our analysis of the UMDA on LeadingOnes is nearly tight.
5.3 BinVal
Correlation coefficient \(\rho \) for the bestfit models in the experiments with BinVal shown in Fig. 3a, b
Setting  Model  \({\rho }\) 

\(\mu = \sqrt{n}\)  \(10.489 \;n\log n\)  0.9952 
\(1.4605 \;n^{3/2}\)  0.9999  
\(0.0240 \;n^{2}\)  0.9933  
\(\mu = \sqrt{n}\log n\)  \(11.973 \;n\log n\)  0.9972 
\(1.6596 \;n^{3/2}\)  0.9994  
\(0.0272 \;n^{2}\)  0.9903 
Theorem 7 gives the upper bound of \(\mathcal {O}(n^2\log n)\) for the expected runtime of BinVal. However, Fig. 3 and Table 4 show clearly that the model \(1.4605 \;n^{3/2}\) fits best the empirical runtime for \(\mu =\sqrt{n}\). On the other hand, the empirical runtime lies between the two models \(11.973 \;n\log n\) and \(1.6586 \;n^{3/2}\) when \(\mu =\sqrt{n}\log n\). While these observations are consistent with the theoretical upper bound since \(\mathcal {O}(n^{3/2})\) and \(\mathcal {O}(n\log n)\) are all members of \(\mathcal {O}(n^2\log n)\), they also suggest that our analysis of the UMDA on BinVal given by Theorem 7 may be loose.
6 Conclusion
Despite the popularity of EDAs in realworld applications, little has been known about their optimisation time, even for apparently simple settings such as the UMDA on toy functions. More results for the UMDA on these simple problems with wellunderstood structures provide a way to describe and compare the performance of the algorithm with other search heuristics. Furthermore, results about the UMDA are not only relevant to evolutionary computation, but also to population genetics where it corresponds to the notion of linkage equilibrium [36, 41].
We have analysed the expected optimisation time of the UMDA on three benchmark problems: OneMax, LeadingOnes and BinVal. For both LeadingOnes and BinVal, we proved the upper bound \(\mathcal {O}(n\lambda \log \lambda +n^2)\), which holds for \(\lambda =\varOmega (\log n)\). For OneMax, two upper bounds of \(\mathcal {O}(\lambda n)\) and \(\mathcal {O}(\lambda \sqrt{n})\) were obtained for \(\mu = \varOmega (\log n)\cap \mathcal {O}(\sqrt{n})\) and \(\mu = \varOmega (\sqrt{n}\log n)\), respectively. Although our result assumes that \(\lambda \ge (1+\beta )\mu \) for some positive constant \(\beta >0\), it no longer requires that \(\lambda =\varTheta (\mu )\) as in [44]. Note that if \(\lambda =\varTheta (\log n)\), a tight bound \(\varTheta (n\log n)\) on the expected optimisation time of the UMDA on OneMax is obtained, matching the wellknown tight bound \(\varTheta (n\log n)\) for the (\(1+1\)) EA on the class of linear functions. Although we did not obtain a runtime bound when the parent population size is \(\mu = \varOmega (\sqrt{n})\cap \mathcal {O}(\sqrt{n}\log n)\), our results finally close the existing \(\varTheta (\log \log n)\)gap between the first upper bound \(\mathcal {O}(n\log n \log \log n)\) for \(\lambda =\varOmega (\mu )\) [10] and the relatively new lower bound \(\varOmega (\mu \sqrt{n}+n\log n)\) for \(\lambda = (1+\varTheta (1))\mu \) [26].
Our analysis further demonstrates that the levelbased theorem can yield, relatively easily, asymptotically tight upper bounds for nontrivial, populationbased algorithms. An important additional component of the analysis was the use of anticoncentration properties of the Poisson–Binomial distribution. Unless the variance of the sampled individuals is not too small, the distribution of the population cannot be too concentrated anywhere, even around the mean, yielding sufficient diversity to discover better solutions. We expect that similar arguments will lead to new results in runtime analysis of evolutionary algorithms.
Footnotes
 1.
This and some other lemmas are stated in “Appendix”.
Notes
Acknowledgements
Funding was provided by Seventh Framework Programme (Grant No. 618091).
References
 1.Armañanzas, R., Inza, I., Santana, R., Saeys, Y., Flores, J.L., Lozano, J.A., Van de Peer, Y., Blanco, R., Robles, V., Bielza, C., Larrañaga, P.: A review of estimation of distribution algorithms in bioinformatics. BioData Min 1(1), 6 (2008)CrossRefGoogle Scholar
 2.Asoh, H., Mühlenbein, H.: On the mean convergence time of evolutionary algorithms without selection and mutation. In: Proceedings of the 3rd International Conference on Parallel Problem Solving from Nature, PPSN III, pp. 88–97 (1994)Google Scholar
 3.Baillon, J.B., Cominetti, R., Vaisman, J.: A sharp uniform bound for the distribution of sums of Bernoulli trials. Comb. Probab. Comput. 25(3), 352–361 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 4.Baluja, S.: Populationbased incremental learning: a method for integrating genetic search based function optimization and competitive learning. Technical Report, Carnegie Mellon University (1994)Google Scholar
 5.Chen, T., Lehre, P.K., Tang, K., Yao, X.: When is an estimation of distribution algorithm better than an evolutionary algorithm? In: Proceedings of 2009 IEEE Congress on Evolutionary Computation, pp. 1470–1477 (2009)Google Scholar
 6.Chen, T., Tang, K., Chen, G., Yao, X.: On the analysis of average time complexity of estimation of distribution algorithms. In: Proceedings of 2007 IEEE Congress on Evolutionary Computation, pp. 453–460 (2007)Google Scholar
 7.Chen, T., Tang, K., Chen, G., Yao, X.: Rigorous time complexity analysis of univariate marginal distribution algorithm with margins. In: Proceedings of 2009 IEEE Congress on Evolutionary Computation, pp. 2157–2164 (2009)Google Scholar
 8.Chen, T., Tang, K., Chen, G., Yao, X.: Analysis of computational time of simple estimation of distribution algorithms. IEEE Trans. Evol. Comput. 14(1), 1–22 (2010)CrossRefGoogle Scholar
 9.Corus, D., Dang, D.C., Eremeev, A.V., Lehre P.K.: Levelbased analysis of genetic algorithms and other search processes. IEEE Trans. Evol. Comput. https://doi.org/10.1109/TEVC.2017.2753538 (2017)
 10.Dang D.C., Lehre P.K.: Simplified runtime analysis of estimation of distribution algorithms. In: Proceedings of Genetic and Evolutionary Computation, GECCO’15, pp. 513–518 (2015)Google Scholar
 11.Dang, D.C., Lehre, P.K.: Selfadaptation of mutation rates in nonelitist populations. In: Proceedings of the 14th International Conference on Parallel Problem Solving from Nature, PPSN XIV, pp. 803–813 (2016)Google Scholar
 12.Doerr, B.: Probabilistic tools for the analysis of randomized optimization heuristics. CoRR. arXiv:1801.06733 (2018)
 13.Doerr, B., Krejca, M.S.: Significancebased estimationofdistribution algorithms. In: Proceedings of Genetic and Evolutionary Computation Conference, GECCO’18, pp. 1483–1490 (2018)Google Scholar
 14.Droste, S.: A rigorous analysis of the compact genetic algorithm for linear functions. Natl. Comput. 5(3), 257–283 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 15.Ducheyne, E.I., De Baets, B., De Wulf, R.: Probabilistic Models for Linkage Learning in Forest Management, pp. 177–194. Springer, Berlin (2005)Google Scholar
 16.Feige, U.: On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM J. Comput. 35(4), 964–984 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 17.Friedrich, T., Kötzing, T., Krejca, M., Sutton, A.M.: The compact genetic algorithm is efficient under extreme Gaussian noise. IEEE Trans. Evol. Comput. 21(3), 477–490 (2017)Google Scholar
 18.Friedrich, T., Kötzing, T., Krejca, M.S.: EDAs cannot be balanced and stable. In: Proceedings of Genetic and Evolutionary Computation Conference, GECCO’16, pp. 1139–1146 (2016)Google Scholar
 19.Gleser, L.J.: On the distribution of the number of successes in independent trials. Ann. Probab. 3(1), 182–188 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
 20.Gu, W., Wu, Y., Zhang, G.Y.: A hybrid univariate marginal distribution algorithm for dynamic economic dispatch of units considering valvepoint effects and ramp rates. Int. Trans. Electr. Energy Syst. 25(2), 374–392 (2015)CrossRefGoogle Scholar
 21.Harik, G.R., Lobo, F.G., Goldberg, D.E.: The compact genetic algorithm. IEEE Trans. Evol. Comput. 3(4), 287–297 (1999)CrossRefGoogle Scholar
 22.Hauschild, M., Pelikan, M.: An introduction and survey of estimation of distribution algorithms. Swarm Evol. Comput. 1(3), 111–128 (2011)CrossRefGoogle Scholar
 23.Jogdeo, K., Samuels, S.M.: Monotone convergence of binomial probabilities and a generalization of ramanujan‘s equation. Ann. Math. Stat. 39(4), 1191–1195 (1968)CrossRefzbMATHGoogle Scholar
 24.Kollat, J.B., Reed, P.M., Kasprzyk, J.R.: A new epsilondominance hierarchical Bayesian optimization algorithm for large multiobjective monitoring network design problems. Adv. Water Resour. 31(5), 828–845 (2008)CrossRefGoogle Scholar
 25.Krejca, M.S., Witt, C.: Theory of estimationofdistribution algorithms. CoRR. arXiv:1806.05392 (2018)
 26.Krejca, M.S., Witt, C.: Lower bounds on the run time of the univariate marginal distribution algorithm on OneMax. In: Proceedings of Foundations of Genetic Algorithms XIV, FOGA’17, pp. 65–79 (2017)Google Scholar
 27.Lehre, P.K., Nguyen, P.T.H.: Improved runtime bounds for the univariate marginal distribution algorithm via anticoncentration. In: Proceedings of Genetic and Evolutionary Computation Conference, GECCO’17, pp. 1383–1390 (2017)Google Scholar
 28.Lehre, P.K., Nguyen, P.T.H.: Levelbased analysis of the populationbased incremental learning algorithm. In: Proceedings of the 15th International Conference on Parallel Problem Solving from Nature, PPSN XV, pp. 105–116 (2018)Google Scholar
 29.Lehre, P.K., Witt, C.: Blackbox search by unbiased variation. In: Proceedings of Genetic and Evolutionary Computation Conference, GECCO’10, pp. 1441–1448 (2010)Google Scholar
 30.Lehre, P.K., Yao, X.: Runtime analysis of the (1+1) EA on computing unique input output sequences. Inf. Sci. 259, 510–531 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 31.Leiserson, C.E., Stein, C., Rivest, R., Cormen, T.H.: Introduction to Algorithms. MIT Press, Cambridge (2009)zbMATHGoogle Scholar
 32.Lengler, J., Sudholt, D., Witt, C.: Medium step sizes are harmful for the compact genetic algorithm. In: Proceedings of Genetic and Evolutionary Computation Conference, GECCO’18, pp. 1499–1506 (2018)Google Scholar
 33.Marshall, A.W., Olkin, I., Arnold, B.C.: Inequalities: Theory of Majorization and its Applications. Springer, New York (2011)CrossRefzbMATHGoogle Scholar
 34.Massart, P.: The tight constant in the Dvoretzky–Kiefer–Wolfowitz inequality. Ann. Probab. 18(3), 1269–1283 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
 35.Mitrinovic, D.S.: Analytic Inequalities. Springer, Berlin (1970)CrossRefzbMATHGoogle Scholar
 36.Mühlenbein, H., Mahnig, T.: Evolutionary computation and wright‘s equation. Theor. Comput. Sci. 287, 145–165 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
 37.Mühlenbein, H., Paaß, G.: From recombination of genes to the estimation of distributions I. Binary parameters. In: Proceedings of the 9th International Conference on Parallel Problem Solving from Nature, PPSN IV, pp. 178–187 (1996)Google Scholar
 38.Rubinstein, R.Y., Kroese, D.P.: The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte–Carlo Simulation (Information Science and Statistics). Springer, New York (2004)CrossRefzbMATHGoogle Scholar
 39.Santana, R., Mendiburu, A., Lozano, J.A.: A review of message passing algorithms in estimation of distribution algorithms. Natl. Comput. 15(1), 165–180 (2016)MathSciNetCrossRefGoogle Scholar
 40.Shapiro, J.L.: Drift and scaling in estimation of distribution algorithms. Evol. Comput. 13(1), 99–123 (2005)MathSciNetCrossRefGoogle Scholar
 41.Slatkin, M.: Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 9(6), 477–485 (2008)CrossRefGoogle Scholar
 42.Sudholt, D., Witt, C.: Update strength in EDAs and ACO: how to avoid genetic drift. In: Proceedings of Genetic and Evolutionary Computation Conference, GECCO’16, pp. 61–68 (2016)Google Scholar
 43.van der Waerden, B.L.: Algebra, vol. 1. Springer, New York (1991)CrossRefGoogle Scholar
 44.Witt, C.: Upper bounds on the runtime of the univariate marginal distribution algorithm on OneMax. In: Proceedings of Genetic and Evolutionary Computation Conference, GECCO’17, pp. 1415–1422 (2017)Google Scholar
 45.Witt, C.: Domino convergence: why one should hillclimb on linear functions. In: Proceedings of Genetic and Evolutionary Computation Conference, GECCO’18, pp. 1539–1546 (2018)Google Scholar
 46.Wu, Z., Kolonko, M., Möhring, R.H.: Stochastic runtime analysis of the crossentropy algorithm. IEEE Trans. Evol. Comput. 21(4), 616–628 (2017)CrossRefGoogle Scholar
 47.Yu, T.L., Santarelli, S., Goldberg, D.E.: Military Antenna Design Using a Simple Genetic Algorithm and hBOA, pp. 275–289. Springer, Berlin (2006)zbMATHGoogle Scholar
 48.Zinchenko, L., Mühlenbein, H., Kureichik, V., Mahnig, T.: Application of the univariate marginal distribution algorithm to analog circuit design. In: Proceedings of 2002 NASA/DoD Conference on Evolvable Hardware, pp. 93–101 (2002)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.