The bilevel programming problem (abbreviation: BPP) is a mathematical program in two variables x and θ, where x = x°(θ) is an optimal solution of another program. Specifically, BPP can be formulated in terms of two ordered objective functions φ and Ψ as follows:

(1)

where x = x°(θ) is an optimal solution of the program

(2)

Here the functions φ, Ψ, f i, g j : R n × R m → R, i ∈ P, j ∈ Q, are assumed to be continuous; x ∈ R n, θ ∈ R m; P, Q are finite index sets. Program (1) is often called the upper ( first level , outer , leader’s ) problem; then (2) is the lower ( second level , inner , follower’s ) problem. Many mathematical programs, such as minimax problems, linear integer, bilinear and quadratic programs, can be stated as special cases of bilevel programs. In view of the so-called Reduction Ansatz, developed in [18], [44], semi-infinite programs can be considered as special cases of bilevel programs. For stability and deformations of these see, e.g., [20], [21]. Problems appearing in such seemingly unrelated areas as best approximation problems and data envelopment analysis can be viewed as bilevel programs. In the former, one is often interested in finding a least-norm solution in the set of all best approximate solutions, while, in the latter, one wants to rank, or decrease the number of, efficient decision making units by a ‘post-optimality analysis’. For history of bilevel programs , reviews of numerical methods and applications, especially for connections with von Stackelberg games of market economy see, e.g., [14], [22], [30], [39]. In this contribution we will focus only on optimality conditions and duality.

Basic Difficulties

The study of bilevel programming problems requires some familiarity with point-to-set topology; see, e.g., [1], [2], [6], [15]. Since the lower level optimal solution mapping x° : θx°(θ) is a point-to-set mapping (rather than a vector function), the optimal value function of the BPP may be discontinuous. This is illustrated by the following example:

EXAMPLE 1

Consider the bilevel program with the upper level objective φ (x, θ) = −x 1/θ, the lower level objective Ψ (x, θ) = − x 1 − x 2, and the lower level feasible set determined by x 1 + θ x 2 ≤ 1, x 1 ≥ 0, x 2 ≥ 0. The lower level optimal solutions x = x°(θ) are the segment {x 1 + x 2 = 1, x 1 ≥ 0, x 2 ≥ 0}, for θ = 1, and the singleton [0, 1/θ], when 0 < θ < 1. The corresponding upper level optimal solutions, i.e., the BPP optimal solutions, are the points [1, 0] and [0, 1/θ], respectively. Here the corresponding optimal value of the BPP jumps from − 1 to 0, as θ assumes the value 1.

Note that the lower level feasible set mapping, in Example 1, is lower semicontinuous (open) at θ = 1. Hence we conclude that discontinuity of the optimal value can occur even if the lower level model is stable.

The fact that the set of optimal solutions is generally discontinuous in a stable situation is well known in linear programming. It may manifest itself in a chaotic behavior of the optimal solutions, but not the optimal value, when the program is solved by computer repeatedly with small perturbations of data; see Nondifferentiable optimization: Parametric programming. The topological loss of continuity is generally unrelated to the conditioning, which describes numerical sensitivity of the solutions relative to roundoff errors. In particular, a linear program with an ill-conditioned coefficient matrix can be stable.

Another difficulty results from the fact that the optimal solutions mapping x° : θ x°(θ) is not generally closed. Hence a BPP may not have an optimal solution even if the feasible set of the lower program is compact:

EXAMPLE 2

Consider the bilinear BPP:

where x = x°(θ) solves

Here the optimal solutions mapping is the function x°(θ) = 0, if θ > 0, and x°(0) = 1, if θ = 0. The feasible set of the lower level problem is a unit square in the (θ, x)-plane, while the feasible set of the BPP is a disjoint noncompact set consisting of the segment 0 < θ ≤ 1 and the point [0, 1]. Since the origin is not a feasible point, the BPP does not have a solution. Note that the function x°(θ) is not continuous here because the lower level feasible set mapping is not lower semicontinuous at the origin, i.e., the lower level problem is unstable.

Optimality

A popular approach to the study of optimality in BPP is to reduce the program to a one-level program. This can be done as follows: Denote the optimal value of the lower level program (2) by Ψ°(θ) and introduce the new constraint f°(x, θ) = Ψ(x, θ) − Ψ°(θ). Now the BPP can be reformulated as

(3)

Difficulties with this formulation generally include discontinuity of the leading constraint f° and the lack of classical constraint qualifications. The latter can be handled in convex case using the results on optimality conditions from, e.g., [5], [15], [47]. One of the first attempts to formulate optimality conditions for bilevel programming problems, using (3), was made in [2]. However a counterexample to these conditions was given in [4], [12], [17], also see [10]. The one-level approach leads, under assumptions that guarantee Lipschitz continuity of the optimal value function, to necessary conditions of the Fritz John type . Under a partial calmness condition , and a constraint qualification for the lower level problem, one obtains conditions of the Karush-Kuhn-Tucker type . The concept of partial calmness is equivalent to the ‘exact penalization’ and it is satisfied, in particular, for the minimax problem and if the lower level problem is linear. This approach in a nonsmooth framework is used in, e.g., [11] and [46]. The relationship between the BPP and an associated exact penalty function was explored also in [7] to derive other types of necessary and sufficient optimality conditions. Other approaches to optimality conditions, that use nonsmooth analysis, include [13], [19], [32]. Another approach to reducing the BPP to a single-level program is to replace the lower level problem by an optimality condition. This is usually done in formulations of numerical methods; see, e.g., [42]. There are also approaches that use the specific geometry of BPP. One of these applies properties of the steepest descent directions to BPP and it yields a necessary condition for optimality, see [33]. Adaptations of the well-known first and second order optimality conditions of mathematical programming to BPP appeared in [40]. Checking local optimality for linear BPP is NP-hard; see [41]. Examples of linear BPPs with an exponential number of local minima can be generated by a technique proposed in [9].

Many authors have studied links between two-objective and bilevel programming, looking for conditions that guarantee that the optimal solution of a given BPP be Pareto optimal for both upper and lower level objective functions, and vice versa; e.g., [28], [29], [30], [37]. The idea is to find an optimal solution of the BPP by solving a bi-objective program. It was shown in [43] that an optimal solution in linear BPP may not be a Pareto optimum for the objective function of the outer program and the optimal value function of the lower program, contrary to a claim made in [38]. The authors of [43] also give a sufficient condition for the implication to hold. If an optimal solution exists, in the linear BPP case with a compact feasible set at the lower level, then at least one optimal solution is assumed at a vertex of this set, see [3]. Necessary conditions for optimality can also be stated using marginal value formulas for optimal value functions. However, these formulas can not assume a usual constraint qualification in order to be applied to the formulation (3). One such formula in parametric convex programming is given in [48] and, under slightly different assumptions, in [49]. In the latter, it is used in the context of data envelopment analysis to rank efficiently administered university libraries by their radii of rigidity. Existence of optimal solutions is studied in [16], [23], [24]; constraints in [24] are defined by an implicit variational problem. Both, existence and stability of solutions and approximate solutions are studied in [27]. Optimality conditions are important for checking optimality, formulation of duality theories, and for numerical methods.

Parametric Approach To Optimality

A parametric approach to characterizing global and local optimal solutions in convex BPP can be described as follows: Denote, for every θ, the optimal value of (3) by

Also, denote the feasible set in the x variable by F(θ) = {x : f i(x, θ) ≤ 0, i ∈ R, and the feasible set in the θ variable by

A parametric formulation of the BPP is

(4)

Here we optimize the optimal value of the outer problem over the feasible set in the variable θ, considered as a ‘parameter’. The problem of the form (4) is a basic problem of parametric programming, e.g., Nondifferentiable optimization: Parametric programming. It has been extensively studied in the literature from both the theoretical and the numerical side. In particular, various optimality conditions have been formulated for it, e.g., in the context of input optimization; see [48]. The key observation in the parametric approach is that, under the assumption that the feasible set of the lower program is compact, every θ∗ that globally solves the parametric program (4), with the corresponding optimal solution x ∗ of the program (3), is a global optimal solution of the bilevel program, and vice versa. However, under the compactness assumption, both sets can be empty (as demonstrated by Example 2). A necessary and sufficient condition for global optimality in convex BPP can be given over a ‘region of cooperation’ in terms of the existence of a saddle point; see [15]: Given a candidate for global optimality θ∗ and the set of all optimal solutions at the lower level {x° (θ)}, θ ∈ F. Denote by K(θ∗) the region in the θ-space, where the minimal index set of active constraints R = (θ) = {i ∈ R : x ∈ {x°(θ)} ⇒ f i (x, θ) = 0} does not strictly increase, i.e., K(θ∗) = {θ ∈ F : R = (θ) ⊂ R =(θ∗)}. Then the region of cooperation at θ∗ is defined as the set {(θ, x)} : θ ∈ K(θ∗), x ∈ F (θ)}. One can characterize global optimality on the entire feasible set for linear BPP, and also for convex BPP provided that the constraints are ‘LFS functions’ , e.g. [35], [48]. These functions form a large class of convex functions that includes all linear and polyhedral functions. Characterizations of global optimality are simplified under the so-called sandwich condition . This is a two-sided global inclusion involving the set of optimal solutions of the inner program, e.g., [15]. Characterizations of locally optimal parameters θ∗ for convex (4) require lower semicontinuity of the optimal solutions mapping x°. The results apply to the convex BPP with the additional assumption that the corresponding optimal solution x ∗ ∈ {x° (θ∗} is unique; see, e.g., [15]. The uniqueness assumption in the characterization of local optimality cannot be replaced by the requirement that the set {x°(θ∗)} be compact. The following example illustrates a situation where a local optimum of the BPP can not be recovered by the parametric approach.

EXAMPLE 3

Consider the program min φ (x, θ) = x θ2, where x solves min Ψ (x, θ) = 0, subject to − 1 ≤ x, θ ≤ 1. Here x ∗ = 1, θ∗ = 0 is a local minimum of the bilevel program. But φ° (θ) = − θ2 and θ∗ = 0 is not its local minimum; in fact, it is an isolated global maximum!

Duality

Duality theories for bilevel programming problems can be formulated by adjusting the duality theories of mathematical programming (see, e.g., [34]) to the single-objective model (3). Let us outline how this works using a parametric approach; we follow the ideas from [15]. Instead of a single ‘dual’ one obtains a collection of several’ subduals’, each closely related to the original (primal) program. The number of these subduals is cardinality of the set

First, with each Ω ⊂ Π, one associates the feasible subregion SΩ = {θ ∈ F : R=(θ) = Ω}, the Lagrangian LΩ (x, θ; u) = φ (x, θ) + ∑i ∈ R \ Ωu i fi (x, θ), and the point-to-set mapping FΩ : F → Rn defined by FΩ (θ) = {x : fi (x, θ) ≤ 0, i ∈ Ω}. The corresponding subdual function is

and the subdual (D, Ω) is defined as

(5)

Here u belongs to the set of all nonnegative vector functions defined on S Ω. The duality results, stated for partly convex programs in, e.g., [47] can be reformulated for the outer convex model and hence BPP. In particular, if, for some Ω ⊂ Π, u ∗ ∈ [S Ω → R + card R \ Ω], and an optimal solution x ∗ of the inner program for some fixed θ∗ ∈ S Ω, one has ΦΩ(u ∗) = φ (x ∗, θ∗), then u ∗ solves the subdual (5) and θ∗ solves (4) on S Ω.

If optimization of the optimal value function in (4) is performed from some fixed ‘initial’ θ, but using only parameter paths that preserve continuity of the optimal solutions mapping of the lower problem, then we talk about stable BPP . This approach, in the convex case, guarantees that the optimal solutions mapping in BPP is closed and that the optimal value function is continuous, thus removing the two basic difficulties mentioned in Section 1. However, the optimal solutions now depend on the initial choice of the parameter and on a particular class of stable paths used. Stable parametric programming has been studied in [48], stable BPP is mentioned (but not studied) in [15]; see [36].

See also: Bilevel programming: Introduction, history and overview; Bilevel linear programming; Bilevel linear programming: Complexity, equivalence to minmax, concave programs; Bilevel programming; Bilevel programming: Algorithms; Bilevel programming: Global optimization; Bilevel programming in management; Bilevel programming: Applications in engineering; Bilevel optimization: Feasibility test and flexibility index; Bilevel programming: Implicit function approach; Bilevel programming: Applications; Stochastic bilevel programs; Bilevel fractional programming; Multilevel optimization in mechanics; Multilevel methods for optimal design.