1 Introduction

In the business-to-consumer market, many decisions made by companies influence customers’ choices and thus the resulting demand or market share. For instance, in assortment optimization, the decision about the assortment of products and their prices might directly affect the demand for a single product due to its dependence on the other products on offer and their prices. Likewise, in location planning, the decision about the position of a facility might influence the demand occurring in other facilities and vice versa (e.g., park and ride facilities). Therefore, if such decisions are supported by methods of Operations Research, the underlying mathematical formulations need to endogenously consider this choice behavior.

In the academic literature, one of the most widely spread models to incorporate choice behavior is the attraction choice model drawing on Bradley and Terry [4] and Luce [14]. In its basic form, the attraction choice model (ACM) explains demand indirectly by reference to the market share and states that the market share of a choice alternative is the ratio of the alternative’s attraction to the overall attraction of all available alternatives (including the alternative to choose nothing). Thus, if an alternative is not available, the market share of this alternative is recaptured by the available alternatives (including the no-choice alternative) in proportion to their attractions. Note that the ACM can account for different observable customer segments by incorporating segment-specific characteristics, such as sociodemographic variables, into the attractions’ specifications. However, if unobserved segments, i.e., latent classes of customers, shall be captured, it is common to model each of these segments by its separate ACM weighted by the segment’s share of the population, so that one ends up with an overall more complex model. Please note, as a special case of the ACM, the multinomial logit model [16] is one of today’s most prominent choice models to represent probabilistic demand in econometrics and marketing. In case of multiple, non-observable customer segments, the corresponding overall model is known as finite-mixture logit or latent-class model. Discussions of customer segments are provided in detail in Train [24] and Müller and Haase [19].

Over the last decade, contributions with regard to business optimization problems that integrate such choice behavior following the ACM have tremendously increased. However, due to the ACM’s properties, its straightforward consideration in mathematical optimization leads to nonlinear formulations. Therefore, in order to be able to apply standard software of (mixed-integer) linear programming (MILP), quite a number of coexisting publications from different research communities and fields are dedicated to the exact linearization of the resulting nonlinear formulation and sometimes claim this linearization as one of their key contributions. Examples include publications from the field of revenue management and assortment optimization in the operations community [7, 18], from product line selection—which originates more from the marketing community and is indeed technically very similar to assortment optimization [2021], as well as from location planning [1, 9].

In this paper, we contribute to the literature by providing a unifying analysis of linear reformulations proposed in major publications of different research fields and by clarifying their relationship to each other. Based on a generic problem formulation that covers the majority of the investigated problems of the different fields of research (Sect. 2), first, we describe two linearization ideas to which the proposed approaches can be traced back and present the appropriate mathematical formulations in the generic context. Second, we show that the resulting formulations can straightforwardly be transformed into each other, thereby also confirming that they indeed model the same problem (Sect. 3). Third, based upon the generic formulations, we are able to systematically discuss the specific linear formulations proposed in major works of the academic literature. In particular, for each formulation, we explain to which extent it is a special case of (one of) the presented generic formulations. This also makes clear under which context-specific conditions certain elements of the generic linearization can be omitted, potentially serving as helpful guideline for future applications of such linearizations (Sect. 4). Finally, some concluding remarks are given (Sect. 5).

2 Generic Problem Definition

Let \(J=\left\{1,\dots , m\right\}\) be a set of different alternatives that can be made available to customers. Further, let \(N\) be the set of customer segments. Then, following the ACM, the choice probability of customer segment \(n\in N\) for alternative \(j\in {S}_{0}=S\cup \left\{0\right\}\) when subset \(S\subseteq J\) is made available—with \(j=0\) representing the no-choice alternative (always available)—is given by

$${P}_{nj}\left(S\right)=\frac{{A}_{nj}}{{A}_{n0}+{\sum }_{i\in S}{A}_{ni}} \text{,}$$
(1)

with \({A}_{nj}\ge 0\) \(({A}_{n0}>0)\) being a segment-specific measure of attraction preassigned to alternative \(j\in S\). In the special case that demand follows the multinomial logit model, in line with random utility theory, \({A}_{nj}={e}^{{v}_{nj}}\), with \({v}_{nj}\) being the deterministic part of the utility of customer segment \(n\in N\) for alternative \(j\). Note that the no-choice alternative may also include other alternatives available to customers but not being within the decision-making scope.

Since the choice probability of each alternative \(j\in {S}_{0}\) is equal to its attraction \({A}_{nj}\) relative to the attraction of all available alternatives, for each customer segment \(n\in N\), the choice probabilities sum up to one:

$${\sum }_{j\in S}{P}_{nj}\left(S\right)+{P}_{n0}\left(S\right)=1\text{.}$$
(2)

The problem is now to decide about the offer set \(S\) of available alternatives subject to a predefined problem specific objective (e.g., profit maximization) under the assumption that customer segments are not necessarily observable. For this purpose, we define the binary decision variables \({x}_{j}\in \{\mathrm{0,1}\}\) with \(j\in J\) that equal 1 if alternative \(j\) should be made available and zero otherwise. The corresponding offer set is \(S\left({\varvec{x}}\right):=\left\{ \left.j\in J\right|{x}_{j}=1\right\}\text{.}\) A generic formulation of the resulting objective function—incorporating the demand of all customer segments—is given by

$$\underset{{\varvec{x}}}{\mathrm{Max}}{\sum }_{n\in N}{\omega }_{n}{\sum }_{j\in J}{\theta }_{nj}{P}_{nj}\left(S({\varvec{x}})\right)=\underset{{\varvec{x}}}{\mathrm{Max}}{\sum }_{n\in N}{\omega }_{n}\frac{{\sum }_{j\in J}{\theta }_{nj}{A}_{nj}{x}_{j}}{{A}_{n0}+{\sum }_{i\in J}{A}_{ni}{x}_{i}}\text{,}$$
(3)

where \({\omega }_{n}\) is the segment’s share of the population and \({\theta }_{nj}\) is a context-specific constant associated with each alternative \(j\in J\) and each segment \(n\in N\).

The objective in (Eq. 3) aims at maximizing the sum of weighted \({\theta }_{nj}\) by deciding about the available alternatives. Depending on the context, this might, for instance, be the expected overall profit or market share. The resulting problem in (3) is a binary and nonlinear, fractional program containing a sum of ratios. Importantly, note that if \(\left|N\right|=1\), i.e., if only one segment exists, the problem becomes much easier to handle (also see Sect. 4).

3 Generic Linearization Approaches

3.1 Method-Based Linearization

The first linearization idea consists of applying global formal methods developed to linearize nonlinear terms in fractional formulations (referred to as ML—“method-based linearization”). For example, the linearizations presented by Schön [20, 21] as well as by Miranda-Bront et al. [18] can be seen to be in line with this idea.

Applying such techniques [12, 25], the linearization can be accomplished in two steps: Regarding the generic formulation (3), in the first step, we substitute \(\frac{1}{{A}_{n0}+{\sum }_{i\in J}{A}_{ni}{x}_{i}}\) by non-negative decision variables \({y}_{n}\ \forall n\in N\). This substitution draws on the idea of Charnes and Cooper [5] who first proposed it in a similar way for continuous fractional functions and one segment. The variable \({y}_{n}\) is from the interval \(\left[\frac{1}{{A}_{n0}+{\sum }_{i\in J}{A}_{ni}}; \frac{1}{{A}_{n0}}\right]\). The lower bound of \({y}_{n}\) is reached when all alternatives are available, i.e., \({x}_{j}=1 \ \forall j\in J\). The upper bound is reached when none of the alternatives is offered, i.e., \({x}_{j}=0\ \forall j\in J\). The resulting nonlinear program is given by

$$\underset{{\varvec{x}},{\varvec{y}}}{\mathrm{max}}{\sum }_{n\in N}{\omega }_{n}{\sum }_{j\in J}{\theta }_{nj}{A}_{nj}{x}_{j}{y}_{n}$$
(4)

subject to

$$A_{n0} y_{n}+\sum\nolimits_{j \in J} {A_{nj} x_{j} y_{n} = 1\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\forall n \in N}$$
(5)

with \({x}_{j}\in \left\{0, 1\right\}\ \forall j\in J\) and \({y}_{n}\ge 0\ \forall n\in N\). Constraints (5) ensure the correct substitution by \({y}_{n}\) as described above. Note that this substitution is generally valid for the ACM since \({A}_{nj}\ge 0\ \forall j\in J,\ \forall n\in N\) and \({A}_{n0}>0\ \forall n\in N\), and thus, the variables \({y}_{n}\) are always positive.

In the second step, we eliminate the resulting bilinear term \({x}_{j}{y}_{n}\) [25]. For this purpose, we define new decision variables \({z}_{nj}:={x}_{j}{y}_{n}\ \forall j\in J,\ \forall n\in N\). To guarantee \({z}_{nj}={x}_{j}{y}_{n}\) in dependence of the value of the variables \({x}_{j}\), the logical conditions (I) \({x}_{j}=0 \Rightarrow {z}_{nj}=0\) and (II) \({x}_{j}=1\)\(\Rightarrow {z}_{nj}={y}_{n}\) must be imposed by a number of linear constraints. The resulting linear program—equivalent to problem (3)—is given by

$$\underset{{\varvec{z}}}{\mathrm{max}}{\sum }_{n\in N}{\omega }_{n}{\sum }_{j\in J}{\theta }_{nj}{A}_{nj}{z}_{nj}$$
(6)

subject to

$$A_{n0} y_{n}+\sum\nolimits_{j \in J} {A_{nj} z_{nj} = 1\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\forall n \in N}$$
(7)
$$z_{nj}\ge0\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\forall n \in N,\ \forall j \in J$$
(8)
$$z_{nj}\le y_{n} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\forall n \in N,\ \forall j \in J$$
(9)
$$z_{nj}\le K_{nj(10)} x_{j} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\forall n \in N,\ \forall j \in J$$
(10)
$$z_{nj}\ge y_{n}+K_{nj(11)} (x_{j}- 1)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\forall n \in N,\ \forall j \in J$$
(11)

with \({x}_{j}\in \left\{0, 1\right\}\ \forall j\in J\), \({y}_{n}\ge 0\ \forall n\in N\), and \({K}_{nj\left(10\right)}\) as well as \({K}_{nj\left(11\right)}\) \(\forall n\in N,\ \forall j\in J\) being sufficiently large numbers. Constraints (8) and (10) impose implication (I), whereas implication (II) is represented by constraints (9) and (11). For tight definitions of the parameters \({K}_{nj(10)}\) and \({K}_{nj(11)}\), see Appendix 1.

3.2 Property-Based Linearization

The second linearization idea is motivated from specific properties of the ACM (referred to as PL—“property-based linearization”). This approach is followed, for instance, by Davis et al. [7], Haase [9], and Aros-Vera et al. [1]. More precisely, the fundamental property of demand models whose structure follows (1) , as, for instance, the multinomial logit model, is the so-called independence of irrelevant alternatives (IIA) property. This property states that the ratio of two available alternatives’ choice probabilities is constant and thus independent of the availability of other and hence irrelevant alternatives. From definition (1) of the choice probabilities in the ACM, it follows that this constant ratio is equal to

$$\frac{{P_{nj} }}{{P_{ni} }} = \frac{{A_{nj} }}{{A_{n0} \,\, + \,\,\sum\nolimits_{k \in S} {A_{nk} } }}/\frac{{A_{ni} }}{{A_{n0} \,\, + \,\,\sum\nolimits_{k \in S} {A_{nk} } }} = \frac{{A_{nj} }}{{A_{ni} }}\ \forall n \in N,\ \forall j,i \in S_{0}$$
(12)

Note, demand models not following (1), as, for instance, the nested logit model or the probit model, do not suffer from the IIA property.

In the mathematical program, it is necessary to ensure the IIA property and hence the ratios in (12). Therefore, we further exploit the fact \(\frac{{P}_{nj}}{{P}_{ni}}=\frac{{P}_{nj}}{{P}_{n0}}/\frac{{P}_{ni}}{{P}_{n0}}\). This means that every ratio of two alternatives can be expressed by two ratios comprising the no-choice alternative. Since the no-choice alternative is always available, we can ensure the IIA property in the mathematical program by merely imposing

$$\frac{{P_{nj} }}{{P_{n0} }} = \frac{{A_{nj} }}{{A_{n0} }}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\forall n \in N,\ \forall j \in S$$
(13)

So instead of \({|N|\cdot\left|{S}_{0}\right|}^{2}\), only \(|N|\cdot|S|\) ratios have to be determined.

For the PL, we define non-negative decision variables \({p}_{nj}\ \forall n\in N,\ \forall j\in J\cup \left\{0\right\}\) which represent the choice probabilities of alternatives \(j\in J\cup \{0\}\) for customers belonging to segment \(n\in N\) in dependence of the offered alternatives. The model formulation building on the IIA property is given by

$$\underset{{\varvec{p}}}{\mathrm{max}}{\sum }_{n\in N}{\omega }_{n}{\sum }_{j\in J}{\theta }_{nj}{p}_{nj}$$
(14)

subject to

$$p_{n0}+\sum\nolimits_{j \in J} {p_{nj} } \,\, = 1\,\,\,\,\,\,\,\,\,\,\,\,\,\forall n\in N$$
(15)
$$p_{nj}\ge 0 \,\,\,\,\,\,\,\,\,\,\,\,\,\forall n\in N,\ \forall j \in J$$
(16)
$$p_{nj}\le\frac{{A_{nj} }}{{A_{n0} }}p_{n0} \,\,\,\,\,\,\,\,\,\,\,\,\,\forall n \in N,\ \forall j \in J$$
(17)
$$p_{nj}\le M_{nj(18)} x_{j} \,\,\,\,\,\,\,\,\,\,\,\,\,\forall n \in N,\ \forall j \in J$$
(18)
$$p_{nj}\ge \frac{{A_{nj} }}{{A_{n0} }}\,\left[{p_{n0}+M_{nj(19)} (x_{j}-1)} \right]\,\,\,\,\,\,\,\,\,\,\,\,\,\forall n \in N,\ \forall j \in J$$
(19)

with \({x}_{j}\in \left\{0, 1\right\}\ \forall j\in J\) and \({M}_{nj(18)}\) as well as \({M}_{nj(19)}\) \(\forall n\in N,\ \forall j\in J\cup \left\{0\right\}\) being sufficiently large numbers. First of all, constraints (15) reflect the ACM’s property stated in (2). For the IIA property as stated in (13) to hold, the two logical conditions \({x}_{j}=0 \Rightarrow {p}_{nj}=0\) and \({x}_{j}=1\)\(\Rightarrow {p}_{nj}=\frac{{A}_{nj}}{{A}_{n0}}{p}_{n0}\) must be ensured. While constraints (16) and (18) impose the first implication, the second implication (IV) is modeled by constraints (17) and (19) . For tight definitions of the parameters \({M}_{nj(18)}\) and\({M}_{nj(19)}\), see Appendix 1.

The general ML (6)-(11) and the general PL (14)-(19) presented in Sects. 3.1 and 3.2 are equivalent mixed-integer linear formulations of problem (3), as they can straightforwardly be transformed into each other by variable substitution. The proof is given in Appendix 2.

4 Classification of Specific Linearization Approaches

Based upon the two generic approaches presented in Sect. 3, we are now able to systematically discuss and compare major publications’ linearization approaches. We argue that most of the resulting programs can be traced back to either the presented ML or PL. Further, we show that problem-specific characteristics and the considered setting lead to special and simplified cases of ML or PL regarding the linearization part, i.e., the required constraints.

Table 1 presents the comparison. Column 1 states the research field to which the reference in column 2 is dedicated. Column 3 states if the work referenced in column 2 considers a market divided into different customer segments or not. The last block of columns classifies whether the referenced work’s proposed linearization is based on the methodological (Sect. 3.1) or property-driven approach (Sect. 3.2), and thus, if they can be directly traced back to either ML or PL. Further, it is shown which of the constraints of the general formulations are applied as a result of the specific setting considered.

Table 1 Systematic comparison of context-specific linearizations of (3) proposed in the literature

In Schön's [20], [21] optimization approaches for the product line selection problem, an ML is used with additional constraints reflecting pricing decisions. Schön [20] allows to consider each segment separately with regard to the linearization; each segment-specific objective function is quasi-convex and quasi-concave, and the model has a unimodular (price) constraint matrix. Thus, without explicitly claiming integrality, an optimal binary solution can be obtained [6]. Hence, Schön [20] can drop the integrality requirement on the decision variables \({x}_{j}\) and thus does not require any constraints like (10) and (11). Hence, the applied linearization, as stated by herself, resembles the classical Charnes-Cooper transformation for continuous variables [5]. In Schön [21], pricing is made continuous rather than based on a discrete set of prices as in Schön [20]. This would normally result in a non-concave objective function which is circumvented by defining the continuous probability as the central decision variable. Hence, constraints (10) and (11) are also not necessary. In Bechler et al. [2], the product line selection problem is extended by the empirically proven effect that customers tend to choose compromise alternatives. This results in a non-unimodular formulation such that integrality constraints cannot be dropped, and thus, their proposed formulation comprises the full set of ML’s constraints as given in (7)-(11).

In the context of revenue management and assortment optimization, Talluri and van Ryzin [23] study problem (3) for the multinomial logit model and only one customer segment. They confirm an earlier result from fractional programming [22], stating that in this particular case without any further constraints, an optimal assortment can easily be obtained by greedily adding products into the offer set in order of decreasing revenues, such that a model-based approach is not necessary at all. Miranda-Bront et al. [18] consider the case of multiple customer segments and propose an ML exactly as stated in (7)-(11). In their setting, each customer segment is characterized by one consideration set (i.e., the set of products this segment considers choosing from). They assume that the different consideration sets do not need to be disjoint but can overlap to some extent.

As one of their contributions, Davis et al. [7] present a PL for a setting with one customer segment and several alternative additional types of side constraints, such as price constraints. Similar to Schön [20], [21], these constraints’ coefficients form a unimodular constraint matrix which allows for neglecting constraints (18) and (19). Thus, even though developed independently, from a technical point of view, the linearization proposed by Davis et al. [7] resembles the classical Charnes-Cooper transformation. Méndez-Díaz et al. [17] present an ML which is a problem related extension of Miranda-Bront et al. [18].

In the area of location planning, the objective mostly is the optimization of the market share without the consideration of cost, but under consideration of different customer segments. In this context, customer segments are denoted as demand nodes. Benati and Hansen [3] propose an ML but, in contrast to already mentioned linear formulations, completely substitute the objective function (3). In this case, constraints (9) and (10) can be omitted, since the variables substituting the resulting bilinear terms are negatively considered in the objective function and thus are minimized. Hence, only the lower bounds represented by (16) and (19) need to be ensured. Haase [9] proposes a PL. However, in constraints (19), he explicitly formulates the IIA property drawing on (12) for every possible pair of alternatives. This automatically includes constraints (17) but results in many redundant constraints (for details see Sect. 3.2). Zhang et al. [26] propose an ML by substituting the single probabilities for the different alternatives (in contrast to Benati and Hansen [3] who substitute the sum of all probabilities). For the linearization of the resulting nonlinear terms, \({|N|\cdot\left|J\right|}^{2}\) instead of \(|N|\cdot|J|\) variables and \(|N|\cdot{\left|J\right|}^{2}\) of each of the constraints (7)-(11) are necessary. In line with Haase [9], Aros-Vera et al. [1] propose a PL considering all possible pairs of alternatives to formulate the IIA property. In contrast to Haase [9], constraints (16) are omitted since the objective of market share maximization automatically favors the largest values for the choice probabilities. Haase and Müller’s [10] reformulation of Haase [9] omits the redundant constraints and formulates the IIA property as given by (17). Due to the objective of market share maximization, constraints (19) are not necessary and constraints (15) can be formulated as inequality. Haase and Müller [10] consider \({M}_{nj\left(18\right)}\) as defined in Appendix 1, which represents the tightest upper bound for the choice probabilities in the PL in general. However, in the special case of facility location planning, a predefined and fixed number of \(r\) facilities are required to be open which is considered in the MILP formulation as additional constraint. Based on this, a stronger formulation of constraints (18) can be derived. The resulting tighter bound for \({M}_{nj\left(18\right)}\) is presented by Freire et al. [8] in the context of Haase and Müller’s [10] linear formulation (see Appendix 1 for its definition).

Note that in the context of location planning, other linearizations have recently been discussed by Ljubić and Moreno [13] and Mai and Lodi [15]. Ljubić and Moreno’s [13] approach relies on the outer-approximation of the continuous relaxation of the objective function and its submodularity property. Mai and Lodi’s [15] approach allows to create a set of piecewise linear functions that outer-approximate separated parts of the objective function. The corresponding models arise in the specific context of branch-and-cut or cutting-plane solution procedures the authors develop and therefore are omitted in Table 1.

5 Discussion

In this paper, we argue that major publications’ linearizations of attraction choice behavior in business optimization problems can be traced back to one of two different but equivalent MILP formulations, each relying on a specific linearization idea. By a systematic analysis, we revealed that differences of the publications’ linearizations to the presented ones result from problem-specific characteristics depending on the field of application. Thus, our analysis can serve as helpful guideline for future applications of such linearizations.

Note that, basically, both linearization schemes rely on the same number of (binary and nonnegative real-valued) variables and constraints. Further, given that their equivalence can be shown by variable substitution, there are no specific indications that one is generally more suitable than the other one. Besides the equivalence, it can be seen from the substitution that the defined bounds in Appendix 1 lead to the same tightness of constraints in both formulations. Hence, no solution time differences can be expected in general. However, with regard to the future development of context-specific linearization approaches on the basis of these generic models, it is important to keep considering both variants. In particular, one could be more intuitive than the other with regard to the required model adjustments, potentially leading to differences in efficiency of the resulting specific linearizations.

Further, we want to emphasize that the two presented MILP formulations are of special interest in the case of only one customer segment, since then, the formulations can be solved very efficiently and utilized for a broad range of applications [7]. In the case of several latent segments, even though the MILP formulations are NP-hard, standard MILP solver methods have been reported to work pretty fast in many cases, or at least, the formulations can serve as helpful starting points for the derivation of promising heuristic solution procedures [18]. Additionally, as discussed in this paper, problem specific circumstances can further simplify the linearization effort needed.