1 Introduction

Many real-world problems can be considered resource allocation problems. For example, consider the loading of a cargo plane with (potential) goods of different weights. Each item raises a certain profit for the airline if it is transported; however, not all goods can be loaded due to airplane weight restrictions. Clearly, the dispatcher seeks for a maximum profit packing fulfilling the capacity constraint. This example from [1] illustrates the knapsack problem: Given a set of n items, specified by a size and a profit value, and a resource (called knapsack) of fixed capacity, the goal is to find a subset of items (called packing) with maximum total profit and whose total size does not exceed the capacity. Besides being a fundamental and extensively studied problem in combinatorial optimization, knapsack problems arise in many and various practical settings. We refer the readers to textbooks [1, 2] and to the surveys of previous work in [3, 4] for further references.

The introductory example from cargo logistics can be generalized naturally to multiple airplanes of different capacities. Here, the size and the profit of an item may depend on the airplane and on the schedule, respectively. This leads to the generalized assignment problem (GAP) [2], where resources of different capacities are given, and the size and the profit of an item depend on the resource to which it is assigned. The GAP includes many prominent problems, such as the (multiple) knapsack problem [5], weighted bipartite matching [6], AdWords [7], and the display ads problem [8]. Further applications of GAP are outlined in the survey articles [9, 10].

We study online variants of the knapsack problem and GAP. Here, n items are presented sequentially, and the decision for each item must be made immediately upon arrival. This setting would arise in our logistics example if the dispatcher needs to answer customer requests immediately without knowledge of future requests. In fact, many real-world optimization problems occur as online problems, as often decisions must be made under uncertain conditions. The online knapsack problem has been studied in particular in the context of online auctions [11, 12].

Typically, the performance measure for online algorithms is the competitive ratio, which is defined as the ratio between the values of the algorithmic solution and an optimal offline solution for a worst-case input. The knapsack problem admits no randomized algorithm of bounded competitive ratio in the general online setting [12]. This holds even if only a single item can be packed, as known from the secretary problem [13, 14]. However, these hardness results are based on a worst-case input presented in adversarial order. In the random order model, the performance of an algorithm is evaluated for a worst-case input, but the adversary has no control over the input order; the input sequence is drawn uniformly at random among all permutations.

In order to define the competitive ratio of an algorithm \(\mathcal {A}\) in this model formally, let \(\mathcal {A}(\mathcal {I})\) and \({\text {OPT}}(\mathcal {I})\) denote the profits of the solutions of \(\mathcal {A}\) and an optimal offline algorithm, respectively, for input \(\mathcal {I}\). We say that \(\mathcal {A}\) is r-competitive (or has competitive ratio r) in the random order model if

$$\begin{aligned} {\mathbf{E}}\left[ {\mathcal {A}}({\mathcal {I}})\right] \ge (r - o(1)) \cdot {\text {OPT}}({\mathcal {I}}) \end{aligned}$$

holds for all inputs \(\mathcal {I}\). Here, the expectation is over the random permutation as well as over random choices of the algorithm. The o(1)-term is asymptotic with respect to the number n of items in the input.

The random order model became increasingly popular in the field of online algorithms. An early and well-known example is the secretary problem [13, 14]. Nowadays, the matroid secretary problem [15, 16] is considered as one of the most central problems in this field. Further multiple-choice generalizations [17, 18] are part of active research as well. The model has also been successfully applied to other problem classes including scheduling [19,20,21], packing [22,23,24,25,26], graph problems [27,28,29], facility location [30], budgeted allocation [31], and submodular welfare maximization [32].

1.1 Related Work

1.1.1 Online Knapsack Problem

The online knapsack problem was first studied by Marchetti-Spaccamela and Vercellis [33], who showed that no deterministic online algorithm for this problem can obtain a constant competitive ratio. Moreover, Chakrabarty et al. [12] demonstrated that this fact cannot be overcome by randomization.

Given such hardness results, several relaxations have been introduced and investigated. Most relevant to our work are results in the random order model. Introduced as the secretary knapsack problem [34], Babaioff et al. developed a randomized algorithm of competitive ratio \(1/(10e) < 1/27\). Kesselheim et al. [25] achieved a significant improvement by developing a (1/8.06)-competitive randomized algorithm for the generalized assignment problem. Finally, Vaze [35] showed that there exists a deterministic algorithm of competitive ratio \(1/(2e) < 1/5.44\), assuming that the maximum profit of a single item is small compared to the profit of the optimal solution.

Apart from the random order model, different further relaxations have been considered. Marchetti-Spaccamela and Vercellis [33] studied a stochastic model wherein item sizes and profits are drawn from a fixed distribution. Lueker [36] obtained improved bounds in this model. Chakrabarty et al. [12] studied the problem when the density (profit-size ratio) of each item is in a fixed range [LU]. Under the further assumption that item sizes are small compared to the knapsack capacity, Chakrabarty et al. proposed an algorithm of competitive ratio \(\ln (U/L) + 1\) and provided a lower bound of \(\ln (U/L)\). Another branch of research considers removable models, where the algorithm can remove previously packed items. Removing such items can incur no cost [37, 38] or a cancellation cost (buyback model, [39,40,41]). Recently, Vaze [42] considered the problem under a (weaker) expected capacity constraint. This variant admits a competitive ratio of 1/4e.

1.1.2 Online GAP

Since all hardness results for online knapsack also hold for online GAP, research focuses on stochastic variants or modified online settings. Currently, the only result for the random order model is the previously mentioned (1/8.06)-competitive randomized algorithm proposed by Kesselheim et al. [25]. To the best of our knowledge, the earliest paper considering online GAP is due to Feldman et al. [8]. They obtained an algorithm of competitive ratio tending to \(1-1/e\) in the free disposal model. In this model, the total size of items assigned to a resource might exceed its capacity; in addition, no item consumes more than a small fraction of any resource. A stochastic variant of online GAP was studied by Alaei et al. [43]. Here, the size of an item is drawn from an individual distribution that is revealed upon arrival of the item, together with its profit. However, the algorithm learns the actual item size only after the assignment. If no item consumes more than a (1/k)-fraction of any resource, the algorithm proposed by Alaei et al. has competitive ratio \(1 - 1/\sqrt{k}\).

1.1.3 Online packing LPs

Packing problems where requests can consume \(d \ge 1\) different resources lead to general online packing LPs. Note that the special case of \(d=1\) is the generalized assignment problem. Buchbinder and Naor [44] initiated the study of online packing LPs in the adversarial model. The random order model admits \((1-\varepsilon )\)-competitive algorithms assuming large capacity ratios, i.e., the capacity of any resource is large compared to the maximum demand for it. This has been shown in a sequence of papers [22, 23, 25, 26]. Recently, Kesselheim et al. [25] gave an algorithm of competitive ratio \(1 - O(\sqrt{(\log d) / B})\) where B is the capacity ratio. Consequently, their algorithm is \((1-\varepsilon )\)-competitive if \(B = \varOmega ((\log d) / \varepsilon ^2)\). For \(d=1\), this result matches the lower bound by Kleinberg [18].

1.2 Our Contributions

As outlined above, for online knapsack and GAP in the adversarial input model, nearly all previous works attain constant competitive ratios at the cost of either (a) imposing structural constraints on the input or (b) significantly relaxing the original online model. Therefore, we study both problems in the random order model, which is less pessimistic than the adversarial model but still considers worst-case instances without further constraints on the item properties. For the knapsack problem, our main result is the following.

Theorem 1

There exists a (1/6.65)-competitive randomized algorithm for the online knapsack problem in the random order model.

One challenge in the design of knapsack algorithms is that the optimal packing can have, on a high level, at least two different structures. Either there are a few large items, constituting the majority of the packing’s profit, or there are many small such items. Previous work [25, 34] is based on splitting the input according to item sizes and then employing algorithms tailored for these restricted instances. However, the algorithms from [25, 34] choose a single item type via an initial random choice, and then pack items of that type exclusively. In contrast, our approach considers different item types in distinct time intervals, rather than discarding items of a specific type in advance. More precisely, we develop algorithms \(\mathcal {A}_L\) and \(\mathcal {A}_S\) which are combined in a novel sequential approach: While large items appearing in early rounds are packed using \(\mathcal {A}_L\), algorithm \(\mathcal {A}_S\) is applied to pack small items revealed in later rounds. We think that this approach may be helpful for other problems in similar online settings as well.

The proposed algorithm \(\mathcal {A}_L\) deals with the knapsack problem where all items consume more than 1/3 of the capacity (we call this problem 2-KS). The 2-KS problem is closely related to the k-secretary problem [18] for \(k=2\). We also develop a general framework that allows to employ any algorithm for the 2-secretary problem to obtain an algorithm for 2-KS. As a side product, we obtain a simple (1/3.08)-competitive deterministic algorithm for 2-KS in the random order model. For items whose size is at most 1/3 of the resource capacity, we give a simple and efficient algorithm \(\mathcal {A}_S\). Here, a challenging constraint is that \(\mathcal {A}_L\) and \(\mathcal {A}_S\) share the same resource, so we need to argue carefully that the decisions of \(\mathcal {A}_S\) are feasible, given the packing of \(\mathcal {A}_L\) from previous rounds.

Finally, we show that the proposed sequential approach also improves the current best result for GAP [25] from competitive ratio 1/8.06 to 1/6.99.

Theorem 2

There exists a (1/6.99)-competitive randomized algorithm for the online generalized assignment problem in the random order model.

For this problem, we use the algorithmic building blocks \(\mathcal {A}_L\), \(\mathcal {A}_S\) developed in [25, 28]. However, we need to verify that \(\mathcal {A}_L\), an algorithm for edge-weighted bipartite matching [28], satisfies the desired properties for the sequential approach. We point out that the assignments of our algorithm differ structurally from the assignments of the algorithm proposed in [25]. In the assignments of the latter algorithm, all items are either large or small compared to the capacity of the assigned resource. In our approach, both situations can occur, because resources are managed independently.

1.2.1 Roadmap

We focus on the result on the knapsack problem (Theorem 1) in the first sections of this paper. For this purpose, we provide elementary definitions and facts in Sect. 2. Our main technical contribution is formally introduced in Sect. 3: Here, we describe an algorithmic framework performing two algorithms \(\mathcal {A}_L\), \(\mathcal {A}_S\) sequentially. In Sects. 4 and 5, we design and analyze the algorithms \(\mathcal {A}_L\) and \(\mathcal {A}_S\) for the knapsack problem. Finally, in Sect. 6 we describe how the sequential approach can be applied to GAP.

2 Preliminaries

Let \([n] := \{1,\ldots ,n\}\). Further, let \(\mathbb {Q}_{\ge 0}\) and \(\mathbb {Q}_{>0}\) denote the set of non-negative and positive rational numbers, respectively.

2.1 Knapsack Problem

We are given a set of items \(I = [n]\), each item \(i \in I\) has size \(s_i \in \mathbb {Q}_{> 0}\) and a profit (value) \(v_i \in \mathbb {Q}_{\ge 0}\). The goal is to find a maximum profit packing into a knapsack of size \(W \in \mathbb {Q}_{> 0}\), i.e., a subset \(M \subseteq I\) such that \(\sum _{i \in M} s_i \le W\) and \(\sum _{i \in M} v_i\) is maximized. W.l.o.g. we can assume \(s_i \le W\) for all \(i \in I\). In the online variant of the problem, a single item i is revealed together with its size and profit in each round \(\ell \in [n]\). The online algorithm must decide immediately and irrevocably whether to pack i. We call an item visible in round \(\ell\) if it arrived in round \(\ell\) or earlier.

We classify items as large or small, depending on their size compared to W and a parameter \(\delta \in (0,1)\) to be determined later.

Definition 1

We say an item i is \(\delta\)-large if \(s_i > \delta W\) and \(\delta\)-small if \(s_i \le \delta W\). Whenever \(\delta\) is clear from the context, we say an item is large or small for short. Based on the given item set I, we define two modified item sets \(I_L\) and \(I_S\), which are obtained as follows:

  • \(I_L\): Replace each small item by a large item of profit 0

  • \(I_S\): Replace each large item by a small item of profit 0.

Therefore, \(I_L\) only contains large items and \(I_S\) only contains small items. We can assume that no algorithm packs a zero-profit item, thus any algorithmic packing of \(I_L\) or \(I_S\) can be turned into a packing of I having the same profit. Let \({\text {OPT}}\), \({\text {OPT}}_L\), and \({\text {OPT}}_S\) be the total profits of optimal packings for I, \(I_L\), and \(I_S\), respectively. A useful upper bound for \({\text {OPT}}\) is

$$\begin{aligned} {\text {OPT}}\le {\text {OPT}}_L + {\text {OPT}}_S. \end{aligned}$$
(1)

2.2 Bounding Sums by Integrals

In order to obtain lower or upper bounds on sums in closed form, we often make use of the following facts.

Fact 1A Let f be a non-negative real-valued function and let \(a,b \in \mathbb {N}\). If f is monotonically decreasing, then \(\int _{a}^{b+1} f(i) ~\mathrm {d} {i}~ \le \sum _{i=a}^{b} f(i) \le \int _{a-1}^{b} f(i) ~\mathrm {d} {i}~\).

Fact 1B Let f be a non-negative real-valued function and let \(a,b \in \mathbb {N}\). If f is monotonically increasing, then \(\int _{a-1}^{b} f(i) ~\mathrm {d} {i}~ \le \sum _{i=a}^{b} f(i) \le \int _{a}^{b+1} f(i) ~\mathrm {d} {i}~\).

figure a

2.3 Sequential Approach

A common approach in the design of algorithms for secretary problems is to set two phases: a sampling phase, where all items are rejected, followed by a decision phase, where some items are accepted according to a decision rule. Typically, this rule is based on the information gathered in the sampling phase. We take this concept a step further: The key idea of our sequential approach is to use a part of the sampling phase of one algorithm as decision phase of another algorithm, which itself can have a sampling phase. This way, two algorithms are performed in a sequential way, which makes better use of the entire instance. We combine this idea with using different strategies for small and large items.

Formally, let \(\mathcal {A}_L\) and \(\mathcal {A}_S\) be two online knapsack algorithms and \(I_L\) and \(I_S\) be the item sets constructed according to Definition 1. Further, let \(0<c<d<1\) be two parameters to be specified later. Our proposed algorithm samples the first cn rounds; no item is packed during this time. From round \(cn+1\) to dn, the algorithm considers large items exclusively. In this interval it follows the decisions of \(\mathcal {A}_L\). After round dn, the algorithm processes only small items and follows the decisions of \(\mathcal {A}_S\). However, it might be the case that an item accepted by \(\mathcal {A}_S\) cannot be packed because the knapsack capacity is exhausted due to the packing of \(\mathcal {A}_L\) in earlier rounds. Note that all rounds \(1,\ldots ,dn\) can be considered as the sampling phase for \(\mathcal {A}_S\). A formal description is given in Algorithm 1. Here, for a given input sequence \(\pi\) of I, let \(\pi _L\) and \(\pi _S\) denote the corresponding sequences from \(I_L\) and \(I_S\), respectively. Note that \(\pi\) is revealed sequentially and \(\pi _L\), \(\pi _S\) can be constructed online. For any input sequence \(\pi\), let \(\pi (\ell )\) denote the item at position \(\ell \in [n]\).

In the final algorithm, we set the threshold for small items to \(\delta =1/3\) and use Algorithm 1 with parameters \(c=0.42291\) and \(d=0.64570\). The choice of c and d maximizes the minimum of \({\mathbf{E}}\left[ \mathcal {A}_L\right] / {\text {OPT}}_L\) and \({\mathbf{E}}\left[ \mathcal {A}_S\right] / {\text {OPT}}_S\). For simplicity, we assume \(cn,dn \in \mathbb {N}\). If n is large enough, this assumption does not affect the competitive ratio substantially. We next give a high-level description of the proof of Theorem 1.

figure b

Proof (of Theorem 1)

Let \(\mathcal {A}\) be Algorithm 1 and let \(\mathcal {A}_L\), \(\mathcal {A}_S\) be the algorithms developed in Sect. 4 and 5. In the next sections, we prove the following results for \(r= 1/6.65 - o(1)\) (see Lemmas 6 and 11): The expected profit from \(\mathcal {A}_L\) in rounds \(cn+1,\ldots ,dn\) is at least \(r \cdot {\text {OPT}}_L\), and the expected profit from \(\mathcal {A}_S\) in rounds \(dn+1,\ldots ,n\) is at least \(r \cdot {\text {OPT}}_S\). Together with inequality (1), we obtain

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}\right] \ge {\mathbf{E}}\left[ \mathcal {A}_L\right] + {\mathbf{E}}\left[ \mathcal {A}_S\right] \ge r \cdot {\text {OPT}}_L + r \cdot {\text {OPT}}_S \ge \left( \frac{1}{6.65} - o(1) \right) {\text {OPT}}\,. \end{aligned}$$

\(\square\)

The order in which \(\mathcal {A}_L\) and \(\mathcal {A}_S\) are arranged in Algorithm 1 follows from two observations. Algorithm \(\mathcal {A}_S\) is powerful if it samples roughly 2n/3 rounds; a part of this long sampling phase can be used as the decision phase of \(\mathcal {A}_L\), for which a shorter sampling phase is sufficient. Moreover, the first algorithm should either pack high-profit items, or should leave the knapsack empty for the following algorithm with high probability. The algorithm \(\mathcal {A}_L\) we propose in Sect. 4 has this property (see Lemma 7), in contrast to \(\mathcal {A}_S\). If \(\mathcal {A}_S\) would precede \(\mathcal {A}_L\), the knapsack would be empty after round dn with very small probability, in which case we would not benefit from \(\mathcal {A}_L\) at all.

Finally, note that stronger algorithms for the respective sub-problems can be obtained by choosing different parameters or algorithmic approaches (see Lemma 5 and [25]). However, we seek for maximizing the competitive ratio of Algorithm 1 and therefore need algorithms \(\mathcal {A}_L\) and \(\mathcal {A}_S\) that perform well within the sequential framework.

3 Large Items

The approach presented in this section is based on the connection between the online knapsack problem under random arrival order and the k-secretary problem [18]. In the latter problem, the algorithm can accept up to k items and the goal is to maximize the sum of their profits. Therefore, we assume that a k-secretary algorithm can observe the actual profits of the items, as opposed to the ordinal version of the problem, where an algorithm can only decide based on relative merits. This way, the k-secretary problem generalizes the classical secretary problem [13, 14] and is itself a special case of the online knapsack problem under random arrival order (if all knapsack items have size W/k).

In our setting, each large item consumes more than \(\delta =1/3\) of the knapsack capacity. We call this problem 2-KS, since at most two items can be packed completely. Therefore, any 2-secretary algorithm can be employed to identify two high-profit items for the knapsack packing. However, after packing the first item, the resource might be exhausted, such that the second item identified by the 2-secretary algorithm cannot be packed.

Although this idea can be generalized to any k-secretary algorithm and corresponding \(\delta\)-large items, the approach seems stronger for small k: While 1-KS is exactly 1-secretary, the characteristics of k-KS and k-secretary deviate with growing k. Our results show that the problems 2-secretary and 2-knapsack are still close enough to benefit from such an approach.

In the following, let \(\mathcal {A}_L\) be Algorithm 2. This is an adaptation of the algorithm single-ref developed for the k-secretary problem in [45]. As discussed above, 2-secretary and 2-KS are similar, but different problems. Therefore, in our setting it is not possible to apply the existing analysis from [45] or from any other k-secretary algorithm directly. We further note that in the approach described below, in principle any 2-secretary algorithm can be employed. In Sect. 4.4, we discuss several alternative algorithms.

Assumption. For this section, we assume that all profits are distinct. This is without loss of generality, as ties can be broken by adjusting the profits slightly, using the items’ identifiers. Further, we assume \(v_1> v_2> \ldots > v_n\) and say that i is the rank of item i.

3.1 Packing Types

As outlined above, in contrast to the 2-secretary problem, not all combinations of two knapsack items can be packed completely. Therefore, we analyze the probability that \(\mathcal {A}_L\) selects a feasible set of items whose profit can be bounded from below. We restrict our analysis to packings where an item \(i \in \{1,2,3,4\}\) is packed as the first item and group such packings into several packing types A-M defined in the following. Although covering more packings might lead to further insights into the problem and to a stronger result, we expect the improvement to be marginal.

Let \(p_X\) be the probability that \(\mathcal {A}_L\) returns a packing of type \(X \in \{\text {A},\ldots ,\text {M}\}\). In addition, let \(p_i\) for \(i \in [n]\) be the probability that \(\mathcal {A}_L\) packs i as the first item. Finally, let \(p_{ij}\) for \(i,j \in [n]\) be the probability that \(\mathcal {A}_L\) packs i as the first item and j as the second item.

Table 1 Definition of packing types A-M. We use set notation \(\{i,j\}\) if i and j can be packed in any order, and tuple notation (ij) if the packing order must be as given

In a packing of type A, the items 1 and 2 are packed in any order. Therefore, \(p_\text {A} = p_{12} + p_{21}\). The types B and C are defined analogously using the items \(\{1,3\}\) and \(\{2,3\}\), respectively. In a packing of type D, the item 1 is accepted as the first item, together with no or any second item j. This happens with probability \(p_\text {D} = p_1\). Accordingly, we define types E, F, and G using the items 2, 3, and 4, respectively. Finally, for each item \(i \in \{1,2,3\}\), we introduce two further packing types. For \(i=1\), types H and I cover packings where the first accepted item is 1, the second accepted item j is not 2 (type H) and not 3 (type I), respectively. Therefore, we get \(p_\text {H} = p_1 - p_{12}\) and \(p_\text {I} = p_1 - p_{13}\). Packing types J-K and L-M describe analogous packings for \(i=2\) and \(i=3\), respectively. Table 1 shows all packing types A-M and their probabilities expressed by \(p_i\) and \(p_{ij}\).

In Sect. 4.3, we use the packing types to describe a subset of packings whose profit can be bounded against \({\text {OPT}}_L\). For example, suppose that \({\text {OPT}}_L = v_1 + v_2\). Then, all relevant packings are of type A, H, or J. As these types are disjoint by definition, we immediately obtain \({\mathbf{E}}\left[ \mathcal {A}_L\right] \ge p_\text {A} (v_1 + v_2) + p_\text {H} v_1 + p_\text {J} v_2\).

3.2 Acceptance Probabilities of Algorithm 2

In the following, we compute the probabilities \(p_i\) and \(p_{ij}\) from Table 1 as functions of c and d. Throughout the following proofs, we denote the position of an item i in a given permutation with \({{\,\mathrm{pos}\,}}(i) \in [n]\). Further, let a be the maximum profit item from the sampling.

We think of the random permutation as being sequentially constructed. The fact given below follows from the hypergeometric distribution and becomes helpful in the proofs of Lemmas 1 and 2.

Fact 1

Suppose there are N balls in an urn from which M are blue and \(N-M\) red. The probability of drawing K blue balls without replacement in a sequence of length K is \(h(N,M,K) := \left( {\begin{array}{c}M\\ K\end{array}}\right) / \left( {\begin{array}{c}N\\ K\end{array}}\right) \,.\)

In the first lemma, we provide the exact probability \(p_i\) for all \(i \in [n]\) and give lower bounds for \(p_i\) when \(i \in [4]\).

Fig. 1
figure 1

Input sequence considered in Lemma 1. The gray dashed slots represent items of rank greater than a

Lemma 1

The probability that item \(i \in [n]\) is accepted as the first item is

$$\begin{aligned} p_i = \frac{c}{n-1} \sum _{k=cn+1}^{dn} \frac{\left( {\begin{array}{c}n-i\\ k-1\end{array}}\right) }{\left( {\begin{array}{c}n-2\\ k-2\end{array}}\right) } \,. \end{aligned}$$

Moreover, we have the lower bound

$$\begin{aligned} p_i \ge {\left\{ \begin{array}{ll} c \ln \frac{d}{c} - o(1) &{} i=1 \\ c \left( \ln \frac{d}{c} - d + c \right) - o(1) &{} i=2 \\ c \left( \ln \frac{d}{c} - 2(d-c) + \frac{1}{2} (d^2 - c^2) \right) - o(1) &{} i=3 \\ c \left( \ln \frac{d}{c} -3(d-c) + \frac{3}{2} (d^2-c^2) - \frac{1}{3} (d^3-c^3) \right) - o(1) &{} i=4 \,. \end{array}\right. } \, \end{aligned}$$

Proof

In the first part of this proof, we analyze the probability that item i is accepted as the first item at a fixed position \(k \ge cn+1\). As a is defined as the best sampling item, \({{\,\mathrm{pos}\,}}(a) \le cn\) must hold. A permutation uniformly drawn at random satisfies \({{\,\mathrm{pos}\,}}(i)=k\) and \({{\,\mathrm{pos}\,}}(a) \le cn\) with probability \(\frac{1}{n} \frac{cn}{n-1} = \frac{c}{n-1}\).

Next, we draw the remaining \(k-2\) items for the positions before k (see Fig. 1). Since i is packed as the first item, all previous items (except for a) must have rank greater than a. As these items are drawn from the remaining \(n-2\) items (of which \(n-a\) have rank greater than a), the probability for this step is \(h(n-2,n-a,k-2)\) according to Fact 1. Using the law of total probability for \(k \in \{cn+1,\ldots ,dn\}\) and \(a \in \{i+1,\ldots ,n\}\), we obtain

$$\begin{aligned} p_i&= \frac{c}{n-1} \sum _{k=cn+1}^{dn} \sum _{a=i+1}^{n} h(n-2,n-a,k-2) \nonumber \\&= \frac{c}{n-1} \sum _{k=cn+1}^{dn} \frac{1}{\left( {\begin{array}{c}n-2\\ k-2\end{array}}\right) } \sum _{a=i+1}^{n} \left( {\begin{array}{c}n-a\\ k-2\end{array}}\right) \nonumber \\&= \frac{c}{n-1} \sum _{k=cn+1}^{dn} \frac{\left( {\begin{array}{c}n-i\\ k-1\end{array}}\right) }{\left( {\begin{array}{c}n-2\\ k-2\end{array}}\right) } \,. \end{aligned}$$
(2)

Here, the last identity follows from \(\sum _{a=i+1}^{n} \left( {\begin{array}{c}n-a\\ k-2\end{array}}\right) = \sum _{a=0}^{n-i-1} \left( {\begin{array}{c}a\\ k-2\end{array}}\right) = \left( {\begin{array}{c}n-i\\ k-1\end{array}}\right) \,.\)

In the second part of the proof, we derive a lower bound for \(p_i\). We first consider the quotient of binomial coefficients from Eq. (2) and observe

$$\begin{aligned} \frac{\left( {\begin{array}{c}n-i\\ k-1\end{array}}\right) }{\left( {\begin{array}{c}n-2\\ k-2\end{array}}\right) }&= \frac{(n-i)!}{(k-1)! \cdot (n-i-k+1)!} \cdot \frac{(k-2)! \cdot (n-k)!}{(n-2)!} \nonumber \\&= \frac{(n-i)!}{(n-2)!} \cdot \frac{(n-k)!}{(n-i-k+1)!} \cdot \frac{1}{k-1} \nonumber \\&> \frac{1}{(n-2)^{i-2}} \cdot \frac{(n-k)!}{(n-i-k+1)!} \cdot \frac{1}{k} \nonumber \\&> \frac{(n-k-i)^{i-1}}{n^{i-2}} \cdot \frac{1}{k} \,. \end{aligned}$$
(3)

Combining Equation (2) and inequality (3) yields

$$\begin{aligned} p_i> \frac{c}{n-1} \sum _{k=cn+1}^{dn} \frac{(n-k-i)^{i-1}}{n^{i-2}} \cdot \frac{1}{k} > \frac{c}{n^{i-1}} \sum _{k=cn+1}^{dn} \frac{(n-k-i)^{i-1}}{k} \,. \end{aligned}$$
(4)

Now, the goal is to find a closed expression which bounds the last sum in inequality (4) from below. We have

$$\begin{aligned} \sum _{k=cn+1}^{dn} \frac{(n-k-i)^{i-1}}{k} = \sum _{k=cn+1+i}^{dn+i} \frac{(n-k)^{i-1}}{k-i} > \sum _{k=cn+1+i}^{dn+i} \frac{(n-k)^{i-1}}{k} \end{aligned}$$
(5)

and define \(f(k) = (n-k)^{i-1} / k\). Since f is monotonically decreasing in k and \(i-1 \ge 0\), we have

$$\begin{aligned}&\sum _{k=cn+1+i}^{dn+i} \frac{(n-k)^{i-1}}{k} = \sum _{k=cn}^{dn-1} f(k) + \sum _{k=dn}^{dn+i} f(k) - \sum _{k=cn}^{cn+i} f(k) \nonumber \\&\quad > \int _{cn}^{dn} f(k) ~\mathrm {d} {k}~ - (i+1) \cdot f(cn) = \int _{cn}^{dn} f(k) ~\mathrm {d} {k}~ - (i+1) \cdot \frac{(n-cn)^{i-1}}{cn} \,, \end{aligned}$$
(6)

where we used that Fact 1A. Let F be a function such that \(\int _{cn}^{dn} f(k) ~\mathrm {d} {k}~ = F(dn)-F(cn)\). By combining inequalities (46) we obtain

$$\begin{aligned} p_i > \frac{c}{n^{i-1}} \cdot \left( F(dn) - F(cn) \right) - (i+1) \cdot \frac{(1-c)^{i-1}}{n} \,. \end{aligned}$$
(7)

Below we provide suitable functions F for \(i \in [4]\).

i

f(k)

F(k)

\(F(dn) - F(cn)\)

1

\(\frac{1}{k}\)

\(\ln k\)

\(\ln \frac{d}{c}\)

2

\(\frac{n-k}{k}\)

\(n \ln k - k\)

\(n \ln \frac{d}{c} -dn +cn\)

3

\(\frac{(n-k)^2}{k}\)

\(n^2 \ln k - 2nk + \frac{k^2}{2}\)

\(n^2 \ln \frac{d}{c} -2n(dn-cn) + \frac{d^2 n^2 - c^2 n^2}{2}\)

4

\(\frac{(n-k)^3}{k}\)

\(n^3 \ln k -3 n^2 k + \frac{3}{2} nk^2 - \frac{k^3}{3}\)

\(n^3 \ln \frac{d}{c} - 3n^3 (d-c) + \frac{3}{2} n^3 (d^2-c^2) - \frac{1}{3} n^3(d^3-c^3)\)

The claim follows by substituting \(F(dn)-F(cn)\) in inequality (7) by the corresponding expression from the table and noting that \((i+1) \cdot \frac{(1-c)^{i-1}}{n} = o(1)\). \(\square\)

Next, we analyze the probabilities \(p_{ij}\) with \(i < j\) and give lower bounds for \(p_{12}\), \(p_{13}\), and \(p_{23}\).

Lemma 2

Let i and j be two items with \(i < j\). The probability that i is selected as the first item and j is selected as the second item is

$$\begin{aligned} p_{ij} = \frac{c}{n-1} \cdot \frac{1}{n-2} \cdot \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{\left( {\begin{array}{c}n-j\\ l-2\end{array}}\right) }{\left( {\begin{array}{c}n-3\\ l-3\end{array}}\right) } \,. \end{aligned}$$

Moreover, it holds that

$$\begin{aligned} p_{12}&\ge c \left( d - c \ln \frac{d}{c} -c \right) - o(1) \,, \\ p_{13} = p_{23}&\ge c \left( d - c \ln \frac{d}{c} -c - \frac{d^2}{2} +cd - \frac{c^2}{2} \right) - o(1) \,. \end{aligned}$$
Fig. 2
figure 2

Input sequence considered in Lemma 2. The gray dashed slots represent items of rank greater than a

Proof

Let i, j be two items with \(i < j\). The proof follows the same structure as the proof of Lemma 1. Again, we construct the permutation by drawing the positions for items i, j, and a first and afterwards all remaining items with position up to \({{\,\mathrm{pos}\,}}(j)\) (see Fig. 2).

Fix positions \(k={{\,\mathrm{pos}\,}}(i)\) and \(l={{\,\mathrm{pos}\,}}(j)\) . Again, \({{\,\mathrm{pos}\,}}(a) \le cn\) must hold by definition of a. The probability that a random permutation satisfies these three position constraints is \(\beta := \frac{1}{n} \frac{1}{n-1} \frac{cn}{n-2} = \frac{c}{n-1} \cdot \frac{1}{n-2}\).

All remaining items up to position l must have rank greater than a. Thus, we need to draw \(l-3\) items from a set of \(n-3\) remaining items, from which \(n-a\) have rank greater than a. This happens with probability \(h(n-3,n-a,l-3)\). Using the law of total probability for k, l with \(cn+1 \le k < l \le dn\) and \(a \in \{j+1,\ldots ,n\}\), we obtain

$$\begin{aligned} p_{ij}&= \beta \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \sum _{a=j+1}^{n} h(n-3,n-a,l-3) \\&= \beta \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{1}{\left( {\begin{array}{c}n-3\\ l-3\end{array}}\right) } \sum _{a=j+1}^{n} \left( {\begin{array}{c}n-a\\ l-3\end{array}}\right) \,. \end{aligned}$$

Again, by observing \(\sum _{a=j+1}^{n} \left( {\begin{array}{c}n-a\\ l-3\end{array}}\right) = \sum _{a=0}^{n-j-1} \left( {\begin{array}{c}a\\ l-3\end{array}}\right) = \left( {\begin{array}{c}n-j\\ l-2\end{array}}\right)\), we obtain finally

$$\begin{aligned} p_{ij} = \beta \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{\left( {\begin{array}{c}n-j\\ l-2\end{array}}\right) }{\left( {\begin{array}{c}n-3\\ l-3\end{array}}\right) } \,. \end{aligned}$$
(8)

To prove the second part of the lemma, first note that Equation (8) does not depend on i, thus we have \(p_{13} = p_{23}\). It remains to find lower bounds for \(p_{12}\) and \(p_{23}\). We start with \(p_{12}\). By Equation (8) and the definition of \(\beta\), it holds that

$$\begin{aligned} p_{12} = \beta \cdot \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{\left( {\begin{array}{c}n-2\\ l-2\end{array}}\right) }{\left( {\begin{array}{c}n-3\\ l-3\end{array}}\right) } > \frac{c}{n} \cdot \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{1}{l-2} \,. \end{aligned}$$
(9)

Since \(\sum _{l=k+1}^{dn} \frac{1}{l-2} = \sum _{l=k-1}^{dn-2} \frac{1}{l} = \left( \sum _{l=k}^{dn-1} \frac{1}{l}\right) + \frac{1}{k-1} - \frac{1}{dn-1}\) and 1/l is monotonically decreasing, we have \(\sum _{l=k}^{dn-1} \frac{1}{l} \ge \int _{k}^{dn} \frac{1}{l} ~\mathrm {d} {\ell }~ = \ln \frac{dn}{k}\) by Fact 1A. Therefore,

$$\begin{aligned} \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{1}{l-2}&\ge \sum _{k=cn+1}^{dn-1} \left( \ln \frac{dn}{k} + \frac{1}{k-1} - \frac{1}{dn-1} \right) \nonumber \\&= \left( \sum _{k=cn+1}^{dn-1} \ln \frac{dn}{k}\right) + \left( \sum _{k=cn+1}^{dn-1} \frac{1}{k-1} \right) - \frac{dn-1-cn}{dn-1} \,. \end{aligned}$$
(10)

Similarly, using Fact 1A, we obtain

$$\begin{aligned} \sum _{k=cn+1}^{dn-1} \ln \frac{dn}{k} = \left( \sum _{k=cn}^{dn-1} \ln \frac{dn}{k}\right) - \ln \frac{d}{c} \ge \left( \int _{cn}^{dn} \ln \frac{dn}{k} ~\mathrm {d} {k}~\right) - \ln \frac{d}{c} \nonumber \\ = dn -cn \cdot \ln \frac{d}{c} -cn - \ln \frac{d}{c} \end{aligned}$$
(11)

and

$$\begin{aligned}&\sum _{k=cn+1}^{dn-1} \frac{1}{k-1} = \sum _{k=cn}^{dn-2} \frac{1}{k} = \left( \sum _{k=cn}^{dn-1} \frac{1}{k}\right) - \frac{1}{dn-1} \ge \left( \int _{cn}^{dn} \frac{1}{k} ~\mathrm {d} {k}~ \right) - \frac{1}{dn-1} \nonumber \\&\quad = \ln \frac{d}{c} - \frac{1}{dn-1} \,. \end{aligned}$$
(12)

By combining inequalities (9) to (12), we obtain

$$\begin{aligned} p_{12}&> \frac{c}{n} \cdot \left( \left( cn -cn \cdot \ln \frac{d}{c} -cn - \ln \frac{d}{c}\right) + \left( \ln \frac{d}{c} - \frac{1}{dn-1} \right) - \frac{dn-1-cn}{dn-1} \right) \\&=c \cdot \left( d - c \ln \frac{d}{c} -c \right) - \frac{c}{n} \cdot \left( 1 - \frac{cn-1}{dn-1} \right) \,. \end{aligned}$$

Since \(\frac{c}{n} \cdot \left( 1 - \frac{cn-1}{dn-1} \right) = o(1)\), this gives the claim for \(p_{12}\).

Next, we find a lower bound for \(p_{23}\). Equation (8) with \(j=3\) gives

$$\begin{aligned} p_{23} = \beta \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{\left( {\begin{array}{c}n-3\\ l-2\end{array}}\right) }{\left( {\begin{array}{c}n-3\\ l-3\end{array}}\right) } = \beta \cdot \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{n-l}{l-2} \,. \end{aligned}$$
(13)

By splitting this expression into two parts we obtain

$$\begin{aligned} p_{23}&= \left( \beta \cdot n \cdot \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{1}{l-2} \right) - \beta \cdot \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{l}{l-2} \\&> p_{12} - \beta \cdot \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{l}{l-2} \,, \end{aligned}$$

where the inequality follows from inequality (9). Hence, using the lower bound for \(p_{12}\), the claim for \(p_{23}\) follows if we can show \(\beta \cdot \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{l}{l-2} \le c \cdot \left( d^2/2 - cd + c^2/2 \right) + o(1)\). Since \(\frac{l}{l-2}\) decreases monotonically in l, Fact 1A implies

$$\begin{aligned} \sum _{l=k+1}^{dn} \frac{l}{l-2} \le \int _{k}^{dn} \frac{l}{l-2} ~\mathrm {d} {l}~ = dn + 2 \cdot \ln (dn-2) - k - 2 \cdot \ln (k-2) \,. \end{aligned}$$
(14)

Therefore, with \(\xi = \sum _{k=cn+1}^{dn-1} \ln (k-2)\), we have

$$\begin{aligned} \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{l}{l-2}&\le \sum _{k=cn+1}^{dn-1} \left( dn + 2 \cdot \ln (dn-2) - k - 2 \cdot \ln (k-2) \right) \\&= (dn-1-cn) \cdot ( dn + 2 \cdot \ln (dn-2) ) - \left( \sum _{k=cn+1}^{dn-1} k\right) - 2 \xi \,. \end{aligned}$$

Since \(\sum _{k=cn+1}^{dn-1} k = \frac{(dn-1) \cdot dn}{2} - \frac{cn\cdot (cn+1)}{2}\), it follows further

$$\begin{aligned}&\sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{l}{l-2} \\&\quad \le \frac{(dn-1)\cdot dn}{2} +2(dn-1) \ln (dn-2) \\&\qquad - cn \cdot \left( dn + 2 \cdot \ln (dn-2) - \frac{cn+1}{2} \right) - 2\xi \\&\quad < \frac{(dn)^2}{2} - n^2cd + \frac{(cn)^2}{2} + \frac{cn}{2} + 2n(d-c) \cdot \ln (dn-2) - 2\xi \,. \\ \end{aligned}$$

Using \(\beta = \frac{c}{n-1} \cdot \frac{1}{n-2} < \frac{c}{n^2} \cdot ( 1 + \frac{3}{n-3})\), we get

$$\begin{aligned} \beta \cdot \sum _{k=cn+1}^{dn-1} \sum _{l=k+1}^{dn} \frac{l}{l-2} < c \cdot \left( \frac{d^2}{2} -cd + \frac{c^2}{2} \right) + \eta _1 + \eta _2 \,, \end{aligned}$$

where \(\eta _1 = \frac{3c}{n-3} \cdot \left( \frac{d^2}{2} -cd + \frac{c^2}{2} \right) = o(1)\) and

$$\begin{aligned} \eta _2&= \frac{c}{n^2} \cdot \left( 1 + \frac{3}{n-3} \right) \cdot \left( \frac{cn}{2} + 2n(d-c) \cdot \ln (dn-2) - 2\xi \right) \,. \end{aligned}$$

We observe that

$$\begin{aligned} \xi&= \sum _{k=cn+1}^{dn-1} \ln (k-2) \\&= \left( \sum _{k=cn+1}^{dn} \ln k \right) - \left( \sum _{k=0}^{2} \ln (dn-k) \right) + \ln (cn) + \ln (cn-1) \\&\ge \left( \int _{cn}^{dn} \ln k ~\mathrm {d} {k}~ \right) - \ln \frac{d}{c} - \ln \frac{dn-1}{cn-1} - \ln (dn-2) \\&= n \cdot \left( d \ln dn - d - c \ln cn + c \right) - \ln \frac{d}{c} - \ln \frac{dn-1}{cn-1} - \ln (dn-2) \end{aligned}$$

by Fact 1B. This implies \(\eta _2 = o(1)\) and concludes the proof. \(\square\)

The remaining probabilities \(p_{21}\) and \(p_{32}\) can be obtained from the symmetry property stated in the next lemma.

Lemma 3

For any two items i and j it holds that \(p_{ij} = p_{ji}\).

Proof

Suppose i is accepted first and j is accepted as the second item in the input sequence \(\pi\). Consider the sequence \(\pi '\) obtained from \(\pi\) by swapping i with j. Since j and i are the first two elements beating the best sampling item in \(\pi '\), Algorithm 2 will select j and i on input \(\pi '\). Hence, the number of permutations must be the same for both events, which implies the claim. \(\square\)

Therefore, we can obtain all probabilities from Table 1 using Lemmas 1, 2, and 3.

3.3 Analysis

Let T be the set of items in the optimal packing of \(I_L\). This set may contain a single item, may be a two-item subset of \(\{1,2,3\}\), or may be a two-item subset containing an item \(j \ge 4\). In the following, we analyze the performance of Algorithm 2 for each case.

3.3.1 Single-item Case

If the optimal packing contains a single item, it is the most profitable item. Let case 1 be this case. Here, we have \(T=\{1\}\) and \({\mathbf{E}}\left[ {\mathcal{A}}_L\right] \ge p_{{\text{D}}}\,{\text{OPT}}_L\).

3.3.2 Two-item Cases

In cases 2-4, we consider packings of the form \(T = \{i,j\}\) with \(1 \le i < j \le 3\). We define cases 2, 3, and 4 as \(T = \{1,2\}\), \(T = \{1,3\}\), and \(T = \{2,3\}\), respectively. We want to consider all algorithmic packings whose profit can be bounded in terms of \({\text {OPT}}_L = v_i + v_j\). For this purpose, for each case 2-4 we build three groups of feasible packing types, according to whether the profit of a packing is \({\text {OPT}}_L\), at least \(v_i\), or in the interval \([v_j,v_i)\). We ensure that no packing is counted multiple times by (a) choosing appropriate packing types and (b) grouping these packing types in a disjoint way, according to their profit. Let \(\alpha _w\) be the probability that the algorithm returns the optimal packing in case \(w \in \{2,3,4\}\). It holds that \(\alpha _2 = p_\text {A}\), \(\alpha _3 = p_\text {B}\), and \(\alpha _4 = p_\text {C}\). In addition, let \(\beta _w\) be the probability that an item \(k \le i\) is packed as the first item in case \(w \in \{2,3,4\}\). We have \(\beta _2 = p_\text {H}\), \(\beta _3 = p_\text {I}\), and \(\beta _4 = p_\text {D} + p_\text {K}\). Finally, let \(\gamma _w\) be the probability that an item k with \(i < k \le j\) is packed as the first item in case \(w \in \{2,3,4\}\). It holds that \(\gamma _2 = p_\text {J}\), \(\gamma _3 = p_\text {E} + p_\text {L}\), and \(\gamma _4 = p_\text {M}\).

Finally, we define case 5 as \(T = \{i, j\}\) with \(i \ge 1\), \(j \ge 4\), and \(i < j\). In this case, note that packings of type D contain an item of value at least \(v_i\), and packings of type E, F, and G contain an item of value at least \(v_j\). Hence, we can slightly abuse the notation and set \(\alpha _5 = 0\), \(\beta _5=p_\text {D}\), and \(\gamma _5=p_\text {E}+p_\text {F}+p_\text {G}\), such that it holds that

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}_L\right] \ge \alpha _w (v_i+v_j) + \beta _w v_i + \gamma _w v_j \quad \text {in case } w \in \{2,3,4,5\} \,. \end{aligned}$$

To bound this term against \({\text {OPT}}_L = v_i + v_j\), consider the following two cases: If \(\beta _w \ge \gamma _w\), we obtain from Chebyshev’s sum inequalityFootnote 1

$$\begin{aligned} \beta _w v_i + \gamma _w v_j \ge \frac{1}{2} \left( \beta _w + \gamma _w \right) (v_i + v_j) \,. \end{aligned}$$

If \(\beta _w < \gamma _w\), we trivially have \(\beta _w v_i + \gamma _w v_j > \beta _w (v_i + v_j)\).

3.3.3 Competitive Ratio

The competitive ratio of \(\mathcal {A}_L\) is the minimum over all cases 1-5. Hence, setting \(\alpha _1=p_\text {D}\) and \(\beta _1 = \gamma _1 = 0\), we obtain

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}_L\right] \ge \min _{w=1,\ldots ,5} \left\{ \alpha _w + \min \left\{ \frac{\beta _w + \gamma _w }{2} , \beta _w \right\} \right\} \cdot {\text {OPT}}_L \,. \end{aligned}$$
(15)

Clearly, inequality (15) simplifies depending on \(\beta _w \ge \gamma _w\) or \(\beta _w < \gamma _w\). The following lemma gives a sufficient condition for \(\beta _w \ge \gamma _w\).

Lemma 4

Let \(f(x)=2 \ln x -6x + 2x^2 - \frac{x^3}{3}\). For parameters c, d with \(f(c) \ge f(d)\) and \(n \rightarrow \infty\), it holds that \(\beta _w \ge \gamma _w\), where \(2 \le w \le 5\).

Proof

We first show that \(f(c) \ge f(d)\) is equivalent to \(\beta _5 \ge \gamma _5\). Note that \(\beta _5 = p_\text {D} = p_1\) and \(\gamma _5 = p_\text {E} + p_\text {F} + p_\text {G} = p_2 + p_3 + p_4\). Now, using Lemma 1 and ignoring lower order terms, we have

$$\begin{aligned}p_1&\ge p_2 + p_3 + p_4 \\ \Leftrightarrow c \ln \frac{d}{c}&\ge c \left( 3 \ln \frac{d}{c} - 6(d-c) + 2(d^2-c^2) - \frac{1}{3} (d^3-c^3) \right) \\ \Leftrightarrow 0&\ge 2 \ln \frac{d}{c} - 6(d-c) + 2(d^2-c^2) - \frac{1}{3} (d^3-c^3) \\ \Leftrightarrow 0&\ge 2 \ln d - 2 \ln c - 6d + 6c + 2 d^2 -2c^2 - \frac{d^3}{3} + \frac{c^3}{3} \\ \Leftrightarrow f(c)&\ge f(d) \,. \end{aligned}$$

Therefore, the claim for \(w=5\) holds by assumption. For \(2 \le w \le 4\), the claims follow immediately from \(f(c) \ge f(d)\) and the symmetry property of Lemma 3:

$$\begin{aligned} \beta _2 =&~ p_\text {H} = p_1 - p_{12} = p_1 - p_{21} \ge p_2 - p_{21} = p_\text {J} = \gamma _2 \\ \beta _3 =&~ p_\text {I} = p_1 - p_{13} = p_1 - p_{31} \ge p_2 + p_3 - p_{31} = p_E + p_\text {L} = \gamma _3 \\ \beta _4 =&~ p_\text {D} + p_\text {K} = p_1 + p_2 - p_{23} \ge p_1 - p_{32} \ge p_3 - p_{32} = p_\text {M} = \gamma _4 \,. \end{aligned}$$

\(\square\)

We obtain the following two lemmas. If \(\mathcal {A}_L\) uses the entire input sequence (\(d=1\)), this algorithm is (1/3.08)-competitive.

Lemma 5

With \(c = 0.23053\) and \(d = 1\) as parameters, we have \({\mathbf{E}}\left[ \mathcal {A}_L\right] \ge \left( \frac{1}{3.08} -o(1) \right) {\text {OPT}}_L\).

Note that 2-KS includes the secretary problem (case 1); thus, no algorithm for 2-KS can have a better competitive ratio than \(1/e < 1/2.71\). In the final algorithm we set \(d<1\) to benefit from \(\mathcal {A}_S\). The next lemma has already been used to prove Theorem 1 in Sect. 3.

Lemma 6

With \(c = 0.42291\) and \(d = 0.64570\) as parameters, we have \({\mathbf{E}}\left[ \mathcal {A}_L\right] \ge \left( \frac{1}{6.65} - o(1) \right) {\text {OPT}}_L\).

Table 2 Competitive ratios of Algorithm 2 for the parameters from Lemmas 5 and 6 in different cases

Proof (of Lemmas 5 and 6)

Let f be the function defined in Lemma 4 and let \((c_1,d_1) = (0.23053,1)\) and \((c_2,d_2) = (0.42291,0.64570)\) be the two parameter pairs from Lemmas 5 and 6, respectively. It holds that

$$\begin{aligned} f(c_1) = f(0.23053)> -4.22 > -\frac{13}{3} = f(1) = f(d_1) \end{aligned}$$

and

$$\begin{aligned} f(c_2) = f(0.42291)> -3.93> -4.00 > f(0.64570) = f(d_2) \,. \end{aligned}$$

Hence, by Lemma 4 we have \(\beta _w \ge \gamma _w\) for any case \(w \in \{2,3,4,5\}\). Therefore, inequality (15) simplifies to \({\mathbf{E}}\left[ \mathcal {A}_L\right] \ge \min _{w=1,\ldots ,5} \left\{ \alpha _w + \frac{\beta _w + \gamma _w }{2}\right\} \cdot {\text {OPT}}_L\). Using the definitions of \(\alpha _w\), \(\beta _w\), and \(\gamma _w\) from Sect. 4.3, the definitions of \(p_X\) from Table 1, and the symmetry property of Lemma 3, we obtain after simplifying terms

$$\begin{aligned} \alpha _2 + \frac{\beta _2 + \gamma _2}{2}&= p_\text {A} + \frac{p_\text {H} + p_\text {J}}{2} = \frac{p_1 + p_2}{2} + p_{12} \\ \alpha _3 + \frac{\beta _3 + \gamma _3}{2}&= p_\text {B} + \frac{p_\text {I} + (p_\text {E} + p_\text {L})}{2} = \frac{p_1 + p_2 + p_3}{2} + p_{13} \\ \alpha _4 + \frac{\beta _4 + \gamma _4}{2}&= p_\text {C} + \frac{(p_\text {D} + p_\text {K}) + p_\text {M}}{2} = \frac{p_1 + p_2 + p_3}{2} + p_{23} \\ \alpha _5 + \frac{\beta _5 + \gamma _5}{2}&= 0 + \frac{p_\text {D} + (p_\text {E} + p_\text {F} + p_\text {G})}{2} = \frac{p_1 + p_2 + p_3 + p_4}{2} \,. \end{aligned}$$

Note that the algorithm attains the same competitive ratio in case 3 and 4, since \(p_{13} = p_{23}\) by Lemma 2. Table 2 shows the competitive ratios for all five cases. For the overall competitive ratio, we have

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}_L\right] \ge \min \left\{ p_1, p_{12} + \frac{p_1 + p_2}{2}, p_{23} + \frac{p_1 + p_2 + p_3}{2}, \frac{\sum _{i=1}^4 p_i}{2} \right\} {\text {OPT}}_L \,. \end{aligned}$$

Evaluating this expression for the parameter pairs \((c_1,d_1)\) and \((c_2,d_2)\) yields \(0.32471 \ge 1/3.08\) and \(0.15039 \ge 1/6.65\) as competitive ratios, respectively. This concludes the proofs of Lemmas 5 and 6. \(\square\)

Recall that in Algorithm 1, we can only benefit from \(\mathcal {A}_S\) if \(\mathcal {A}_L\) has not filled the knapsack completely. Thus, the following property is crucial in the final analysis.

Lemma 7

With a probability of at least c/d, no item is packed by \(\mathcal {A}_L\).

Proof

Fix any set of dn items arriving in rounds \(1,\ldots ,dn\). The most profitable item \(v^*\) from this set arrives in the sampling phase with probability c/d. If this event occurs, no item in rounds \(cn+1,\ldots ,dn\) beats \(v^*\) and \(\mathcal {A}_L\) will not select any item. \(\square\)

3.4 Discussion of other 2-Secretary Algorithms

As mentioned in the introduction of Sect. 4, the approach and its analysis of this section are general enough to cover all two-choice secretary algorithms. Therefore, a natural question to ask is which algorithm is a good choice within this framework. Algorithm 2 is based on the algorithm single-ref developed for the k-secretary problem in [45]. In the following, we discuss several algorithms for 2-secretary and related problems.

The optimistic algorithm by Babaioff et al. [34] was developed for the k-secretary problem and performs slightly better than single-ref in the case \(k=2\); the competitive ratios of both algorithms are 0.4168 and 0.4119, respectively [45]. However, optimistic has a weaker threshold for accepting the first item than single-ref, thus the probability considered in Lemma 7 would fall below c/d. In the present analysis of the sequential approach, we can only benefit from the second algorithm \(\mathcal {A}_S\) if \(\mathcal {A}_S\) starts with an empty knapsack (we will use this property later in Lemma 11). Hence, it is not clear if the slight gain in the expected profit compensates the drawback of an early resource consumption.

A strong algorithm for the 2-secretary problem has been developed by Chan et al. [17]. The algorithm is based on a sophisticated set of decision rules, leading to a competitive ratio of 0.49. Again, the probability considered in Lemma 7 would be smaller for this algorithm. Moreover, it seems overly elaborate to find equivalents of Lemmas 1, 2, 4, and 7.

Another candidate algorithm is due to Nikolaev [46] and Tamaki [47] who proposed an algorithm for a slightly different secretary problem: Here, the objective is to maximize the probability of selecting the best two items. This algorithm depends on two parameters \(0 \le c_1 \le c_2 \le 1\). The first item is selected just as in single-ref with sampling size \(c_1 n\) (select the first item beating the best sampling item). The second item must beat the first item if it arrives before round \(c_2 n\), or (merely) the best sampling item if it arrives later than this round. The success probability tends asymptotically to 0.2254 with \(c_1 = 0.2291\) and \(c_2=0.6065\), which is best possible [47]. If we use this algorithm within our framework, it turns out that the best competitive ratio is achieved for \(c_1 = c_2\). However, for \(c_1 = c_2\), this algorithm is equal to single-ref in the case \(k=2\).

Therefore, we conclude that even though various algorithms for the 2-secretary problem stronger than single-ref exist, it is not clear if they can improve the performance of the overall algorithm within the sequential framework. On the other side, Algorithm 2 (based on single-ref) is fairly easy to analyze and selects high-profit items with sufficient high probability.

4 Small Items

For (1/3)-small items, we use solutions for the fractional problem variant and obtain an integral packing via randomized rounding. This approach has been applied successfully to packing LPs [25]; however, for the knapsack problem it is not required to solve LP relaxations in each round (as in [25]). Instead, here, we use solutions of a greedy algorithm, which is well-known to be optimal for the fractional knapsack problem. Particularly, this algorithm is both efficient in running time and easy to analyze.

We next formalize the greedy solution for any set T of items. Let the density of an item be the ratio of its profit to its size. Consider any list L containing the items from T ordered by non-increasing density. We define the rank \(\rho (i)\) of item i as its position in L and \(\sigma (l)\) as the item at position l in L. Thus, \(\sigma (l) = \rho ^{-1}(l)\) denotes the l-th densest item. Let k be such that \(\sum _{i=1}^{k-1} s_{\sigma (i)} < W \le \sum _{i=1}^{k} s_{\sigma (i)}\). The fraction of item i in the greedy solution \(\alpha\) is now defined as

$$\begin{aligned} \alpha _i = {\left\{ \begin{array}{ll} 1 &{} \text {if } \rho (i) < k \\ \left( W - \sum _{i=1}^{k-1} s_{\sigma (i)}\right) / s_i &{} \text {if } \rho (i) = k \\ 0 &{} \text {else} \,, \end{array}\right. } \end{aligned}$$

i.e., the \(k-1\) densest items are packed integrally and the remaining space is filled by the maximum feasible fraction of the k-th densest item. Let \({\text {OPT}}(T)\) and \({\text {OPT}}^*(T)\) denote the profits of optimal integral and fractional packings of T, respectively. It is easy to see that \(\alpha\) satisfies \(\sum _{i \in T} \alpha _i v_i = {\text {OPT}}^*(T) \ge {\text {OPT}}(T)\) and \(\sum _{i \in T} \alpha _i s_i = W\).

4.1 Algorithm

The algorithm \(\mathcal {A}_S\) for (1/3)-small items, which is formally defined in Algorithm 3, works as follows. During the initial sampling phase of dn rounds, the algorithm rejects all items. In each round \(\ell \ge dn+1\), the algorithm computes a greedy solution \(x^{(\ell )}\) for \(I_S(\ell )\). Here, \(I_S(\ell )\) denotes the subset of \(I_S\) revealed up to round \(\ell\). The algorithm packs the current online item i with probability \(x^{(\ell )}_i\). However, generally, this can only be done if the remaining capacity of the knapsack is at least \((1/3) \cdot W \ge s_i\).

Note that in case of an integral coefficient \(x^{(\ell )}_i \in \{0,1\}\), the packing step is completely deterministic. Moreover, in any greedy solution \(x^{(\ell )}\), there is at most one item i with fractional coefficient \(x^{(\ell )}_i \in (0,1)\). Therefore, in expectation, there is only a small number of rounds where the algorithm actually requests randomness. Although this is not relevant for the proof of the competitive ratio, we provide a short proof of this observation in the following.

Observation 1

Let X denote the number of rounds where Algorithm 3 packs an item with probability \(x_i \in (0,1)\). It holds that \({\mathbf{E}}\left[ X\right] \le \ln (1/d) \le 0.44\).

Proof

Consider any round \(\ell\) and let \(x^{(\ell )}\) be the greedy knapsack solution computed by Algorithm 3. By definition of \(x^{(\ell )}\), at most one of the \(\ell\) visible items has a fractional coefficient \(x^{(\ell )}_i \in (0,1)\). The probability that this item i arrives in round \(\ell\) is \(1/\ell\) in a random permutation. Let \(X_\ell\) be an indicator variable for the event that Algorithm 3 packs an item at random in round \(\ell\). By the above argument, we have \({\mathbf{Pr}}\left[ X_\ell = 1\right] \le 1/\ell\). Since Algorithm 3 selects items starting in round \(dn+1\), we obtain

$$\begin{aligned} {\mathbf{E}}\left[ X\right] = \sum _{\ell =dn+1}^{n} {\mathbf{E}}\left[ X_\ell \right] \le \sum _{\ell =dn+1}^{n} \frac{1}{\ell } \le \ln \frac{1}{d} \le 0.44 \,. \end{aligned}$$

\(\square\)

Note that Algorithm 2 and the sequential approach (Algorithm 1) are deterministic algorithms. Therefore, our overall algorithm requests randomness in expectation in less than one round.

figure c

4.2 Analysis

Before we analyze the competitive ratio of \(\mathcal {A}_S\) in a sequence of lemmas, we make a few technical observations and introduce further notation.

In round \(dn+1\), the knapsack might already have been filled by \(\mathcal {A}_L\) with large items from previous rounds. For now, we assume an empty knapsack after round dn and denote this event by \(\xi\). In the final analysis, we will use the fact that \({\mathbf{Pr}}\left[ \xi \right]\) can be bounded from below, which is according to Lemma 7.

The description of Algorithm 3 is tailored to (1/3)-small items, in order to complement Algorithm 2. Anyway, it is straightforward to generalize this algorithm to arbitrary maximum item size \(\delta\). In order to show similarities with the analysis from Sect. 6 later, we state the following lemmas with \(\delta\) as a parameter. For this purpose, we define \(\varDelta = \frac{1}{1-\delta }\) (and obtain \(\varDelta =3/2\) in the final analysis).

Finally, let \(\alpha\) be a greedy (offline) solution for \(I_S\). By the following lemma, the probability that an item \(i \in I_S\) is packed by \(\mathcal {A}_S\) is proportional to \(\alpha _i\). By treating \(\alpha _i\) as a parameter in the next two lemmas, it is not required to analyze the profit in each round in expectation over all items. The latter approach appears in related work [28], where stochastic dependencies need to be handled carefully.

Lemma 8

Let \(i \in I_S\) and \(E_i(\ell )\) be the event that the item i is packed by \(\mathcal {A}_S\) in round \(\ell\). For \(\ell \ge dn+1\), it holds that \({\mathbf{Pr}}\left[ E_i(\ell ) \mid \xi \right] \ge \frac{1}{n} \alpha _i (1 - \varDelta \ln \frac{\ell }{dn})\).

Proof

In a random permutation, item i arrives in round \(\ell\) with probability 1/n. In round \(\ell \ge dn+1\), the algorithm decides to pack i with probability \(x^{(\ell )}_i\). Note that the rank of item i in \(I_S(\ell )\) is less than or equal to its rank in \(I_S\). According to the greedy solution’s definition, this implies \(x^{(\ell )}_i \ge \alpha _i\).

Finally, the \(\delta\)-small item i can be packed successfully if the current resource consumption X is at most \((1 - \delta ) W\). In the following, we investigate the expectation of X to give a probability bound using Markov’s inequality at the end of this proof.

Let \(X_k\) be the resource consumption in round \(k < \ell\). By assumption, the knapsack is empty after round dn, thus \(X = \sum _{k=dn+1}^{\ell -1} X_k\). Let Q be the set of k visible items in round k. The set Q can be seen as uniformly drawn from all k-item subsets and any item \(j \in Q\) is the current online item of round k with probability 1/k. The algorithm packs any item j with probability \(x_j^{(k)}\), thus

$$\begin{aligned} {\mathbf{E}}\left[ X_k\right] = \sum _{j \in Q} {\mathbf{Pr}}\left[ j \text { occurs in round } k\right] s_j x_j^{(k)} = \frac{1}{k} \sum _{j \in Q} s_j x_j^{(k)} \le \frac{W}{k} \,, \end{aligned}$$

where the last inequality holds because \(x^{(k)}\) is a feasible solution for a knapsack of size W. By the linearity of expectation and the previous inequality, the expected resource consumption up to round \(\ell\) is

$$\begin{aligned} {\mathbf{E}}\left[ X\right] = \sum _{k=dn+1}^{\ell -1} {\mathbf{E}}\left[ X_k\right] \le \sum _{k=dn+1}^{\ell -1} \frac{W}{k} \le W \ln \frac{\ell }{dn} \,. \end{aligned}$$

Using Markov’s inequality, we obtain

$$\begin{aligned} {\mathbf{Pr}}\left[ X < (1-\delta ) W\right] = 1 - {\mathbf{Pr}}\left[ X \ge (1-\delta ) W\right] \ge 1 - \frac{{\mathbf{E}}\left[ X\right] }{(1-\delta )W} \ge 1 - \varDelta \ln \frac{\ell }{dn} \,, \end{aligned}$$

which concludes the proof. \(\square\)

Using Lemma 8 we easily obtain the total probability that a specific item will be packed.

Lemma 9

Let \(i \in I_S\) and \(E_i\) be the event that the item i is packed by \(\mathcal {A}_S\). It holds that \({\mathbf{Pr}}\left[ E_i \mid \xi \right] \ge \alpha _i \left( (1-d)(1+\varDelta ) - \varDelta \cdot \left( 1 + \frac{1}{n} \right) \cdot \ln \frac{1}{d} \right)\).

Proof

Summing the probabilities from Lemma 8 over all rounds \(\ell \ge dn+1\) gives

$$\begin{aligned} {\mathbf{Pr}}\left[ E_i \mid \xi \right]&= \sum _{\ell =dn+1}^{n} {\mathbf{Pr}}\left[ E_i(\ell ) \mid \xi \right] \nonumber \\&\ge \sum _{\ell =dn+1}^{n} \frac{1}{n} \alpha _i \left( 1 - \varDelta \ln \frac{\ell }{dn} \right) \nonumber \\&= \frac{1}{n} \alpha _i \left( n - dn - \varDelta \sum _{\ell =dn+1}^{n} \ln \frac{\ell }{dn} \right) \nonumber \\&= \alpha _i \left( 1 - d - \frac{\varDelta }{n} \sum _{\ell =dn+1}^{n} \ln \frac{\ell }{dn} \right) \,. \end{aligned}$$
(16)

By Fact 1B, we obtain

$$\begin{aligned} \sum _{\ell =dn+1}^{n} \ln \frac{\ell }{dn} = \left( \sum _{\ell =dn}^{n-1} \ln \frac{\ell }{dn}\right) + \ln \frac{1}{d} \le \left( \int _{dn}^{n} \ln \frac{\ell }{dn} \, \mathrm {d}\ell \right) + \ln \frac{1}{d} \end{aligned}$$

and resolving the integral yields

$$\begin{aligned} \sum _{\ell =dn+1}^{n} \ln \frac{\ell }{dn}&\le n \cdot \left( \ln \frac{n}{dn} - 1 \right) - dn \cdot \left( \ln \frac{dn}{dn} - 1 \right) + \ln \frac{1}{d} \nonumber \\&= n \cdot \ln \frac{1}{d} - n + dn + \ln \frac{1}{d} \,. \end{aligned}$$
(17)

The claim follows by combining inequalities (16) and (17) and by rearranging terms. \(\square\)

The following lemma bounds the expected profit of the packing of \(\mathcal {A}_S\), assuming the event \(\xi\).

Lemma 10

We have \({\mathbf{E}}\left[ \mathcal {A}_S \mid \xi \right] \ge \left( (1 \!- \! d) (1 \!+ \! \varDelta ) - \varDelta \cdot \left( 1 \!+ \! \frac{1}{n} \right) \cdot \ln \frac{1}{d} \right) {\text {OPT}}_S\).

Proof

Let \(\beta = (1 - d) (1 + \varDelta ) - \varDelta \cdot \left( 1 + \frac{1}{n} \right) \cdot \ln \frac{1}{d}\). By Lemma 9, the probability that an item i is packed, assuming \(\xi\), is \({\mathbf{Pr}}\left[ E_i \mid \xi \right] \ge \alpha _i \beta\). Therefore,

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}_S \mid \xi \right] = \sum _{i \in I_S} {\mathbf{Pr}}\left[ E_i \mid \xi \right] v_i \ge \sum _{i \in I_S} \alpha _i \beta v_i \ge \beta {\text {OPT}}_S \,. \end{aligned}$$

\(\square\)

The conditioning on \(\xi\) can be resolved using Lemma 7. We obtain the following lemma, which is the second pillar in the proof of Theorem 1 and concludes this section.

Lemma 11

We have \({\mathbf{E}}\left[ \mathcal {A}_S\right] \ge \frac{c}{d} \left( (1 - d) (1 + \varDelta ) - \varDelta \cdot \left( 1 + \frac{1}{n} \right) \cdot \ln \frac{1}{d} \right) {\text {OPT}}_S\). In particular, the algorithm \(\mathcal {A}_S\) is (1/6.65)-competitive with respect to \({\text {OPT}}_S\) setting \(\varDelta = 3/2\), \(c = 0.42291\), and \(d = 0.64570\).

Proof

By Lemma 7, the probability for an empty knapsack after round dn is \({\mathbf{Pr}}\left[ \xi \right] \ge \frac{c}{d}\). Thus, we obtain from Lemma 10

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}_S\right]&= {\mathbf{Pr}}\left[ \xi \right] {\mathbf{E}}\left[ \mathcal {A}_S \mid \xi \right] \\&\ge \frac{c}{d} \left( (1 - d) (1 + \varDelta ) - \varDelta \cdot \left( 1 + \frac{1}{n} \right) \cdot \ln \frac{1}{d} \right) {\text {OPT}}_S \,. \end{aligned}$$

Setting \(\varDelta = 3/2\), which corresponds to \(\delta =1/3\), leads to

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}_S\right] \ge \frac{c}{d} \left( \frac{5}{2} (1 - d) - \frac{3}{2} \ln \frac{1}{d} - \frac{3}{2n} \ln \frac{1}{d}\right) {\text {OPT}}_S \,. \end{aligned}$$

Noting that \(\frac{c}{d} \frac{3}{2n} \ln \frac{1}{d} = o(1)\), we obtain that \({\mathbf{E}}\left[ \mathcal {A}_S\right] \ge \left( \frac{1}{6.65} - o(1) \right) {\text {OPT}}_S\) for \(c = 0.42291\) and \(d = 0.64570\). \(\square\)

5 Extension to GAP

In this section, we show that the sequential approach introduced in Sect. 3 can be easily adapted to GAP, yielding a (1/6.99)-competitive randomized algorithm. We first define the problem formally.

GAP. We are given a set of items \(I = [n]\) and a set of resources \(R = [m]\) of capacities \(W_r \in \mathbb {Q}_{> 0}\) for \(r \in R\). If item \(i \in I\) is assigned to resource \(r \in R\), this raises profit (value) \(v_{i,r} \in \mathbb {Q}_{\ge 0}\), but consumes \(s_{i,r} \in \mathbb {Q}_{> 0}\) of the resource’s capacity. The goal is to assign each item to at most one resource such that the total profit is maximized and no resource exceeds its capacity. We call the tuple \((v_{i,r}, s_{i,r})\) an option of item i and w.l.o.g. assume that options for all resources exist. This can be ensured by introducing dummy options with \(v_{i,r} = 0\). In the online version of the problem, in each round an item is revealed together with its set of options. The online algorithm must decide immediately and irrevocably, if the item is assigned. If so, it has to specify the resource according to one of its options.

Again, we construct restricted instances \(\mathcal {I}_L\) and \(\mathcal {I}_S\) according to the following definition, which generalizes Definition 1. Let \(\delta \in (0,1)\).

Definition 2

We call an option \((v_{i,r}, s_{i,r})\) \(\delta\)-large if \(s_{i,r} > \delta W_r\) and \(\delta\)-small if \(s_{i,r} \le \delta W_r\). Whenever \(\delta\) is clear from the context, we say an option is large or small for short. Based on a given instance \(\mathcal {I}\) for GAP, we define two modified instances \(\mathcal {I}_L\) and \(\mathcal {I}_S\) which are obtained from \(\mathcal {I}\) as follows.

  • \(\mathcal {I}_L\): Replace each small option \((v_{i,r}, s_{i,r})\) by the large option \((0,W_r)\).

  • \(\mathcal {I}_S\): Replace each large option \((v_{i,r}, s_{i,r})\) by the small option \((0, \delta W_r)\).

Thus, \(\mathcal {I}_L\) only contains large options and \(\mathcal {I}_S\) only contains small options. However, by construction no algorithm will assign an item according to a zero-profit option. We define \({\text {OPT}}\), \({\text {OPT}}_L\), and \({\text {OPT}}_S\) accordingly. Note that the inequality \({\text {OPT}}\le {\text {OPT}}_L + {\text {OPT}}_S\) holds also for GAP.

The sequential framework of Algorithm 1 can be adapted in a straightforward manner by replacing terms like packing with assignment to resource r. Here, we set the threshold parameter to \(\delta =1/2\). In the following subsections, we specify algorithms \(\mathcal {A}_L\) and \(\mathcal {A}_S\) for (1/2)-large and (1/2)-small options, respectively.

5.1 Large Options

If each item consumes more than one half of a resource, no two items can be assigned to this resource. Thus, we obtain the following matching problem.

Edge-weighted bipartite matching. Given a bipartite graph \(G=(L \cup R, E)\) and a weighting function \(w :E \rightarrow \mathbb {Q}_{\ge 0}\), the goal is to find a bipartite matching \(M \subseteq E\) such that \(w(M) := \sum _{e \in M} w(e)\) is maximal. In the online version, the (offline) nodes from R and the number \(n=|L|\) are known in advance, whereas the nodes from L are revealed online together with their incident edges. In the case of GAP, L is the set of items, R is the set of resources, and the weight of an edge \(e=\{l,r\}\) is \(w(e)=v_{l,r}\).

Kesselheim et al. [28] developed an optimal (1/e)-competitive algorithm for the online problem under random arrival order. Adapting this algorithm to the sequential approach with parameters c and d leads to the following algorithm \(\mathcal {A}_L\): During the first cn rounds, no edge is added to the matching. Then, in each round \(\ell\), the algorithm computes a maximum edge-weighted matching \(M^{(\ell )}\) for the graph revealed up to this round. Let \(l \in L\) be the online vertex of round \(\ell\). If l is matched in \(M^{(\ell )}\) to some node \(r \in R\), we call \(e^{(\ell )} = \{l,r\}\) the tentative edge of round \(\ell\). Now, if r is still unmatched and \(\ell \le dn\), the tentative edge is added to the matching.

figure d

A formal description of this algorithm is given in Algorithm 4. The proof of the approximation guarantee relies mainly on the following two lemmas; for completeness, we give the proofs from [28] here. The first lemma shows that the expected weight of any tentative edge can be bounded from below.

Lemma 12

( [28]) In any round \(\ell\), the tentative edge (if it exists) has expected weight \({\mathbf{E}}\left[ w(e^{(\ell )})\right] \ge \frac{1}{n} {\text {OPT}}_L\).

Proof

We use the fact that the random sequence of visible items in round \(\ell\) can be obtained from the following process: First, the set Q of visible items in round \(\ell\) is drawn uniformly at random from all \(\ell\)-element subsets of L. Then, the online vertex of round \(\ell\) is drawn uniformly at random from Q. Note that these random experiments are independent.

After the first step, the matching \(M^{(\ell )}\) is already fixed. Let \(M^* = M^{(n)}\) be a maximum weight (offline) matching and \(M^*_Q = \{e = \{l,r\} \in M^* \mid l \in Q\}\) the matching \(M^*\) projected to visible nodes. We have \(w(M^{(\ell )}) \ge w(M^*_Q)\), since \(M^{(\ell )}\) is an optimal and \(M^*_Q\) a feasible matching for the graph revealed in round \(\ell\). As described above, each vertex \(l \in L\) has probability \(\ell /n\) to be in Q, thus

$$\begin{aligned} {\mathbf{E}}\left[ w(M^{(\ell )})\right] \ge {\mathbf{E}}\left[ w(M^*_Q)\right] = \sum _{e=\{l,r\} \in M^*} {\mathbf{Pr}}\left[ l \in Q\right] w(e) = \frac{\ell }{n} w(M^*) \,. \end{aligned}$$
(18)

For the second step, we observe that each vertex from Q has the same probability of \(1/\ell\) to arrive in round \(\ell\). Let \(\mathcal {M}\) be the domain of the random variable \(M^{(\ell )}\). We have

$$\begin{aligned} {\mathbf{E}}\left[ w(e^{(\ell )})\right]&= \sum _{M' \in \mathcal {M}} {\mathbf{E}}\left[ w(e^{(\ell )}) \mid M^{(\ell )} = M'\right] \cdot {\mathbf{Pr}}\left[ M^{(\ell )} = M'\right] \nonumber \\&= \sum _{M' \in \mathcal {M}} \left( \sum _{e=\{l,r\} \in M'} \frac{1}{\ell } w(e) \right) \cdot {\mathbf{Pr}}\left[ M^{(\ell )} = M'\right] \nonumber \\&= \frac{1}{\ell } \cdot \sum _{M' \in \mathcal {M}} w(M') \cdot {\mathbf{Pr}}\left[ M^{(\ell )} = M'\right] \nonumber \\&= \frac{1}{\ell } \cdot {\mathbf{E}}\left[ w(M^{(\ell )})\right] \,. \end{aligned}$$
(19)

Combining (19) and (18) concludes the proof. \(\square\)

However, we only gain the weight of the tentative edge \(e^{(\ell )} = \{l,r\}\) if it can be added to the matching, i.e., if r has not been matched previously. The next lemma bounds the probability for this event from below.

Lemma 13

( [28]) Let \(\xi (r,\ell )\) be the event that the offline vertex \(r \in R\) is unmatched after round \(\ell \ge cn+1\). It holds that \({\mathbf{Pr}}\left[ \xi (r,\ell )\right] \ge \frac{cn}{\ell }\).

Proof

In each round k, the vertex r can only be matched if it is incident to the tentative edge \(e^{(k)} \in M^{(k)}\) of this round, i.e., \(e^{(k)} = \{l,r\}\) where \(l \in L\) is the online vertex of round k. As l can be seen as uniformly drawn among all k visible nodes (particularly, independent of the order of the previous \(k-1\) items), l has probability 1/k to arrive in round k. Consequently, r is not matched in round k with probability \(1- 1/k\). This argument applies to all rounds \(cn+1,\ldots ,\ell\). Therefore,

$$\begin{aligned} {\mathbf{Pr}}\left[ \xi (r,\ell )\right] \ge \prod _{k=cn+1}^{\ell } 1 - \frac{1}{k} = \prod _{k=cn+1}^{\ell } \frac{k-1}{k} = \frac{cn}{\ell } \,. \end{aligned}$$

\(\square\)

Using Lemmas 12 and 13, we can bound the competitive ratio of \(\mathcal {A}_L\) in the following lemma. Note that we obtain the optimal (1/e)-competitive algorithm from [28] for \(c=1/e\) and \(d=1\).

Lemma 14

It holds that \({\mathbf{E}}\left[ \mathcal {A}_L\right] \ge \left( c \ln \frac{d}{c} - o(1) \right) {\text {OPT}}_L\).

Proof

Let \(A_\ell\) be the gain of the matching weight in round \(\ell\). As the tentative edge \(e^{(\ell )} = \{l,r\}\) can only be added if r has not been matched in a previous round, we have \({\mathbf{E}}\left[ A_\ell \right] = {\mathbf{E}}\left[ w(e^{(\ell )})\right] {\mathbf{Pr}}\left[ \xi (r,\ell )\right]\) for the event \(\xi (r,\ell )\) from Lemma 13. Therefore, from Lemmas 12 and 13, we have \({\mathbf{E}}\left[ A_\ell \right] \ge \frac{1}{n} {\text {OPT}}_L \frac{cn}{\ell } = \frac{c}{\ell } {\text {OPT}}_L\). Summing over all rounds from \(cn+1\) to dn yields

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}_L\right] = \sum _{\ell =cn+1}^{dn} {\mathbf{E}}\left[ A_\ell \right] \ge \left( c \sum _{\ell =cn+1}^{dn} \frac{1}{\ell } \right) {\text {OPT}}_L \ge \left( c \ln \frac{d}{c} - \frac{1 - c/d}{n}\right) {\text {OPT}}_L \,. \end{aligned}$$

The last inequality follows from \(\sum _{\ell =cn+1}^{dn} \frac{1}{\ell } = \left( \sum _{\ell =cn}^{dn-1} \frac{1}{\ell } \right) - \frac{1}{cn} + \frac{1}{dn}\) and, according to Fact 1A, \(\sum _{\ell =cn}^{dn-1} \frac{1}{\ell } \ge \int _{cn}^{dn} \frac{1}{\ell } ~\mathrm {d} {\ell }~ = \ln \frac{d}{c}\). \(\square\)

5.2 Small Options

For small options, we use the LP-based algorithm from [25, Sec. 3.3] and analyze it within our algorithmic framework. In order to make this paper self-contained, we give a linear program for fractional GAP (LP 1), the algorithm, and its corresponding proofs.

$$\begin{aligned} \text {maximize }&\sum _{\begin{array}{c} i \in I_S \\ r \in R \end{array}} v_{i,r} x_{i,r} \\ \text {subject to }&\sum _{i \in I_S} s_{i,r} x_{i,r} \le W_r&\forall r \in R \\&\sum _{r \in R} x_{i,r} \le 1&\forall i \in I_S \\&0 \le x_{i,r} \le 1&\forall (i,r) \in I_S \times R \end{aligned}$$
(LP 1)
figure e

Let \(\mathcal {A}_S\) be Algorithm 5. After a sampling phase of dn rounds, in each round \(\ell\), the algorithm computes an optimal solution \(x^{(\ell )}\) of LP 1 for \(I_S(\ell )\). Here, \(I_S(\ell )\) denotes the instance of small options revealed so far. Now, the decision to which resource the current online item i is assigned, if at all, is made at random using \(x^{(\ell )}\): Resource \(r \in R\) is chosen with probability \(x^{(\ell )}_{i,r}\) and the item stays unassigned with probability \(1 - \sum _{r \in R} x^{(\ell )}_{i,r}\). Note that the item can only be assigned to the chosen resource if its remaining capacity is at least \((1/2) \cdot W_r\).

To analyze Algorithm 5, we consider the gain of profit in round \(\ell \ge dn+1\), denoted by \(A_\ell\). For this purpose, let \(i^{(\ell )}\) be the item of that round and \(r^{(\ell )}\) the resource chosen by the algorithm. Now, it holds that \({\mathbf{E}}\left[ A_\ell \right] = {\mathbf{E}}\left[ v_{i^{(\ell )},r^{(\ell )}}\right] \cdot {\mathbf{Pr}}\left[ i^{(\ell )}\text { can be assigned to }r^{(\ell )}\right]\), where in the first term, the expectation is over the item arriving in round \(\ell\) and the resource chosen by the algorithm. The latter term only depends on the resource consumption of \(r^{(\ell )}\) in earlier rounds. In the next two lemmas, we give lower bounds for both terms. As in the proofs of Sect. 6.1, it is helpful to construct the random permutation of the first \(\ell\) items in two independent steps: First, the set of \(\ell\) visible items is drawn uniformly, without determining the order of items. Second, the online item arriving in round \(\ell\) is drawn uniformly from this set.

Lemma 15

( [25, Sec. 3.3]) For any round \(\ell \ge dn+1\), we have \({\mathbf{E}}\left[ v_{i^{(\ell )},r^{(\ell )}}\right] \ge \frac{1}{n} {\text {OPT}}_S\).

Proof

The proof is similar to the proof of Lemma 12. As we consider a fixed round \(\ell\), we write i and r instead of \(i^{(\ell )}\) and \(r^{(\ell )}\) for ease of presentation. Further, we write \(v(\alpha ) := \sum _{j \in I_S} \sum _{s \in R} \alpha _{j,s} v_{j,s}\) for the profit of a fractional assignment \(\alpha\).

First, the set of visible items Q in round \(\ell\) is drawn uniformly at random among all subsets of \(\ell\) items. Let \(x^{(n)}\) be an optimal (offline) solution to LP 1 and let \(x^{(n)}|_Q\) denote the restriction of \(x^{(n)}\) to the items in Q, i.e., \((x^{(n)}|_Q)_{j,s} = x^{(n)}_{j,s}\) if \(j \in Q\) and \((x^{(n)}|_Q)_{j,s} = 0\) if \(j \notin Q\). Since \(x^{(n)}|_Q\) is a feasible and \(x^{(\ell )}\) is an optimal solution for Q, we have \({\mathbf{E}}\left[ v(x^{(\ell )})\right] \ge {\mathbf{E}}\left[ v(x^{(n)}\mid _Q)\right]\).

As each item has the same probability of \(\ell /n\) to be in Q, it holds that

$$\begin{aligned}&{\mathbf{E}}\left[ v(x^{(\ell )})\right] \ge {\mathbf{E}}\left[ v(x^{(n)}\mid _Q)\right] = \sum _{j \in I_S} \sum _{s \in R} {\mathbf{Pr}}\left[ j \in Q\right] \cdot x^{(n)}_{j,s} \cdot v_{j,s}\nonumber \\&\quad = \frac{\ell }{n} v(x^{(n)}) \ge \frac{\ell }{n} {\text {OPT}}_S \,. \end{aligned}$$
(20)

In the second step, the online item of round \(\ell\) is determined by choosing one item from Q uniformly at random. Let \(\mathcal {X}\) be the domain of \(x^{(\ell )}\) and \(x' \in \mathcal {X}\). We have

$$\begin{aligned}{\mathbf{E}}\left[ v_{i,r} \mid x^{(\ell )} = x'\right] &= \sum _{j \in Q} \sum _{s \in R} {\mathbf{Pr}}\left[ j=i, s=r\right] v_{j,s} \\&= \sum _{j \in Q} \sum _{s \in R} \frac{1}{\ell } \cdot x'_{j,s} \cdot v_{j,s} \nonumber = \frac{1}{\ell } v(x') \,, \end{aligned}$$
(21)

where we used that each item from Q arrives in round \(\ell\) with probability \(1/\ell\) and the algorithm assigns item j to resource s with probability \(x'_{j,s}\), given \(x^{(\ell )} = x'\). By the law of total expectation, it follows that \({\mathbf{E}}\left[ v_{i,r}\right] = \frac{1}{\ell } {\mathbf{E}}\left[ v(x^{(\ell )})\right]\). Combining with (20) gives the claim. \(\square\)

Hence, by the previous lemma, the expected gain of profit in each round is at least a (1/n)-fraction of \({\text {OPT}}_S\), supposing the remaining resource capacity is large enough. The probability for the latter event is considered in the following lemma. Here, a crucial property is that we deal with \(\delta\)-small options. As in Sect. 5.2, we define \(\varDelta =\frac{1}{1-\delta }\).

Lemma 16

For any round \(\ell \ge dn+1\), it holds that

$$\begin{aligned} {\mathbf{Pr}}\left[ i^{(\ell )}\text { can be assigned to }r^{(\ell )}\right] \ge \frac{c}{d} \left( 1 - \varDelta \ln \frac{\ell }{dn}\right) \,. \end{aligned}$$

Proof

Let \(\xi\) be the event that no item is assigned to r after round dn. Note that \(\xi\) does not necessarily hold, since \(\mathcal {A}_L\) might already have assigned items to r in earlier rounds. By Lemma 13, \({\mathbf{Pr}}\left[ \xi \right] \ge \frac{c}{d}\). Therefore, it is sufficient to show \({\mathbf{Pr}}\left[ i^{(\ell )}\text { can be assigned to }r^{(\ell )} \mid \xi \right] \ge 1 - \varDelta \ln \frac{\ell }{dn}\).

For this purpose, assume that \(\xi\) holds and let X denote the resource consumption of r after round \(\ell - 1\). Further, let \(X_k\) be the resource consumption of r in round \(k < \ell\). We have \(X = \sum _{k=dn+1}^{\ell -1} X_k\). Let Q be the set of k visible items in round k. The set Q can be seen as uniformly drawn from all k-item subsets and any item \(j \in Q\) is the current online item of round k with probability 1/k. Now, the algorithm assigns any item j to resource r with probability \(x_{j,r}^{(k)}\), thus

$$\begin{aligned} {\mathbf{E}}\left[ X_k\right] = \sum _{j \in Q} {\mathbf{Pr}}\left[ j \text { occurs in round } k\right] s_{j,r} x_{j,r}^{(k)} = \frac{1}{k} \sum _{j \in Q} s_{j,r} x_{j,r}^{(k)} \le \frac{W_r}{k}, \end{aligned}$$
(22)

where the last inequality follows from the capacity constraint for resource r in LP 1. By linearity of expectation and inequality (22), the expected resource consumption up to round \(\ell\) is thus

$$\begin{aligned} {\mathbf{E}}\left[ X\right] = \sum _{k=dn+1}^{\ell -1} {\mathbf{E}}\left[ X_k\right] \le \sum _{k=dn+1}^{\ell -1} \frac{W_r}{k} \le W_r \ln \frac{\ell }{dn} \,. \end{aligned}$$
(23)

Now, since \(i^{(\ell )}\) is \(\delta\)-small, \(X < (1-\delta ) W_r\) implies \(X + s_{i^{(\ell )},r^{(\ell )}} \le W_r\), in which case the assignment is feasible. Using (23) and Markov’s inequality, we obtain

$$\begin{aligned} {\mathbf{Pr}}\left[ X < (1-\delta ) W_r \right] = 1 - {\mathbf{Pr}}\left[ X \ge (1-\delta ) W_r\right] \ge 1 - \frac{{\mathbf{E}}\left[ X\right] }{(1-\delta ) W_r} \ge 1 - \varDelta \ln \frac{\ell }{dn} \,. \end{aligned}$$

\(\square\)

The next lemma finally gives the competitive ratio of \(\mathcal {A}_S\).

Lemma 17

It holds that

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}_S\right] \ge \frac{c}{d} \left( (1-d) (1+\varDelta ) - \varDelta \cdot \left( 1 + \frac{1}{n} \right) \cdot \ln \frac{1}{d} \right) {\text {OPT}}_S \,. \end{aligned}$$

Proof

We add the expected profits in single rounds using Lemmas 15 and 16.

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}_S\right]&= \sum _{\ell =dn+1}^{n} {\mathbf{E}}\left[ A_\ell \right] \\&= \sum _{\ell =dn+1}^{n} {\mathbf{E}}\left[ v_{i^{(\ell )},r^{(\ell )}}\right] {\mathbf{Pr}}\left[ i^{(\ell )}\text { can be assigned to }r^{(\ell )}\right] \\&\ge \sum _{\ell =dn+1}^{n} \frac{1}{n} {\text {OPT}}_S \frac{c}{d} \left( 1 - \varDelta \ln \frac{\ell }{dn}\right) \\&= \frac{c}{dn} \left( \sum _{\ell =dn+1}^{n} 1 - \varDelta \ln \frac{\ell }{dn} \right) {\text {OPT}}_S \\&= \frac{c}{dn} \left( n - dn - \varDelta \sum _{\ell =dn+1}^{n} \ln \frac{\ell }{dn} \right) {\text {OPT}}_S \,. \end{aligned}$$

Since \(\frac{\ell }{dn}\) is monotonically increasing in \(\ell\), we have

$$\begin{aligned} \sum _{\ell =dn+1}^{n} \ln \frac{\ell }{dn} = \left( \sum _{\ell =dn}^{n-1} \ln \frac{\ell }{dn}\right) + \ln \frac{n}{nd} \le \left( \int _{dn}^{n} \ln \frac{\ell }{dn} ~\mathrm {d} {\ell }~\right) + \ln \frac{1}{d} \end{aligned}$$

by Fact 1B. The integral \(\int _{dn}^{n} \ln \frac{\ell }{dn} ~\mathrm {d} {\ell }~\) evaluates to \(n \cdot \left( \ln \frac{1}{d} - 1 + d \right)\), so combining the previous inequalities yields

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}_S\right]&> \frac{c}{d} \left( 1 - d - \varDelta \cdot \left( \ln \frac{1}{d} - 1 + d \right) - \frac{\varDelta }{n} \cdot \ln \frac{1}{d}\right) {\text {OPT}}_S \\&= \frac{c}{d} \left( (1 - d) (1+\varDelta ) - \varDelta \cdot \left( 1 + \frac{1}{n} \right) \cdot \ln \frac{1}{d} \right) {\text {OPT}}_S \,. \end{aligned}$$

Note that we obtain the same competitive ratio as in Lemma 11. \(\square\)

5.2.1 Remark

The setting of large capacities (compared to the respective resource demands) has been addressed in several papers [8, 12, 43]. For instance, such settings arise in online auctions, where the budgets are very high compared to single bids. Although the algorithm \(\mathcal {A}_S\) is not tailored for this setting, a corresponding bound can be obtained easily from Lemma 17. Setting \(c=d\) clearly maximizes the performance of \(\mathcal {A}_S\) with respect to \({\text {OPT}}_S\), thus the factor c/d vanishes. Assuming that the maximum resource demand is \(\delta \rightarrow 0\), the competitive ratio of \(\mathcal {A}_S\) tends to \(2 (1-d) - \ln \frac{1}{d}\), since \(\varDelta \rightarrow 1\). This function is maximized for \(d=1/2\), yielding a competitive ratio of \(1 - \ln (2) \ge 0.3068\).

5.3 Proof of Theorem 2

Finally, we prove our main theorem for GAP.

Proof (of Theorem 2)

We set the threshold between large and small options to \(\delta = 1/2\) and consider Algorithm 1 with the algorithms \(\mathcal {A}_L\) and \(\mathcal {A}_S\) as defined previously. By Lemma 14, the expected gain of profit in rounds \(cn+1,\ldots ,dn\) is \({\mathbf{E}}[ \mathcal {A}_L] \ge \left( c \ln \frac{d}{c} - o(1) \right) { {\rm OPT}}_L\). In the following rounds, we gain

$$\begin{aligned} {\mathbf{E}}\left[ \mathcal {A}_S\right] \ge \frac{c}{d} \left( 3 (1-d) - 2 \ln \frac{1}{d} - o(1) \right) {\text {OPT}}_S \end{aligned}$$

according to Lemma 17 (with \(\varDelta = 2\)). Setting \(c=0.5261\) and \(d=0.6906\) gives \(c \ln \frac{d}{c} \approx \frac{c}{d} \left( 3 (1-d) - 2 \ln \frac{1}{d}\right)\) and thus, using \({\text {OPT}}_L + {\text {OPT}}_S \ge {\text {OPT}}\),

$$\begin{aligned}&{\mathbf{E}}\left[ \mathcal {A}_L\right] + {\mathbf{E}}\left[ \mathcal {A}_S\right] \ge \frac{c}{d} \left( 3 (1-d) - 2 \ln \frac{1}{d} - o(1) \right) \left( {\text {OPT}}_L + {\text {OPT}}_S \right) \\&\quad \ge \left( \frac{1}{6.99} - o(1) \right) {\text {OPT}}\,. \end{aligned}$$

\(\square\)