Competitive clustering of stochastic communication patterns on a ring
 189 Downloads
Abstract
This paper studies a fundamental dynamic clustering problem. The input is an online sequence of pairwise communication requests between n nodes (e.g., tasks or virtual machines). Our goal is to minimize the communication cost by partitioning the communicating nodes into \(\ell \) clusters (e.g., physical servers) of size k (e.g., number of virtual machine slots). We assume that if the communicating nodes are located in the same cluster, the communication request costs 0; if the nodes are located in different clusters, the request is served remotely using intercluster communication, at cost 1. Additionally, we can migrate: a node from one cluster to another at cost \(\alpha \ge 1\). We initiate the study of a stochastic problem variant where the communication pattern follows a fixed distribution, set by an adversary. Thus, the online algorithm needs to find a good tradeoff between benefitting from quickly moving to a seemingly good configuration (of low intercluster communication costs), and the risk of prematurely ending up in a configuration which later turns out to be bad, entailing high migration costs. Our main technical contribution is a deterministic online algorithm which is \(O(\log {n})\)competitive with high probability (w.h.p.), for a specific but fundamental class of problems: namely on ring graphs. We also provide first insights in slightly more general models, where the adversary is not restricted to a fixed distribution or the ring.
Keywords
Clustering Repartition Migration Online algorithms RandomizationMathematics Subject Classification
68W20 Randomized algorithms 68W27 Online algorithms1 Introduction
Modern distributed systems are often highly virtualized and feature unprecedented resource allocation flexibilities. For example, these flexibilities can be exploited to improve resource utilization, making it possible to multiplex more applications over the same shared physical infrastructure, reducing operational costs and increasing profits. However, exploiting these resource allocation flexibilities is nontrivial, especially since workloads and resource requirements are timevarying.
This paper studies a fundamental dynamic resource allocation problem underlying many networkintensive distributed applications, e.g., batch processing or streaming applications, or scaleout databases. To minimize the resource footprint (in terms of bandwidth) of such applications as well as latency, we want to collocate frequently communicating tasks or virtual machines on the same physical server, saving communication across the network. The underlying problem can be seen as a clustering problem [4]: nodes (the tasks or virtual machines) need to be partitioned into different clusters (the physical servers), minimizing intercluster communications.
The clustering problem is challenging as the detailed communication patterns are often stochastic and the specific distribution unknown ahead of time. In other words, a clustering algorithm must deal with uncertainties: although two nodes may have communicated frequently in the past, it can turn out later that it is better to collocate different node pairs. Accordingly, clustering decisions may have to be reconsidered, which entails migrations.
Our contributions This paper initiates the study of a natural dynamic clustering problem where communication patterns follow an unknown distribution, chosen by an adversary: the distribution represents the worstcase for the given online algorithm, and communication requests are drawn i.i.d. from this distribution. Our goal is to devise online algorithms which perform well against an optimal offline algorithm which has perfect knowledge of the distribution. Our main technical contribution is a deterministic online algorithm which, for a special but fundamental request pattern family, namely the ring, achieves a competitive ratio of \(O(\log {n})\), with high probability (w.h.p.), i.e., with probability at least \(11/n^c\), where n is the total number of nodes and c is a constant.
We also initiate the discussion of slightly more general models, where the adversary is not restricted to a fixed distribution or a ring, but can pick arbitrary requests from a perfect partition. We present an O(n)competitive algorithm for this more general model of learning a perfect partition.
Novelty and challenges Our work presents an interesting new perspective on several classic problems. For example, our problem is related to the fundamental statistical problem of guessing the most likely distribution (and its parameters) from which a small set of samples is drawn. Indeed, one natural strategy of the online algorithm could be to first simply sample requests, and once a good estimation of the actual distribution emerges, directly move to the optimal clustering configuration. However, as we will show in this paper, the competitive ratio of this strategy can be very bad: the communication cost paid by the online algorithm during sampling can be high. Accordingly, the online algorithm is forced to eliminate distributions early on, i.e., it needs to migrate to seemingly lowcost configurations. And here lies another difference to classic distribution learning problems: in our model, an online algorithm needs to pay for changing configurations, i.e., when revising the “guessed distribution”. In other words, our problem features an interesting combination of distribution learning and efficient searching. It turns out that amortizing the migration costs with the expected benefits (i.e., the reduced communication costs) at the new configuration however is not easy. For example, if the request distribution is uniform, i.e., if all clustering configurations have the same probability, the best strategy is not to move: the migration costs cannot be amortized. However, if the distribution is “almost uniform”, migrations are required and “pay off”. Clearly, distinguishing between uniform and almost uniform distributions is difficult from an online perspective.
2 Model
We consider the problem of partitioning n nodes \(V= \{v_1,v_2,\ldots ,v_n\}\) into \(\ell \) clusters of capacity k each. We assume that \(n=\ell \cdot k\), i.e., nodes perfectly fit into the available clusters, and there is no slack. We call a specific nodecluster assignment a configurationc. We assume that the communication request is generated from a fixed distribution \({\mathscr {D}}\), chosen in a worstcase manner by the adversary. The sequence of actual requests \(\sigma ({\mathscr {D}})=(\sigma _1,\sigma _2,\ldots ,\sigma _T)\), is sampled i.i.d. from this distribution: the communication event at time t is a (directed) node pair \(\sigma _t=(v_i,v_j)\). Alternatively, we represent the distribution \({\mathscr {D}}\) as a weighted graph \(G=(V,E)\). For an edge \((v_i,v_j) \in E(G)\), let the weight of the edge \(p(v_i,v_j)\) denote the probability of a communication request from between \(v_i\) and \(v_j\): each edge \(e\in E\) has a certain probability p(e) and \(\sum _{e\in E} p(e)=1\). A request (i.e., edge in G) \(\sigma _t=(v_i,v_j)\) is called internal if \(v_i\) and \(v_j\) belong to the same cluster at the current configuration (i.e., at the time of the request); otherwise, the request (edge) is called external. We will assume that the communication cost of an external request is 1 and the cost of an internal request is 0.
Note that each configuration uniquely defines external edges that form a “cut”, interconnecting \(\ell \) clusters in G. Therefore in the following, we will treat the terms “configuration” and “cut” as synonyms and use them interchangeably; we will refer to them by c. Moreover, we define the probability of a cut (or identically a configuration) c as the sum of the probabilities of its external edges: \(p(c)=\sum _{e\in c} p(e)\). We also note that there are many configurations which are symmetric, i.e., they are equivalent up to cluster renaming. Accordingly, in the following, we will only focus on the actually different (i.e., nonisomorphic) configurations.
To reduce external communication costs, an algorithm can change the current configuration by using node swaps. Swapping a node pair costs \(2\alpha \) (two node migrations of cost \(\alpha \) each). Since the request probability of different configurations/cuts differs, the goal of the algorithm will be to quickly guess and move toward a good cut, a configuration that reduces its future cost. Figure 1 shows an example.
In particular, we are interested in the online problem variant: we assume that the distribution \({\mathscr {D}}\) of the communication pattern (and hence the \(\sigma \) we observe is generated from) is initially unknown to the online algorithm. Nevertheless, we want the performance of an online clustering algorithm, ON, to be similar to the one of a hypothetical offline algorithm, OFF, which knows the request distribution as well as the number of requests in \(\sigma \), henceforth denoted by \(\sigma \), ahead of time. In particular, OFF can move before any request occurs or \(\sigma \) is generated.
As a first step, we focus on partitioning problems where \(\ell =2\). We consider a fundamental case, the ring communication pattern. That is, the communication graph G is the cycle graph (a ring) and the event space is defined over the edges \(E=\{(v_1,v_2),(v_2,v_3),\ldots ,(v_{n1},v_k),(v_n,v_1)\}\). Moreover, we assume configurations that minimize the cut, that is nodes are partitioned according to contiguous subsequences of the identifier space. Each cluster is (up to modulo) of the form, \(\{(v_{i},v_{i+1}, \ldots ,v_{i+k1}\}\). This communication pattern is not only fundamental but also captures the aspects and inherent tradeoffs rendering the problem nontrivial. In this model, an algorithm changes configurations using rotations (either clockwise or counterclockwise). A rotation swaps two nodes incident to opposite cut edges (hence incurring the cost \(2\alpha \)). See Fig. 2.
3 The challenge of dynamic clustering
In order to acquaint ourselves with the problem and understand the fundamental challenges involved in dynamic clustering, we first provide some examples and discuss naive strategies. Let us consider an example with \(n=2k\) nodes divided into \(\ell =2\) clusters of size k. There are k possible configurations/cuts: \(\{c_0,c_1,\ldots , c_{k1}\}\). At one end of the algorithmic spectrum lies a lazy algorithm which never moves, let’s call it LAZY. At the other end of the spectrum lies a very proactive algorithm which greedily moves to the configuration which so far received the least external requests, let’s call it GREEDY. Both LAZY and GREEDY are doomed to fail, i.e., they have a large competitive ratio: LAZY fails under a request distribution where the initial external cut has probability 1, i.e., \(p(c_0)=1\) and for any \(i>0\), \(p(c_{i})=0\): LAZY pays for all requests, while after a simple node swap all communication costs would be 0. GREEDY fails in uniform distributions, i.e., if \(p(c_i)=1/k\) for all i: the best configuration is continuously changing, and in particular, the best cut is likely to be at distance \(\Omega (k)\) from the initial configuration \(c_0\): GREEDY quickly incurs migration costs in the order of \(\Omega (\alpha \cdot k)\), while staying at the same location would cost 1 / k per request. Thus, the competitive ratios grow quickly in the number of requests and in the number of nodes.
Another intuitive strategy could be to wait in the initial configuration \(c_0\) for some time, simply observing and sampling the actual distribution, until a “sufficiently accurate” estimation of the distribution is obtained. Then, we move directly to the (hopefully) optimal configuration. Thus, the problem boils down to the classic statistical problem of estimating the distribution (and its parameters) from samples. However, it is easy to see that waiting for the optimal distribution to emerge is costly. Imagine for example a scenario where the initial configuration/cut \(c_0\) has a high probability, and there are two additional cuts \(c_1\) and \(c_2\) which have almost the same low probability (for example polynomially low probability) . Clearly, waiting at \(c_0\) to learn whether \(c_1\) or \(c_2\) is better is not only very costly, but it may also be pointless: even if the online algorithm ended up at \(c_1\) although \(c_2\) was a little bit better, the resulting competitive ratio could be still small.
 1.
Migrate early...: An online algorithm should migrate away from a suboptimal configuration early, possibly long before the optimal configuration can be guessed.
 2.
... but not too early...: An online algorithm should avoid frequent migrations, e.g., due to a wrong or poor estimate of the actual request distribution.
 3.
... and locally: Especially if the length of \(\sigma \) is small (small number of requests), it may not make sense to migrate to an optimal but faraway location, even if the distribution is known: even OFF would not move there.
4 Deterministic and competitive clustering
With these intuitions and challenges in mind, we present our solution. Let us first start with the offline algorithm. It is easy to see that OFF, knowing the distribution as well as the number of requests, only moves once in time (i.e., one move consisting of multiple migrations or node swaps): namely in the beginning and to the configuration providing an optimal expected costbenefit tradeoff. Concretely, OFF computes for each configuration \(c_i\), its expected costbenefit tradeoff: the communication cost of configuration \(c_i\) is \(\sigma \cdot p(c_i)\) and the cost of moving there is \( 2\alpha \cdot d(c_0,c_i)\), where \(d(\cdot ,\cdot )\) is the rotation distance between the two configurations (the smallest number of rotation moves to reach the other configuration). Thus, OFF will move to \(c_{OFF}:=\arg \min _{c_i} \sigma .p(c_i)+(2\alpha \cdot d(c_0,c_i))\) (note that this configuration is not necessarily unique). In the following, we will use the short form \(d_i = d(c_0,c_i)\) to denote distances relative to \(c_0\), the initial configuration.

Eliminating bad configurations: We define conditions for configurations which, if met, allow us to eliminate the corresponding configurations once and for all. In particular, we will guarantee (w.h.p.) that an online algorithm be competitive (even) if it never moves back to such a configuration anymore in the future. In other words, our online algorithm will only move between configurations for which this condition is not true yet.

Local migrations and growingradius search strategy: In order to avoid high migration costs, our online algorithm is local in the sense that it only moves to nearby cuts/configurations once the condition of the current configuration is met and it needs to be eliminated. Concretely, our online algorithm is based on a growingradius search strategy: we only migrate to valid configurations lying within the given radius. Only if no such configurations exist, the search radius is increased.
 Amortization: The radius growth strategy alone is not sufficient to provide the necessary amortization for being competitive. Two additions are required:
 1.
Directed search: An online algorithm may still incur a high migration cost when frequently moving backandforth within a given radius, chasing the next best configuration. Therefore, our proposed online algorithm first moves in one direction only (clockwise), and then in the other direction, bounding the number of times the \(c_0\) configuration is crossed.
 2.
Lazy expansion: Even once all configurations within this radius have been eliminated, the online algorithm should not immediately move to configurations in the next larger interval. Rather, the algorithm waits until a certain amount of requests have been accumulated, allowing to amortize the migrations (an “insurance”).
 1.
Let us now elaborate more on the moving strategy. Before going into the details however, let us note that for ease of presentation, we will use two different but equivalent numbering schemes to refer to configurations: depending on what is more useful in the current context. In particular, while talking about the number of requests, r[], we often enumerate configurations globally, \(0,1,2, \ldots , k\). When discussing moving strategies, we often enumerate configurations relative to \(c_0\), i.e., \(1,1,2,2,\ldots ,c_{k/2}\), depending on whether they are located clock or counterclock wise from \(c_0\).
This is reminiscent of classic line searching [13] type problems like “the goat searches the hole in the fence”escape problems: moving in one direction only, the goat may risk missing a nearby hole in the other direction. That is, moving greedily in one direction is \(\Omega (F)\) competitive only, where F is the circumference of the fence, which in our case means that the competitive ratio is \(\Omega (k)\). Accordingly, some combination of searchleft and searchright is required. Our search radius R is centered around \(c_0\) at any time during the execution of the algorithm, and we always first explore all remaining noneliminated configurations in one direction, and then explore the remaining configurations in the other direction. In other words, starting from \(c_0\), we alternate the search between the positive and negative configurations following the sequence: \((0,1,2,1,2,3,4,3,4,\ldots , 8, 5 , \ldots ,16, \ldots , 2^{2i1}1, \ldots ,2^{2i+1}, 2^{2i}+1, \ldots ,2^{2i+2}, \ldots )\). Thus, configuration \(c_0\) is crossed only a constant number of times per given radius R. We call this sequence the searching path.
Given a moving strategy, we next note that we should not move too fast: we introduce a second condition for when it is safe to move. When in a configuration \(2^{2i}\) and before we want to explore configurations in \([2^{2i+1},2^{2i1}]\), we wait in the configuration \(c_{\min }\) between configurations \(2^{2i1}\) and \(2^{2i}\), until this configuration fulfills \(r[c_{\min }] \ge \alpha \cdot 2^{2i+1}\). Similarly, when moving from the configuration \(2^{2i+1}\) to explore the configurations in \([2^{2i},2^{2i+2}]\), we will wait at \(c_{\min }\) between \([2^{2i+1},2^{2i}]\), until \(r[c_{\min }] \ge \alpha \cdot 2^{2i+2}\).
5 Analysis
We first make some general observations on our elimination condition. Subsequently, we will present a costbreakdown which will be helpful to analyze the competitive ratio of ON: we will show that each cost component is competitive with respect to the optimal offline algorithm. We first prove the following helper claim.
Claim 1
Proof
End of Proof of Claim 1
The next lemma provides an intuition of our algorithm and its condition.
Lemma 1
Proof
5.1 A cost breakdown
It is convenient to break down the algorithm costs into different components. In case of OFF, the situation is fairly easy: OFF simply incurs a migration cost, hencefoth denoted by \({\textsc {Off}} _{ {mig}}\), of \({\textsc {Off}} _{ {mig}}=2\alpha \cdot d_{ {OFF}}\) to move to the optimal location \(c_{ {OFF}}\), where \(d_{ {OFF}}\) is the rotation distance between \(c_0\) and \(c_{ {OFF}}\), plus an expected communication cost \({\textsc {Off}} _{ {comm}}\) of \(\sigma  \cdot p(c_{ {OFF}})\).
In case of ON, the situation is more complicated. In particular, while we do not distinguish between different migration costs for ON either, we consider three types of communication costs for ON: \({\textsc {On}} _{ {elim}}\) is the elimination cost, i.e., the total communication cost incurred while ON is waiting on every configuration that has not been eliminated yet, until the condition \(\texttt {Cond}{(j,i, \epsilon )}\) is fulfilled for the current configuration. \({\textsc {On}} _{ {ins}}\) is the “insurance” cost paid by ON when waiting in an already eliminated configuration, until being allowed to actually move beyond the current radius to a noneliminated configuration. Finally, \({\textsc {On}} _{ {final}}\) is the communication cost paid by ON once it reached its final configuration and all other configurations have been eliminated. (Note that the cost incurred at the final configuration while there are still other, noneliminated configurations, is counted toward elimination costs.)
The total communication cost \({\textsc {On}} _{ {comm}}\) is the sum of these three costs. In the following, we will prove that all these cost components are competitive compared to OFF’s overall costs, from which the bound on the competitive ratio is obtained.
5.2 Competitive ratio
We now prove that our online algorithm ON performs well with high probability (w.h.p.). That is, we derive a competitive ratio of \(O(\log {k})\) which holds with probability at least \(11/n^q\) for some constant q.
Theorem 1
The competitive ratio achieved by ON is \(\rho \in O(\log {n})\) with high probability.
5.2.1 Elimination costs
To calculate the elimination cost (the total cost resulting from waiting at different configurations until \(\texttt {Cond}()\) holds for the current configuration), we divide all configurations into two sets: configurations c for which \(p(c) \le 20p_{\min }\) and configurations \(c'\) for which \(p(c') > 20p_{\min }\). We consider the elimination cost for these two sets in turn.
All configurations c for which \(p(c) \le 20p_{\min }\). We will consider again two cases. Let e[c] the cost of elimination on a position c (number of requests served until the condition of elimination of c is fulfilled). Either \(e[c] \le 20\log {n}\) or \(e[c] > 20\log {n}\). In the first case we can just say that the number of configuration we have to eliminate is in \(O({\textsc {On}} _{ {migr}})\) and so \(\sum \nolimits _{e(c_i)\le 20\log {n}}{e(c_i)}\le O(\log {n} \cdot {\textsc {On}} _{ {migr}}) = O(\log {n} \cdot {\textsc {Off}})\).
For the other case, where \(e(c_i) > 20\cdot \log {n}\), we use the following claim:
Claim 2
Let \(\Delta = [t_1,t_2]\) be a time interval. We note \(r[c](\Delta ) = r[c](t_2) r[c](t_1) \), where r[c](t) is the number of requests on the configuration c at the time t. Then:
If \(p(c_j) \le 20p(c_i)\) and \(r[c_j](\Delta ) \ge 20\log n\) then w.h.p. \(r[c_j](\Delta ) \le 40r[c_i](\Delta )\).
Proof
First note that from the bound of Eq. (3) w.h.p. \(r[c_j](\Delta ) \le 2E[r[c_j](\Delta )]\). Similarly since \(E[r[c_i]] \ge \frac{1}{20}E[r[c_j]]\) we have that w.h.p. \(r[c_i](\Delta ) \ge \frac{1}{2}E[r[c_i](\Delta )] \ge \frac{1}{40}E[r[c_j](\Delta )]\). So w.h.p. \(r[c_j](\Delta ) \le 40r[c_i](\Delta )\). \(\square \)
From Claim 2 and union bound over at most n states we get that w.h.p. \(r[c_j](\Delta _j) \le 40 r[c_{\min }](\Delta _j)\) for all such configurations, with \(\Delta _j\) denoting the time interval where we stayed on the configuration \(c_j\), and \(c_j\) was not eliminated (which means \(\bigcup _{c_j}\Delta _j=[0,\sigma ]\)).

All configurations \(c'\) for which \(p(c') > 20p_{\min }\). For this we claim:
Claim 3
If \(p(c_j) \ge 20p(c_i)\) and \(r[c_j] \ge 20\log n\) then w.h.p. \(r[c_j] > 5r[c_i]\) and \(\texttt {Cond}(r[c_j], r[c_i],\epsilon )\) is True for \(\epsilon = \frac{1}{n^2}\).
Proof
Since \(r[c_j] \ge 20\log n\) w.h.p. \(E[r[c_j]] \le 2r[c_j]\). If \(r[c_i] > \frac{1}{5}r[c_j]\) then w.h.p. \(E[r[c_i]] > \frac{1}{10}r[c_j]\), but this contradicts the assumption that \(E[r[c_i]] \le \frac{1}{20}E[r[c_j]]\). So we have \(\frac{r[c_i]}{r[c_j]} \le \frac{1}{5}\) and \(\texttt {Cond}(j,i, \epsilon )\) holds for \(\epsilon = \frac{1}{n^2}\). \(\square \)
5.2.2 Migration cost

In the first case, \(d_{ {OFF}} \ge d_{ {far}}\), we can prove
Lemma 2
if \(d_{ {OFF}} \ge d_{ {far}}\) then \({\textsc {On}} _{ {mig}} \le 6\cdot {\textsc {Off}} _{ {mig}}(\sigma ) \).
Proof

If \(d_{ {OFF}} < d_{ {far}}\), then from Claims 2 and 3 with \(\Delta = [0,\sigma ]\) it follows that w.h.p. \(r[c_{ {OFF}}] \in \Omega (\alpha \cdot d_{ {far}})\): Recall that in our algorithm (line 15) we only move beyond the current radius if the corresponding costs have been amortized. Hence \({\textsc {On}} _{ {mig}} \le {\textsc {Off}} _{ {comm}}\).
5.2.3 Insurance costs
For the insurance cost we also consider several cases. Let \(c_{ {far}}\) be the farthest configuration reached by our online algorithm. Let \(c_{ {OFF}}\) denote the location of the offline algorithm. We split \({\textsc {On}} _{ {ins}}\) into two parts: \({\textsc {On}} _{ {ins<far}}\) and \({\textsc {On}} _{ {ins=far}}\). \({\textsc {On}} _{ {ins<far}}\) is the insurance cost up to (not including) \(c_{ {far}}\) while \({\textsc {On}} _{ {ins=far}}\) is the insurance cost paid on \(c_{ {far}}\). The last insurance cost, paid before the last migration to \(c_{ {far}}\), is \(\alpha d_{ {far}}\), so we have \({\textsc {On}} _{ {ins<far}} \le O({\textsc {On}} _{ {mig}}) = O({\textsc {Off}})\) (see the migration cost analysis).

\(c_{ {OFF}}\) is in \({\mathscr {E}}\) (eliminated configuration). Since \(c_{ {OFF}}\) was eliminated before \(c_{ {far}}\) if follows from Claims 2 and 3 that w.h.p. \(r[c_{ {OFF}}] > \Omega (r[c_{ {far}}])\) so \({\textsc {On}} _{ {ins=far}}<O({\textsc {Off}} _{ {comm}})\).

\(c_{ {OFF}}\) is in \(\overline{{\mathscr {E}}}\). In this case because of our searching path and the selection of \(c_{ {next}}\), we have \(d_{ {OFF}} \ge d_{ {next}}/2\). Therefore \({\textsc {On}} _{ {ins=far}} \le O({\textsc {Off}} _{ {mig}})\).
5.2.4 Final costs
By definition, in the final configuration, all other configurations have been eliminated. Thus, our condition, \(\texttt {Cond}(r[c_j], r[c_i],\epsilon )\), has been fulfilled at some point for any \(c_j\), with respect to some \(c_i\). The probability that we eliminate a minimum configuration and end up at a suboptimal configuration is small. This follows from Lemma 1, when setting \(\epsilon :=\frac{1}{n^2}\): once we stopped in a configuration, it is, with high probability, a (not necessarily unique) minimal configuration. Since OFF directly moves to a minimum configuration (which may not be unique), ON cannot incur a higher cost than OFF on a specific minimum configuration, i.e., not more than \(r[c_{\min }]\). As the offline algorithm moved from the start to a configuration \(c_{OFF}\) and \(r[c_{\min }]\) is the configuration with the lowest number of requests, \(r[c_{ {OFF}}] \ge r[c_{\min }]\). Thus, \({\textsc {On}} _{ {final}}(\sigma ) \le {\textsc {Off}} (\sigma )\), and also \({\textsc {On}} _{ {final}}(\sigma )/{\textsc {Off}} (\sigma ) =O(1)\).
5.2.5 Overall costs
6 Beyond stochastic adversary
So far we assumed that the adversary is restricted to sample requests i.i.d. from a distribution of its choice. In this section we make a first attempt to relax this assumption and consider an adversary who can adapt the communication frequencies depending ON’s deterministic choices. However, we require that the requests come from a perfect partition in the following sense: there exists a configuration without any intercluster communications. An optimal offline algorithm may hence simply move to such a perfect partition (the closest one from the initial configuration), and the goal of the online algorithm is to learn a perfect partition.
6.1 Ring communication pattern
We start by assuming that the perfect partition is a subset of a ring communication pattern (but frequencies can change arbitrarily over time). Observe that in this model, ON must move as soon as a remote request hits the current configuration: otherwise the adversary will simply repeat this request arbitrarily. A naive strategy would be to move to the next configuration, e.g., in clockwise direction. This algorithm is O(k)competitive: If \(d_F=d(c_0,c_F)\ge 1\) (where \(c_F\) is the perfect partition closest to the initial configuration), OFF pays at least \(\Omega (\alpha )\) to reach \(c_F\), whereas ON may pay \(O(\alpha k)\) if the optimal configuration is in counterclockwise direction.
We can improve this algorithm by replacing the search strategy, in the spirit of our previous algorithms. Starting from \(c_0\), ON visits all configurations within the current radius R before moving to a configuration \(c_i\) s.t. \(R < d(c_0, c_i) \le 2R\). Concretely, ON moves according to the sequence \((0, 1, 1, 2, 2, 3, 4, 3, 4, \ldots , 8, 5, \ldots , 8, \ldots , 2^{i1}+1, \ldots , 2^{i}, 2^{i1}1, \ldots , 2^{i}, \ldots )\) when the current radius is \(R=2^i\) and \(i>0\) is even. Similarly, for \(j>0\) being odd, the search sequence within the current radius \(R=2^j\) is \( 2^{j1}1, \ldots , 2^{j}, 2^{j1}+1, \ldots , 2^{j}\). Notice that whenever the search within the current radius \(R=2^h\) is complete, we extend the search to the next radius \(2^{h+1}\) without changing direction.
6.2 More general communication pattern
Let us now remove the constraint that requests need to come from a ring, but allow the adversary to choose request sequences from an arbitrary perfect partition. Again, the goal of the online algorithm is to learn this perfect partition, at low cost.
The rebalancing implements a standard dynamic program known from Partition or Subset Sum problems. In addition, “rebalance()” embeds a heuristic necessary to achieve a good competitive ratio. Specifically, we compute the specific partitions that are closest to the initial configuration, denoted by \(A_I\). Let \((A',B')\) be the current configuration (i.e. the content of the clusters \({\mathscr {A}} \) and \({\mathscr {B}} \)). We define the current distance as \(d(A',A_I) = A' \triangle A_I\).
The lines 13 and 15 reflect this choice of partitioning. More technically, the dynamic program in Algorithm 3 computes the minimum distance partition for all possible cluster sizes (up to k), which is stored as subsolutions P(i, j). Each subproblem identified by the pair (i, j) corresponds to a mindistance partition of the first j components into two clusters of size i. Each P(i, j) is computed by considering whether to take the last component \({\mathscr {C}}_j\) for the cluster \({\mathscr {A}} \) or not. If the component originally was located on \({\mathscr {A}} \) (i.e. \(j \in A_I\)) then not putting it back there increases the distance by \({\mathscr {C}}_j\). Conversely, relocating a component to \({\mathscr {A}} \) knowing that it was not initially there also increases the distance in a similar way.
Next, at line 24, the algorithm traces the dynamic program choices in reverse direction beginning with the topmost solution P(k, m) and constructs the new partition. Eventually at line 27, the actual rebalancing takes place by swapping nodes between the clusters until nodes that belong to the same component are collocated on the same cluster.
We have the following result.
Theorem 2
The online Algorithm 2 has a competitive ratio in O(k) and runs in polynomial time (per request).
Proof
For any intermediate configuration \({\mathscr {K}}'\), we define the distance measure as \({\textit{dist}} ({\mathscr {K}}',{\mathscr {K}}_I) = d(A',A_I) = A' \triangle A_I\). Obviously, \({\textit{dist}} ({\mathscr {K}}_F, {\mathscr {K}}_I) = d(A_F, A_I)=2x\).
Since any intercluster communication is followed by a rebalancing, we only need to bound the migration cost PPL pays over the course of all requests until it reaches \({\mathscr {K}}_F\). Let \(\textit{PPL}_{ {mig}}()\) denote the number of nodes that migrate during the rebalancing triggered by \(\sigma _t\). First, note that there are at most \(2(k1)\) calls to rebalance(). Because, after each occurrence, the number of components decreases by at least 1 and there are initially k components of each type which eventually collocate in two large components (of size k). Now we analyze the worstcase cost of the rebalancing upon a request \(\sigma _t\), i.e. \(\textit{PPL}_{ {mig}}(\sigma _t)\).
 1.
the majority are black. In this case, after the rebalancing, there are more than xblack nodes on cluster \({\mathscr {A}} \) which means the same number of red nodes exist on cluster \({\mathscr {B}} \). This in turn implies that the distance is more than 2x after the rebalancing, i.e., \(d(A_t, A_I) > 2x\). On the other hand, we know by assumption that moving to the configuration \({\mathscr {K}}_F\) would yield the distance exactly 2x. Thus, the partition \((A',B')=(A_t,B_t)\) computed by Algorithm 3 is not optimal.
 2.
the majority are red. Since these red nodes are on \({\mathscr {B}} \) before rebalancing (i.e. in \(B_{t1}\)), the same number of black nodes exist in \(A_{t1}\). Therefore, the distance before rebalancing is \(d(A_{t1}, A_I) > 2x\). Similar to the first case (but for time \(t1\)), this contradicts the optimality of Algorithm 3.
It remains to show the polynomial runtime. It is easy to see that the running time is dominated by line (7) when PPL computes a new partition. The dynamic program computes a table of \((k+1)\cdot (2k+1) \in \Theta (k^2)\) integers. Then we trace the optimal path in the table in time \(\Theta (k)\). Thus, the total computation for each request takes time in \(\Theta (k^2)\). \(\square \)
7 Related work
Our paper takes a novel perspective on a range of classic problems. First, clustering and graph partitioning problems as well as repartitioning problems [22] have been studied for many years and in many contexts. These problems are usually NPcomplete and even hard to approximate [2]. Especially partitioning problems for two clusters (\(\ell =2\) in our case), known as minimum bisection problems [10], have been studied intensively. Minimum bisection problems are known to allow for good, \(O(\log ^{1.5} n)\)factor approximations [14]. Problem variants with \(k=2\) correspond to maximum matching problems, which are polynomialtime solvable. In contrast to our work however, these models assume an offline perspective where the problem input is given ahead of time. In the online world, our problem is related to page (resp. file) migration [5, 7] and server migration [6] problems: in these problems, a server needs to be migrated close to requests occurring on a graph, trading off access and migration costs. In the former problem variant, migration costs relate to distance; in the latter, migration costs relate to the available bandwidth along migration paths. Moreover, in our problem, a skirental resp. rentorbuy like tradeoff between migration and communication costs needs to be found. However, migrations do not occur along a graph but between clusters, and multiple nodes can be migrated simultaneously. The large configuration space also renders solutions based on metrical task system approaches [8] inefficient. Another interesting connection exists to kserver problems [12], where multiple servers can “collaboratively” serve requests. In some sense, our problem can be seen as the opposite problem, where rather than aiming to move servers to the locations where the requests occur, we aim to move away and avoid configurations (i.e., cuts) where requests occur. More importantly, compared to classic online migration problems where requests define a unique optimal location from which they can be served at minimal cost (namely at the corresponding graph vertex), in our case, a request only reveals very limited information about the optimal (minimal cost) configuration. In other words, a single request only contains very limited information about how good a current clustering is, and how far (in terms of migrations) we are from an optimal offline location.
Our model can be seen as a generalization of online paging [11, 15, 16, 21, 23], and especially its variants with bypassing [1, 9]. However, in general, in our model, the “cache” is distributed: requests occur between nodes and not to nodes, and costs can be saved by collocation.
Our problem also has connections to online packing problems, where items of different sizes arriving over time need to be packed into a minimal number of bins [19, 20]. In contrast to these problems, however, in our case the objective is not to minimize the number of bins but rather the number of “links” between bins, given a fixed number of bins.
The paper closest to ours is [4] which studies online partitioning problems from a deterministic perspective, i.e., \(\sigma \) is generated in a deterministic manner. In this setting, it has been shown that the competitive ratio is inherently high, at least linear in k, and even if the online algorithm is allowed to user larger clusters than the offline algorithm (scenario with augmentation). We in this paper initiate the study of stochastic models where request patterns are drawn from an unknown but fixed distribution, and show that polylogarithmic bounds can be achieved under ring patterns, even without augmentation.
In general, we believe that a key conceptual contribution of our model itself regards the underlying combination of learning and searching. Indeed, while the fundamental problem of how to efficiently learn a distribution has been explored for many decades [18], our perspective comes with an additional locality requirement, namely that searching induces costs (i.e., migrations).
8 Conclusion
This paper initiated the study of a natural cluster learning problem where the search procedure entails costs: communication costs occur in “suboptimal” clustering configurations and migration costs occur when switching between configurations. In particular, we presented an efficient online clustering algorithm which performs well even if compared to an offline algorithm which knows the distribution of the communication pattern ahead of time. Indeed, the \(O(\log {k})\) competitive ratio is interesting as k is likely to be small in the applications considered in this paper: k corresponds to the number of virtual machines that can be hosted on the same server, e.g., the number of cores. Moreover, we believe that our online approach is interesting in practice as it does not rely on any assumptions on the communication distribution, which may turn out to be wrong.
We believe that our work sheds an interesting new light on multiple classic problems, and opens an interesting field for future research. In particular, it would be interesting to know whether similar competitive ratios can be achieved even for more general communication patterns. Moreover, so far we have only focused on deterministic algorithms, and the exploration of randomized algorithms constitutes another interesting avenue for future research.
Notes
Acknowledgements
Open access funding provided by University of Vienna. Research supported by the GermanIsraeli Foundation for Scientific Research and Development (GIF), Grant no. I1245407.6/2014.
References
 1.Adamaszek A, Czumaj A, Englert M, Räcke H (2012) An O(log k)competitive algorithm for generalized caching. In: Proceedings of 23rd SODA, pp 1681–1689Google Scholar
 2.Andreev K, Räcke H (2006) Balanced graph partitioning. Theory Comput Syst 39(6):929–939MathSciNetCrossRefzbMATHGoogle Scholar
 3.Avin C, Cohen L, Schmid S (2017) Competitive clustering of stochastic communication patterns on the ring. In: Proceedings of 5th international conference on networked systems (NETYS)Google Scholar
 4.Avin C, Loukas A, Pacut M, Schmid S (2016) Online balanced repartitioning. In: Proceedings of 30th international symposium on distributed computing (DISC)Google Scholar
 5.Bartal Y, Charikar M, Indyk P (2001) On page migration and other relaxed task systems. Theor Comput Sci 268(1):43–66 Also appeared in Proc. of the 8th SODA, pages 43–52, 1997MathSciNetCrossRefzbMATHGoogle Scholar
 6.Bienkowski M, Feldmann A, Grassler J, Schaffrath G, Schmid S (2014) The widearea virtual service migration problem: a competitive analysis approach. IEEE/ACM Trans Netw 22:165–178CrossRefGoogle Scholar
 7.Black DL, Sleator DD (1989) Competitive algorithms for replication and migration problems. CarnegieMellon University, Department of Computer Science, Pittsburgh, USAGoogle Scholar
 8.Borodin A, Linial N, Saks ME (1992) An optimal online algorithm for metrical task system. J ACM 39(4):745–763 Also appeared in Proc. of the 19th STOC, pages 373–382, 1987MathSciNetCrossRefzbMATHGoogle Scholar
 9.Epstein L, Imreh C, Levin A, NagyGyörgy J (2015) Online file caching with rejection penalties. Algorithmica 71(2):279–306MathSciNetCrossRefzbMATHGoogle Scholar
 10.Feige U, Krauthgamer R (2002) A polylogarithmic approximation of the minimum bisection. SIAM J Comput 31(4):1090–1118MathSciNetCrossRefzbMATHGoogle Scholar
 11.Fiat A, Karp RM, Luby M, McGeoch LA, Sleator DD, Young NE (1991) Competitive paging algorithms. J Algorithms 12(4):685–699CrossRefzbMATHGoogle Scholar
 12.Fiat A, Rabani Y, Ravid Y (1994) Competitive kserver algorithms. J Comput Syst Sci 48(3):410–428MathSciNetCrossRefzbMATHGoogle Scholar
 13.Franck W (1965) An optimal search problem. SIAM Rev 7(4):503–512MathSciNetCrossRefzbMATHGoogle Scholar
 14.Krauthgamer R, Feige U (2006) A polylogarithmic approximation of the minimum bisection. SIAM Rev 48(1):99–130MathSciNetCrossRefzbMATHGoogle Scholar
 15.McGeoch LA, Sleator DD (1991) A strongly competitive randomized paging algorithm. Algorithmica 6(6):816–825MathSciNetCrossRefzbMATHGoogle Scholar
 16.Mendel M, Seiden SS (2004) Online companion caching. Theor Comput Sci 324(2–3):183–200MathSciNetCrossRefzbMATHGoogle Scholar
 17.Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
 18.Pöschel T, Ebeling W, Rosé H (1995) Guessing probability distributions from small samples. J Stat Phys 80(5–6):1443–1452CrossRefzbMATHGoogle Scholar
 19.Ramanan PV, Brown DJ, Lee CC, Lee DT (1989) Online bin packing in linear time. J Algorithms 10(3):305–326MathSciNetCrossRefzbMATHGoogle Scholar
 20.Seiden SS (2002) On the online bin packing problem. J ACM 49(5):640–671MathSciNetCrossRefzbMATHGoogle Scholar
 21.Sleator DD, Tarjan RE (1985) Amortized efficiency of list update and paging rules. Commun ACM 28(2):202–208MathSciNetCrossRefGoogle Scholar
 22.Vaquero L, Cuadrado F, Logothetis D, Martella C (2013) Adaptive partitioning for largescale dynamic graphs. In: Proceedings of 4th annual symposium on cloud computing (SOCC), pp 35:1–35:2Google Scholar
 23.Young NE (1991) Online caching as cache size varies. In: Proceedings of the 2nd ACMSIAM symposium on discrete algorithms (SODA), pp 241–250Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.