1 Introduction

The model we study in the current paper is a “marriage” between Jante’s law process and the Bak–Sneppen model. Jante’s law process refers to the interacting particle model studied in [6] under the name “Keynesian beauty contest process”, and generalized in [8]. This model runs as follows. Fix an integer \(N\ge 3\), \(d\ge 1\), and some d-dimensional random variable \(\zeta \). Let the initial configuration consist of N arbitrary points in \({\mathbb {R}}^d\). The process runs in discrete time according to the following algorithm: first, compute the centre of mass \(\mu \) of the given configuration of N points; then replace the point which is the most distant from \(\mu \) by a new \(\zeta -\)distributed point drawn independently each time. It was shown in [6] that if \(\zeta \) has a uniform distribution on the unit cube, then all but one points converge to some random point in \({\mathbb {R}}^d\). This result was further generalized in [8], by allowing \(\zeta \) to have an arbitrary distribution, and additionally removing not just 1, but \(K\ge 1\) points chosen to minimize a certain functional. The term “Jante’s law process” was also coined in [8], to reflect that this process is reminiscent of the “Law of Jante” principle, which describes patterns of group behaviour towards individuals within Scandinavian countries that criticises individual success and achievement as unworthy and inappropriate; in other words, it is better to be “like everyone else”. The origin of this “law” dates back to Aksel Sandemose  [12]. Another modification of this model in one dimension, called the p-contest, was introduced in [6, 7] and later studied e.g. in [9]. This model runs as follows: fix some constant \(p\in (0,1)\cup (1,\infty )\), and replace the point which is the farthest from \(p\mu \) (rather than \(\mu \)).

Finally, we want to mention that the phenomenon of conformity is observed in many large social networks, see, for example, [4, 10, 13] and references therein.

Pieter Trapman (2018, personal communications) suggested to study Jante’s law model with local interactions, thus making it somewhat similar to the famous Bak–Sneppen (BS) model see e.g. [1]. In the BS model, N species are located around a circle, and each of them is associated with a so-called “fitness”, which is a real number. The algorithm consists in choosing the least fit individual, and then replacing it and both of its two closest neighbours by a new species, with a new random and independent fitness. After a long time, there will be a minimum fitness, below which species do not survive. The model proceeds through certain events, called “avalanches”, until it reaches a state of relative stability where all fitnesses are above a certain threshold level. There is a version of the model where fitnesses take only values 0 and 1 (see [2] and [15]), but even this simplified version turns out to be notoriously difficult to analyse, see e.g. [11]. Some more recent results can be found in [3, 14].

The barycentric Bak–Sneppen model, or, equivalently, Jante’s law process with local interactions, is defined as follows. Unlike the classical Bak–Sneppen model, our model is based on some local phenomena, which makes it much more tractable mathematically, and hence we are able to obtain substantial rigorous results.

Fix an integer \(N\ge 3\), and let \(S=\{1,2,\dots ,N\}\) be the set of nodes uniformly spaced on a circle. At time t, each node \(i\in S\) has a certain “fitness” \(X_i(t)\in {\mathbb {R}}\); let \(X(t)=(X_1(t),\dots ,X_N(t))\). Next, for the vector \(x=(x_1,\dots ,x_N)\), define

$$\begin{aligned} d_i(x)=\left| x_i-\frac{x_{i+1}+x_{i-1}}{2}\right| , \end{aligned}$$

as the measure of local “non-conformity” of the fitness at node i (here and further we will use the convention that \(N+1\equiv 1\), \(N+2\equiv 2\), and \(1-1\equiv N\) for indices on x). Let also \(d(x)=\max _{i\in S} d_i(x)\).

The process runs as follows. Let \(\zeta \) be some fixed one-dimensional random variable. At time t, \(t=0,1,2,\dots \), we chose the “least conformist node”Footnote 1i, i.e. the one maximizing \(d_i(X(t))\), and replace it by a \(\zeta \)-distributed random variable. By \(\jmath (x)\) we denote the index of such a node in the configuration \(x=(x_1,\dots ,x_N)\), that is

$$\begin{aligned} d_{\jmath (x)}(x)=d(x) \end{aligned}$$

(see Fig. 1). If there is more than one such node, we choose any of them with equal probability, thus \(\jmath (x)\) is, in general, a random variable. Also assume that all the coordinates of the initial configuration X(0) lie in the support of \(\zeta \). We are interested in the long-term dynamics of this process.

We start with a somewhat easier version of the problem, where \(\zeta \) takes finitely many distinct values (Sect. 2), and then extend this result to the case where \(\zeta \sim U[0,1]\) (Sect. 3). We will show that all the fitnesses (except the one which has just been updated) converge to the same (random) value. This will hold for each of the two models.

Remark 1

One can naturally extend this model to any finite connected non-oriented graph G with vertex set V, as follows. For any two vertices \(v,u\in V\) that are connected by an edge we write \(u\sim v\). To each vertex v assign a fitness \(x_v\in {\mathbb {R}}\), and define the measure of non-conformity of this vertex as

$$\begin{aligned} d_v(x)=\left| x_v-\frac{\sum _{u:\ u\sim v} x_u}{N_v}\right| , \end{aligned}$$

where \(N_v=|{u\in V:\, u\sim v}|\) denotes the number of neighbours of v, and the replacement algorithm runs exactly as it is described earlier.

In particular, if G is a cycle graph, we obtain the model studied in the current paper. On the other hand, if G is a complete graph, we obtain the model equivalent to that studied in [6, 8].

Remark 2

Unfortunately, our results cannot be extended to a general model, described in Remark 1. Indeed, assume that \(\mathop {\mathrm {supp}}\zeta =\{0,1\}\). It is not hard to show that if for some v we have \(N_v=1\), then the statement of Theorem 1 does not have to hold.

Moreover, it turns out that even when all the vertices have at least two neighbours (i.e., \(N_v\ge 2\) for all \(v\in V\)), then there are still counterexamples: please see Fig. 2.

Fig. 1
figure 1

Illustration of the distances from the average of the two neighbours; \(\jmath =6\)

Fig. 2
figure 2

On this graph with \(N=6\) vertices, only values x and \(y\in \{0,1\}\) are updated all the time; infinitely often half of the fitnesses equal 0, while the other half equals 1

The rest of the paper is organized as follows. In Sect. 2 we study the easier, discrete, case. We show the convergence by explicitly finding all the absorbing classes for the finite-state Markov chain.

Section 3 contains the main result of our paper, Theorem 2, which shows that all but one fitness converge to the same (random) limit, similarly to the main result of [6].

2 Discrete Case

In this Section we study the case when fitnesses take finitely many values, equally spaced between each other. Due to the shift- and scale-invariance of the model, without loss of generality we may assume that \(\mathop {\mathrm {supp}}\zeta =\{1,2,\dots ,M\}=:\mathcal{M}\), and that \(\displaystyle p=\min _{j\in \mathcal{M}} {\mathbb {P}}(\zeta =j)>0\). In this case X(t) becomes a finite state-space Markov chain on \(\mathcal{M}^N\).

Note that if \(N-1\) fitnesses coincide and are equal to some \(a\in \mathcal{M}\), then it is the fitness that differs from a that will keep being replaced, until it finally coincides with the others. When this happens, we will have to choose randomly one among all the vertices, and replace its fitness. The replaced fitness may or may not differ from a, and then this procedure will repeat over and over again. Hence, to simplify the rest of the argument, we can (and will) safely modify the process as follows:

$$\begin{aligned} X(t+1)\equiv X(t)\text { as soon as }d(X(t))=0\text { i.e. all }X_i(t)=a\text { for some }a\in \mathcal{M}. \end{aligned}$$

We will say that the process that the process is absorbed at value a.

Remark 3

The fact that the values of \(\zeta \) are equally spaced is, surprisingly, crucial. Let \(\mathop {\mathrm {supp}}\zeta =\{0,1,5,6\}=:\mathcal{M}\) and \(N=8\). Then the set of configurations

$$\begin{aligned} {[}0,1,x,5,6,5,y,1],\quad x,y\in \mathcal{M}\end{aligned}$$

is stable; the maximum distance from the average of the fitnesses of the neighbours is always at nodes 3 or 7, and it equals 2 or 3, while the other distances are at most 1.5 or 2 respectively.

Theorem 1

The process X(t) gets absorbed at some value \(a\in \mathcal{M}\), regardless of its starting configuration \(X(0)\in \mathcal{M}^N\).

First, observe that since X(t), \(t=0,1,2,\dots \) is a finite-state Markov chain on \(\mathcal{M}^N\) with the set of absorbing states

$$\begin{aligned} \mathrm{O}=(1,1,\dots ,1) \cup (2,2,\dots ,2) \cup \dots (M,M,\dots ,M) \subset \mathcal{M}^N \end{aligned}$$

it suffices to show that \(\mathrm O\) is accessible (can be reached with a positive probability in some number of steps) from any starting configuration X(0).

First, for \(x=(x_1,x_2,\dots ,x_N)\in \mathcal{M}^N\), define

$$\begin{aligned} \mathrm{Max}(x)&=\max _{1\le i\le N} x_i,\\ S(x)&=\left\{ j\in \{1,2,\dots ,N\}:\ x_j=\mathrm{Max}(x)\right\} . \end{aligned}$$

that is, the maximum of x, and the indices of x where this maximum is achievedFootnote 2. Let us also define

$$\begin{aligned} f(x)=\sum _{i=1}^N (x_i-x_{i+1})^2 \end{aligned}$$

with the convention \(x_{N+1}\equiv x_1\), which we will use as some sort of Lyapunov function. The following two algebraic statements are not difficult to prove.

Claim 1

\(f(x)=0\) if and only if \(d(x)=0\).

Proof

Let \(x=(x_1,\dots ,x_N)\). One direction is trivial: if \(f(x)=0\), then \(x_i\equiv x_1\) for all \(i\in S\) and hence \(d_i(x)=0\) for all \(i\in S\) \(\Longleftrightarrow d(x)=0\).

On the other hand, suppose that \(d_i(x)=0\) for all i. If not all \(x_i\)’s are equal, there must be an index j for which \(x_j=\max _{i\in S}x_i\), and either \(x_{j-1}< x_j\) or \(x_{j+1}< x_j\). This, in turn, implies that \( 2d_j(x)=|(x_j -x_{j-1})+(x_j-x_{j+1})|=(x_j -x_{j-1})+(x_j-x_{j+1})>0 \) yielding a contradiction. \(\square \)

Claim 2

Let \(x=(x_1,\dots ,x_{i-1},x_i,x_{i+1},\dots ,x_N)\) and and \(x'=(x_1,\dots ,x_{i-1},a,x_{i+1},\dots ,x_N)\) where \(a=\left\lfloor \frac{x_{i-1}+x_{i+1}}{2}\right\rfloor \). Then

  1. (a)

    \(f(x')\le f(x)\);

  2. (b)

    if additionally \(d_i(x)\ge 1\) then \(f(x')\le f(x)-1\).

Remark 4

One may expect that there are simpler Lyapunov functions; while we cannot rule this out, let us illustrate two natural candidates that, unfortunately, fail. First, consider d(x); however this function does not work as the next example shows. Let \(x= [1, 3, 9, 18, 24, 27, 27, 24, 18, 9, 3, 1]\). Then \(d_i(x)\) is the largest at \(i=2\) and \(i=11\); thus \(d(x)=d_2(x)=2\). If we replace a “3” by “4”\(=(1+9)/2\), then \(x'= [1, 4, 9, 18, 24, 27, 27, 24, 18, 9, 3, 1]\) so \(d(x')=d_3(x)=2.5>d(x)\).

Another possible candidate, \(\tilde{f}(x)=\sum _i d_i(x)^2\) does not work either: let \(x=[1,6,9,6,1]\), then \(x'=[1,6,6,6,1]\) and \(\tilde{f}(x')> \tilde{f}(x)\), so it is not a Lyapunov function either.

Proof of Claim 2

From simple algebra it follows that

$$\begin{aligned} \frac{f(x')-f(x)}{2}&=(a-x_i)(a+x_i-x_{i-1}-x_{i+1})\\&=\left( a-\frac{x_{i-1}+x_{i+1}}{2}\right) ^2 -\left( x_i-\frac{x_{i-1}+x_{i+1}}{2}\right) ^2 =d_i(x')^2-d_i(x)^2=:(*). \end{aligned}$$

Note that if \(d_i(x)=0\) or \(d_i(x)=1/2\) , then \(d_i(x')=d_i(x)\) and thus \((*)=0\). On the other hand, if \(d_i(x)\ge 1\), since \(d_i(x')\le 1/2\), we get \((*)\le - 1/2\). \(\square \)

To simplify notations, denote

$$\begin{aligned} \jmath _t=\jmath (X(t)),\qquad {\Delta }_t=d(X(t)),\qquad f_t=f(X(t)). \end{aligned}$$

Now we are going to construct an explicit path through which X(t) can reach \(\mathrm O\) starting from any initial state. Let

$$\begin{aligned} A_t&=\left\{ \right. X_{\jmath _t}(t)\text { is replaced by }X_{\jmath _t}(t+1)=\left\lfloor \frac{X_{\jmath _t-1}(t)+X_{\jmath _t+1}(t)}{2}\right\rfloor ,\\&\quad \left. \text { and }\jmath _t\in S(X(t))\text { if possible}\right\} . \end{aligned}$$

Note that the second condition is always possible to satisfy when \(\Delta _t=1/2\). Indeed, if \(\Delta _t=1/2\) for \(X(t)=x\), then there must be a j such that \(x_j=\mathrm{Max}(x)\) but \(x_{j+1}\le \mathrm{Max}(x)-1\). As a result, \(d_j(x)\ge 1/2\) and hence \(x_j\) is one of the points which can be potentially replaced.

Now the statement of Theorem 1 will follow from the following Lemma.

Lemma 1

For any X(0) there is a \(T\ge 0\) such that on the event

$$\begin{aligned} A_0\cap A_1\cap ... \cap A_T \end{aligned}$$

we have \(X(T)\in \mathrm{O}\).

This Lemma, in turn, immediately follows from the next statement and the observation that \(0\le f(x)\le M^2N\), as well as the fact that \(f(X_T)=0\Longleftrightarrow \Delta _T=0 \Longleftrightarrow X_T\in \mathrm{O}\) (see Claim 1).

Claim 3

If \(f_s>0\) then \(f_{s+N-2}\le f_s-1\) on \(A_s \cap A_{s+1} \cap \dots \cap A_{s+N-2}\).

Proof

Note that \(\Delta _t\) can take only values \(\{0,\frac{1}{2},1,\frac{3}{2},2,\dots \}\). W.l.o.g. we assume that \(s=0\).

First, if \(\Delta _t=0\) for some \(0\le t\le N-2\), then \(f_t=0\) by Claim 1 and by Claim 2(a) and the fact that \(f_0\ge 1\), we have \(f_{N-2}\le 0=f_t\le f_0-1\). From now on suppose that \(\min _{0\le t\le N-2}\Delta _t\ge 1/2\).

We will show that it is impossible to have \(\Delta _t= \frac{1}{2}\) simultaneously for all \(t=0,1,2,\dots ,N-3\) (observe that the case \(\Delta _t=1/2\) contains, quite counter-intuitively, a very rich set of states, see Fig. 3). Indeed, the set S(X(t)) of indices of the maximum fitnesses must contain between 2 and \(N-2\) elementsFootnote 3. However, on \(A_t\) we have \(S(X(t+1))\subset S(X(t))\) and \(|S(X(t+1))|=|S(X(t))|-1\) by construction. Since \(S(X(0))\le N-2\), the value \(\Delta _t\) cannot stay equal to 1/2 for \(N-2\) consecutive steps, and thus this case is impossible.

Fig. 3
figure 3

A configuration with \(\Delta _t = 1/2\) (note the periodic boundary conditions), \(\mathcal{M}=\{1,2, \dots , 13\}\) and \(N = 24\). Observe that if \(\Delta _t=1/2\) then there will be a number of “plateaus” each containing at least two maximal fitnesses; moreover, any two such plateaus will be separated by at least two non-maximal fitnesses

As a result, we conclude that \(\Delta _t\ge 1\) for some \(t\in \{0,1,\dots , N-3\}\). Then \(f_{t+1}\le f_t-1\) by Claim 2(b). As a result, \(f_{N-2}\le f_{t+1}\le f_t-1\le f_0\) by Claim 2(a). \(\square \)

Remark 5

We have actually shown that T in Lemma 1 can be chosen no larger than \(M^2N\times (N-2)\), i.e. \({\mathbb {P}}(X(M^2N(N-2))\in \mathrm{O}\,|\, X(0)=x)>0\) for any \(x\in \mathcal{M}^N\).

Remark 6

It would be interesting to find the distribution of the limiting absorbing configuration, i.e. \(\xi :=\lim _{t\rightarrow \infty } X_i(t)\); clearly it will depend on X(0). This is quite hard problem, and we can present only results based on simulations. Figure 4 shows the histograms of the distribution of \(\xi \) for different values of M and N, starting from a random initial condition, i.e. \(X_i(0)\) are i.i.d. random variable uniformly distributed on \(\mathcal{M}\).

Fig. 4
figure 4

Distribution of \(\xi \) based on simulations, for \((N,M)=(20,20)\), (20, 100), and (200, 10) respectively. Uniform random initial conditions

3 Continuous Case

Throughout this section, we assume that \(\zeta \sim U[0,1]\), and \(X_i(t)\in [0,1]\) for all \(i\in S\) and \(t=0,1,2,\dots \). We also assume that X(0) is such that \(\jmath (X(0))\) is non-random.

Theorem 2

There exists a.s. a random variable \(\bar{X}\in [0,1]\) such that as \(t\rightarrow \infty \)

$$\begin{aligned}&(X_1(t),X_2(t),\dots ,X_{\jmath (X(t))-1}(t),X_{\jmath (X(t))+1}(t), \dots ,X_N(t))\\&\quad \rightarrow (\bar{X},\bar{X},\dots ,\bar{X})\in [0,1]^{N-1}\qquad \text {a.s.} \end{aligned}$$

The proof of this theorem will consists of two parts. Firstly (see Lemma 8), we will show that the properly defined “spread” between the values \(X_1(t),\dots ,X_N(t)\) converges to zero. This does not, however, imply the the desired result, as hypothetically we can have the situation best described by the “Dance of the Little Swans” from Tchaikovsky’s “Swan Lake”: while the mutual distances between the \(X_i\)’s decrease or even some stay 0, their common location changes with time, and thus does not converge to a single point in [0, 1]. This can happen, for example, if the diameter of the configuration converges to zero too slowly.

The second part of the proof will show that not only the distances between the \(X_i\)’s decrease, but they all (but the most recently changed one) converge to the same random limit. Please note that the similar strategy was used in [6], however, in our case both steps require much more work.

It turns out that it is much easier to work with the embedded process, for which either the non-conformity of the node at which the value is replaced, is smaller than the initial non-conformity, or at least the location of the “worst” node (i.e. the one where \(d_i\) is the largest) has changed, whichever comes first. Formally, let \(\nu _0=0\) and recursively define for \(k=0,1,2,\dots \)

$$\begin{aligned} \nu _{k+1}=\inf \left\{ t>\nu _{k}:\ \jmath (X(t))\ne \jmath (X(\nu _{k}))\text { or }d(X(t))<d(X(\nu _k))\right\} . \end{aligned}$$

Note that due to the continuity of \(\zeta \) each \(\jmath (X(t))\) is uniquely defined a.s., and that all \(\nu _k\) are finite a.s..

Example

  1. (a)

    \(x=(\dots 0.5,\underline{0.6},0.5,0.3,\dots )\). The “worst” node is the second one (with the fitness of 0.6) and \(d=d_2(x)=0.1\); it is replaced, say, by 0.32. Now the configuration becomes

    $$\begin{aligned} x'=(\dots , 0.5,0.32,\underline{0.5},0.3,\dots )\end{aligned}$$

    and the worst node is the third one with \(d(x')=d_3(x')=0.19>0.1=d(x)\);

  2. (b)

    x is the same as in (a), but \(x_2\) is replaced by 0.58. Now the configuration becomes

    $$\begin{aligned}x=(\dots ,0.5,\underline{0.58},0.5,0.3,\dots )\end{aligned}$$

    and the worst node is still the second one with \(d(x')=d_2(x')=0.08<0.1=d(x)\).

Now let \(\tilde{X}(s)=X(\nu _s)\) and \({\tilde{\mathcal{{F}}}}_s=\sigma \left( \tilde{X}(1),\dots ,\tilde{X}(s)\right) \) be the filtrations associated with this embedded process. Since throughout time \([\nu _k,\nu _{k+1})\) the value \(\jmath \) remains constant at \(\jmath _{\nu _k}\) and only \(X_{\jmath _{\nu _k}}\) is updated, we have

$$\begin{aligned} X_i(t)=X_i(\nu _k)\quad \text {for all } i\ne \jmath (X(t)) \end{aligned}$$

for \(t\in [\nu _k,\nu _{k+1})\). Moreover, the process \(\tilde{X}\) evolves as a Markov process but with the “update” distribution restricted from the full range, since a uniform distribution conditioned to be in some subinterval is still uniform (this will be used later in Lemma 2). Hence Theorem 2 follows immediately from

Theorem 3

There exists a.s. a random variable \(\bar{X}\in [0,1]\) such that as \(s\rightarrow \infty \)

$$\begin{aligned} (\tilde{X}_1(s),\tilde{X}_2(s),\dots ,\tilde{X}_N(s))\rightarrow (\bar{X},\bar{X},\dots ,\bar{X})\in [0,1]^{N}\qquad \text {a.s.} \end{aligned}$$

(Moreover, this convergence happens exponentially fast: there is an \(s_0=s_0(\omega )<\infty \) and a non-random \(\gamma \in (0,1)\) such that \( \left| \tilde{X}_i(s)-\bar{X}\right| \le \gamma ^s \) for all \(i\in S\) and \(s\ge s_0\).)

Remark 7

In what follows, we assume that \(N\ge 5\). The cases \(N=3\) and \(N=4\) can be studied somewhat easier, and we leave this as an exercise.

We will use the Lyapunov functions method, with a clever choice of the function. For \(x=(x_1,x_2,\dots ,x_N)\) define

$$\begin{aligned} h(x)&=2\cdot \sum _{i\in S} (x_i-x_{i+1})^2 + \sum _{i\in S} (x_i-x_{i+2})^2 =2 \sum _{i\in S} \left( 3x_i^2-2 x_i x_{i+1} - x_i x_{i+2}\right) . \end{aligned}$$

We start by showing that \(h(\tilde{X}(s))\) is a non-negative supermartingale (Lemma 2), hence it must converge a.s. Then we show that this limit is actually 0 (Lemma 8). Combined with the fact that \(h(\tilde{X}(s))\), as a metric, is equivalent to \(\max _{i,j} |\tilde{X}_i(t)-\tilde{X}_j(t)|\), (see Lemma 3) this ensures that eventually all \(\tilde{X}_i\) become very close to each other, thus establishing the first necessary ingredient of the proof of the main theorem.

Lemma 2

\(\xi (s)=h\left( \tilde{X}(s)\right) \) is a non-negative supermartingale.

Proof

The non-negativity of \(\xi (s)\) is obvious. To show that it is a supermartingale, assume that \(\tilde{X}(s)=(x_1,x_2,x_3,x_4,x_5,\dots )\) and w.l.o.g. that \(\jmath (\tilde{X}(s))=3\). Suppose that the allowed range (i.e., for which either d decreases or the location of the minimum changes) for the newly sampled point is \([a,b]\subseteq [0,1]\). Assuming the newly sampled point is uniformly distributed on [ab] (since a restriction of the uniform distribution to a subinterval is also uniform), we get

$$\begin{aligned} \Delta&:={{\mathbb {E}}} (\xi (s+1)-\xi (s) |{\tilde{\mathcal{{F}}}}_{s})= \int _{a}^{b} \left\{ 2(x_2-u)^2+2(u-x_4)^2+(x_1-u)^2+(u-x_5)^2\right. \nonumber \\&\qquad \left. -\left[ 2(x_2-x_3)^2+2(x_3-x_4)^2+(x_1-x_3)^2+(x_3-x_5)^2\right] \right\} \frac{\mathrm{d}u}{b-a} \nonumber \\&=2(a^2+b^2+ab)+(2x_3-a-b)(x_1+2x_2+2x_4+x_5) -6x_3^2. \end{aligned}$$
(3.1)

Now we need to compute the appropriate a and b, and then show that \(\Delta \le 0\).

W.l.o.g. we can assume that \(x_3>\frac{x_2+x_4}{2}\), the case \(x_3<\frac{x_2+x_4}{2}\) is equivalent to \((1-x_3)>\frac{(1-x_2)+(1-x_4)}{2}\). Now setting \(\tilde{X}_i=1-x_i\) for all i yields identical calculations.

Suppose that the fitness at node 3 is replaced by some value \(X(\nu _s+1)=:u\), let the new value of the non-conformity at node 3 be \(d_3'=d_3(x_1,x_2,u,x_4,x_5,\dots )=d_3(X(\nu _s+1))\).

  • If \(x_3\) is replaced by \(u>x_3\), then this value will be “rejected”, in the sense that d has only increased while the \(\arg \max _{i\in S} d_i\) is still at the same node (i.e., 3). Indeed, when \(x_3\) increases by some \(\delta >0\), so does \(d_3\), while \(d_2\) and \(d_4\) can potentially increase only by \(\delta /2\) and thus cannot overtake \(d_3\).

  • When \(u\in \left( \frac{x_2+x_4}{2},x_3\right) \), \(d_3'\) is definitely smaller than the original \(d_3\).

    Assume from now on that \(u\in \left( 0,\frac{x_2+x_4}{2}\right) \). When \(x_3\) is replaced by u, it might happen that while the new \(d_3\) is larger than the original one, the value of \(d_2\) or \(d_4\) overtakes \(d_3\).

  • When \(u\in \left( 0,\frac{x_2+x_4}{2}\right) \) the condition that \(d_3'<d_3\) is equivalent to

    $$\begin{aligned} \frac{x_2+x_4}{2}-u<x_3-\frac{x_2+x_4}{2} \Longleftrightarrow u>x_2+x_4-x_3=:Q_0. \end{aligned}$$
  • For \(d_2\) to overtake \(d_3\), we need

    $$\begin{aligned} \left| x_2 -\frac{x_1+u}{2} \right|>\frac{x_2+x_4}{2}-u \quad \Longleftrightarrow \quad {\left\{ \begin{array}{ll} &{}u>x_1-x_2+x_4=:Q_1\\ &{} \text {or}\\ &{}u>\frac{-x_1+3x_2+x_4}{3}=:Q_2 \end{array}\right. } \end{aligned}$$
  • For \(d_4\) to overtake \(d_3\), we need

    $$\begin{aligned} \left| x_4 -\frac{u+x_5}{2} \right|>\frac{x_2+x_4}{2}-u \quad \Longleftrightarrow \quad {\left\{ \begin{array}{ll} &{}u>x_2-x_4+x_5=:Q_3\\ &{}\text {or}\\ &{}u>\frac{x_2+3x_4-x_5}{3}=:Q_4 \end{array}\right. } \end{aligned}$$

    As a result, the condition for \(d_3\) to be overtaken by some other node, or \(d_3'<d_3\) is

    $$\begin{aligned} u&>\min _{j=0,1,2,3,4} Q_j. \end{aligned}$$

Consequently, we must set

$$\begin{aligned} a&=\max \left\{ 0, \min \{Q_0,Q_1,Q_2,Q_3,Q_4\}\right\} \\&=\max \left\{ 0, \min \left\{ x_2+x_4-x_3, x_1-x_2+x_4, \frac{-x_1+3x_2+x_4}{3}, x_2-x_4\right. \right. \\&\quad \left. \left. +x_5, \frac{x_2+3x_4-x_5}{3} \right\} \right\} , \\ b&=x_3. \end{aligned}$$

Note that we are guaranteed that \( a\le b\). This is trivial when \(a=0\); on the other hand, when \(a>0\) we have

$$\begin{aligned} a\le x_2+x_4-x_3= \frac{x_2+x_4}{2}-\left[ x_3-\frac{x_2+x_4}{2}\right]<\frac{x_2+x_4}{2}<x_3=b \end{aligned}$$

since \(x_3>\frac{x_2+x_4}{2}\).

By substituting \(b=x_3\) into the expression for the drift (3.1), we get

$$\begin{aligned} \Delta =(x_3-a)(x_1+2x_2-4x_3+2x_4+x_5-2a) \end{aligned}$$

and to establish \(\Delta \le 0\) it suffices to show

$$\begin{aligned} x_1+2x_2-4x_3+2x_4+x_5\le 2 a =2\max \{0,\min \{Q_0,Q_1,Q_2,Q_3,Q_4\}\} \end{aligned}$$
(3.2)

under the assumption that

$$\begin{aligned} x_3-\frac{x_2+x_4}{2}>\max \left\{ \left| x_2-\frac{x_1+x_3}{2}\right| , \left| x_4-\frac{x_3+x_5}{2}\right| \right\} \end{aligned}$$

that is, equivalently,

$$\begin{aligned} x_3>\max \{Q_1,Q_2,Q_3,Q_4\}. \end{aligned}$$
(3.3)

In order to show (3.2) we consider a number of cases. First, assume that \(x_2+x_4<x_3\). Then \(Q_0<0\) and \(a=0\). From (3.3) we get that \(2 x_3>Q_1+Q_3=x_1+x_5\), thus

$$\begin{aligned} x_1+2x_2-4x_3+2x_4+x_5=(x_1+x_5-2x_3)+ 2(x_2+x_4-x_3)<0=a \end{aligned}$$

and (3.2) is fulfilled.

The next case is when \(\frac{x_2+x_4}{2}<x_3<x_2+x_4\). We need to verify if all of the following holds:

$$\begin{aligned} x_1+2x_2-4x_3+2x_4+x_5-2 Q_j\le 0 \quad \text { subject to }\\ Q_0\ge 0,\ x_3\ge Q_1\ge 0,\ x_3\ge Q_2\ge 0,\ x_3\ge Q_3\ge 0,\ x_3\ge Q_4\ge 0 \end{aligned}$$

and

$$\begin{aligned} x_1+2x_2-4x_3+2x_4+x_5\le 0\quad \text { subject to } \\ \quad Q_j\le 0,\ x_3\ge Q_1,\ x_3\ge Q_2,\ x_3\ge Q_3,\ x_3\ge Q_4 \end{aligned}$$

for \(j=0,1,2,3,4\). This can be done using Linear Programming method. Thus \(\Delta \le 0\). \(\square \)

The next statement shows that the metrics provided by h(x), d(x), and \(\max _{i\in S} |x_i-x_{i-1}|\), where \(x\in {\mathbb {R}}^N \) are, in fact, equivalent.

Lemma 3

Let \(x=(x_1,\dots ,x_N)\) and \(\Delta _i(x):=x_i-x_{i-1}\), \(i\in S\). Then

$$\begin{aligned} \begin{array}{rcccl} d(x) &{}\le &{} \displaystyle \max _{i\in S} \left| \Delta _i\right| &{}\le &{} N d(x),\\ 2\, d(x)^2 &{}\le &{} h(x)&{}\le &{} 6\,N^3\, d(x)^2. \end{array} \end{aligned}$$

Proof

Note that \(\Delta _1+\dots +\Delta _N=0\) and

$$\begin{aligned} h(x)&=\sum _{i\in S} \left[ 2\Delta _i ^2+(\Delta _i+\Delta _{i+1})^2\right] , \\ d(x)&=\frac{1}{2}\, \max _{i\in S} \left| \Delta _{i+1}-\Delta _i\right| . \end{aligned}$$

Let j be such that \(d_j(x)=d(x)\), then by the triangle inequality

$$\begin{aligned} |\Delta _{j+1}|+|\Delta _j|\ge |\Delta _{j+1}-\Delta _j|=2d(x) \end{aligned}$$

so at least one of the two terms on the LHS \(\ge d(x)\), hence \(\max _{i\in S} |\Delta _i|\ge d(x)\).

Now we will show that \(\max _{i\in S} |\Delta _i|\le N d(x)\). Indeed, suppose that this is not the case, and w.l.o.g. \(\Delta _1>N d(x)\). For all i we have \(\left| \Delta _{i+1}-\Delta _i\right| \le 2d(x)\), hence by induction and the triangle inequality we get

$$\begin{aligned}&\Delta _2>\left( N-2\right) d(x),\\&\Delta _3>\left( N-4 \right) d(x),\\&\dots , \ \\&\Delta _{N-1}>\left( N-2(N-2) \right) d(x),\\&\Delta _N>\left( N-2(N-1) \right) d(x). \end{aligned}$$

As a result, \(\Delta _1+\Delta _2+\dots +\Delta _N>\left[ N^2-2(1+2+\dots +(N-1))\right] d(x)=N d(x) \ge 0\), which yields a contradiction, since the LHS is identically equal to 0.

Thus \(|\Delta _i|\le N d(x)\), and so \(|\Delta _i+\Delta _{i+1}|\le 2N d(x)\) for all \(i\in S\). Consequently, \(h(x)\le 2N (Nd(x))^2 +N (2Nd(x))^2=6N^3 d(x)^2\). On the other hand, \(h(x)\ge \displaystyle \max _{i\in S} 2\Delta _i^2\ge 2 d(x)^2\). \(\square \)

The following four statements (Lemmas 4 and 5 and Corollaries 1 and 2) show that \(\xi (t)\) can actually decrease by a non-trivial factor with a positive (and bounded from below) probability.

Lemma 4

Suppose that \(X(t)=x=(x_1,x_2,x_3,x_4,x_5,\dots )\), and \(d_3(x)\ge \max \left\{ d_2(x),d_4(x)\right\} \). Let \(\mu =\frac{x_2+x_4}{2}\) and \(\delta =|x_3-\mu |=d_3(x)\). If \(x_3\) is replaced by some \(u\in [\mu -\delta /6,\mu +\delta /6]\) then \(\Delta _h:=h(X(t+1))-h(X(t))\le - \frac{5}{6} \delta ^2\). (Note that the Lebesgue measure of \([\mu -\delta /6,\mu +\delta /6] \bigcap [0,1]\) is always at least \(\delta /6\); also after this replacement \(d_3\) must decrease.)

Proof

Note that the change in h equals

$$\begin{aligned} \Delta _h=-2(x_3-u) (3u+A),\qquad \text { where }A= 3x_3-x_1-2x_2-2x_4-x_5. \end{aligned}$$

W.l.o.g. assume \(x_3>\mu \). Then

$$\begin{aligned} x_3-u\ge \mu +\delta -\left( \mu +\frac{\delta }{6}\right) =\frac{5}{6}\,\delta . \end{aligned}$$

At the same time, recalling that \(d_3(x)\ge \max \{d_2(x),d_4(x)\}\), we obtain that

$$\begin{aligned} \min _{x_1,\dots ,x_5\ge 0} A\qquad \text { subject to }\qquad x_3-\mu >\max \left\{ \left| x_2-\frac{x_1+x_3}{2}\right| ,\left| x_4-\frac{x_3+x_5}{2}\right| \right\} \end{aligned}$$

equals \(-3\mu +\delta \). Hence

$$\begin{aligned} 3u+A\ge 3\left( \mu -\frac{\delta }{6}\right) -3\mu +\delta =\frac{\delta }{2} \end{aligned}$$

and thus \(\Delta _h\le -2\, \frac{5\delta }{6} \cdot \frac{\delta }{2}\). \(\square \)

Lemma 5

Suppose that \(X(t)=x=(x_1,x_2,x_3,x_4,x_5,\dots )\), and \(d_3(x)=d(x)\). Let \(\mu =\frac{x_2+x_4}{2}\) and \(\delta =|x_3-\mu |=d_3(x)\). Given that \(x_3>\mu \), if \(x_3\) is replaced by some \(u\notin [\mu -3\delta ,x_3]\) then \(d_3(x')>d_3(x)\) and \(d_3(x')\) is still the largest of \(d_i(x')\), where \(x'=(x_1,x_2,u,x_4,x_5,\dots )\). The same conclusion holds if \(x_3<\mu \) and \(x_3\) is replaced by some \(u\notin [x_3,\mu +3\delta ]\).

Before presenting the proof of Lemma 5, we state the obvious

Corollary 1

Let \(\delta =d(\tilde{X}(s))\). If \(i=\jmath (\tilde{X}(s))\) then

$$\begin{aligned} \tilde{X}_i(s+1)\in [\tilde{X}_i(s)-4\delta ,\tilde{X}_i(s)+4\delta ] \end{aligned}$$

(and if \(i\ne \jmath (\tilde{X}(s))\) then trivially \(X_i(s+1)=X_i(s)\)). Hence we always have

$$\begin{aligned} \max _{i\in S} \left| \tilde{X}_i(s+1)-\tilde{X}_i(s)\right| \le 4 \delta . \end{aligned}$$

(Note that in Corollary 1 we have \(4\delta \) for the following reason: the newly accepted point can deviate from \(\mu \) by at most \(3\delta \) by Lemma 5, while \(|\tilde{X}_i(s)-\mu |=\delta \).)

The next implication of Lemma 5 requires a bit of work.

Corollary 2

Let \(\rho =1-\frac{5}{36\, N^3}<1\). Then

$$\begin{aligned} {\mathbb {P}}\left( \xi (s+1)\le \rho \xi (s)\,|\, {\tilde{\mathcal{{F}}}}_s \right) \ge \frac{1}{48}. \end{aligned}$$

Proof of Corollary 2

From Corollary 1 we know that given \(x=\tilde{X}(s)\), the allowed range for the newly sampled point to be in \(\tilde{X}(s+1)\) is at most \(8 \delta \) where \(\delta =d(x)\). At the same time if the newly sampled point falls into the interval \([\mu -\delta /6,\mu +\delta /6]\) (see Lemma 5), at least half of which lies in [0, 1], then \(\xi (s+1)-\xi (s)\le -\frac{5}{6} \delta ^2\); the probability of this event is no less than \(\frac{\delta /6}{8\delta }=\frac{1}{48}\). Since \(\xi (s)=h(x)\) and by Lemma 3 we have \(d(x)^2\ge \frac{h(x)}{6N^3}\), the inequality \(\xi (s+1)-\xi (s)\le -\frac{5}{6} \delta ^2\) implies \(\xi (s+1)-\xi (s)\le -\frac{5}{36N^3} \xi (s)\). \(\square \)

Proof of Lemma 5

By symmetry, it suffices to show just the first part of the statement. First, observe that

$$\begin{aligned} d_j(x')&=d_j(x)\le d_3(x)\text { for } j\in S\setminus \{2,3,4\}; \nonumber \\ d_2(x')&= \left| \left( \frac{x_1+x_3}{2}-x_2\right) +\frac{u-x_3}{2}\right| \le d_2(x)+\left| \frac{u-x_3}{2}\right| \le d_3(x)+\left| \frac{u-x_3}{2}\right| . \end{aligned}$$
(3.4)

If \(u>x_3>\mu \), then from (3.4)

$$\begin{aligned} d_3(x')&=u-\frac{x_2+x_4}{2}>x_3-\frac{x_2+x_4}{2}=d_3(x);\\ d_2(x')&\le d_3(x)+\left| \frac{u-x_3}{2}\right| = d_3(x')-(u-x_3)+\left| \frac{u-x_3}{2}\right| =d_3(x')-\left| \frac{u-x_3}{2}\right|<d_3(x');\\ d_4(x')&<d_3(x')\quad \text { (by the same argument as }d_2) \end{aligned}$$

so indeed \(d_3(x)<d_3(x')=\max _{i\in S} d_i(x')\).

On the other hand, if \(u<\mu -3\delta <x_3=\mu +\delta \), then \(d_j\) for \(j\in S\setminus \{2,3,4\}\) still remain unchanged, but

$$\begin{aligned} d_3(x')&=\mu -u>3\delta >d_3(x);\\ d_2(x')&\le d_3(x)+\left| \frac{u-x_3}{2}\right| =\delta + \frac{x_3-u}{2}=\delta + \frac{x_3-\mu }{2}+\frac{\mu -u}{2} = \frac{3\delta }{2}+\frac{\mu -u}{2}\\&< \frac{\mu -u}{2}+\frac{\mu -u}{2} =d_3(x') ;\\ d_4(x')&< d_3(x')\quad \text { (by the same argument as }d_2) \end{aligned}$$

hence \(d_3(x)<d_3(x')=\max _{i\in S} d_i(x')\) in this case as well. \(\square \)

At the same time, it turns out that \(\xi (t)\) cannot increase too much in one step, as follows from

Lemma 6

There is a non-random \(r>0\) such that for all s we have \(\xi (s+1)\le r \xi (s)\).

Proof

By Corollary 1 it follows that the worst outlier (w.l.o.g. \(x_3\)) can be replaced only by a point at most at the distance \(4\delta \) from \(x_3\) at time \(\nu _{s+1}\). Let the new value of the fitness at node 3 be \(x_3+v\), \(|v|\le 4\delta \). The change in the Lyapunov function is given by

$$\begin{aligned} \xi (s+1)-\xi (s)&=\left[ 2((x_3+v)-x_2)^2+2((x_3+v)-x_4)^2\right. \nonumber \\&\quad \left. +((x_3+v)-x_1)^2+((x_3+v)-x_5)^2\right] \nonumber \\&-\left[ 2(x_3-x_2)^2+2(x_3-x_4)^2+(x_3-x_1)^2+(x_3-x_5)^2\right] \nonumber \\&=(12x_3-2x_2-2x_4-4x_1-4x_5)\, v+6\,v^2 \end{aligned}$$
(3.5)

Since

$$\begin{aligned} \left| 12x_3-2x_2-2x_4-4x_1-4x_5\right|&=\left| 8\left( x_2-\frac{x_1+x_3}{2}\right) +8\left( x_4-\frac{x_5+x_3}{2}\right) \right. \\&\quad \left. +20\left( x_3-\frac{x_2+x_4}{2}\right) \right| \\&\le 8\delta +8\delta +20\delta =36\delta \end{aligned}$$

from (3.5) and the fact that \(\delta =d(\tilde{X}(s))\le \sqrt{\frac{\xi (s)}{2}}\) by Lemma 3

$$\begin{aligned} |\xi (s+1)-\xi (s)|\le 36\delta \times 4\delta + 6\,(4\delta )^2= 240\delta ^2\le 120 \xi (s), \end{aligned}$$

so we can take \(r=121\). \(\square \)

Finally, we want to show that, roughly speaking, one does not have to wait for too long before \(\xi (t)\) increases or decreases by a substantial amount.

Lemma 7

Fix some \(k>1\) and \(s_0>0\). Let \(\tau _1=\inf \{s>0:\ \xi (s_0+s)\le \xi (s_0)/k\}\) and \(\tau _2=\inf \{s>0:\ \xi (s_0+s)\ge k\xi (s_0)\}\). Then \(\tau =\min (\tau _1,\tau _2)\), given \({\tilde{\mathcal{{F}}}}_{s_0}\), is stochastically smaller than some random variable with a finite mean, the distribution of which does not depend on anything except N and k.

Proof

Fix a positive integer L. For each \(t\ge s_0\) define

$$\begin{aligned} B_t=\left\{ \xi (t+L)\le \frac{\xi (t)}{k^2}\right\} . \end{aligned}$$

It suffices to show that \({\mathbb {P}}(B_t| {\tilde{\mathcal{{F}}}}_t)\ge p\) for some \(p>0\) uniformly in t, since for \(j=0,1,2,\dots \)

$$\begin{aligned} B_{s_0+jL}&\subseteq \{\xi (s_0+jL)<k\xi (s_0)\text { and } \xi (s_0+(j+1)L)<\xi (s_0)/k \} \cup \{\xi (s_0+jL)\ge k \xi (s_0)\} \\&\subseteq \{\tau _1\le (j+1)L\} \cup \{\tau _2\le jL\} \subseteq \{\tau \le (j+1)L\}. \end{aligned}$$

which, in turn, would imply that \(\tau \) is stochastically smaller than L multiplied by a geometric random variable with parameter \(p=p(N,k)\).

To show that \({\mathbb {P}}(B_t\,|\, \ {\tilde{\mathcal{{F}}}}_t)\ge p\), note that by Corollary 2,

$$\begin{aligned} {\mathbb {P}}(B_{m}^* \,|\, {\tilde{\mathcal{{F}}}}_{m-1})\ge \frac{1}{48}, \quad \text {where } B_{m}^*=\left\{ \xi (m)<\rho \xi (m-1)\right\} , \quad \rho =1-\frac{5}{36 N^3}. \end{aligned}$$

Let L be so large that \(\rho ^L<1/k^2\). Then, on one hand,

$$\begin{aligned} \bigcap _{m=1}^{L}B_{t+m}^*\subseteq B_t \text { whence } {\mathbb {P}}\left( B_t \,|\, {\tilde{\mathcal{{F}}}}_t\right) \ge {\mathbb {P}}\left( \bigcap _{m=1}^{L}B_{t+m}^* \,|\, {\tilde{\mathcal{{F}}}}_t \right) , \end{aligned}$$

while on the other hand

$$\begin{aligned} {\mathbb {P}}\left( \bigcap _{m=1}^{L}B_{t+m}^* \,|\, {\tilde{\mathcal{{F}}}}_t\right) \ge \frac{1}{48^L}=:p \end{aligned}$$

which depends on N and k only.

The proof of the next statement, which completes the first part of the proof of the main theorem, requires a bit more work than that of Lemma 2.4 in [6]. In fact, we will prove a stronger statement (Corollary 3) later, however, it is still useful to see a fairly quick proof of the following

Lemma 8

\(\xi (s)\rightarrow 0\) a.s. as \(s\rightarrow \infty \) (and as a result \(\Delta _i(\tilde{X}(s))\rightarrow 0\) a.s. and \(d(\tilde{X}(s))\rightarrow 0\) a.s. as \(s\rightarrow \infty \)).

Proof

From Lemma 2 it follows that \(\xi (s)\) converges a.s. to a non-negative limit, say \(\xi _\infty \). Let us show that \(\xi _\infty =0\). From Corollary 2 we have

$$\begin{aligned} {\mathbb {P}}\left( \xi (s+1)\le \rho \xi (s)\, |\, \mathcal{F}_s\right) \ge \frac{1}{48}. \end{aligned}$$
(3.6)

Fix an \(\varepsilon >0\) and a \(T\in {\mathbb {N}}\). Let \(\sigma _{\varepsilon ,T}=\inf \{s\ge T:\ \xi (s)\le \varepsilon \}\). Then (3.6) implies

$$\begin{aligned} {\mathbb {P}}(A_{s+1}\,|\, \mathcal{F}_s)&\ge \frac{1_{s<\sigma _{\varepsilon ,T}}}{48},\quad \text {where }A_{s+1}=\left\{ \xi (s+1)\le \xi (s)-(1-\rho )\varepsilon \right\} \end{aligned}$$

(Compare this with the inequality (2.18) in [6]). From the non-negativity of \(\xi (s)\), we know that only finitely many of \(A_s\) can occur. By the Levy’s extension to the Borel-Cantelli lemma, we get that \(\sum _{s=T}^\infty {\mathbb {P}}(A_{s+1}\,|\, \mathcal{F}_s)<\infty \) a.s., and hence \(\sum _{s=T}^\infty 1_{s<\sigma _{\varepsilon ,T}}<\infty \). This, in turn, implies that \(\sigma _{\varepsilon ,T}<\infty \) a.s. Consequently, since T is arbitrary,

$$\begin{aligned} \liminf _{s\rightarrow \infty }\xi (s)\le \varepsilon \quad \text {a.s.} \end{aligned}$$

Since \(\varepsilon >0\) is also arbitrary and \(\xi (s)\) converges, \(\lim _{s\rightarrow \infty } \xi (s)=\liminf _{s\rightarrow \infty }\xi (s)=0\) a.s.

The next general statement may be known, but since we could not find it in the literature, we present its fairly short proof. We need it in order to show that \(\xi (t)\) converges to zero quickly.

Proposition 1

Suppose that \(\xi (s)\) is a positive bounded supermartingale with respect to a filtration \({\tilde{\mathcal{{F}}}}_s\). Suppose there is a constant \(r>1\) such that \(\xi (s+1)\le r \xi (s)\) a.s. and that for all k large enough the stopping times

$$\begin{aligned} \tau _s=\inf \{t>s:\ \xi (t)>k \xi (s)\text { or } \xi (t)<k^{-1} \,\xi (s)\} \end{aligned}$$

are stochastically bounded above by some finite–mean random variable \(\bar{\tau }>0\), which depends on k only (and, in particular, independent of \({\tilde{\mathcal{{F}}}}_s\)). Let \(\mu ={\mathbb {E}}\bar{\tau }<\infty \). Then

$$\begin{aligned} \limsup _{s\rightarrow \infty } \frac{\ln \xi (s)}{s}\le -\frac{1}{4\mu }<0\qquad \text {a.s.} \end{aligned}$$

Proof

First, observe that by the Optional Stopping Theorem

$$\begin{aligned} {\mathbb {E}}(\xi (\tau _s)\,|\, {\tilde{\mathcal{{F}}}}_s) \le \xi (s) \end{aligned}$$
(3.7)

(where \(\tau _s<\infty \) a.s. by the stochastic dominance condition) while, on the other hand,

$$\begin{aligned} {\mathbb {E}}(\xi (\tau _s)\,|\, {\tilde{\mathcal{{F}}}}_s)&={\mathbb {E}}(\xi (\tau _s),\xi (\tau _s)>k \xi (s)\,|\, {\tilde{\mathcal{{F}}}}_s)+{\mathbb {E}}(\xi (\tau _s),\xi (\tau _s)<k^{-1}\ \xi (s)\,|\, {\tilde{\mathcal{{F}}}}_s)\nonumber \\&\ge {\mathbb {E}}(\xi (\tau _s),\xi (\tau _s)>k \xi (s)\,|\, {\tilde{\mathcal{{F}}}}_s) \ge k\xi (s) \cdot {\mathbb {P}}(\xi (\tau _s)> k \xi (s)\,|\, {\tilde{\mathcal{{F}}}}_s). \end{aligned}$$
(3.8)

From (3.7) and (3.8) we conclude

$$\begin{aligned} p:={\mathbb {P}}(\xi (\tau _s)>k \xi (s)\,|\, {\tilde{\mathcal{{F}}}}_s)<\frac{1}{k}. \end{aligned}$$
(3.9)

Now let us define a sequence of stopping times as follows: \(\eta _0=0\) and for \(n=1,2,\dots \),

$$\begin{aligned} \eta _{n}=\inf \left\{ s>\eta _{n-1}:\ \xi (s)>k \xi (\eta _{n-1}) \text { or } \xi (s)<k^{-1}\ \xi (\eta _{n-1}) \right\} \end{aligned}$$

and let

$$\begin{aligned} N_s=\max \{n:\ \eta _n\le s\}. \end{aligned}$$

From the definition of the stopping times \(\eta \), it follows

$$\begin{aligned} \xi (s)\le k \xi (\eta _{N_s}),\qquad \xi (\eta _{n+1})\le rk \xi (\eta _n). \end{aligned}$$
(3.10)

Consider now the sequence of random variables \(\xi (\eta _n)\). From (3.9) and (3.10) we obtain that \(\log _k \frac{\xi (\eta _n)}{ \xi (\eta _{n-1})}\) is stochastically bounded above by a random variable \(X_n\in \{-1,1+\log _k r\}\) such that

$$\begin{aligned} 1-{\mathbb {P}}(X_n=-1)={\mathbb {P}}(X_n=1+\log _k r)=\frac{1}{k} \end{aligned}$$

yielding

$$\begin{aligned} {\mathbb {E}}X_n =\frac{2+\frac{\ln r}{\ln k}}{k} -1 =:g(r,k); \end{aligned}$$

we can also assume that \(X_n\) are i.i.d. One can choose \(k>1\) so largeFootnote 4 that \(g(r,k)<-\frac{1}{2}\). Then, by the Strong Law applied to \(\sum _{i=1}^n X_i\), we get

$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{\log _k \xi (\eta _n)}{n} \le \limsup _{n\rightarrow \infty } \frac{X_1+\dots +X_n}{n} < -\frac{1}{2} \qquad \text {a.s.} \end{aligned}$$

From the condition of the proposition we know that the differences \(\eta _n-\eta _{n-1}\), \(n=1,2,\dots ,\) are stochastically bounded by independent random variables with the distribution of \(\bar{\tau }\) with \({\mathbb {E}}\bar{\tau }=:\mu <\infty \). Then by the Strong Law for renewal processes (see e.g. [5], Theorem I.7.3) applied to the sum of independent copies of \(\bar{\tau }\), we get

$$\begin{aligned} \liminf _{s\rightarrow \infty }\frac{N_s}{s}\ge \frac{1}{\mu }\qquad \text {a.s.} \qquad \Longrightarrow \qquad s\le 2\mu N_s \text { for all large enough }s. \end{aligned}$$
(3.11)

Combining (3.10) and (3.11), we get

$$\begin{aligned} \limsup _{s\rightarrow \infty } \frac{\log _k \xi (s)}{s}&\le \limsup _{s\rightarrow \infty } \frac{\log _k \left( k\xi (\eta _{N_s}) \right) }{s} = \limsup _{s\rightarrow \infty } \frac{\log _k \xi (\eta _{N_s})}{s}\\&\le \limsup _{s\rightarrow \infty } \frac{\log _k \xi (\eta _{N_s})}{2\mu N_s} = \frac{1}{2\mu } \limsup _{n\rightarrow \infty } \frac{\log _k \xi (\eta _n)}{n} \le -\frac{1}{4\mu } \qquad \text {a.s.} \end{aligned}$$

since \(N_s\rightarrow \infty \) when \(s\rightarrow \infty \) a.s. \(\square \)

The next statement strengthens Lemma 8.

Corollary 3

\(\xi (s)\rightarrow 0\) exponentially fast as \(s\rightarrow \infty \).

Proof

The statement follows immediately from Proposition 1: the bound for r we have by Lemma 6; the other condition follows from Lemma 7. \(\square \)

Now we are ready to finish the proof of the main statement.

Proof of Theorem 3

According to Corollary 3 there exist \(a,b>0\) which are a.s. finite and such that \(\xi (t)\le ae^{-bt}\). If we take \(s_0\) such that \(ae^{-bs}\le \epsilon \) for all \(s\ge s_0\) then if \(s_0\le s <t\),

$$\begin{aligned} |\tilde{X}_i(t)-\tilde{X}_i(s)|&\le \sum _{k=s+1}^{t} 4\,d(\tilde{X}(k))\le \sum _{k=s+1}^{t} \sqrt{8\xi (k)} \nonumber \\&\le \sqrt{8\epsilon } \sum _{k=s+1}^{t} e^{-bk/2}\le \frac{\sqrt{8\epsilon }}{1-e^{-b/2}}, \end{aligned}$$
(3.12)

where we used Corollary 1 in the first inequality and Lemma 3 in the second inequality. We can thus conclude that \(\{\bar{X}_i(t)\}_t\) is a Cauchy sequence in the a.s. sense; therefore the limit \(\bar{X}_i(\infty )=\lim _{t\rightarrow \infty } \tilde{X}_i(t)\) exists a.s. Moreover, by letting \(t\rightarrow \infty \) in (3.12), we get that \(|\tilde{X}_i(s)-\tilde{X}_i(\infty )|\le C e^{-bs/2}\) for some \(C>0\).

Furthermore, assuming w.l.o.g. that \(i<j\),

$$\begin{aligned} |\bar{X}_i(\infty )-\bar{X}_j(\infty ) |&=\lim _{t\rightarrow \infty } |\tilde{X}_i(t)-\tilde{X}_j(t)| \le \lim _{t\rightarrow \infty } \sum _{k=i+1}^{j} \left| \Delta _k( \tilde{X}(t) )\right| =0 \end{aligned}$$

by Lemma 8, which completes the proof. \(\square \)

4 Discussion and Open Problems

One may be interested in the speed of convergence, established in Theorem 3. In Lemma 6 we can take \(r=121\) and from the proof of Proposition 1, \(k=\ln r=\ln (121)=2\ln (11)\) will be sufficient. Then, for Lemma 7, find L such that

$$\begin{aligned} \left( 1-\frac{5}{36N^3}\right) ^L<\frac{1}{23}<\frac{1}{k^2} \end{aligned}$$

We can take, e.g.,

$$\begin{aligned} L\approx 7.2 N^3 \cdot \ln (23)\approx 22.6 N^3 \end{aligned}$$

This, in turn, will provide a bound on \(\mu ={\mathbb {E}}\bar{\tau }\le \frac{ L}{p}=L\cdot 48^L\) for Proposition 1, and hence the speed of the convergence for large s:

$$\begin{aligned} 2\, [d(\tilde{X}(s))]^2\le & {} h(\tilde{X}(s))=\xi (s)\le k^{-\frac{s}{4\mu }}\\\le & {} \exp \left\{ \displaystyle -\frac{s}{8\, L\, 48^L\, \ln (11)}\right\} \approx \exp \left\{ \displaystyle -\frac{s}{433 \cdot 10^{38 N^3}}\right\} \end{aligned}$$

This bound is, however, far from the optimal one. The simulations seem to indicate that, depending on N,

$$\begin{aligned} \xi (s)\sim e^{-\rho _N s}, \end{aligned}$$

where e.g. \(\rho _5\in (0.47,0.77)\), \(\rho _{10}\in (0.14,0.23)\), \(\rho _{20}\in (0.02,0.03)\), \(\rho _{40}\in (0.003,0.006)\), suggesting that (a) \(\rho _N\) can be, in fact, random, and (b) the average value of \(\rho _N\) decays roughly like \(5/N^{2}\). We leave the study of the properties of \(\rho _N\) for further research.

We believe that the convergence, described by Theorems 2 and 3 holds for a much more general class of replacement distributions \(\zeta \), not just uniform; for example, for the continuous distributions with the property that their density is uniformly bounded away from zero. Unfortunately, our proof is based on the construction of the Lyapunov function which cannot be easily transferred to other cases (obviously, it will work for any \(\zeta \sim U[a,b]\), where \(a<b\)).

One can also attempt to generalize the theorems for more general graphs as described in Remark 1; this should be done, however, with care, as it will not work for all the distributions (see Remark 2).