1 Introduction

In this paper we study the many-server N-system shown in Fig. 1, with Poisson arrivals and exponential service times, under the first come first served and assign to the longest idle server policy (FCFS–ALIS), as the number of servers becomes large. Before describing the model in detail, we will first discuss our motivation for studying this system.

Fig. 1
figure 1

The multi-server N-system

The N-system is one of the simplest special cases of skill-based routing in parallel server systems, as defined in [9, 15] and further studied in [4, 6, 7, 12,13,14, 17, 19, 20, 22, 23]. The general model has customers of types \(i=1,\ldots ,I\), servers of types \(j=1,\ldots ,J\), and a bipartite compatibility graph G, where \((i,j)\in G\) if customer type i can be served by server type j. Arrivals are renewal with rate \(\lambda \), where successive customer types are i.i.d. with probabilities \(\alpha _i\). There are a total of n servers, of which \(n \theta _j\) are of type j, and service times are generally distributed with rates \(\mu _{i,j}\). Assume the system is operated under the FCFS–ALIS policy, that is, servers take on the longest waiting compatible customer, and arriving customers are assigned to the longest idle compatible server. For this general system, necessary and sufficient conditions for stability (positive Harris recurrence for given \(\lambda \)), or for complete resource pooling (there exists a critical \(\lambda _0\) such that the system is stable for \(\lambda <\lambda _0\), and the queues of all customer types diverge for \(\lambda >\lambda _0\)) cannot be determined by the first moment information alone (as conjectured by an example of Foss and Chernova [9], which is further discussed in [16]). In particular, under FCFS–ALIS, calculation of the matching rates \(r_{i,j}\), which are the long-term average fractions of services performed by servers of type j on customers of type i, in general, is intractable.

In the special case that service rates depend only on the server type, and not on the customer type, with Poisson arrivals and exponential service times, the system has a product form stationary distribution, as given in [2]. In that case matching rates can be computed from the stationary distribution.

The following conjecture was made in [4]. If the system is stable and has complete resource pooling for given \(\lambda ,\,n\), and we let both become large together, the behavior of the system simplifies: there will exist \(\beta _j\) such that servers of type j perform a fraction \(\beta _j\) of the services, and the matching rates \(r_{i,j}\) will converge to the rates for the FCFS infinite matching model with \(G,\alpha ,\beta \), as calculated in [1] (see also [5]). The conjecture is based on the following heuristic argument: in steady state the times that each server becomes available form a stationary process which is only mildly correlated with the other servers, and so servers become available approximately as a superposition of almost independent stationary processes, which in the many-server limit becomes a Poisson process, and server types are then i.i.d. with probabilities \(\beta _j\), while customer types arrive as an i.i.d. sequence with probabilities \(\alpha _i\). This corresponds exactly to the model of FCFS infinite matching. Under FCFS–ALIS it is also possible that while the system is stable, service by all the servers is not pooled. Instead it is decoupled: the bipartite compatibility graph breaks into two or more subgraphs, and when the system is operated under FCFS–ALIS the links connecting the subgraphs are only rarely used. The conjecture then is that under many-server scaling this decoupling is the same as in the FCFS infinite matching model, with the same matching rates.

In our current study of the many-server N-system we shall verify the conjectured many-server behavior for this simple parallel server system. To do so we start from the known stationary distribution of the N-system with many servers, as derived in [2], and study its behavior as \(n\rightarrow \infty \). As it turns out, the product form stationary distribution, even for this simple case, is far from simple, and the derivations of limits, which use summations over server permutations and asymptotic expansions of various expressions, are quite laborious. We feel that this emphasizes the difficulty in verifying the conjectured behavior of the general system, which remains intractable at this time.

We mention that the N-system with just two servers has been the subject of several papers, including [3, 10, 11, 19, 20]. In this paper, our focus is on the N-system with many servers under FCFS–ALIS and its limiting behavior.

The rest of the paper is structured as follows. In Sect.  2 we describe the model, and in Sect. 3 we use some heuristic arguments to obtain a guess at the limiting behavior, where we distinguish between pooled and decoupled modes. In Sect. 4 we verify the heuristic guess and obtain the stationary behavior under many-server scaling. In Sect. 5 we illustrate our results with some numerical examples. To improve the readability of the paper we have put all the proofs for Sect. 4 in the Appendix.

2 The model

In our N-system, customers of types \(c_1\) and \(c_2\) arrive as independent Poisson streams, with rates \(\lambda _{1},\lambda _{2}\). There are skill-based parallel servers, \(n_1\) servers of type \(s_1\) which are flexible and can serve both types, and \(n_2\) servers of type \(s_2\) which can only serve type \(c_1\) customers. In our notation, \(c_1\) customers and \(s_1\) servers are flexible, while \(c_2\) customers and \(s_2\) servers are inflexible. (\(s_2\) servers cannot serve \(c_2\) customers.) We assume service times are all independent exponential, with server-dependent rates. The service rate of an \(s_1\) server is \(\mu _1\); the service rate of an \(s_2\) server is \(\mu _2\). See Fig. 1. We let \(\lambda =\,\lambda _1+\,\lambda _2,\,n=n_1+n_2\). The service policy is FCFS–ALIS.

Fig. 2
figure 2

State description under FCFS–ALIS

The system is Markovian. In [2, 3, 21] the following state description for the skill-based parallel server systems under the FCFS–ALIS policy was used: imagine the customers arranged in a single queue by order of arrival, and servers are attached to the customers which they serve, and the remaining idle servers are arranged by increasing idle time in front of the queue; see Fig. 2. The state is then \(\mathfrak {s}=(S_1,q_1,S_2,q_2,\ldots ,S_{n-i},q_{n-i},S_{n-i+1},\ldots ,S_n)\), where \(S_1,\ldots ,S_n\) is a permutation of the n servers; the first \(n-i\) servers are the ordered busy servers, and the last i servers are the ordered idle servers, and where \(q_j,\,j=1,\ldots , n-i\), are the queue lengths of the customers waiting for one of the servers \(S_1,\ldots , S_j\), and skipped (could not be served) by servers \(S_{j+1},\ldots ,S_n\). When service rates depend only on the servers, arrivals are Poisson, and services are exponential, this description is Markovian, as shown in [21]. The reason is as follows: given the permutation of servers, we know for each \(q_j\) exactly what types of customers may be present, and since those customers are in the order in which they arrived, the type of each of them is randomly distributed according to the initial frequencies of customer types, and independent of all others. Hence, each server with a queue in front will have to go through an independent sequence of trials as he scans the customers FCFS until finding a match, and the specific sequences of customer types in the queues are not relevant to the steady state of the scan. This yields Markovian transition probabilities.

For the special case of the N-system, in steady state, the following three random quantities are important: \(i_1=I_1(\mathfrak {s})\), the number of idle servers of type \(s_1\), \(i_2=I_2(\mathfrak {s})\), the number of idle servers of type \(s_2\), and \(k=K(\mathfrak {s}) \ge 0\), the number of servers of type \(s_2\) which follow the last server of type \(s_1\) in the sequence \(S_1,\ldots ,S_n\). An incoming \(c_2\) customer has to skip \(k\,s_2\) servers and find the last \(s_1\) server to be served. We let \(i=I(\mathfrak {s})\) be the total number of idle servers in steady state. Because of the structure of the N-system and the FCFS–ALIS policy, the following properties hold for \(i=0,\ldots ,n\) and \(k=0,\ldots ,n_2\):

  1. (i)

    There are no customers waiting for any server which precedes the last \(s_1\) server in the permutation. In other words, for all \(j < \min (n-k, n-i)\) we have \(q_j=0\). In particular, if there is an idle server of type \(s_1\) (meaning \(i > k\)), then there are no waiting customers at all.

  2. (ii)

    If there are any idle servers, then there are no type \(c_1\) customers waiting for service; in other words, if \(i>0\), then all the waiting customers are of type \(c_2\).

  3. (iii)

    If there are no idle servers (all servers are busy), then only the last queue can contain type \(c_1\) customers; in other words, if \(i=0\), then the last queue may contain customers of both types, but all the other waiting customers are of type \(c_2\).

Denote

$$\begin{aligned} \alpha = \frac{\lambda _1}{\lambda }, \quad \theta =\frac{n_1}{n}, \quad \rho =\frac{\lambda }{n_1\mu _1+n_2\mu _2}, \quad \delta =\frac{\lambda _2}{n_1 \mu _1}, \quad r = \frac{\lambda }{n}. \end{aligned}$$

Then a necessary and sufficient condition for stability is

$$\begin{aligned} \rho<1 , \quad \delta <1. \end{aligned}$$

Throughout the paper, we assume the above stability condition. For the stable system, define \(\beta \) as the long-term fraction of customers served by servers of type \(s_1\), and \(1-\beta \) the long-term fraction of customers served by servers of type \(s_2\). Since type \(s_1\) servers are the only ones that can serve type \(c_2\) servers, we must have \(\beta \ge 1-\alpha \), or, equivalently, \(\alpha +\beta \ge 1\). The stable system under FCFS–ALIS may operate in two different modes: it may be that servers of both types share the service of customers of type \(c_1\), in which case \(\beta > 1-\alpha \) and we say that resource pooling occurs for large n, or it may be the case that servers of type \(s_1\) serve almost exclusively only customers of type \(c_2\), and almost all the service of customers of type \(c_1\) is done by servers of type \(s_2\), in which case \(\beta \approx 1-\alpha \) for large n, and we say that the system is decoupled.

Using the results of [1, 2] we can then write the exact stationary distribution of this system. We wish to show that, as the arrival rate and the number of servers increase, the system simplifies, and we get very precise many-server scaling limits, and in particular we find sharp conditions for pooled or decoupled modes of operation. We will investigate the behavior of the system when we fix the values of \(\alpha ,\theta ,\rho \), and let \(n \rightarrow \infty \). To be precise, we shall then have n, \(n_1=\lceil \theta n \rceil ,\,n_2 = n-n_1\)\(\lambda =\rho (\mu _1 n_1+\mu _2 n_2),\,\lambda _1=\alpha \lambda ,\,\lambda _2=(1-\alpha )\lambda \), all of which go to infinity. Average processing times \(1/\mu _1,1/\mu _2\) are fixed and not scaled.

3 Heuristic fluid calculations

In this section we use some heuristic arguments to guess at the fluid behavior of the many-server system. In particular, we calculate a guess for some key quantities. Using these quantities we give a heuristic description of how the system will behave under the FCFS–ALIS policy, in the many-server case, distinguishing between pooled and decoupled modes of operation. The main part of the paper, in Sect. 4, is the verification of these guesses.

We assume some fixed \(\rho<1,\,\delta <1\) so that the system under FCFS–ALIS is stable. We then observe that under many-server scaling there will almost always be some idle servers available of both types and customers will almost never wait, so that they will enter service immediately upon arrival. At the same time, when a server completes a service there will almost never be any waiting customers, so, after almost every service completion, the server will experience some idle time. Because our policy is ALIS, when a server becomes idle, he always joins the end of a queue of idle servers. In a slight abuse of the notation, we reuse \(I_1, I_2\) and K to denote, respectively, the stationary numbers of servers of type \(s_1\), \(s_2\) and the servers of type \(s_2\) which follow the last server of type \(s_1\) in \(\mathfrak {s}\).

When the system is stationary, the sample path of each server will consist of a sequence of cycles, each of which consists of a single service period followed by an idle period (which can be equal to 0). We denote the generic idle periods between services by \(Y_1,Y_2\). We can bound the values of \(T_1,\,T_2\) as follows: servers of type \(s_2\) can serve only customers of type \(c_1\), some of which may also be served by servers of type \(s_1\). Hence, the arrival rate per server is no larger than \(\lambda _1/n_2\), and so the average interval between arrivals is no less than \(n_2/\lambda _1\), and the average service time per arrival is \(1/\mu _2\), hence \(T_2 \ge n_2/\lambda _1 - 1/\mu _2\). Servers of type \(s_1\) serve all customers of type \(c_2\) and may in addition serve some customers of type \(c_1\). Hence, the arrival rate per server is no less than \(\lambda _2/n_1\), and so the average interval between arrivals is no larger than \(n_1/\lambda _2\). The average service time per arrival is \(1/\mu _1\); hence, \(T_1 \le n_1/\lambda _2 - 1/\mu _1\). Hence, we have found that the stationary expected idle time satisfies

$$\begin{aligned} T_1 = E(Y_1) \le \frac{n_1}{\lambda _2} - \frac{1}{ \mu _1}, \qquad T_2 = E(Y_2) \ge \frac{n_2}{\lambda _1} - \frac{1}{ \mu _2}. \end{aligned}$$
(1)

We now distinguish three cases for the values of the parameters:

$$\begin{aligned} \text{ Case } \text{ I }\quad \frac{n_2}{\lambda _1} - \frac{1}{ \mu _2}> & {} \frac{n_1}{\lambda _2} - \frac{1}{ \mu _1} \\ \text{ Case } \text{ II } \quad \frac{n_2}{\lambda _1} - \frac{1}{ \mu _2}< & {} \frac{n_1}{\lambda _2} - \frac{1}{ \mu _1} \\ \text{ Case } \text{ III } \quad \frac{n_2}{\lambda _1} - \frac{1}{ \mu _2}= & {} \frac{n_1}{\lambda _2} - \frac{1}{ \mu _1} \\ \end{aligned}$$

Case I

In this case, by (1) we will have \(T_2>T_1\), and the system will decouple. The reasoning is as follows: because our policy is ALIS, each server, on completion of service, joins the end of the queue of idle servers, and his idle period consists of waiting until all the servers ahead of him who are of his type, as well as all the other servers that can serve customers who are compatible with him, are assigned to customers, and he is then assigned to the next compatible customer.

At the end of his idle period, a server of type \(s_i\) has been idle for \(Y_i\), and he is then the longest idle server of his type. If we assume the idle times \(Y_i\) converge to their means \(T_i\) as the system becomes large, \(i=1,2\), then since \(T_2 > T_1\), we can say that most of the time the longest idle server will be of type \(s_2\). Therefore almost all the arriving customers of type \(c_1\) will be assigned to a server of type \(s_2\), and so servers of type \(c_1\) will serve almost only customers of type \(c_2\).

This implies that in Case I the system under many-server scaling will behave like two separate M/M/s queues. Because servers of type \(s_2\) serve almost all customers of type \(c_1\), and servers of type \(s_1\) serve all customers of type \(c_2\) and almost none of the customers of type \(c_1\), we have, for large n,

$$\begin{aligned} \alpha + \beta \approx 1 \end{aligned}$$

and inequalities (1) will be close to equalities, and we will have (by Little’s law)

$$\begin{aligned} E(I_1) = \lambda _2 T_1 \approx n_1 - \frac{\lambda _2}{\mu _1}, \quad E(I_2) = \lambda _1 T_2 \approx n_2 - \frac{\lambda _1}{\mu _2}. \quad \end{aligned}$$

We can also estimate the value of K, the location of the first type \(s_1\) server. Since service completions of customers of type \(c_1\) occur at rate \(\lambda _1\) and almost all of those are served by type \(s_2\), and service completions of customers of type \(c_1\) occur at rate \(\lambda _1\) and all of those and almost no others are served by type \(s_2\), servers of type \(s_2\) and \(s_1\) join the end of the queue of idle servers at the ratio of \(\lambda _1/\lambda _2\), so \((I_2-K)/I_1 \approx \lambda _1/\lambda _2\) and

$$\begin{aligned} E(K) \approx E(I_2) - E(I_1) \frac{\lambda _1}{\lambda _2} = \lambda _1 \big ( T_2 - T_1) \approx \lambda _1 \left( \frac{n_2}{\lambda _1} - \frac{1}{ \mu _2} - \frac{n_1}{\lambda _2} + \frac{1}{ \mu _1}\right) . \end{aligned}$$

It is worthwhile to note that the condition of Case I that implies decomposition is not simply that \(\delta > \rho \), which is equivalent to \(\frac{\lambda _2}{n_1\mu _1}>\frac{\lambda _1}{n_2\mu _2}\) (the load of customers of type \(c_2\) on servers of type \(s_1\) is higher than the load of customers of type \(c_1\) on servers of type \(s_2\)). In fact, under FCFS, servers of both types may share service of customers of type \(c_1\) even when \(\delta >\rho \). To explain, when \(\delta > \rho \), under decoupled service, the load and therefore the busy time percentage of type \(s_2\) servers is smaller than the load of type \(s_1\) servers, but, if \(\mu _1<\mu _2\), the idle time of type \(s_2\) servers (\(Y_2\)) could be shorter than that of type \(s_1\) servers (\(Y_1\)). In that case, under FCFS the work of \(c_1\) customers will be shared by both types of servers.

Fig. 3
figure 3

FCFS–ALIS many-server system, queues of idle servers decoupled

The stationary behavior of the decoupled system is described in Fig. 3. In this figure we have, from left to right, a section of busy servers of both types serving all the customers in the system, followed by a section of more recent queueing idle servers of mixed types, followed by a section of the oldest idle servers, all of which are of type \(s_2\). Servers that complete service join the queue of idle servers at its left end. Arriving customers of type \(c_1\) pick the oldest waiting server, which is of type \(c_2\); arriving customers of type \(c_2\) skip all the K servers of type \(s_2\), and pick the oldest idle server of type \(s_1\). Note that the idle servers of both types are mixed in the middle section, and \(I_2\ne I_1+K\).

The exact limiting behavior under many-server scaling for Case I is derived in Sect. 4.4, where the heuristic calculations are verified. Our main results for Case I are:

  • The probability that \(K=0\) converges to 0 as \(n\rightarrow \infty \), and so every customer of type \(c_1\) is served by a server of type \(s_2\).

  • The two sets of servers and their customers behave like independent \({M/M/}n_1\) and \({M/M/}n_2\) queues.

Case II

In this case, we argue that \(T_1 \rightarrow T_2\) as \(n\rightarrow \infty \). Assume to the contrary that \(T_1 > T_2\) as \(n\rightarrow \infty \). Then, for large n, we should have that most of the time the longest idle server will be of type \(s_1\). But \(s_1\) servers can serve all customers, and so by ALIS \(s_1\) servers will serve almost all the customers in the system, which is a contradiction. Now assume that \(T_2>T_1\) as \(n\rightarrow \infty \). But in that case we already argued that the system will decouple and so the inequalities in (1) will hold as equalities, which, since we are in Case II, contradicts \(T_2>T_1\). Therefore, there is no decoupling in Case II, and we conclude that, for large n,

$$\begin{aligned} \frac{n_2}{\lambda _1} - \frac{1}{ \mu _2}< T_2 \approx T_1 < \frac{n_1}{\lambda _2} - \frac{1}{ \mu _1}. \end{aligned}$$

Our first conclusion from \(T_2 > \frac{n_2}{\lambda _1} - \frac{1}{ \mu _2}\) is that servers of type \(s_2\) do not serve all the customers of type \(c_1\), so \(1-\beta < \alpha \), i.e., \(\alpha +\beta >1\), and from \(T_1 < \frac{n_1}{\lambda _2} - \frac{1}{\mu _1}\) we conclude that servers of type \(s_1\) serve some customers of type \(c_1\) as well as customers of \(c_2\) (again, \(\beta > 1-\alpha \)).

The following is a heuristic description of the behavior of the system in Case II under many-server scaling. When n increases, the (random) number of idle servers becomes large, of order O(n), and successive servers join the queue of idle servers at short intervals (of expected length \(1/\lambda \), which is O(1 / n)). They will spend a time of O(1) to traverse the queue and will then reach the head of the queue of idle servers with short intervals between them. At this point they will need to wait for a compatible customer, and this waiting time does depend on the type of server, but because \(\lambda \) is large, once a server is at the head of the line his wait for a compatible customer will be short; hence, successive server arrivals to the idle queue are close to each other and so are their departures from the idle queue. So, as \(n\rightarrow \infty \), not only does \(T_1=T_2\), but also the idle times, \(Y_1\) and \(Y_2\), have the same distribution, and K is of order O(1). This heuristic description will be verified in Sect. 4.

We denote by T the presumed common value of \(T_1\) and \(T_2\). We now calculate the value of T. Let T be the average length of the idle time, common to all servers. The average cycle times will be \(1/\mu _1+T\) and \(1/\mu _2+T\). We defined \(\beta \) as the long-run fraction of services performed by \(s_1\) servers, with \(1-\beta \) services by type \(s_2\). The cycle rate of one type \(s_1\) server is \(1/(1/\mu _1+T)\); hence, the processing rate of all type \(s_1\) servers is \(n_1/(1/\mu _1+T)\), which should equal \(\lambda \beta \). Similarly, the flow rate out of all type \(s_2\) servers should equal \(\lambda (1-\beta )\). That is,

$$\begin{aligned} \lambda \beta = n_1/(1/\mu _1+T), \quad \lambda (1-\beta )=n_2/(1/\mu _2+T). \end{aligned}$$

Now we solve for T and \(\beta \) to obtain

$$\begin{aligned} \beta = \frac{n_1}{\lambda } \frac{1}{1/\mu _1+T}, \quad 1- \beta = \frac{n_2}{\lambda } \frac{1}{1/\mu _2+T} \end{aligned}$$
(2)

and a quadratic equation for T:

$$\begin{aligned} g(T)= \lambda \mu _1 \mu _2 T^2 +\big ( \lambda (\mu _1+\mu _2) - (n_1+n_2) \mu _1 \mu _2 \big ) T + \lambda -n_1 \mu _1 - n_2 \mu _2 =0. \end{aligned}$$

Here \(g(0)<0\) because \(\rho <1\), so the equation has one positive and one negative root. Solving for positive T we get

$$\begin{aligned} \begin{aligned} T&= \frac{1}{2} \left( \frac{n}{\lambda } -\frac{1}{\mu _1} - \frac{1}{\mu _2} + \sqrt{\frac{n^2}{\lambda ^2} + 2 \,\frac{n_1-n_2}{\lambda }\left( \frac{1}{\mu _1} - \frac{1}{\mu _2}\right) + \Big (\frac{1}{\mu _1} - \frac{1}{\mu _2}\Big )^2} \right) \\&=\frac{1}{2} \left( \frac{1}{\rho (\theta \mu _1 +(1-\theta )\mu _2)} -\frac{1}{\mu _1} - \frac{1}{\mu _2}\right. \\&\quad \left. +\,\sqrt{\frac{1}{\rho ^2(\theta \mu _1 +(1-\theta )\mu _2)^2} + \frac{4\theta -2}{\rho (\theta \mu _1 +(1-\theta )\mu _2)}\left( \frac{1}{\mu _1} - \frac{1}{\mu _2}\right) + \left( \frac{1}{\mu _1} - \frac{1}{\mu _2}\right) ^2} \right) . \end{aligned} \end{aligned}$$
(3)

Note: for the case of \(\mu _1 = \mu _2 = \mu \) we get \(T=\frac{1-\rho }{\rho }\frac{1}{\mu }\).

Fig. 4
figure 4

FCFS–ALIS many-server system, queues of idle servers pooled

From T and Little’s law we can obtain \(m_{i}\), the approximate average number of idle servers in pool \(i, i=1,2\):

$$\begin{aligned} m_1 = T\lambda \beta = \frac{Tn_1}{T+1/\mu _1},\quad m_2 = T\lambda (1-\beta )=\frac{Tn_2}{T+1/\mu _2}. \end{aligned}$$
(4)

When \(T_1=T_2\), servers are pooled. Servers share the load, and both types of customers receive similar levels of service. The pooled behavior of the system for FCFS–ALIS under many-server scaling is our main interest in this paper. Figure 4 shows the analog of Fig. 3 for the pooled system. Note that the idle servers of both types are mixed, and \(I_2\ne I_1\).

Case III

This case lies on the boundary of the other two cases. As a sanity check, on the one hand, we see that setting \(T_1 = \frac{n_2}{\lambda _1} - \frac{1}{ \mu _2}\) and \(T_2= \frac{n_2}{\lambda _1} - \frac{1}{ \mu _2}\) would correspond to the values for Case I, and result in \(T_1=T_2\). On the other hand, considering the equation (2) for Case II, if we substitute

$$\begin{aligned} \beta = \frac{n_1}{\lambda } \frac{1}{1/\mu _1+T} = \frac{n_1}{\lambda } \frac{1}{1/\mu _1+T_1} = \frac{n_1}{\lambda } \frac{1}{1/\mu _1+n_1/\lambda _2 - 1/\mu _1} = \frac{\lambda _2}{\lambda } = 1 - \alpha , \end{aligned}$$
$$\begin{aligned} 1 - \beta = \frac{n_2}{\lambda } \frac{1}{1/\mu _2+T} = \frac{n_2}{\lambda } \frac{1}{1/\mu _1+T_2} = \frac{n_2}{\lambda } \frac{1}{1/\mu _2+n_2/\lambda _1 - 1/\mu _2} = \frac{\lambda _1}{\lambda } = \alpha , \end{aligned}$$

therefore, \(\alpha +\beta =1\).

4 Many-server limit of the stationary distribution

In this section, we keep the stability assumption \(\rho<1,\,\delta <1\) and derive the many-server limit from the exact stationary distributions.

4.1 Exact stationary distributions

We first obtain the stationary distribution for each state \(\mathfrak {s}\). We note that the stationary probabilities depend mainly on the values of \(k,i_1, i_2\). Let \(\mu (S_j)\) denote the service rate of the server at position j.

Theorem 1

The stationary distribution of the state \(\mathfrak {s}\) of the FCFS–ALIS many-server N-system is given by

$$\begin{aligned} \pi (\mathfrak {s}) = \left\{ \begin{array}{ll} \displaystyle B \prod _{l=1}^{n-i_1-i_2} \left( \sum _{j=1}^l \mu (S_j) \right) ^{-1} \left( \frac{1}{\lambda }\right) ^{i_1+i_2-k} \left( \frac{1}{\lambda _1}\right) ^{k}, &{} \begin{array}{l} k=0,\ldots ,n_2,\; \\ i_1=1,\ldots ,n_1,\; \\ i_2=k,\ldots ,n_2, \end{array} \\ \displaystyle B \prod _{l=1}^{n-k-1} \left( \sum _{j=1}^l \mu (S_j) \right) ^{-1} \prod _{j=n-k}^{n-i_2} \frac{\lambda _2^{q_j}}{(\mu _1n_1+\mu _2(j-n_1))^{q_j+1}}\; \left( \frac{1}{\lambda _1}\right) ^{i_2}, \quad &{} \begin{array}{l} k=1,\ldots ,n_2,\; \\ i_1=0,\;\\ i_2=1,\ldots ,k, \end{array} \\ \displaystyle B \prod _{l=1}^{n-k-1} \left( \sum _{j=1}^l \mu (S_j) \right) ^{-1} \prod _{j=n-k}^{n-1} \frac{\lambda _2^{q_j}}{(\mu _1n_1+\mu _2(j-n_1))^{q_{j}+1}}\; \frac{\lambda ^{q_n}}{(\mu _1n_1+\mu _2n_2)^{q_n+1}}, &{} \begin{array}{l} k=0,\ldots ,n_2,\;\\ i_1=i_2=0, \end{array} \end{array}\right. \nonumber \\ \end{aligned}$$
(5)

where B is a normalizing constant.

Proof

This follows for all three parts of (5) by utilizing properties (i),(ii),(iii) in Sect. 2 and substituting into Equation (2.1), Theorem 2.1, in [2]. \(\square \)

Before we manipulate Eq. (5), we introduce a lemma to facilitate the calculation.

Lemma 1

Letting \(A_1,\ldots ,A_m\) denote a permutation of m given positive real numbers \(a_1,\ldots ,a_m\), we have

$$\begin{aligned} \sum _{(A_1,\ldots ,A_m)\in \mathcal {P}(a_1,\ldots ,a_m) } \prod _{l=1}^m \left( \sum _{j=1}^l A_j \right) ^{-1} =\left( \prod _{l=1}^m a_l \right) ^{-1} \end{aligned}$$

where \(\mathcal {P}(a_1,\ldots ,a_m)\) denotes the set of all the permutations of \(a_1,\ldots ,a_m\).

Now we can get the joint stationary distribution of \(K,\,I_1,\,I_2\). We denote by \(\pi (k,i_1,i_2)\) the stationary probability of \(K=k\), \(I_1=i_1\) and \(I_2=i_2\).

Theorem 2

The steady-state joint distribution of \(K,\,I_1,\,I_2\) is given by

$$\begin{aligned} \pi (k,i_1,i_2) {=} \left\{ \begin{array}{ll} \displaystyle B_1 {n_1\atopwithdelims ()i_1}{n_2\atopwithdelims ()i_2} \frac{i_1 i_2! (i_1+i_2-k-1)!}{(i_2-k)!}\mu _1^{i_1}\mu _2^{i_2} \left( \frac{1}{\lambda }\right) ^{i_1+i_2} \left( \frac{\lambda }{\lambda _1}\right) ^{k}, &{} \begin{array}{l} k=0,\ldots ,n_2,\; \\ i_1=1,\ldots ,n_1,\; \\ i_2=k,\ldots ,n_2, \end{array} \\ \displaystyle B_1 \frac{n_1\,n_2!}{(n_2-k)!}\mu _1\mu _2^{k} \prod _{j=n-k}^{n-i_2} \frac{1}{\mu _1n_1+\mu _2(j-n_1)-\lambda _2}\; \left( \frac{1}{\lambda _1}\right) ^{i_2}, \quad &{} \begin{array}{l} k=1,\ldots ,n_2,\; \\ i_1=0,\;\\ i_2=1,\ldots ,k, \end{array} \\ \displaystyle B_1 \frac{n_1\,n_2!}{(n_2-k)!}\mu _1\mu _2^{k} \prod _{j=n-k}^{n-1} \frac{1}{\mu _1n_1+\mu _2(j-n_1)-\lambda _2}\; \frac{1}{\mu _1n_1+\mu _2n_2-\lambda }, &{} \begin{array}{l} k=0,\ldots ,n_2,\;\\ i_1=i_2=0, \end{array} \end{array}\right. \nonumber \\ \end{aligned}$$
(6)

where \(B_1\) is a normalizing constant.

4.2 The distribution of \((I_1,I_2)\) given K

In this section we obtain the asymptotic distribution of \((I_1,I_2)\) conditional on \(K=k\), as \(n\rightarrow \infty \). We first show that, as \(n\rightarrow \infty \), the probability of no idle servers of type \(s_1\) goes to zero, and so the probability that customers need not wait goes to 1. Next we condition on \(K=k\) and show \(I_1/n {\mathop {\longrightarrow }\limits ^{p}} f_1,\,I_2/n {\mathop {\longrightarrow }\limits ^{p}} f_2\), where

$$\begin{aligned} f_1 = \frac{m_1}{n}= \frac{T \theta }{T+1/\mu _1}, \quad f_2 = \frac{m_2}{n}= \frac{T (1-\theta )}{T+1/\mu _2}, \end{aligned}$$

where T is given in (3). Finally, we condition on \(K=k\) and show that the scaled and centered values of \((I_1,I_2)\) converge in distribution to a bivariate normal distribution. Proofs of the following theorems can be found in the Appendix.

Theorem 3

When \(n\rightarrow \infty \), there exists an \(\epsilon >0\) such that

$$\begin{aligned} P(I_1=0) = o\left( \exp (-\epsilon n)\right) . \end{aligned}$$

From this theorem we see that when \(n\rightarrow \infty \), \(P(I_1>0)\rightarrow 1\). Therefore, \(P(K=k,I_1>0) \rightarrow P(K=k)\) for any \(0\le k\le I_2\). From Eq. (6), given \(K=k\), the limiting stationary distribution as \(n\rightarrow \infty \) is

$$\begin{aligned}&P(I_1=i_1, I_2=i_2|K=k)\rightarrow P(I_1=i_1, I_2=i_2|K=k,I_1>0) \\&\quad = B_1 {n_1\atopwithdelims ()i_1}{n_2\atopwithdelims ()i_2}i_1(i_1+i_2-k-1)! \frac{i_2!}{(i_2-k)!} \mu _1^{i_1}\mu _2^{i_2}\lambda ^{-i_1-i_2-k}\lambda _1^{-k} \frac{1}{P(K=k)} . \end{aligned}$$

Theorem 4

Conditional on \(K=k\), \(\left( \frac{I_1}{n},\,\frac{I_2}{n}\right) \) converges to \((f_1,\,f_2)\) in probability for any \(k\ge 0\). That is, for any \(\epsilon >0\), when \(n\rightarrow \infty \), we have

$$\begin{aligned}&P\left( |I_1-f_1 n|\ge \epsilon n \text{ or } |I_2-f_2 n|\ge \epsilon n |K=k\right) \rightarrow 0. \end{aligned}$$

After showing the fluid limit result, we are now ready to show the central limit result.

Theorem 5

For any \(k\ge 0\), when \(n\rightarrow \infty \), we have

$$\begin{aligned} \left( \left. \frac{I_1- f_1 n }{\sqrt{n}} ,\frac{I_2-f_2 n}{\sqrt{n}} \right| K= k \right) \Rightarrow \mathcal {N} \left( 0, \left[ \begin{array}{ll} \sigma _1^2 &{} \rho \sigma _1 \sigma _2 \\ \rho \sigma _1 \sigma _2 &{} \sigma _2^2 \end{array} \right] \right) , \end{aligned}$$
(7)

where

$$\begin{aligned} \rho= & {} \left( \frac{(\theta -f_1)(1-\theta -f_2)f_1f_2}{\left( \theta f_2+f_1^2\right) \left( (1-\theta )f_1+f_2^2\right) }\right) ^{\frac{1}{2}},\\ \sigma _1= & {} \left( \frac{(\theta -f_1)f_1\left( (1-\theta )f_1+f_2^2\right) }{\theta f_2^2+(1-\theta )f_1^2}\right) ^{\frac{1}{2}},\\ \sigma _2= & {} \left( \frac{(1-\theta -f_2)f_2\left( \theta f_2+f_1^2\right) }{\theta f_2^2+(1-\theta )f_1^2}\right) ^{\frac{1}{2}}. \end{aligned}$$

Note that the above is consistent with the bivariate normal distribution stated in Sect. 3.

4.3 Case II: Pooled system

Now we consider Case II, where \(\frac{n_2}{\lambda _1} - \frac{1}{ \mu _2} < \frac{n_1}{\lambda _2} - \frac{1}{ \mu _1}\). First we show the limit distribution of K, the location of the first type \(s_1\) server.

Theorem 6

In Case II, for any \(k\ge 0\), as \(n\rightarrow \infty \),

$$\begin{aligned} P(K= k) \rightarrow \left( 1-\frac{1-\beta }{\alpha }\right) \left( \frac{1-\beta }{\alpha }\right) ^{k}. \end{aligned}$$
(8)

Theorem 6 shows that K converges in distribution to a geometric distribution in Case II, so \(P(K<\infty )=1\). Therefore, we can extend Theorems 4 and 5 into unconditional versions.

Theorem 7

In Case II, as \(n\rightarrow \infty \), K becomes independent of \(I_1\) and \(I_2\). \(\left( \frac{I_1-f_1n}{\sqrt{n}},\,\frac{I_2-f_2n}{\sqrt{n}}\right) \) converges in distribution to the bivariate normal distribution described in (10).

Consider the special case when \(\mu _1=\mu _2=\mu \). Then \(\theta =\beta \), \(f_1 = (1-\rho )\theta \) and \(f_2 = (1-\rho )(1-\theta )\). When \(n\rightarrow \infty \), \(\left( \frac{I_1-(1-\rho )n_1}{\sqrt{n}},\,\frac{I_2-(1-\rho )n_2}{\sqrt{n}}\right) \) converges in distribution to a bivariate normal distribution with mean (0, 0), variance

$$\begin{aligned} \left( \,\rho \theta (1-\rho (1-\theta )),\, \rho (1-\theta )(1-\rho \theta )\,\right) , \end{aligned}$$

and correlation

$$\begin{aligned} \frac{\rho \sqrt{\theta (1-\theta )}}{\sqrt{(1-\rho (1-\theta ))(1-\rho \theta )}}. \end{aligned}$$

The total idleness has mean of \((1-\rho )n\) and variance of

$$\begin{aligned} Var(I_1)+Var(I_2)+2 Cov(I_1,I_2) = \rho n. \end{aligned}$$

4.4 Case I: Decoupling to two independent systems

We now assume \(\frac{n_2}{\lambda _1} - \frac{1}{ \mu _2} > \frac{n_1}{\lambda _2} - \frac{1}{ \mu _1}\), where we find that under many-server scaling the system decouples into two independent M/M/s service systems. We first show the following proposition:

Proposition 1

In Case I, as \(n\rightarrow \infty \), we have \(P(\alpha I_1 \ge (1-\alpha ) I_2) = o\left( \frac{1}{\sqrt{n}}\right) \).

We next obtain the conditional distribution \(K|(I_1,I_2)\).

Theorem 8

Given \(I_1=i_1n,I_2=i_2n\), where \(i_1\in (0,\theta ), i_2\in (0,1-\theta )\), and \(i_2>\frac{\alpha }{1-\alpha }i_1\), we have

$$\begin{aligned} \left. \frac{K - \left( i_2 - \frac{\alpha }{1-\alpha } i_1\right) n}{\sqrt{n}}\right| (I_1=i_1n,I_2=i_2n) \Rightarrow \mathcal {N}\left( 0,\frac{\alpha i_1}{(1-\alpha )^2}\right) , \text{ as } n\rightarrow \infty . \end{aligned}$$

Therefore, given \((1-\alpha )I_2 > \alpha I_1\), \(P(K=0|I_1,I_2)=o\left( \frac{1}{\sqrt{n}}\right) \). Now we have

$$\begin{aligned} P(K=0)<P(K=0|I_1,I_2)+ P((1-\alpha )I_2 \le \alpha I_1) = o\left( \frac{1}{\sqrt{n}}\right) . \end{aligned}$$

That means the number of type \(c_1\) customers served by \(s_1\) servers is no more than \(o(\sqrt{n})\), which cannot affect the fluid scaled mean or the diffusion scaled variance of two independent decoupled systems.

Theorem 9

In Case I, as \(n\rightarrow \infty \),

$$\begin{aligned} \left( \frac{I_1-\left( n_1 - \frac{\lambda _2}{ \mu _1}\right) }{\sqrt{n}} ,\frac{I_2-\left( n_2 - \frac{\lambda _1}{ \mu _2}\right) }{\sqrt{n}}\right) \Rightarrow \mathcal {N} \left( 0, \left[ \begin{array}{ll} \frac{\lambda _2}{n\mu _1} &{} 0 \\ 0 &{} \frac{\lambda _1}{n\mu _2} \end{array} \right] \right) . \end{aligned}$$
(9)

This is exactly the many-server scaling limiting distribution of the number of idle servers in two independent M/M/s queues, one of which has arrival rate \(\lambda _2\), service rate \(\mu _1\), and \(n_1\) servers; the other has arrival rate \(\lambda _1\), service rate \(\mu _2\), and \(n_2\) servers.

Furthermore, K will then consist of \(I_2\) minus the idle servers of type \(s_2\) which are mingled with the \(I_1\) servers of type \(s_1\). The following calculation obtains the mean and variance of K under many-server scaling. We denote by \(I_{2,1}\) the number of idle servers of type \(s_2\) that are mingled with the \(I_1\) idle servers of type \(s_1\). Since the type \(s_1\) servers join the idle servers with rate \(\lambda _2\) and type \(s_2\) servers join the idle servers with rate \(\lambda _1\), we have

$$\begin{aligned} I_{2,1} = \sum _{j=1}^{I_1} W_i, \end{aligned}$$

where \(W_i\) are i.i.d. random variables independent of \(I_1\), each of them having the distribution of the number of failures before the first success in a sequence of Bernoulli trials with probability of success \(\frac{\lambda _2}{\lambda _1+\lambda _2}\). We have

$$\begin{aligned} E(W_i)= & {} \frac{\lambda _1}{\lambda _2},\quad \\ \text{ Var }(W_i)= & {} \frac{\lambda _1(\lambda _1+\lambda _2)}{\lambda _2^2},\\ E(I_{2,1})= & {} E(I_1) \frac{\lambda _1}{\lambda _2} = \left( n_1 - \frac{\lambda _2}{ \mu _1}\right) \frac{\lambda _1}{\lambda _2}, \\ \text{ Var }(I_{2,1})= & {} E(I_1)\frac{\lambda _1(\lambda _1+\lambda _2)}{\lambda _2^2} + \text{ Var }(I_1)\left( \frac{\lambda _1}{\lambda _2}\right) ^2 \\= & {} \left( n_1 - \frac{\lambda _2}{ \mu _1}\right) \frac{\lambda _1(\lambda _1+\lambda _2)}{\lambda _2^2} + \frac{\lambda _2}{ \mu _1} \left( \frac{\lambda _1}{\lambda _2}\right) ^2. \end{aligned}$$

Furthermore, as \(n\rightarrow \infty \), centered and scaled \(I_{2,1}\) converges to a normal distribution, and is independent of \(I_2\).

It now follows that centered and scaled K also converges to a normal distribution, and centered and scaled \((I_1,I_2,K)\) converge to a multivariate normal distribution. The relevant parameters are

$$\begin{aligned} E(K)= & {} E(I_2)-E(I_{2,1}) = n_2 - \frac{\lambda _1}{ \mu _2} - \left( n_1 - \frac{\lambda _2}{ \mu _1}\right) \frac{\lambda _1}{\lambda _2}\\= & {} \lambda _1 \left( \frac{n_2}{\lambda _1} - \frac{1}{ \mu _2} - \frac{n_1}{\lambda _2} +\frac{1}{ \mu _1} \right) ,\\ \text{ Var }(K)= & {} \text{ Var }(I_2)+\text{ Var }(I_{2,1}) = \frac{\lambda _1}{ \mu _2} + \left( n_1 - \frac{\lambda _2}{ \mu _1}\right) \frac{\lambda _1(\lambda _1+\lambda _2)}{\lambda _2^2} + \frac{\lambda _2}{ \mu _1} \left( \frac{\lambda _1}{\lambda _2}\right) ^2\\= & {} n_1 \frac{\lambda _1 \lambda }{\lambda _2^2} + \lambda _1 \left( \frac{1}{\mu _2} - \frac{1}{\mu _1} \right) . \end{aligned}$$

K is correlated with both \(I_1\) and \(I_2\):

$$\begin{aligned} \text{ Cov }(I_2,K)= & {} \text{ Cov }(I_2,I_2-I_{2,1}) = \text{ Var }(I_2),\\ \text{ Cov }(I_1,K)= & {} \text{ Cov }(I_1, I_2-I_{2,1})=\text{ Cov }(I_1, - I_{2,1})= - \frac{\lambda _1}{\lambda _2} \text{ Var }(I_1). \end{aligned}$$

4.5 Case III: Slowly decoupling as system becomes large

As \(n\rightarrow \infty \), we have seen that when \(\frac{n_2}{\lambda _1} - \frac{1}{ \mu _2} <\frac{n_1}{\lambda _2} -\frac{1}{ \mu _1} \) (Case II), then \(\frac{K}{n}\rightarrow 0\) in probability, and in fact \(K=O(1)\); when \(\frac{n_2}{\lambda _1} - \frac{1}{ \mu _2} > \frac{n_1}{\lambda _2} -\frac{1}{ \mu _1} \) (Case I), then \(\frac{K}{n}\rightarrow \frac{\lambda _1}{n} \left( \frac{n_2}{\lambda _1} - \frac{1}{ \mu _2} - \frac{n_1}{\lambda _2} +\frac{1}{ \mu _1} \right) > 0\) in probability, and in fact \(K=O(n)\). We now examine Case III, where \(\frac{n_2}{\lambda _1} - \frac{1}{ \mu _2} = \frac{n_1}{\lambda _2} -\frac{1}{ \mu _1} \). We will show that in this case, as n becomes large, with fluid scaling the queues decouple, but with diffusion scaling K has nontrivial behavior.

We first prove a monotonicity result on K as a function of \(\alpha \), which holds for all three cases, I, II, and III. To mark dependence on \(\alpha \) we use the notation \(K_\alpha \).

Proposition 2

Keep all the other parameters fixed and change \(\alpha \). If \(\alpha _1<\alpha _2\), then \(K_{\alpha _1}\) stochastically dominates \(K_{\alpha _2}\).

From the monotonicity and the previous statements for Cases I and II, we conclude:

Corollary 1

In Case III, as \(n\rightarrow \infty \), \(\frac{K}{n} \rightarrow 0\) in probability.

We can in fact derive more precise asymptotic results for \(I_1,I_2,K\) in case III. We note first that the result of Theorem 5 on the limiting distribution of \( \left( \left. \frac{I_1- m_1 }{\sqrt{n}} ,\frac{I_2-m_2}{\sqrt{n}} \right| K= k \right) \) as \(n\rightarrow \infty \), for any fixed k, is valid not just in Case II, but also in Cases I and III. In the following theorem we investigate the limit, for fixed k, as \(n\rightarrow \infty \), of \( \left( \left. \frac{I_1- m_1 }{\sqrt{n}} ,\frac{I_2-m_2}{\sqrt{n}} \right| K= kn \right) \).

Theorem 10

For any \(k\in \left[ 0,1-\theta -\left[ \frac{r-\theta \mu _1}{\mu _2}\right] ^+\right) \), as \(n\rightarrow \infty \), we have

$$\begin{aligned} \left( \left. \frac{I_1- f_{1,k} n }{\sqrt{n}} ,\frac{I_2-f_{2,k} n}{\sqrt{n}} \right| K= k n \right) \Rightarrow N \left( 0, \left[ \begin{array}{ll} \sigma _{1,k}^2 &{} \rho _k \sigma _{1,k} \sigma _{2,k} \\ \rho _k \sigma _{1,k} \sigma _{2,k} &{} \sigma _{2,k}^2 \end{array} \right] \right) , \end{aligned}$$
(10)

where

$$\begin{aligned} \rho _k= & {} \left( \frac{f_{1,k} (f_{2,k} - k) (\theta -f_{1,k}) (1-\theta - f_{2,k})}{(f_{1,k}^2 + (f_{2,k} - k) \theta ) ((f_{2,k} - k)^2 + f_{1,k} (1-\theta -k))}\right) ^{\frac{1}{2}},\\ \sigma _{1,k}= & {} \left( \frac{(\theta -f_{1,k})f_{1,k}((f_{2,k}-k)^2+f_{1,k}(1-\theta -k))}{ f_{1,k}^2(1-\theta -k)+(f_{2,k}-k)^2\theta }\right) ^{\frac{1}{2}},\\ \sigma _{2,k}= & {} \left( \frac{(1-\theta -f_{2,k})(f_{2,k}-k)(f_{1,k}^2+(f_{2,k}-k)\theta )}{f_{1,k}^2(1-\theta -k)+(f_{2,k}-k)^2\theta }\right) ^{\frac{1}{2}},\\ \end{aligned}$$

where \(f_{1,k}=\frac{T \theta }{T+1/\mu _1},\,f_{2,k}=\frac{T (1-\theta -k)}{T+1/\mu _2}+k\), and \(T>0\) solves

$$\begin{aligned} \frac{n_1}{\lambda } \frac{1}{1/\mu _1+T} + \frac{n_2-kn}{\lambda } \frac{1}{1/\mu _2+T} = 1. \end{aligned}$$

Note that \(f_{i,0}\) equals \(f_i\), defined in Sect. 4.2, for \(i=1,2\). So when \(k=0\), Theorem 10 agrees with Theorem 5. We can now use these results to obtain the centered and scaled limiting behavior of K in Case III.

Theorem 11

In Case III, as \(n\rightarrow \infty \), \(\frac{K}{\sqrt{n}}\) converges to a half truncated normal distribution with density function

$$\begin{aligned} \sqrt{ \frac{2}{\sigma _K^2\pi }}\exp \left( -\frac{x^2}{2\sigma _K^2}\right) , \forall x\ge 0, \end{aligned}$$

where \(\sigma _K^2= \alpha \left( \frac{\lambda }{n} \left( \frac{1}{\mu _2}-\frac{1}{\mu _1}\right) +\frac{\theta }{(1-\alpha )^2}\right) \).

The result of Theorem 11 in combination with Theorem  10 should in principle allow us to obtain the joint distribution of \((I_1,I_2)\). Its centered and scaled limit is, however, not a bivariate normal distribution, and too messy to write down. Theorem 11 directly implies that \(P(K=0)\rightarrow 0\) as \(n\rightarrow \infty \). That means the proportion of type \(c_1\) customers who are served by type \(s_1\) servers goes to 0. Therefore, we can obtain the following fluid limit result:

Corollary 2

In Case III,

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{I_1-\left( n_1 -\frac{\lambda _2}{ \mu _1}\right) }{n} \rightarrow 0, \lim _{n\rightarrow \infty }\frac{I_2-\left( n_2-\frac{\lambda _1}{\mu _2}\right) }{n} \rightarrow 0, \end{aligned}$$

which is the same as in Case I.

4.6 Comparison to the bipartite FCFS infinite matching model

The infinite matching model was defined and studied in [1, 5, 8] and is as follows: there are a set of customer types \(\mathcal {C}=\{c_1,\ldots ,c_I\}\) and a probability vector \(\mathbf {\alpha }=(\alpha _1,\ldots ,\alpha _I)\), a set of server types \(\mathcal {S}=\{s_1,\ldots ,s_J\}\) and a probability vector \(\mathbf {\beta }=(\beta _1,\ldots ,\beta _J)\), and a bipartite compatibility graph \(\mathcal {G}\subseteq \mathcal {C}\times \mathcal {S}\). There are two infinite sequences \(C^1, C^2, \ldots \) where \(C^m\) are i.i.d. drawn from \(\mathcal {C}\) with probabilities \(\mathbf {\alpha }\), and \(S^1, S^2, \ldots \) where \(S^n\) are i.i.d. drawn from \(\mathcal {S}\) with probabilities \(\mathbf {\beta }\). The two sequences are matched according to the compatibility graph, using FCFS. That is, \(C^1\) is matched to the earliest \(S^n\) in the server sequence that is compatible with it, and thereafter \(C^m\) is matched to the earliest \(S^n\) in the server sequence that is compatible with it, and that was not matched to one of the customers \(C^1,\ldots ,C^{m-1}\). This model is much simpler than a parallel servers queueing model; because there are no arrival times, no busy or idle servers (only a sequence of service types), and no processing times, only ordered customer types and ordered service types matched in the FCFS manner. This model is tractable: under a condition of complete resource pooling the system reaches a steady state, and in particular it is possible to calculate the matching rate for each compatible pair \(r_{s_j,c_i}\), the frequency of matches that happen between server type \(s_j\) and customer type \(c_i\).

In the special case of the infinite matching model corresponding to the N-system, there are an infinite sequence of customers of types \(c_1,c_2\), where the customer types are i.i.d., the type is \(c_1\) with probability \(\alpha \) and \(c_2\) with probability \(1-\alpha \), and an independent infinite sequence of servers of types \(s_1,s_2\), where the server types are i.i.d., the type is \(s_1\) with probability \(\beta \) and \(s_2\) with probability \(1-\beta \), and the compatibility graph \(\mathcal {G}\) has arcs \(\{(c_1,s_1),(c_1,s_2),(c_2,s_1)\}\). The condition for complete resource pooling is then \(\alpha +\beta >1\), corresponding to Case II in our queueing model. Based on the exact formula in [1], successive customers and servers are matched according to FCFS, with matching rates \(r_{c_1,s_1}=\alpha +\beta -1,\,r_{c_1,s_2}=1-\beta ,\,r_{c_2,s_1}=1-\alpha \).

After n customers have arrived and been matched, there may be some unmatched \(s_2\) servers skipped by the customers. We define \(K_n\) to be the number of unmatched \(s_2\) servers before the first unmatched \(s_1\) server after the first n customers have been matched. We can see that \((K_n)_{n=1}^\infty \) is a Markov chain. If \(K_n=0\), that means server \(S^{n+1}\) is of type \(s_1\), and then a new customer \(C^{n+1}\) will be matched to \(S^{n+1}\) and will add a geometrically distributed number with parameter \(\beta \) to \(K_n\). If \(K_n>0\), then a new customer \(C^{n+1}\) of type \(c_1\) will reduce \(K_n\) by 1, and a new customer \(C^{n+1}\) of type \(c_2\) will add a geometrically distributed number with parameter \(\beta \) to \(K_n\). The steady-state distribution for this Markov chain is that \(P(K_\infty = k) = \left( 1-\frac{1-\beta }{\alpha }\right) \left( \frac{1-\beta }{\alpha }\right) ^{k},\,k\ge 0\), which is exactly the limiting distribution of K in (6). This supports our intuition that when the large N-system is underloaded with resource pooling in Case II, the replenishment of idle servers of types \(s_1\) and \(s_2\) becomes i.i.d with probability \(\beta \) and \(1-\beta \), respectively.

In the infinite matching model, if complete resource pooling fails then there is a subset of customer types whose frequency is larger or equal to the frequency of all the compatible server types. In that case the infinite matching model will not reach steady state. However, in such cases there will be a unique decomposition of the model, so that each component on its own is an infinite matching model with complete resource pooling. In the case of the N-model this will happen when \(\alpha + \beta \le 1\), and then the model will decouple to two subsystems, one consisting of customers and servers of types \(c_1,s_2\), and the other of customers and servers of types \(c_2,s_1\). This is exactly the same decomposition that we observe in Cases I and III.

5 Numerical examples

We test our results by investigating an N-system with \(\lambda =100\), \(n_1=n_2 = 100\), \(\mu _1 = \mu _2 = 1\), \(\rho =0.5\). In this example \(\beta =0.5\), \(\theta \rho (1-\rho +\theta \rho )n=(1-\theta )\rho (1-\rho +(1-\theta )\rho )n=37.5\). We use the exact stationary distribution to verify this. We calculate the expectation and variance of the idle number in each pool exactly, listed in the following table. In this example \(\beta =0.5\). When \(\alpha >0.5\) (Case II), so the average number of idle servers in each pool is close to 50, with variance close to \(\theta \rho (1-\rho +\theta \rho )n=(1-\theta )\rho (1-\rho +(1-\theta )\rho )n=37.5\); when \(\alpha <0.5\) (Case I), resource pooling disappears, and \(s_1\) servers seldom serve \(c_1\) customers. The N-system operates like two separate queues: \(s_1\) servers server \(c_2\) customers, and \(s_2\) servers serve \(c_1\) customers. The utilization of the \(s_1\) server pool is \(\frac{(1-\alpha )\lambda }{n_1}\), and the utilization of the \(s_2\) server pool is \(\frac{\alpha \lambda }{n_2}\). When \(\alpha =0.4\), almost zero portion of services performed by \(s_1\) servers are for \(c_1\) customers, the number of idle \(s_1\) servers can be approximated by a normal distribution with mean \(n_1-(1-\alpha )\lambda =40\) and variance \((1-\alpha )\lambda =60\), whereas the number of idle \(s_2\) servers can be approximated by a normal distribution with mean \(n_2-\alpha \lambda =60\) and variance \(\alpha \lambda =40\); when \(\alpha =0.5\) (Case III), we can see that the means are somewhat close to the fluid prediction 50, whereas we do not have analytic approximation for the variances (Table 1).

Table 1 The exact calculation