1 Introduction

Jackson networks (henceforth JN), to be formally introduced later on, are a well established class of models in, e.g., production, telecommunication, computer systems; for surveys see Kelly (1979) and Chen and Yao (2001). JN’s have the desirable property that the distribution of the stationary queue length vector is of product-form, which allows for quick numerical evaluation of performance measures, such as the the mean queue length, mean sojourn times and the throughput at nodes. In this paper we consider JN’s with the additional features:

  • simultaneous breakdown and repair of groups of servers (i.e., repair can be grouped). This allows in particular (i) to model simultaneous breakdown of groups of servers, and (ii) model group repair strategies. For example, repairing several servers simultaneously may lead to more efficient repair actions and thus may reduce the repair time.

  • infinite supply, where infinite supply has the aim to utilize the capacity of a server to the fullest. For example, in service center models it is typically assumed that an agent, when not answering a call, switches to low priority works such as answering email and administrative duties.

Like for classical JN’s, one can obtain for these extended JN’s the steady-state distribution of the queue-length vector at stable nodes in product-from for different type of failure-regimes (a precise definition of “stable node” will be provided later in the text), see Sommer et al. (2017). In addition, closed-form solutions for the long-run throughput of subnetworks and of the complete network are provided.

Design and analysis of stochastic networks are often challenged by the fact that the exact specifications of the network are either not known. This is even more so true for models including breakdowns as there is typically only limited information on breakdowns available. Indeed, during usual operation breakdowns are to be avoided and typically only censored observations are available, which is in contrast to, for example, repairs (indeed, repair times are observable and can often be influenced by a decision maker).

Elaborating on the product-form results in Sommer et al. (2017), we will in this paper investigate the impact of the distribution of the time between breakdowns of the individual servers on the throughput of the network. More specifically, we will model the breakdown behavior through a parameterized distribution, and provide a robustness analysis of the system throughput with respect to the uncertainty parameters. Our analysis shows how for different breakdown and repair regimes, the corresponding risk profiles for system-oriented and customer-oriented performance metrics can be evaluated. It is worth noting that this efficient risk analysis step is only possible due to the simple closed-form solutions obtained for the performance measures. The framework provided in this paper allows to combine robustness analysis and performance modeling in an efficient way.

The research for robustness analysis of stochastic models is a predominant research line in Georg Pflug’s work, see, for example the monographs (Ermoliev et al. 2006; Pflug 2000). Next to his impressive work on stochastic optimization the study of risk and the investigation on how to deal with uncertainty in stochastic models.

The paper is organized as follows. Section 2 gives a brief introduction to the class of generalized JN’s. Robustness analysis is introduced in Sect. 3. The general approach to robustness analysis is presented in Sect. 4. We conclude the paper with discussion of possible future research directions.

2 Jackson networks with breakdowns and repairs

We present a brief review of the theory of JN’s with breakdowns and repairs. For details we refer to Sommer et al. (2017). The network consists of J exponential single server nodes with service discipline “First-Come-First-Served” (FCFS), the node set is denoted by \( {\tilde{J}} = \{ 1 , \ldots , J \}\). At node i a Poisson stream with rate \( \lambda _i \ge 0\) arrives from the exterior node 0, and service times at node i are exponential with rate \( \mu _i \). All service times constitute an independent family of variables which are independent of the arrival streams. Standard customers are indistinguishable and follow the same rules. Routing is Markovian

Nodes in \( V \subseteq {\tilde{J}}\) have an infinite supply from which customers are put into an idling server. We denote \(W:= {\tilde{J}}{\setminus } V\) and require \( V \ne \emptyset \) (unless otherwise specified). Customers from the infinite supply have low priority, and (standard) customers arriving from the outside or from another server have high priority with preemptive-resume regime: Service of a low priority customer is interrupted as soon as a high priority customer arrives. Service of low priority customers is resumed only when the server idles again. When a low priority customer is served and fed into the network, he becomes a high priority customer and follows the rules for standard customers. Service times of the low priority customers are independent from the external arrival streams and the service times of high priority customers.

Let D denote the set of nodes that can breakdown. The breakdown-repair process \(Y=(Y(t): t\ge 0)\) is Markov on state space \(\mathcal {P}(D)\), where \(\mathcal {P}(D)\) denotes the power set of D. \(Y(t)=I\), for \(\emptyset \subseteq I\subseteq D\), indicates that (exactly) the nodes in I are broken down. The transition rates of Y out of \(I\subseteq D\) are given as

  1. 1.

    if \(I\subset H\subseteq D\), the nodes in \(H{\setminus } I\) break down with rate \(\alpha (I,H)\ge 0\),

  2. 2.

    if \(\emptyset \subseteq K\subset I\), the nodes in \(I{\setminus } K\) are repaired with rate \(\beta (I,K)\ge 0\).

Rates \(\alpha (\cdot ,\cdot )\) and \( \beta (\cdot ,\cdot )\) are constructed from any pair of functions \(A, B: \mathcal {P}(D) \rightarrow [0,\infty ),\) subject to (i) \(A(\emptyset ) = B(\emptyset ) =1\), (ii) \(\forall ~ I\subset H\subseteq D:~ {A(H)}/{A(I)}<\infty \), and (iii) \(\forall ~\emptyset \subseteq K\subset I:~ {B(I)}/{B(K)}<\infty \) (where we set \(0/0=0\)).

With these functions we set for all subsets of down nodes \(I \subseteq D\)

$$\begin{aligned} \alpha (I, H)=\frac{A(H)}{A(I)},\ I\subset H\subseteq D, \text { and } \beta (I, K)=\frac{B(I)}{B(K)},\ \emptyset \subseteq K\subset I. \end{aligned}$$
(1)

Remark 1

With suitable functions A and B we can model, e.g., that nodes may break down isolated or in groups, and repair may happen similarly. It is not required that nodes which are broken down are repaired simultaneously. A statistical procedure to check whether this form is justified, is to determine in a first step all possible values \( A ( I ) = \alpha ( \emptyset , I \} \) and \(B ( I ) = \beta (I , \emptyset ) , \forall I\subseteq D \), and then to check (1) stepwise.

The availability process Y is an ergodic Markov process with stationary distribution

$$\begin{aligned} \pi (I)=\left( \sum _{K\subseteq D} \frac{A(K)}{B(K)}\right) ^{-1} \cdot \frac{A(I)}{B(I)}, \quad \forall I\subseteq D. \end{aligned}$$
(2)

From this the stationary (time) point availability (PA) of a Jackson network with infinite supply and unreliable nodes (or subnetworks thereof) may be computed similar to Sauer and Daduna (2003), p.185) as \( \mathrm{{ PA }} (H):=\sum _{K\subseteq D{\setminus } H} \pi (K) \), for \( H\subseteq D\) and \(t\ge 0 \), where \(\pi (I)\) is the probability that exactly the nodes in \(I\subseteq D\) are under repair as given by (2).

The following regime is set in force whenever a node breaks down:

  • service at this node is interrupted, customers (of high as well as of low priority) are frozen there to wait for restart of the service, which is resumed at the point where it was paused,

  • no new customers are admitted to enter that node,

  • customers who select a broken down node to visit are rerouted according one of the classical rules: stalling, skipping or blocking rs-rd, which will be defined below,

  • all these rules, if applicable, are valid for high and low priority customers.

Rerouting is a functional of Y and applies only to high priority customers, because on departure from a node with infinite supply low priority customers are transformed immediately to high priority, and only thereafter are rerouted. We distinguish the following rerouting schemes:

  • Stalling: Whenever a node breaks down the service system is frozen: All arrival processes are interrupted and service everywhere in the network is stopped until all broken down nodes are repaired again. Stalling is applied, e.g., in the automotive industry to decrease variability of the flow of materials. Indeed, stalling prevents servers to send parts to a server that is broken down and thereby prevents piling up inventory.

  • Skipping: If as next destination of a customer a down node is selected, the customer jumps to this node, spends no time there, and immediately performs the next jump according to routing regime R until he arrives at a node in up status or leaves the network. Skipping is applied, e.g., in production networks where skipping a production step yields a product of lower but sufficient quality.

  • Blocking rs-rd: Broken down stations are blocked. A customer whose next destination is down stays at his present node to obtain immediately another service there. After the repeated service (rs) the customer chooses his next destination anew according to R (random destination (rd)). Blocking rs-rd is applied, e.g., in communication networks where packages are rerouted in case a link is not available.

Throughout this article, it is assumed that all nodes in W are stable, i.e., the traffic rate \(\eta _i\), following from the general traffic equations for Jackson networks with infinite supply but no breakdowns and repairs, is smaller than its service rate \(\mu _i\) for every node i in W, i.e., \(\eta _i < \mu _i\). Without breakdowns, i.e., \(D = \emptyset \), the traffic equations of Jackson networks with infinite supply is

$$\begin{aligned} \eta _i= \lambda _i + \sum _{j\in W} \eta _j r(j,i) + \sum _{j\in V} \mu _j r(j,i), \quad i\in {\tilde{J}}. \end{aligned}$$
(3)

By assumption, under blocking rs-rd the following reversibility constraints hold:

$$\begin{aligned} \eta _i r(i,j)&=\eta _j r(j,i)\quad \forall i,j\in W, \end{aligned}$$
(4)
$$\begin{aligned} \eta _i r(i,j)&=\mu _j r(j,i)\quad \forall i\in W, j\in V, \end{aligned}$$
(5)

and in case of skipping the following rate stability constraints hold:

$$\begin{aligned} \eta _i=\mu _i, \quad \forall i\in V\cap D. \end{aligned}$$
(6)

These constraints ensure that solution \(\eta _i\), \(i\in {\tilde{J}}\), from (3) is also the solution of the traffic equations for unreliable Jackson networks under any breakdown scenario. For more details see Sommer et al. (2017). Under these assumptions, we provide an overview on the studied performance characteristics in the following. For an overview of other performance characteristics see Sommer et al. (2017).

Under stalling the stationary throughput at nodes \(i\in W\) (no infinite supply) is

$$\begin{aligned} \eta _i\cdot \pi (\emptyset ) . \end{aligned}$$

Under blocking rs-rd and skipping the stationary throughput at a node \(i\in W\) is

$$\begin{aligned} \eta _i\cdot \sum _{I\subseteq D, i\notin I}\pi (I) . \end{aligned}$$

To simplify the presentation, we will in the following only consider individual breakdowns and repairs, where for a server \( i \in D \) the breakdown rate will be denoted by \( \tau _i \) and the repair rate by \( \rho _i\). We conclude this section with a short discussion on the robustness of JN’s as modeling class for stochastic networks.

Remark 2

Suppose that the physical layout of the network, i.e., the number of nodes, and the topology are known, as well as mean service times, mean inter-arrival times of customers at the network and routing decisions. Provided that this rough information constitute the only available data in advance, arguments exploiting entropy properties lead to use so-called product-form models as conservative first order models. Indeed, if mean service times and mean inter-arrival times are given, the exponential distribution is known to maximize the entropy over all distributions with support \( \mathbb {R}_+ := [0,\infty ) \) and these means,Footnote 1 see, for example, Park and Bera (2009), Lisman and van Zuylen (1972). Furthermore, for given arrival and service rates at the stations, a product-form solution maximizes the entropy of the stationary queue length distribution (Ferdinand 1970; Walstra 1985). Therefore, product-form solutions are conservative and robust models with respect to model insecurity in the service time and inter-arrival time distributions. Hence, working with a model with exponentially distributed service times and inter-arrival times that does have a product-form solution for the stationary queue length distribution, provides a robust model for performance analysis.

3 Robustness analysis

As pointed out in the introduction, breakdown rates are hard to estimate via historical data. Therefore, one is typically confronted with uncertainty about the true value of the parameters defining the distributions of the time between breakdowns, in our case the failure rate. This is known as parameter uncertainty in the literature, see, for example, Haverkort and Meeuwissen (1992), for a discussion on integration of parameter uncertainty into queueing models and Henderson (2003) for a discussion on parameter insecurity from a broader perspective. In the following, the focus is on robustness analysis of our queuing model with respect to uncertainty about the breakdown rates.

In modeling parameter uncertainty, the choice of the distribution is of importance and one typically chooses a particular distribution based on (possible incomplete) knowledge that is available. For example, if the mean and the variance are known, and if, in addition, we know that the parameter may take values in \( \mathbb {R}\), the most general distribution is the normal distribution, where “most conservative” refers to the fact that this distribution maximizes the entropy. On the other hand, when, due to expert knowledge, it is known that the parameter falls into an interval, say, [ab], then the uniform distribution on [ab] is the entropy maximizing distribution; see, for example, Kullback (1959). Alternatively, there may be statistical knowledge available on \( \theta \) based on measurements. Then, the distribution of the statistic used for estimating \( \theta \) is a natural candidate for the distribution of \( \theta \).

Formally, we assume that the breakdown rate \( \tau \) is a random variable defined on some underlying probability field, and that the probability density function for \( \tau \), denoted by \( f_\tau \), is known. We let \( h ( \tau ) \) denote some reward function. Think, for example, of h as the stationary throughput at node i. Provided that h is invertible and the inverse is differentiable with respect to the throughput

$$\begin{aligned} g(y ) = f_\tau ( h^{-1}(y ) ) \left| \frac{d}{dy} \left( h^{-1}(y ) \right) \right| \end{aligned}$$
(7)

yields the density of the stationary throughput. Based on the distributional assumptions or statistical information comprised in \( f_{\tau }\), one may, as is common practice in applied probability, take the expected value of \( \tau \), denoted by \( \mu _\tau = \int y f_\tau ( y ) dy \), as a noise-free approximation of \( \tau \) and subsequently \( h ( \mu _{\tau } ) \) as output for the throughput. Since \( h ( \mu _{\tau } ) \) is typically not close to \( \mathbb {E} [ h ( \tau ) ] \), simply using \( \mu _\tau \) instead of \( \tau \) falls short of bringing the risk incurred by the insecurity on \( \tau \) to light. To analyze the impact \( \tau \) has on \( h ( \tau ) \), we consider the value at risk of \( h ( \tau )\), denoted in short by \( \mathrm{VaR } ( \alpha ) \), where \( \mathrm{VaR } ( \alpha )= q \) if and only if

$$\begin{aligned} G^{ -1} ( \alpha ) = q , \end{aligned}$$

where \( G ( \cdot ) \) denotes the cumulative distribution function of \(h(\tau )\), which is, for ease of presentation, assumed to be continuous and invertible. The potential misspecification at an \( \alpha \) probability level is thus \( h ( \mu _\tau ) - \mathrm{VaR } ( \alpha ) \). Note that for the throughput we want to hedge against the risk of low values, so we use the \( \alpha \)-quantile, whereas for cost functions one would measure the risk through the \( (1- \alpha ) \)-quantile.

In the following, we make the reasonable assumption that the “true” breakdown rate at node i, that is, \( \tau _i \), is not revealed to us and we therefore assume that \( \tau _i \) follows a given distribution \( F_i \). Instances of the throughput can be easily obtained by sampling the \( \tau _i \)’s according to their assumed distribution and evaluating the realization of the stationary throughput. Creating a sufficient number of samples, the density and the cumulative distribution function of the throughput can be estimated and evaluated for further robustness analysis. We would like to point out that a similar robustness analysis can be performed for other performance measures of the queuing network as given (Sommer et al. 2017).

We illustrate the application of the above results to robustness analysis with the help of the following examples.

3.1 A tandem system

Fig. 1
figure 1

The two-way-tandem network as analyzed in Sect. 3.1

Consider the network with \(J=\{1,2,3 \}\) nodes given in Fig. 1. The network is a two-way tandem of three nodes. The infinite supply is depicted by a dashed arrow pointing to server 2, and the node that is prone to failure is depicted as a grey circle. Note that by incorporating node 0, the linear topology is transformed into a ring. To summarize, \(V=\{2\} \), \(W=\{1,3\}\), and \(D=V\), i.e., the infinite supply node is prone to failure. Routing is given by

$$\begin{aligned} r(1,2)= & {} a, r(1,0)=1-a, r(2,3)=b, r(2,1)\\= & {} 1-b, r(3,0)=c, r(3,2)=1-c , \end{aligned}$$

and

$$\begin{aligned} r(0,1)= \frac{\lambda _1}{\lambda _1+\lambda _3} , r(0,3)= \frac{\lambda _3}{\lambda _1+\lambda _3}, \end{aligned}$$

for \( 0< a , b < 1 \) and \( \lambda _i > 0 \), \( i=1,2\).

For ease of analysis, we parameterize the model and set \( \lambda _1 = ( 1 -a ) t \), for \( t > 0 \), and \( \lambda _3 = a t \), and \( b = 1 - c \). Regarding the service rates it holds that

$$\begin{aligned} \mu _1> t , \, \mu _2 = t \frac{a}{c} \quad \text { and } \quad \mu _3 > t \frac{a}{c} . \end{aligned}$$

In this model, we assume for a breakdown scenario that the rate with which a breakdown of server \(i=2\) occurs is given by \(\tau _{2}\), and we denote the corresponding repair rate by \(\rho _{2}\). Then,

$$\begin{aligned} \pi ( \emptyset ) = \frac{\rho _2}{\rho _2 + \tau _2} \end{aligned}$$

and the throughput at node 3 under stalling as a function of a certain \(\tau _2\) is given by

$$\begin{aligned} h ( \tau _2 ) = \frac{\eta _3 \rho _2}{ \rho _2 + \tau _2} , \end{aligned}$$

and by computation

$$\begin{aligned} h^{-1} ( y ) = \frac{\eta _3 \rho _2}{y} - \rho _2 \quad \hbox { and } \quad \frac{d}{d y } h^{-1} ( y ) = - \frac{\eta _3 \rho _2}{y^2} . \end{aligned}$$

Let h denote the stationary throughput at node \(i=3\) and model \(\tau _{2}\) as being random. In the following, two distributions for \(\tau _2\) are elaborated, the uniform and exponential distribution, respectively:

  • Assume that \( \tau _2 \) is uniformly distributed on [ab] , with \( 0< a< b < \infty \). Then,

    $$\begin{aligned} g ( y ) = \frac{1}{b-a} \frac{\eta _3 \rho _2}{y^2} \, , \quad \hbox { for } \, \frac{\eta _3\rho _2}{ \rho _2 + b} \le y \le \frac{\eta _3 \rho _2}{ \rho _2 + a} , \end{aligned}$$

    and zero otherwise. It holds that \(\mu _{\tau _2} = (a+b)/2\). The value at risk, i.e., the \( \alpha \)-quantile of the stationary throughput, is also easily computable to be

    $$\begin{aligned} \mathrm{VaR} ( \alpha ) = \frac{\eta _3 \rho _2}{ \rho _2 + b - (b-a) \alpha } , \end{aligned}$$

    for \( \alpha \in [ 0 , 1 ]\). In words, for \( \alpha \cdot 100 \%\) of the possible breakdown rates the actual throughput of the system will fall below \(\mathrm{VaR} ( \alpha ) \). Observe, that for \( \alpha = 1 \) we have \( \mathrm{VaR} ( \alpha ) = \eta _3 \rho _2 / ( \rho _2 + a ) \), which is the right bound of the support of \( \tau _2 \), and for \( \alpha = 0 \) we have \( \mathrm{Var} ( \alpha ) = \eta _3 \rho _2 / ( \rho _2 + b) \), which is the left bound of the support of \( h(\tau _2) \).

    Suppose \(b = k \cdot a\) with \( k >1\), then \(\tau _2 \) is uniformly distributed on [aka], and \( \tau _2 \) becomes rather uncertain for large values of k. Then the above analysis uncovers the exposed risk by expecting the throughput to be of order \(h ( \mu _{\tau _2})\) without taking the stochasticity into account. In particular, with chance \( \alpha \) the realized throughput \( h (\tau _2) \) is at least

    $$\begin{aligned} \left( 1 - \frac{\frac{2\rho _2}{a}+k+1}{\frac{2\rho _2}{a} + 2 (1-\alpha ) k +2 \alpha } \right) \cdot 100\% \end{aligned}$$
    (8)

    smaller than \(h( \mu _{\tau _2})\). For example, let \(\alpha = 0.1\), then with probability 0.1 the actual throughput \(h( { \tau _2}) \) is at least approximately \(44.4\%\) smaller than \(h( \mu _{ \tau _2} )\) for relatively large k.

  • Let \( \tau _2 \) be exponentially-\( \lambda \)-distributed so that \(\mu _{ \tau _2}=1/\lambda \). Then the density for the throughput via (7) equals

    $$\begin{aligned} g ( y ) = \frac{\lambda \eta _3 \rho _2}{y^2} \exp \left( - \lambda \rho _2 \left( \frac{\eta _3 }{y} - 1 \right) \right) , \end{aligned}$$

    for \( y \in ( 0 , \eta _3 ]\). It can be shown that the cumulative distribution function of g(y) is given by

    $$\begin{aligned} G(y) = \exp \left( -\frac{\lambda \eta _3\rho _2}{y} + \lambda \rho _2 \right) , \quad \text{ for } y \in ( 0 , \eta _3 ], \end{aligned}$$
    (9)

    which leads to

    $$\begin{aligned} \mathrm{VaR}(\alpha ) = \frac{\lambda \rho _2 \eta _3}{\lambda \rho _2 - \ln (\alpha )}, \end{aligned}$$
    (10)

    for \(\alpha \in (0,1)\). Note that the “naive” throughput is given by

    $$\begin{aligned} h(1/\lambda ) = \frac{\lambda \rho _2\eta _3 }{\lambda \rho _2 + 1} \end{aligned}$$

    and the difference between \( h ( 1 / \lambda ) \) and \( \mathrm{VaR}(\alpha ) \) expresses the model risk at probability \( \alpha \).

Remark 3

Consider the uniform model for \( \tau _2 \) and consider the reasonable case that the breakdown rate is small, i.e., assume that \( \tau _2 \) is close to zero. More specifically, let \(\tau _2 \sim U(0,\;2\cdot \epsilon )\), and assume that \(\rho _2 = c \cdot \epsilon \), for \(c > 1\). It then holds for the relative error that

$$\begin{aligned} \frac{h( \mathbb {E} [\tau _2] ) - \mathrm{VaR}(\alpha )}{ h( \mathbb {E}[ \tau _2])} = 1 - \frac{1+c}{2(1-\alpha ) + c}, \end{aligned}$$

which implies that \( \mathrm{VaR}(\alpha ) \) is \( (1 - \frac{1+c}{2(1-\alpha ) + c}) \cdot 100 \) percent smaller than \(h( \mathbb {E} [\tau _2])\) for \( \alpha \le 1/2 \), and \( \mathrm{VaR}(\alpha ) \) is \( (\frac{1+c}{2(1-\alpha ) + c}-1 ) \cdot 100 \) percent lager than \(h( \mathbb {E} [\tau _2])\) for \( \alpha \ge 1/2 \). This reasoning allows for a quick assessment of the impact of the postulated model on the breakdown rate.

3.2 A star-like system

Fig. 2
figure 2

The star-like network as analyzed in Sect. 3.2

Consider a network with \(J=\{1,2, \dots , 6 \}\), \(V=\{2, 3 ,4 \}\), \(W=\{1,5, 6 \}\), and \(D=V\), i.e., all infinite supply nodes are prone to failure, depicted by grey circles in Fig. 2. Jobs arrive from outside with rate \( \lambda \) at the central node 1. From node 1 they go with probability r / 5 to any of the nodes 2 to 6, for \( r \in ( 0 , 1 ) \). After finishing service at node \( i = 2 , \ldots , 6 \), jobs are sent back to the central node 1. Being served there, they either leave the system with probability \(1- r \), or are sent back to one of the servers in the set \( \{ 2 ,\ldots , 6 \}\) according to the routing scheme described above. Let h denote the stationary throughput at node \( i=2 \) and model \( \tau _2 \) as being random. Then, the throughput at node 2 as a function of certain \(\tau _2\) under skipping is

$$\begin{aligned} h ( \tau _2 )= & {} \eta _2 \pi ( \emptyset ) \left( 1 + \frac{\tau _3}{\rho _3} + \frac{\tau _4}{\rho _4} + \frac{\tau _3 + \tau _4}{ 2 \min ( \rho _3 , \rho _4 ) } \right) , \end{aligned}$$

with \( \pi (\emptyset ) \) given as

$$\begin{aligned} \pi ( \emptyset )= & {} \left( 1 + \sum _{i=2}^4 \frac{\tau _i}{\rho _i} + \frac{\tau _2 + \tau _3}{2 \min ( \rho _2 , \rho _3 )} + \frac{\tau _2 + \tau _4}{2 \min ( \rho _2 , \rho _4 )}\right. \nonumber \\&\left. +\, \frac{\tau _3 + \tau _4}{2 \min ( \rho _3 , \rho _4 )} +\frac{\tau _2+ \tau _3 + \tau _4}{3 \min ( \rho _3 , \rho _3 , \rho _4 )} \right) ^{-1}. \end{aligned}$$

Letting

$$\begin{aligned} a_1= & {} 1 + \frac{\tau _3}{\rho _3} + \frac{\tau _4}{\rho _4} + \frac{\tau _3 + \tau _4}{ 2 \min ( \rho _3 , \rho _4 ) } + \frac{\tau _4}{ 2 \min ( \rho _2 , \rho _4 ) } \\&+\, \frac{\tau _3}{ 2 \min ( \rho _2 , \rho _3 ) } + \frac{\tau _3 + \tau _4 }{ 3 \min ( \rho _2 , \rho _3 , \rho _4 ) } , \\ a_2= & {} \frac{1}{ \rho _2} + \frac{1}{ 2 \min ( \rho _2 , \rho _4 ) } + \frac{1}{ 2 \min ( \rho _2 , \rho _3 ) } + \frac{1}{ 3 \min ( \rho _2 , \rho _3 , \rho _4 ) } , \end{aligned}$$

and

$$\begin{aligned} a_3 = \eta _2 \left( 1 + \frac{\tau _3}{\rho _3} + \frac{\tau _4}{\rho _4} + \frac{\tau _3 + \tau _4}{ 2 \min ( \rho _3 , \rho _4 ) } \right) , \end{aligned}$$

we may write for a constant \(\tau _2\)

$$\begin{aligned} h ( \tau _2 ) = \frac{a_3}{ a_1 + a_2 \tau _2 } . \end{aligned}$$

Hence,

$$\begin{aligned} h^{-1} ( y ) = \frac{a_3}{a_2 y} - \frac{a_1}{a_2} \quad \hbox { and } \quad \frac{d}{d y } h^{-1} ( y ) = - \frac{a_3 }{a_2 y^2} . \end{aligned}$$

In the following we show that for uniformly and exponentially distributed \(\tau _2\), closed form expressions for the value at risk can be obtained:

  • Assume that \( \tau _2 \) is uniformly distributed on [ab] , with \( 0< a< b < \infty \). Then,

    $$\begin{aligned} g ( y ) = \frac{1}{b-a} \frac{a_3 }{a_2 y^2} \, , \quad \hbox { for } \, \frac{a_3}{ a_1 + a_2 b} \le y \le \frac{a_3}{ a_1 + a_2 a} \end{aligned}$$

    and zero otherwise. The value at risk, i.e., the \(\alpha \) quantile of the stationary throughput, is also easily computable to be

    $$\begin{aligned} \mathrm{VaR} ( \alpha ) = \frac{a_3}{ a_1 + a_2 b - \alpha ( b-a ) a_2 } , \end{aligned}$$

    for \( \alpha \in [ 0 , 1 ] \).

  • For \(\tau _2\) exponentially distributed with parameter \(\lambda \) it holds that

    $$\begin{aligned} g(y) = \lambda \exp \left( -\lambda \left( \frac{a_3}{a_2y}-\frac{a_1}{a_2}\right) \right) \frac{a_3}{a_2y^2}, \end{aligned}$$
    (11)

    for \(y \in (0,\frac{a_3}{a_1})\), so that

    $$\begin{aligned} G(y) = \exp \left( -\lambda \left( \frac{a_3}{a_2y}-\frac{a_1}{a_2}\right) \right) , \end{aligned}$$
    (12)

    and thus

    $$\begin{aligned} \mathrm{VaR} ( \alpha ) = \frac{\lambda a_3}{ \lambda a_1 - a_2 \ln (\alpha )} , \end{aligned}$$

    for \(\alpha \in (0,1)\).

As we have shown in this section, for uniform and exponential distributions, \( \mathrm{VaR} ( \alpha ) \) can be explicitly solved, which is due to the simplicity of both distributions. In the following we study the more challenging problem when the distribution of \( \tau _2 \) is of general form.

4 The general approach

In this section we provide a general approach to approximately computing \( \mathrm{VaR} ( \alpha ) \). Revisit the two-way network from example in Sect. 3.1. Let \(\tau _2\) be normally distributed with mean \(\mu \) and standard deviation \(\sigma \) but conditioned on interval \([\gamma _l,\gamma _r]\) where \(0 \le \gamma _l < \gamma _r\). The rationale behind the conditioning is that negative values as well as non-realistically large values for \( \tau _2 \) are avoided. Then, following (7), the throughput at node 3 under stalling has density g for

$$\begin{aligned} \frac{\eta _3 \rho _2}{\rho _2 + \gamma _r} \le y \le \frac{\eta _3 \rho _2}{\rho _2 + \gamma _l} \end{aligned}$$
(13)

given by

$$\begin{aligned} g ( y ) = \frac{\eta _3 \rho _2 }{ \varDelta _\varPhi \sigma \sqrt{2\pi } y^2 } \exp \left( - \frac{ \left( \frac{\eta _3 \rho _2}{y} - \rho _2 - \mu \right) ^2 }{ 2 \sigma ^2 } \right) , \end{aligned}$$
(14)

where

$$\begin{aligned} \varDelta _\varPhi = \varPhi \left( \frac{\gamma _r-\mu }{\sigma } \right) - \varPhi \left( \frac{\gamma _l-\mu }{\sigma } \right) \end{aligned}$$

and \(\varPhi (\cdot )\) is the standard normal cumulative distribution function. To obtain the VaR we need to find the inverse of the cumulative distribution of the throughput given by

$$\begin{aligned} G(y) = \int _{\frac{\eta _3 \rho _2}{\rho _2 + \gamma _r}}^y g ( t ) dt. \end{aligned}$$

Computing the inverse of a general function can usually only be performed numerically. However, in case the function of interest is analytical and can thus be written as a power series, a power series representation of the inverse can be obtained. This result is well-known in analysis, see, e.g., Dettman (2012). However, computing the actual elements of the power series is a challenging task. A first result can be found in Whittaker’s pioneering paper (Whittaker 1951). In particular, Whittaker provided an explicit expression for the elements of the power series of the inverse in terms of the elements of the power series of the original function. Unfortunately, the computation of the elements is rather demanding. An alternative approach, that suffers from the same computational burden is Lagrange’s inversion formula (Abramowitz and Stegun 1992). Dominici (2003) introduced a method for numerical inversion that is very well suited to computing the VaR of a transformation of an exponentially distributed random variable. In the following we will present this approach.

For an infinitely often differentiable mapping f define the nested derivative \( \mathcal{D}^n [ f] (x ) \) by the recursion

$$\begin{aligned} \mathcal{D}^0 [ f] (x ) =1 \end{aligned}$$

and

$$\begin{aligned} \mathcal{D}^n [ f] (x ) = \frac{d}{d x } \Big ( f ( x ) \mathcal{D}^{n-1} [ f] (x ) \Big ) , \end{aligned}$$

for \( n \ge 1 \). Let

$$\begin{aligned} h ( x ) = \int _a^x \frac{1}{f ( t ) } dt \end{aligned}$$
(15)

with \( f ( a ) \not = 0 , \infty \). Then according to Theorem 4.1 in Dominici (2003) the inverse of h(x) is given by

$$\begin{aligned} h^{-1} ( y ) = a + f ( a ) \sum _{n \ge 1 } \mathcal{D}^{n-1} [ f] ( a ) \frac{y^n}{n!} , \end{aligned}$$
(16)

where \( | y | < \epsilon \) for some \( \epsilon > 0 \). The elements of the series can be easily evaluated by means of standard computer algebra tools. We refer to Dominici (2003) for details.

Example 1

Consider the exponential mapping \( e^x \) and apply the method of nested derivatives. Note that

$$\begin{aligned} h ( x ) := e^x -1 = \int _0^x e^t d t . \end{aligned}$$

Let \( f ( x ) = e^{-x }\), then

$$\begin{aligned} D^n[f](x) = (-1)^nn!e^{-nx} \end{aligned}$$

so that \(D^n[f](0) = (-1)^n n!\). Since f is analytical we obtain the inverse of h(x) as

$$\begin{aligned} h^{-1} ( y ) = \sum _{n=1}^\infty (-1)^{n-1} \frac{y^n}{n} , \end{aligned}$$

which is easily recognizable as the series expansion of \( \ln ( y +1 ) \) around 0.

As illustrated in the above example, the method of nested derivatives allows for a direct analysis of the function under the integral. This is particularly useful in VaR computations as the analysis can directly be applied to the density and computation of the cumulative distribution function can thus be avoided.

In the following we present the main result on nested derivatives, where we write \( {\bar{g}} ( t ) = 1 / g ( t )\).

Theorem 1

Let G(y) be a cumulative distribution function on \( B= [ b_l , b_r ] \), where \( B = \mathbb {R} \) is not excluded. Suppose that there exits g(t) , for \( t \in B \), such that

  1. (i)

    for \( y \in B \) it holds that

    $$\begin{aligned} G ( y ) = \int _{b_l }^y g ( t ) dt , \end{aligned}$$
  2. (ii)

    G is analytical on the interior of B as a mapping in y,

  3. (iii)

    there is an \( a \in B \) such that \( g(a) \ne 0 \).

Let

$$\begin{aligned} c_a = \int _{b_l}^a g ( t ) dt = G(a), \end{aligned}$$

then

$$\begin{aligned} \mathrm{VaR} ( \alpha ) = a + {\bar{g}} ( a ) \sum _{n \ge 1 } \mathcal{D}^{n-1} [ {\bar{g}}] ( a ) \frac{(\alpha - c_a ) ^n}{n!} , \end{aligned}$$

for \( \alpha \) sufficiently close to \( c_a \).

Proof

For \( x \ge a \), write

$$\begin{aligned} G ( x ) = c_a + \int _a^x g ( t ) \, dt \end{aligned}$$

and let

$$\begin{aligned} G_{c_a} ( x ) = G ( x ) - c_a = \int _a^x \frac{1}{ {\bar{g}} ( t ) } dt . \end{aligned}$$

We now apply the nested derivatives method to \( {\bar{g}} ( t ) \). From (16) [see also Dominici (2003)] it then follows that

$$\begin{aligned} G_{c_a}^{-1} ( y ) = a + {\bar{g}} ( a ) \sum _{n \ge 1 } \mathcal{D}^{n-1} [ {\bar{g}}] ( a ) \frac{y^n}{n!} , \end{aligned}$$

for |y| sufficiently small. Noting that \( \mathrm{VaR} ( \alpha ) = G^{-1}_{c_a} ( \alpha - c_a ) \) concludes the proof. \(\square \)

Note that the advantage of Theorem 1 lies in the fact that the elements of the series expansion have to be computed once for a, yielding a polynomial approximation of VaR on an entire interval. For ease of reference define for \(N \in \mathbb {N}\)

$$\begin{aligned} \mathrm{VaR} (N, \alpha ) = a + {\bar{g}} ( a ) \sum _{n = 1 }^N \mathcal{D}^{n-1} [ {\bar{g}}] ( a ) \frac{(\alpha - c_a ) ^n}{n!} , \end{aligned}$$

so that \(\lim _{N\rightarrow \infty } \mathrm{VaR} (N, \alpha ) = \mathrm{VaR} (\alpha )\).

Example 2

Reconsider the tandem network from Sect. 3.1. Let \(\tau _2\) be normally distributed with mean \(\mu \) and standard deviation \(\sigma \) but truncated on interval \([\gamma _l,\gamma _r]\) where \(0 \le \gamma _l < \gamma _r\). See (14) for the density g(y) of the throughput. In order to approximate the VaR by Theorem 1, where a is chosen to be \(\frac{\eta _3 \rho _2}{\rho _2 + \gamma _r}\) so that \(c_a = 0\) (note that this is allowed since \(g(\frac{\eta _3 \rho _2}{\rho _2 + \gamma _r}) \ne 0\)), we will compute the series using the computer algebra algorithm provided in Dominici (2003). Using notation \(\bar{g}(y) = 1/g(y)\) it holds that

$$\begin{aligned} \bar{g}(y) = \frac{ \varDelta _\varPhi \sigma \sqrt{2\pi } y^2 }{\eta _3 \rho _2 } \exp \left( \frac{ \left( \frac{\eta _3 \rho _2}{y} - \rho _2 - \mu \right) ^2 }{ 2 \sigma ^2 } \right) , \end{aligned}$$

for \(\frac{\eta _3 \rho _2}{\rho _2 + \gamma _r} \le y \le \frac{\eta _3 \rho _2}{\rho _2 + \gamma _l}\).

It follows from Maple calculations that \(\mathrm{VaR} (1, \alpha )\), i.e., a VaR series approximation based on 1 term, equals

$$\begin{aligned} \mathrm{VaR} (1, \alpha )&= a + {\bar{g}} ( a ) \alpha \\&= \frac{\rho _2 \eta _3 \left( \varDelta _{\varPhi } \sigma \sqrt{2\pi } \alpha \exp \left( \frac{(\gamma _r-\mu )^2}{2\sigma ^2} \right) + \rho _2+\gamma _r \right) }{\left( \rho _2+\gamma _r \right) ^{2}}. \end{aligned}$$

For the VaR approximation of order 2 we have to add the term

$$\begin{aligned}&{\bar{g}} ( a ) \mathcal{D}^{1} [ {\bar{g}}] ( a ) \frac{\alpha ^2}{2}\\&\quad = \frac{- \eta _3 \rho _2 (\gamma _r^2 + (\rho _2 - \mu )\gamma _r -\rho _2 \mu - 2\sigma ^2) \alpha ^2 \varDelta _\varPhi ^2 2\pi \exp \left( \frac{(\gamma _r-\mu )^2}{\sigma ^2}\right) }{2 (\rho _2 + \gamma _r)^3}. \end{aligned}$$

In general, for the n-th term it holds

$$\begin{aligned} \bar{g}(a) \mathcal{D}^{n-1} [ {\bar{g}}] ( a ) \frac{\alpha ^n}{n!} = \frac{(-1)^{n+1}\sigma ^2 \eta _3 \rho _2 P(n)}{n!(\rho _2 + \gamma _r)} \left( \frac{ \alpha \varDelta _\varPhi \sqrt{2\pi } \exp \left( \frac{(\gamma _r-\mu )^2}{2\sigma ^2}\right) }{\sigma (\rho _2 + \gamma _r)} \right) ^n, \end{aligned}$$

where P(n) is a homogeneous polynomial of degree \(2(n-1)\) in variables \(\gamma _r\), \(\sigma \), \(\rho _2\) and \(\mu \). In particular for P(n) with \(n=1,2,3,4\) it holds

Fig. 3
figure 3

Plot of VaR(\(\alpha \)), VaR(\(N,\alpha \)), with \(N=1,2,5,10\), and \(c_a=0\). Numerical values: \(\rho _2 = 1.1\), \(\eta _3 = 1.5\), \(\mu = 1\), \(\sigma = 1.5\) and \([\gamma _l,\gamma _r] = [0.1, 2.75]\) so that \(E[\tau _2] = 1.3259\) and \(\text{ VaR }(\tau _2) = 0.52134\)

$$\begin{aligned} P(1)&= 1 \\ P(2)&= \gamma _r^2+(\rho _2-\mu )\gamma _r-2\sigma ^2-\rho _2 \mu \\ P(3)&= 2\gamma _r^4+(4\rho _2-4\mu )\gamma _r^3+(-5\sigma ^2+2\rho _2^2-8\rho _2\mu +2\mu ^2)\gamma _r^2+(-4\sigma ^2\rho _2\\&\quad +\,6\sigma ^2\mu -4\rho _2^2\mu +4\rho _2\mu ^2)\gamma _r+6\sigma ^4+\sigma ^2\rho _2^2\\&\quad +\,6\sigma ^2\rho _2\mu +2\rho _2^2\mu ^2 \\ P(4)&= 6\gamma _r^6+(18\rho _2-18\mu )\gamma _r^5+(-15\sigma ^2+\,18\rho _2^2-54\rho _2\mu +18\mu ^2)\gamma _r^4\\&\quad +(-23\sigma ^2\rho _2 +\,37\sigma ^2\mu +6\rho _2^3-54\rho _2^2\mu +54\rho _2\mu ^2-6\mu ^3)\gamma _r^3\\&\quad +(28\sigma ^4-\sigma ^2\rho _2^2+\,67\sigma ^2\rho _2\mu -22\sigma ^2\mu ^2-18\rho _2^3\mu +54\rho _2^2\mu ^2-18\rho _2\mu ^3)\gamma _r^2\\&\quad +\,(20\sigma ^4\rho _2-36\sigma ^4\mu +7\sigma ^2\rho _2^3+23\sigma ^2\rho _2^2\mu -44\sigma ^2\rho _2\mu ^2\\&\quad +\,18\rho _2^3\mu ^2-18\rho _2^2\mu ^3)\gamma _r-24\sigma ^6-8\sigma ^4\rho _2^2-36\sigma ^4\rho _2\mu -7\sigma ^2\rho _2^3\mu \\&\quad -\,22\sigma ^2\rho _2^2\mu ^2-6\rho _2^3\mu ^3. \end{aligned}$$

Figure 3 provides a numerical example which illustrates that the VaR series with a few terms already yields an accurate approximation for VaR(\(\alpha \)) for \(\alpha \) between 0 and 0.1. For the tandem network parameters we take \(a=3/5\), \(b=1/2\), \(c=1/2\), \(t=15/8\), \(\mu _1=\mu _3=2\), so that \(\lambda _1 = 9/8\), \(\lambda _3=3/4\) and \(\mu _2=3/2\). From the traffic equations it follows that \(\eta _3 = 1.5\) for this this parameter setting. Regarding the parameter uncertainty parameters, we let \(\rho _2 = 1.1\), \(\mu = 1\), \(\sigma = 1.5\) and \([\gamma _l,\gamma _r] = [0.1, 2.75]\) so that \(\mu _{\tau _2} = E[\tau _2] = 1.3259\) and \(\text{ VaR }(\tau _2) = 0.52134\). Furthermore, the example illustrates that significant risk is ignored when taking \(h( \mu _{\tau _2} )\) as measure for the throughput. Specifically, \(h(\mu _{\tau _2}) \approx 0.68\) whereas with probability 0.2 the actual throughput is approximately smaller than 0.52 (a difference of at least \(23.5\%\)) and with probability 0.1 the actual throughput is approximately smaller than 0.48 (a difference of at least \(29.4\%\)).

In case one is interested in VaR(\(\alpha \)) for \(\alpha \) around 0.3, Fig. 3 shows that poor approximations are obtained via the series, even when using 10 terms for the series. The approximation for VaR(\(\alpha \)) with \(\alpha \) around 0.3 can be improved by choosing a in condition (iii) of Theorem 1 greater than \(b_l = \frac{\eta _3 \rho _2}{\rho _2 + \gamma _r}\) such that \(c_a\) lies near 0.3. The downside is that this approach requires numerical evaluation of \(c_a\) and the search for an appropriate a. But after this numerical burden, the series for VaR(\(\alpha \)) from Theorem 1 provides an accurate and efficient approximation for VaR(\(\alpha \)) with \(\alpha \) in a relative large interval around \(c_a\). As example, Fig. 4 shows for the same instance as in Fig. 3 that choosing a such that \(c_a \approx 0.3\) leads to accurate approximations for VaR(\(\alpha \)) with \(\alpha \in (0.15, 0.45)\) even for a small number of series terms.

Fig. 4
figure 4

Plot of VaR(\(\alpha \)), VaR(\(N,\alpha \)), with \(N=1,2,5,10\), and \(c_a \approx 0.3\). Numerical values: \(\rho _2 = 1.1\), \(\eta _3 = 1.5\), \(\mu = 1\), \(\sigma = 1.5\) and \([\gamma _l,\gamma _r] = [0.1, 2.75]\) so that \(E[\tau _2] = 1.3259\) and \(\text{ VaR }(\tau _2) = 0.52134\)

5 Conclusion

In this paper we have argued the importance of robustness analysis in case of parameter uncertainty in queuing models. For generalized Jackson networks we have provided a framework for evaluating numerically the value at risk incurred by parameter uncertainty. Future research includes uncertainty analysis of multiple parameters and further development of our risk analysis framework. In additional topic of further research is to provide a numerically efficient bound for the remainder of the series approximation of the value at risk.