Delay asymptotics and bounds for multitask parallel jobs

Wang, Weina; Harchol-Balter, Mor; Jiang, Haotian; Scheller-Wolf, Alan; Srikant, R.

doi:10.1007/s11134-018-09597-5

Delay asymptotics and bounds for multitask parallel jobs

Published: 16 January 2019

Volume 91, pages 207–239, (2019)
Cite this article

Queueing Systems Aims and scope Submit manuscript

Weina Wang¹^nAff2,
Mor Harchol-Balter²,
Haotian Jiang³,
Alan Scheller-Wolf⁴ &
…
R. Srikant¹

440 Accesses
10 Citations
Explore all metrics

Abstract

We study delay of jobs that consist of multiple parallel tasks, which is a critical performance metric in a wide range of applications such as data file retrieval in coded storage systems and parallel computing. In this problem, each job is completed only when all of its tasks are completed, so the delay of a job is the maximum of the delays of its tasks. Despite the wide attention this problem has received, tight analysis is still largely unknown since analyzing job delay requires characterizing the complicated correlation among task delays, which is hard to do. We first consider an asymptotic regime where the number of servers, n, goes to infinity, and the number of tasks in a job, $k^{(n)}$, is allowed to increase with n. We establish the asymptotic independence of any $k^{(n)}$ queues under the condition $k^{(n)}= o(n^{1/4})$. This greatly generalizes the asymptotic independence type of results in the literature, where asymptotic independence is shown only for a fixed constant number of queues. As a consequence of our independence result, the job delay converges to the maximum of independent task delays. We next consider the non-asymptotic regime. Here, we prove that independence yields a stochastic upper bound on job delay for any n and any $k^{(n)}$ with $k^{(n)}\le n$. The key component of our proof is a new technique we develop, called “Poisson oversampling.” Our approach converts the job delay problem into a corresponding balls-and-bins problem. However, in contrast with typical balls-and-bins problems where there is a negative correlation among bins, we prove that our variant exhibits positive correlation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Article 17 January 2019

Algorithms for Scheduling Deadline-Sensitive Malleable Tasks

Article 01 April 2024

Symmetric Markov Processes with Tightness Property

Notes

We thank Prof. Alexander Stolyar for suggesting this possible approach.

References

Baccelli, F.: Two parallel queues created by arrivals with two demands: the M/G/2 symmetrical case. Technical Report RR-0426, INRIA (1985)
Baccelli, F., Makowski, A.M.: Simple computable bounds for the fork–join queue. Technical Report RR-0394, INRIA (1985)
Baccelli, F., Makowski, A.M., Shwartz, A.: The fork–join queue and related systems with synchronization constraints: stochastic ordering and computable bounds. Adv. Appl. Probab. 21, 629–660 (1989)
Article Google Scholar
Bramson, M., Lu, Y., Prabhakar, B.: Asymptotic independence of queues under randomized load balancing. Queueing Syst. 71(3), 247–292 (2012)
Article Google Scholar
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc. VLDB Endow. 5(12), 1802–1813 (2012)
Article Google Scholar
Cox, J.T.: An alternate proof of a correlation inequality of Harris. Ann. Probab. 12(1), 272–273 (1984)
Article Google Scholar
DasGupta, A.: Asymptotic Theory of Statistics and Probability. Springer, Berlin (2008)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the USENIX Conference Operating Systems Design and Implementation (OSDI), San Francisco, CA, pp. 10–10 (2004)
Esary, J.D., Proschan, F., Walkup, D.W.: Association of random variables, with applications. Ann. Math. Stat. 38(5), 1466–1474 (1967)
Article Google Scholar
Farhat, F., Tootaghaj, D., He, Y., Sivasubramaniam, A., Kandemir, M., Das, C.: Stochastic modeling and optimization of stragglers. IEEE Trans. Cloud Comput. (2016) (to be published)
Flatto, L., Hahn, S.: Two parallel queues created by arrivals with two demands I. SIAM J. Appl. Math. 44(5), 1041–1053 (1984)
Article Google Scholar
Fortuin, C.M., Kasteleyn, P.W., Ginibre, J.: Correlation inequalities on some partially ordered sets. Commun. Math. Phys. 22(2), 89–103 (1971)
Article Google Scholar
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A.: A better model for job redundancy: decoupling server slowdown and job size. In: IEEE International Symposium Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), London, United Kingdom, pp. 1–10 (2016)
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Van Houdt, B.: A better model for job redundancy: decoupling server slowdown and job size. IEEE/ACM Trans. Netw. 25(6), 3353–3367 (2017a)
Article Google Scholar
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Velednitsky, M., Zbarsky, S.: Redundancy-d: the power of d choices for redundancy. Oper. Res. 65(4), 1078–1094 (2017b)
Article Google Scholar
Graham, C.: Chaoticity on path space for a queueing network with selection of the shortest queue among several. J. Appl. Probab. 37(1), 198–211 (2000)
Article Google Scholar
Graham, C., Méléard, S.: Propagation of chaos for a fully connected loss network with alternate routing. Stoch. Proc. Appl. 44(1), 159–180 (1993)
Article Google Scholar
Harchol-Balter, M.: Performance Modeling and Design of Computer Systems: Queueing Theory in Action, 1st edn. Cambridge University Press, New York (2013)
Google Scholar
Harris, T.E.: A correlation inequality for Markov processes in partially ordered state spaces. Ann. Probab. 5(3), 451–454 (1977)
Article Google Scholar
Joag-Dev, K., Proschan, F.: Negative association of random variables with applications. Ann. Stat. 11(1), 286–295 (1983)
Article Google Scholar
Joshi, G., Liu, Y., Soljanin, E.: Coding for fast content download. In: Proceedings of the Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, pp. 326–333 (2012)
Joshi, G., Soljanin, E., Wornell, G.: Efficient redundancy techniques for latency reduction in cloud systems. ACM Trans. Model. Perform. Eval. Comput. Syst. 2(2), 12:1–12:30 (2017)
Article Google Scholar
Ko, S.S., Serfozo, R.F.: Sojourn times in G/M/1 fork–join networks. Nav. Res. Log. 55(5), 432–443 (2008)
Article Google Scholar
Kumar, A., Shorey, R.: Performance analysis and scheduling of stochastic fork-join jobs in a multicomputer system. IEEE Trans. Parallel Distrib. Syst. 4(10), 1147–1164 (1993)
Article Google Scholar
Lee, K., Shah, N.B., Huang, L., Ramchandran, K.: The MDS queue: analysing the latency performance of erasure codes. IEEE Trans. Inf. Theory 63(5), 2822–2842 (2017)
Google Scholar
Li, B., Ramamoorthy, A., Srikant, R.: Mean-field-analysis of coding versus replication in cloud storage systems. In: Proceedings IEEE International Conference on Computer Communications (INFOCOM), San Francisco, CA, pp. 1–9 (2016)
Liggett, T.M.: Interacting Particle Systems. Springer, Berlin (2005)
Book Google Scholar
Lin, M., Zhang, L., Wierman, A., Tan, J.: Joint optimization of overlapping phases in MapReduce. Perform. Eval. 70(10), 720–735 (2013)
Article Google Scholar
Lu, H., Pang, G.: Heavy-traffic limits for an infinite-server fork–join queueing system with dependent and disruptive services. Queueing Syst. 85(1), 67–115 (2017)
Article Google Scholar
Lui, J.C., Muntz, R.R., Towsley, D.: Computing performance bounds of fork–join parallel programs under a multiprocessing environment. IEEE Trans. Parallel Distrib. Syst. 9(3), 295–311 (1998)
Article Google Scholar
Melamed, B., Whitt, W.: On arrivals that see time averages. Oper. Res. 38(1), 156–172 (1990)
Article Google Scholar
Meyn, S.P., Tweedie, R.L.: Stability of Markovian processes I: criteria for discrete-time chains. Adv. Appl. Probab. 24(3), 542–574 (1992)
Article Google Scholar
Meyn, S.P., Tweedie, R.L.: Stability of Markovian processes III: Foster–Lyapunov criteria for continuous-time processes. Adv. Appl. Probab. 25(3), 518–548 (1993)
Article Google Scholar
Moseley, B., Dasgupta, A., Kumar, R., Sarlós, T.: On scheduling in map-reduce and flow-shops. In: Proceedings of the Annual ACM Symposium Parallelism in Algorithms and Architectures (SPAA), San Jose, CA, pp. 289–298 (2011)
Nelson, R., Tantawi, A.N.: Approximate analysis of fork/join synchronization in parallel queues. IEEE Trans. Comput. 37(6), 739–743 (1988)
Article Google Scholar
Nelson, R., Towsley, D., Tantawi, A.N.: Performance analysis of parallel processing systems. IEEE Trans. Softw. Eng. 14(4), 532–540 (1988)
Article Google Scholar
Rizk, A., Poloczek, F., Ciucu, F.: Stochastic bounds in fork–join queueing systems under full and partial mapping. Queueing Syst. 83(3), 261–291 (2016)
Article Google Scholar
Royden, H.L., Fitzpatrick, P.M.: Real Analysis, 4th edn. Pearson, London (2010)
Google Scholar
Shah, N.B., Lee, K., Ramchandran, K.: When do redundant requests reduce latency? In: Proceedings of the Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, pp. 731–738 (2013)
Shah, V., Bouillard, A., Baccelli, F.: Delay comparison of delivery and coding policies in data clusters. In: Proceedings of the Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, pp. 397–404 (2017)
Sun, Y., Koksal, C.E., Shroff, N.B.: Near delay-optimal scheduling of batch jobs in multi-server systems. The Ohio State University. Technical Report (2017)
Tan, J., Meng, X., Zhang, L.: Delay tails in MapReduce scheduling. In: Proceedings of the ACM SIGMETRICS/PERFORMANCE Jt. International Conference on Measurement and Modeling of Computer Systems, London, United Kingdom, pp. 5–16 (2012)
Thomasian, A.: Analysis of fork/join and related queueing systems. ACM Comput. Surv. 47(2), 17:1–17:71 (2014)
Article Google Scholar
Varki, E.: Response time analysis of parallel computer and storage systems. IEEE Trans. Parallel Distrib. Syst. 12(11), 1146–1161 (2001)
Article Google Scholar
Vianna, E., Comarela, G., Pontes, T., Almeida, J., Almeida, V., Wilkinson, K., Kuno, H., Dayal, U.: Analytical performance models for MapReduce workloads. Int. J. Parallel Prog. 41(4), 495–525 (2013)
Article Google Scholar
Vulimiri, A., Michel, O., Godfrey, P.B., Shenker, S.: More is less: reducing latency via redundancy. In: Proceedings of the ACM Workshop Hot Topics in Networks (HotNets), Redmond, WA, pp. 13–18 (2012)
Wang, W., Zhu, K., Ying, L., Tan, J., Zhang, L.: MapTask scheduling in MapReduce with data locality: throughput and heavy-traffic optimality. IEEE/ACM Trans. Netw. 24, 190–203 (2016)
Article Google Scholar
Xia, C.H., Liu, Z., Towsley, D., Lelarge, M.: Scalability of fork/join queueing networks with blocking. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, San Diego, CA, pp. 133–144 (2007)
Xiang, Y., Lan, T., Aggarwal, V., Chen, Y.F.R.: Joint latency and cost optimization for erasure-coded data center storage. IEEE/ACM Trans. Netw. 24(4), 2443–2457 (2016)
Article Google Scholar
Xie, Q., Lu, Y.: Priority algorithm for near-data scheduling: throughput and heavy-traffic optimality. In: Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Hong Kong, China, pp. 963–972 (2015)
Xie, Q., Dong, X., Lu, Y., Srikant, R.: Power of d choices for large-scale bin packing: a loss model. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Portland, OR, pp. 321–334 (2015)
Ying, L., Srikant, R., Kang, X.: The power of slightly more than one sample in randomized load balancing. In: Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Kowloon, Hong Kong, pp. 1131–1139 (2015)
Zheng, Y., Shroff, N.B., Sinha, P.: A new analytical technique for designing provably efficient MapReduce schedulers. In: Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Turin, Italy, pp. 1600–1608 (2013)

Download references

Acknowledgements

This work was supported in part by National Science Foundation Grants CPS ECCS-1739189, ECCS-1609370, XPS-1629444, and CMMI-1538204, the US Army Research Office (ARO Grant No. W911NF-16-1-0259), the US Office of Naval Research (ONR Grant No. N00014-15-1-2169), DTRA under the Grant Number HDTRA1-16-0017, and a 2018 Faculty Award from Microsoft. Additionally, Haotian Jiang was supported in part by the Department of Physics at Tsinghua University.

Author information

Weina Wang
Present address: Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA

Authors and Affiliations

Coordinated Science Lab, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Weina Wang & R. Srikant
Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA
Mor Harchol-Balter
Department of Physics, Tsinghua University, Beijing, China
Haotian Jiang
Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, USA
Alan Scheller-Wolf

Authors

Weina Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mor Harchol-Balter
View author publications
You can also search for this author in PubMed Google Scholar
Haotian Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Alan Scheller-Wolf
View author publications
You can also search for this author in PubMed Google Scholar
R. Srikant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weina Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Lemma 3

Lemma 3 (Restated)

$$\begin{aligned} d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})},\hat{\pi }^{(k^{(n)})}\Bigr )=O\Biggl (\biggl (\frac{k^{(n)}}{n^{1/4}}\biggr )^2\Biggr ). \quad {(17)\, (\mathrm{Restated})} \end{aligned}$$

Proof

This proof has a similar flavor to the proofs of Lemmas 1 and 2. Recall that $\left( \widetilde{\varvec{W}}^{(n,k^{(n)})}(t),t\ge 0\right) $, the workload processes of the first $k^{(n)}$ queues in the system $\widetilde{\mathcal {S}}^{(n)}$, are $k^{(n)}$ independent M/G/1 queues each with arrival rate $\widetilde{\lambda }^{(n)}$ and service time distribution G. We couple this with $\left( \hat{\varvec{W}}^{(k^{(n)})}(t),t\ge 0\right) $, where $\hat{\varvec{W}}^{(k^{(n)})}(t)=\bigl (\hat{W}_1(t),\dots ,\hat{W}_{k^{(n)}}(t)\bigr )$ is the workload vector of $k^{(n)}$ independent M/G/1 queues each with arrival rate $\lambda $ and service time distribution G. Then, $\hat{\pi }^{(k^{(n)})}$ is its stationary distribution. We will prove the bound on $d_{TV}\bigl (\widetilde{\pi }^{(n,k^{(n)})},\hat{\pi }^{(k^{(n)})}\bigr )$ by showing that $\left( \widetilde{\varvec{W}}^{(n,k^{(n)})}(t),t\ge 0\right) $ and $\left( \hat{\varvec{W}}^{(k^{(n)})}(t),t\ge 0\right) $ are close.

Now we specify the coupling. All the queues start from empty, i.e., $\hat{W}_i(0)=\widetilde{W}^{(n)}_i(0)=0$ for all $i=1,2,\dots ,k^{(n)}$. When there is a task arrival to some queue of $\hat{\varvec{W}}^{(k^{(n)})}$, we let a task arrive to the corresponding queue of $\widetilde{\varvec{W}}^{(n,k^{(n)})}$ with probability $\frac{\widetilde{\lambda }^{(n)}}{\lambda }$, and let these two tasks require the same service time. So with probability $1-\frac{\widetilde{\lambda }^{(n)}}{\lambda }$ there is no task arrival to $\widetilde{\varvec{W}}^{(n,k^{(n)})}$.

We pick a time $\tau ^{(n)}=O\Bigl (\frac{n^{1/2}}{k^{(n)}}\Bigr )$. Let $\hat{\pi }^{(k^{(n)})}_{\tau ^{(n)}}$ denote the distribution of $\hat{\varvec{W}}^{(k^{(n)})}(\tau ^{(n)})$. Then,

$$\begin{aligned} d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})},\hat{\pi }^{(k^{(n)})}\Bigr )\le & {} d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})}_{\tau ^{(n)}},\hat{\pi }^{(k^{(n)})}_{\tau ^{(n)}}\Bigr )\\&+\,d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})}_{\tau ^{(n)}},\widetilde{\pi }^{(n,k^{(n)})}\Bigr )+d_{TV}\Bigl (\hat{\pi }^{(k^{(n)})}_{\tau ^{(n)}},\hat{\pi }^{(k^{(n)})}\Bigr ). \end{aligned}$$

Noting Lemma 2, we have

$$\begin{aligned} d_{TV}\Bigl (\widetilde{\pi }_{\tau ^{(n)}}^{(n,k^{(n)})},\widetilde{\pi }^{(n,k^{(n)})}\Bigr )&=O\Biggl (\biggl (\frac{k^{(n)}}{n^{1/4}}\biggr )^2\Biggr ),\end{aligned}$$

(33)

$$\begin{aligned} d_{TV}\Bigl (\hat{\pi }_{\tau ^{(n)}}^{(k^{(n)})},\hat{\pi }^{(k^{(n)})}\Bigr )&=O\Biggl (\biggl (\frac{k^{(n)}}{n^{1/4}}\biggr )^2\Biggr ). \end{aligned}$$

(34)

Next we bound $d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})}_{\tau ^{(n)}},\hat{\pi }^{(k^{(n)})}_{\tau ^{(n)}}\Bigr )$ using arguments similar to those in the proof of Lemma 1. By the coupling, $\hat{\varvec{W}}^{(k^{(n)})}(t)$ and $\widetilde{\varvec{W}}^{(n,k^{(n)})}(t)$ are different for some $t\in [0,\tau ^{(n)}]$ only when some task arrives to $\hat{\varvec{W}}^{(k^{(n)})}$ but not to $\widetilde{\varvec{W}}^{(n,k^{(n)})}$. We denote this event by $\mathcal {E}$. Then,

$$\begin{aligned} d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})}_{\tau ^{(n)}},\hat{\pi }^{(k^{(n)})}_{\tau ^{(n)}}\Bigr )\le \mathbb {P}(\mathcal {E}). \end{aligned}$$

So the remainder of this proof is dedicated to bounding $\mathbb {P}(\mathcal {E})$.

Consider the time interval $[0,\tau ^{(n)}]$. Let A be the number of task arrivals to $\hat{\varvec{W}}^{(k^{(n)})}$ during this time interval. Then,

$$\begin{aligned} \mathbb {P}(\mathcal {E})&=\sum _{j=0}^{\infty }\mathbb {P}(A=j)\mathbb {P}(\mathcal {E}\mid A=j)\nonumber \\&\le \sum _{j=0}^{\infty }\frac{(\lambda k^{(n)}\tau ^{(n)})^je^{-k^{(n)}\lambda \tau ^{(n)}}}{j!}j\biggl (1-\frac{\widetilde{\lambda }^{(n)}}{\lambda }\biggr ) \nonumber \\&= k^{(n)}\tau ^{(n)}(\lambda -\widetilde{\lambda }^{(n)}), \end{aligned}$$

(35)

where we have used a union bound for (35). By definition,

$$\begin{aligned} \widetilde{\lambda }^{(n)}&=\frac{\varLambda ^{(n)}}{ k^{(n)}}\Biggl (1-\frac{\left( {\begin{array}{c}n-k^{(n)}\\ k^{(n)}\end{array}}\right) }{\left( {\begin{array}{c}n\\ k^{(n)}\end{array}}\right) }\Biggr )\\&\ge \frac{\varLambda ^{(n)}}{ k^{(n)}}\Biggl (1-\biggl (1-\frac{k^{(n)}}{n}\biggr )^{k^{(n)}}\Biggr )\\&=\frac{\varLambda ^{(n)}}{ k^{(n)}}\biggl (\frac{(k^{(n)})^2}{n}+O\biggl (\frac{(k^{(n)})^4}{n^2}\biggr )\biggr )\\&=\lambda +O\biggl (\frac{(k^{(n)})^2}{n}\biggr ). \end{aligned}$$

Therefore,

$$\begin{aligned} \mathbb {P}(\mathcal {E})&= O\Biggl (\biggl (\frac{k^{(n)}}{n^{1/4}}\biggr )^2\Biggr ), \end{aligned}$$

which completes the proof. $\square $

Appendix B: Proof of Corollary 1

Corollary

1 (Restated) Consider an n-server system in the limited fork–join model with $k^{(n)}=o(n^{1/4})$, job arrival rate $\varLambda ^{(n)}=n\lambda /k^{(n)}$, and exponentially distributed service times with mean $1/\mu $. Then, the steady-state job delay, $T^{(n)}$, converges as:

$$\begin{aligned} \lim _{n\rightarrow \infty }\sup _{\tau \ge 0}\left| \mathbb {P}\bigl (T^{(n)}\le \tau \bigr )-\left( 1-e^{-(\mu -\lambda )\tau }\right) ^{k^{(n)}}\right| =0. \quad {(8)\,(\mathrm{Restated})} \end{aligned}$$

Specifically, if $k^{(n)}\rightarrow \infty $ as $n\rightarrow \infty $, then

$$\begin{aligned} \frac{T^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}\Rightarrow 1,\quad \text {as }n\rightarrow \infty , \quad {(9)\, (\mathrm{Restated})} \end{aligned}$$

where $H_{k^{(n)}}$ is the $k^{(n)}$-th harmonic number, and further,

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}\bigl [T^{(n)}\bigr ]}{H_{k^{(n)}}/(\mu -\lambda )}=1. \quad {(10)\,(\mathrm{Restated})} \end{aligned}$$

Proof

When the service times are exponentially distributed, each queue is an M/M/1 queue, and thus the cdf of the task delay at each queue, F, is given by

$$\begin{aligned} F(\tau )=1-e^{-(\mu -\lambda )\tau }. \end{aligned}$$

Then, the convergence in (8) directly follows from Theorem 1.

To prove the weak convergence of $\frac{T^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}$ in (9), we first note that

$$\begin{aligned} \frac{\hat{T}^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}\Rightarrow 1,\quad \text {as }n\rightarrow \infty , \end{aligned}$$

which is a direct implication of a standard result in the asymptotic theory of extremes (see, for example, Theorem 8.12 in [7]). Combining this with (8) yields (9).

To prove the convergence of the expectation in (10), we actually need the stochastic dominance shown in Theorem 2. The expectation in (10) can be written as

$$\begin{aligned} \frac{\mathbb {E}\bigl [T^{(n)}\bigr ]}{H_{k^{(n)}}/(\mu -\lambda )}&=\int _{0}^{\infty }\mathbb {P}\biggl (\frac{T^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}> \tau \biggr )\mathrm{d}\tau . \end{aligned}$$

By Theorem 2, for any $\tau \ge 0$,

$$\begin{aligned} \mathbb {P}\biggl (\frac{T^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}> \tau \biggr )\le \mathbb {P}\biggl (\frac{\hat{T}^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}> \tau \biggr ). \end{aligned}$$

Since

$$\begin{aligned} \frac{\hat{T}^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}\Rightarrow 1,\quad \text {as }n\rightarrow \infty , \end{aligned}$$

and

$$\begin{aligned} \frac{\mathbb {E}\bigl [\hat{T}^{(n)}\bigr ]}{H_{k^{(n)}}/(\mu -\lambda )}=\int _{0}^{\infty }\mathbb {P}\biggl (\frac{\hat{T}^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}> \tau \biggr )\mathrm{d}\tau =1, \end{aligned}$$

by the General Lebesgue Dominated Convergence Theorem (see, for example, Theorem 19 in [38]), we can take the limit inside the integral and, using (9), get

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\mathbb {E}\bigl [T^{(n)}\bigr ]}{H_{k^{(n)}}/(\mu -\lambda )}&=\int _{0}^{\infty }\lim _{n\rightarrow \infty }\mathbb {P}\biggl (\frac{T^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}> \tau \biggr )\mathrm{d}\tau \\&=\int _{0}^1 1 \mathrm{d}\tau \\&=1, \end{aligned}$$

which completes the proof. $\square $

Appendix C: Non-independence result for $k^{(n)}=\varvec{\varTheta }(n)$

Theorem 3

Consider an n-server system in the limited fork–join model with $k^{(n)}=\varTheta (n)$, job arrival rate $\varLambda ^{(n)}=n\lambda /k^{(n)}$, and exponentially distributed service times with rate $\mu $. Let $\pi ^{(n,2)}$ denote the joint distribution of the steady-state queue lengths for any two queues in the n-server system. Let $\hat{\pi }^{(2)}$ denote the joint distribution of the steady-state queue lengths of two independent M/M/1 queues, each with load $\rho $. Then, there exist $\epsilon >0$ and $n_0>0$ such that, for any $n>n_0$, $d_{TV}\bigl (\pi ^{(n,2)}, \hat{\pi }^{(2)}\bigr )>\epsilon $.

Proof

We assume that $k^{(n)}= p n$ for a constant p with $0<p\le 1$. Then the job arrival rate is given by $\varLambda ^{(n)}= \lambda /p$, which is a constant, so we rewrite $\varLambda ^{(n)}$ as $\varLambda $ for conciseness.

Let $\epsilon =\frac{p\lambda (1-\rho )^2}{2(11\varLambda +8\mu )}$. We will specify $n_0$ later. Suppose by contradiction that $d_{TV}\bigl (\pi ^{(n,2)},\hat{\pi }^{(2)}\bigr )\le \epsilon $ for all $n>n_0$. We will show that this assumption contradicts the balance equations of the first two queues in the limited fork–join system with n servers.

We first write out the balance equations for the Markov chain formed by the queue lengths of the first two queues. Consider a job arrival to this n-server system. Let $p_0^{(n)}$ be the probability that no task arrives to the first two queues, and $p_1^{(n)}$ be the probability that exactly one task arrives to the first two queues. Let $p_2^{(n)}=1-p_0^{(n)}-p_1^{(n)}$ be the probability that two tasks arrive to the first two queues. We can compute these probabilities as follows:

$$\begin{aligned} p_0^{(n)}&= \left( {\begin{array}{c}n-2\\ k\end{array}}\right) \Bigm / \left( {\begin{array}{c}n\\ k\end{array}}\right) \rightarrow p_0:=(1-p)^2 \quad \text {as }n\rightarrow \infty ,\\ p_1^{(n)}&= \frac{2\left( {\begin{array}{c}n-2\\ k-1\end{array}}\right) }{\left( {\begin{array}{c}n\\ k\end{array}}\right) } \rightarrow p_1:=2p(1-p) \quad \text {as }n \rightarrow \infty ,\\ p_2^{(n)}&= \frac{\left( {\begin{array}{c}n-2\\ k-2\end{array}}\right) }{\left( {\begin{array}{c}n\\ k\end{array}}\right) } \rightarrow p_2:=p^2 \quad \text {as }n \rightarrow \infty . \end{aligned}$$

Recall that the joint distribution of the steady-state queue lengths of the first two queues is $\pi ^{(n,2)}$. Then, the balance equation of the first two queues for the state (1, 1) can be written as

$$\begin{aligned} 0&=\pi ^{(n,2)}(1,1) \cdot (p_1^{(n)}\varLambda +p_2^{(n)}\varLambda +2\mu )\nonumber \\&\quad -\biggl (\frac{1}{2}\pi ^{(n,2)}(0,1)p_1^{(n)}\varLambda +\frac{1}{2}\pi ^{(n,2)}(1,0)p_1^{(n)}\varLambda \nonumber \\&\quad +\pi ^{(n,2)}(0,0)p_2^{(n)}\varLambda +\pi ^{(n,2)}(1,2)\mu +\pi ^{(n,2)}(2,1)\mu \biggr ). \end{aligned}$$

(36)

Let the right-hand side of (36) be denoted by $\mathcal {R}(\pi ^{(n,2)})$. Let

$$\begin{aligned} a_1&=(p_1^{(n)}-p_1)\varLambda \biggl (\pi ^{(n,2)}(1,1)-\frac{1}{2}\pi ^{(n,2)}(0,1)-\frac{1}{2}\pi ^{(n,2)}(1,0)\biggr )\\&\quad +(p_2^{(n)}-p_2)\varLambda \biggl (\pi ^{(n,2)}(1,1)-\frac{1}{2}\pi ^{(n,2)}(0,1)\\&\quad -\frac{1}{2}\pi ^{(n,2)}(1,0)-\pi ^{(n,2)}(0,0)\biggr ),\\ a_2&=(\pi ^{(n,2)}(1,1)-\hat{\pi }^{(2)}(1,1))(p_1\varLambda +p_2\varLambda +2\mu )\\&\quad -\frac{1}{2}(\pi ^{(n,2)}(0,1)-\hat{\pi }^{(2)}(0,1)+\pi ^{(n,2)}(1,0)-\hat{\pi }^{(2)}(1,0))p_1\varLambda \\&\quad -(\pi ^{(n,2)}(0,0)-\hat{\pi }^{(2)}(0,0))p_2\varLambda \\&\quad -(\pi ^{(n,2)}(1,2)-\hat{\pi }^{(2)}(1,2)+\pi ^{(n,2)}(2,1)-\hat{\pi }^{(2)}(2,1))\mu . \end{aligned}$$

Then since $\hat{\pi }^{(2)}(q_1,q_2)=(1-\rho )^2\rho ^{q_1+q_2}$ for any $(q_1,q_2)\in \mathbb {Z}_+^2$,

$$\begin{aligned} \mathcal {R}(\pi ^{(n,2)})&=a_1+a_2+\hat{\pi }^{(2)}(1,1) \cdot (p_1\varLambda +p_2\varLambda +2\mu )\nonumber \\&\quad -\biggl (\frac{1}{2}\hat{\pi }^{(2)}(0,1)p_1\varLambda +\frac{1}{2}\hat{\pi }^{(2)}(1,0)p_1\varLambda \nonumber \\&\quad +\hat{\pi }^{(2)}(0,0)p_2\varLambda +\hat{\pi }^{(2)}(1,2)\mu +\hat{\pi }^{(2)}(2,1)\mu \biggr )\nonumber \\&=a_1+a_2-p \lambda (1-\rho )^4. \end{aligned}$$

(37)

We choose $n_0$ such that, for any $n> n_0$, $|p_1^{(n)}-p_1|\le \epsilon $ and $|p_2^{(n)}-p_2|\le \epsilon $. Then, it is not hard to see that $|a_1|\le 3\varLambda \epsilon $. By the assumption that $d_{TV}\bigl (\pi ^{(n,2)},\hat{\pi }^{(2)}\bigr )\le \epsilon $, we have that $|a_2|\le 8(\varLambda +\mu ) \epsilon $. By the choice of $\epsilon $, $|a_1+a_2|\le (11\varLambda +8\mu )\epsilon =\frac{1}{2}p\lambda (1-\rho )^2$. Therefore, $\mathcal {R}(\pi ^{(n,2)})<0$ by (37), which contradicts the balance equation (36). This completes the proof of Theorem 3. $\square $

Appendix D: Definition and some properties of association

Definition 1

(Association [9]) We say random variables $X_1$, $X_2,\dots ,X_m$ are associated if for all (entrywisely) nondecreasing functions f and g,

$$\begin{aligned}&\mathbb {E}[f(X_1,X_2,\dots ,X_m)g(X_1,X_2,\dots ,X_m)] \nonumber \\&\quad \ge \mathbb {E}[f(X_1,X_2,\dots ,X_m)]\mathbb {E}[g(X_1,X_2,\dots ,X_m)]. \end{aligned}$$

(38)

Lemma 4

[9] Associated random variables have the following properties:

(P1) Nondecreasing functions of associated random variables are associated.
(P2) If two sets of associated random variables are independent of one another, then their union is a set of associated random variables.
(P3) If a sequence of random vectors $\varvec{X}(u)\Rightarrow \varvec{X}$ as $u\rightarrow \infty $ and, for each u, the entries of $\varvec{X}(u)$ are associated, then the entries of $\varvec{X}$ are associated.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, W., Harchol-Balter, M., Jiang, H. et al. Delay asymptotics and bounds for multitask parallel jobs. Queueing Syst 91, 207–239 (2019). https://doi.org/10.1007/s11134-018-09597-5

Download citation

Received: 10 November 2018
Published: 16 January 2019
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s11134-018-09597-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Delay asymptotics and bounds for multitask parallel jobs

Abstract

Access this article

Similar content being viewed by others

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Algorithms for Scheduling Deadline-Sensitive Malleable Tasks

Symmetric Markov Processes with Tightness Property

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proof of Lemma 3

Proof

Appendix B: Proof of Corollary 1

Corollary

Proof

Appendix C: Non-independence result for \(k^{(n)}=\varvec{\varTheta }(n)\)

Theorem 3

Proof

Appendix D: Definition and some properties of association

Definition 1

Lemma 4

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Delay asymptotics and bounds for multitask parallel jobs

Abstract

Access this article

Similar content being viewed by others

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Algorithms for Scheduling Deadline-Sensitive Malleable Tasks

Symmetric Markov Processes with Tightness Property

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proof of Lemma 3

Proof

Appendix B: Proof of Corollary 1

Corollary

Proof

Appendix C: Non-independence result for \(k^{(n)}=\varvec{\varTheta }(n)\)

Theorem 3

Proof

Appendix D: Definition and some properties of association

Definition 1

Lemma 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation