Skip to main content
Log in

Delay asymptotics and bounds for multitask parallel jobs

  • Published:
Queueing Systems Aims and scope Submit manuscript

Abstract

We study delay of jobs that consist of multiple parallel tasks, which is a critical performance metric in a wide range of applications such as data file retrieval in coded storage systems and parallel computing. In this problem, each job is completed only when all of its tasks are completed, so the delay of a job is the maximum of the delays of its tasks. Despite the wide attention this problem has received, tight analysis is still largely unknown since analyzing job delay requires characterizing the complicated correlation among task delays, which is hard to do. We first consider an asymptotic regime where the number of servers, n, goes to infinity, and the number of tasks in a job, \(k^{(n)}\), is allowed to increase with n. We establish the asymptotic independence of any \(k^{(n)}\) queues under the condition \(k^{(n)}= o(n^{1/4})\). This greatly generalizes the asymptotic independence type of results in the literature, where asymptotic independence is shown only for a fixed constant number of queues. As a consequence of our independence result, the job delay converges to the maximum of independent task delays. We next consider the non-asymptotic regime. Here, we prove that independence yields a stochastic upper bound on job delay for any n and any \(k^{(n)}\) with \(k^{(n)}\le n\). The key component of our proof is a new technique we develop, called “Poisson oversampling.” Our approach converts the job delay problem into a corresponding balls-and-bins problem. However, in contrast with typical balls-and-bins problems where there is a negative correlation among bins, we prove that our variant exhibits positive correlation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. We thank Prof. Alexander Stolyar for suggesting this possible approach.

References

  1. Baccelli, F.: Two parallel queues created by arrivals with two demands: the M/G/2 symmetrical case. Technical Report RR-0426, INRIA (1985)

  2. Baccelli, F., Makowski, A.M.: Simple computable bounds for the fork–join queue. Technical Report RR-0394, INRIA (1985)

  3. Baccelli, F., Makowski, A.M., Shwartz, A.: The fork–join queue and related systems with synchronization constraints: stochastic ordering and computable bounds. Adv. Appl. Probab. 21, 629–660 (1989)

    Article  Google Scholar 

  4. Bramson, M., Lu, Y., Prabhakar, B.: Asymptotic independence of queues under randomized load balancing. Queueing Syst. 71(3), 247–292 (2012)

    Article  Google Scholar 

  5. Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc. VLDB Endow. 5(12), 1802–1813 (2012)

    Article  Google Scholar 

  6. Cox, J.T.: An alternate proof of a correlation inequality of Harris. Ann. Probab. 12(1), 272–273 (1984)

    Article  Google Scholar 

  7. DasGupta, A.: Asymptotic Theory of Statistics and Probability. Springer, Berlin (2008)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the USENIX Conference Operating Systems Design and Implementation (OSDI), San Francisco, CA, pp. 10–10 (2004)

  9. Esary, J.D., Proschan, F., Walkup, D.W.: Association of random variables, with applications. Ann. Math. Stat. 38(5), 1466–1474 (1967)

    Article  Google Scholar 

  10. Farhat, F., Tootaghaj, D., He, Y., Sivasubramaniam, A., Kandemir, M., Das, C.: Stochastic modeling and optimization of stragglers. IEEE Trans. Cloud Comput. (2016) (to be published)

  11. Flatto, L., Hahn, S.: Two parallel queues created by arrivals with two demands I. SIAM J. Appl. Math. 44(5), 1041–1053 (1984)

    Article  Google Scholar 

  12. Fortuin, C.M., Kasteleyn, P.W., Ginibre, J.: Correlation inequalities on some partially ordered sets. Commun. Math. Phys. 22(2), 89–103 (1971)

    Article  Google Scholar 

  13. Gardner, K., Harchol-Balter, M., Scheller-Wolf, A.: A better model for job redundancy: decoupling server slowdown and job size. In: IEEE International Symposium Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), London, United Kingdom, pp. 1–10 (2016)

  14. Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Van Houdt, B.: A better model for job redundancy: decoupling server slowdown and job size. IEEE/ACM Trans. Netw. 25(6), 3353–3367 (2017a)

    Article  Google Scholar 

  15. Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Velednitsky, M., Zbarsky, S.: Redundancy-d: the power of d choices for redundancy. Oper. Res. 65(4), 1078–1094 (2017b)

    Article  Google Scholar 

  16. Graham, C.: Chaoticity on path space for a queueing network with selection of the shortest queue among several. J. Appl. Probab. 37(1), 198–211 (2000)

    Article  Google Scholar 

  17. Graham, C., Méléard, S.: Propagation of chaos for a fully connected loss network with alternate routing. Stoch. Proc. Appl. 44(1), 159–180 (1993)

    Article  Google Scholar 

  18. Harchol-Balter, M.: Performance Modeling and Design of Computer Systems: Queueing Theory in Action, 1st edn. Cambridge University Press, New York (2013)

    Google Scholar 

  19. Harris, T.E.: A correlation inequality for Markov processes in partially ordered state spaces. Ann. Probab. 5(3), 451–454 (1977)

    Article  Google Scholar 

  20. Joag-Dev, K., Proschan, F.: Negative association of random variables with applications. Ann. Stat. 11(1), 286–295 (1983)

    Article  Google Scholar 

  21. Joshi, G., Liu, Y., Soljanin, E.: Coding for fast content download. In: Proceedings of the Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, pp. 326–333 (2012)

  22. Joshi, G., Soljanin, E., Wornell, G.: Efficient redundancy techniques for latency reduction in cloud systems. ACM Trans. Model. Perform. Eval. Comput. Syst. 2(2), 12:1–12:30 (2017)

    Article  Google Scholar 

  23. Ko, S.S., Serfozo, R.F.: Sojourn times in G/M/1 fork–join networks. Nav. Res. Log. 55(5), 432–443 (2008)

    Article  Google Scholar 

  24. Kumar, A., Shorey, R.: Performance analysis and scheduling of stochastic fork-join jobs in a multicomputer system. IEEE Trans. Parallel Distrib. Syst. 4(10), 1147–1164 (1993)

    Article  Google Scholar 

  25. Lee, K., Shah, N.B., Huang, L., Ramchandran, K.: The MDS queue: analysing the latency performance of erasure codes. IEEE Trans. Inf. Theory 63(5), 2822–2842 (2017)

    Google Scholar 

  26. Li, B., Ramamoorthy, A., Srikant, R.: Mean-field-analysis of coding versus replication in cloud storage systems. In: Proceedings IEEE International Conference on Computer Communications (INFOCOM), San Francisco, CA, pp. 1–9 (2016)

  27. Liggett, T.M.: Interacting Particle Systems. Springer, Berlin (2005)

    Book  Google Scholar 

  28. Lin, M., Zhang, L., Wierman, A., Tan, J.: Joint optimization of overlapping phases in MapReduce. Perform. Eval. 70(10), 720–735 (2013)

    Article  Google Scholar 

  29. Lu, H., Pang, G.: Heavy-traffic limits for an infinite-server fork–join queueing system with dependent and disruptive services. Queueing Syst. 85(1), 67–115 (2017)

    Article  Google Scholar 

  30. Lui, J.C., Muntz, R.R., Towsley, D.: Computing performance bounds of fork–join parallel programs under a multiprocessing environment. IEEE Trans. Parallel Distrib. Syst. 9(3), 295–311 (1998)

    Article  Google Scholar 

  31. Melamed, B., Whitt, W.: On arrivals that see time averages. Oper. Res. 38(1), 156–172 (1990)

    Article  Google Scholar 

  32. Meyn, S.P., Tweedie, R.L.: Stability of Markovian processes I: criteria for discrete-time chains. Adv. Appl. Probab. 24(3), 542–574 (1992)

    Article  Google Scholar 

  33. Meyn, S.P., Tweedie, R.L.: Stability of Markovian processes III: Foster–Lyapunov criteria for continuous-time processes. Adv. Appl. Probab. 25(3), 518–548 (1993)

    Article  Google Scholar 

  34. Moseley, B., Dasgupta, A., Kumar, R., Sarlós, T.: On scheduling in map-reduce and flow-shops. In: Proceedings of the Annual ACM Symposium Parallelism in Algorithms and Architectures (SPAA), San Jose, CA, pp. 289–298 (2011)

  35. Nelson, R., Tantawi, A.N.: Approximate analysis of fork/join synchronization in parallel queues. IEEE Trans. Comput. 37(6), 739–743 (1988)

    Article  Google Scholar 

  36. Nelson, R., Towsley, D., Tantawi, A.N.: Performance analysis of parallel processing systems. IEEE Trans. Softw. Eng. 14(4), 532–540 (1988)

    Article  Google Scholar 

  37. Rizk, A., Poloczek, F., Ciucu, F.: Stochastic bounds in fork–join queueing systems under full and partial mapping. Queueing Syst. 83(3), 261–291 (2016)

    Article  Google Scholar 

  38. Royden, H.L., Fitzpatrick, P.M.: Real Analysis, 4th edn. Pearson, London (2010)

    Google Scholar 

  39. Shah, N.B., Lee, K., Ramchandran, K.: When do redundant requests reduce latency? In: Proceedings of the Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, pp. 731–738 (2013)

  40. Shah, V., Bouillard, A., Baccelli, F.: Delay comparison of delivery and coding policies in data clusters. In: Proceedings of the Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, pp. 397–404 (2017)

  41. Sun, Y., Koksal, C.E., Shroff, N.B.: Near delay-optimal scheduling of batch jobs in multi-server systems. The Ohio State University. Technical Report (2017)

  42. Tan, J., Meng, X., Zhang, L.: Delay tails in MapReduce scheduling. In: Proceedings of the ACM SIGMETRICS/PERFORMANCE Jt. International Conference on Measurement and Modeling of Computer Systems, London, United Kingdom, pp. 5–16 (2012)

  43. Thomasian, A.: Analysis of fork/join and related queueing systems. ACM Comput. Surv. 47(2), 17:1–17:71 (2014)

    Article  Google Scholar 

  44. Varki, E.: Response time analysis of parallel computer and storage systems. IEEE Trans. Parallel Distrib. Syst. 12(11), 1146–1161 (2001)

    Article  Google Scholar 

  45. Vianna, E., Comarela, G., Pontes, T., Almeida, J., Almeida, V., Wilkinson, K., Kuno, H., Dayal, U.: Analytical performance models for MapReduce workloads. Int. J. Parallel Prog. 41(4), 495–525 (2013)

    Article  Google Scholar 

  46. Vulimiri, A., Michel, O., Godfrey, P.B., Shenker, S.: More is less: reducing latency via redundancy. In: Proceedings of the ACM Workshop Hot Topics in Networks (HotNets), Redmond, WA, pp. 13–18 (2012)

  47. Wang, W., Zhu, K., Ying, L., Tan, J., Zhang, L.: MapTask scheduling in MapReduce with data locality: throughput and heavy-traffic optimality. IEEE/ACM Trans. Netw. 24, 190–203 (2016)

    Article  Google Scholar 

  48. Xia, C.H., Liu, Z., Towsley, D., Lelarge, M.: Scalability of fork/join queueing networks with blocking. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, San Diego, CA, pp. 133–144 (2007)

  49. Xiang, Y., Lan, T., Aggarwal, V., Chen, Y.F.R.: Joint latency and cost optimization for erasure-coded data center storage. IEEE/ACM Trans. Netw. 24(4), 2443–2457 (2016)

    Article  Google Scholar 

  50. Xie, Q., Lu, Y.: Priority algorithm for near-data scheduling: throughput and heavy-traffic optimality. In: Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Hong Kong, China, pp. 963–972 (2015)

  51. Xie, Q., Dong, X., Lu, Y., Srikant, R.: Power of d choices for large-scale bin packing: a loss model. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Portland, OR, pp. 321–334 (2015)

  52. Ying, L., Srikant, R., Kang, X.: The power of slightly more than one sample in randomized load balancing. In: Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Kowloon, Hong Kong, pp. 1131–1139 (2015)

  53. Zheng, Y., Shroff, N.B., Sinha, P.: A new analytical technique for designing provably efficient MapReduce schedulers. In: Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Turin, Italy, pp. 1600–1608 (2013)

Download references

Acknowledgements

This work was supported in part by National Science Foundation Grants CPS ECCS-1739189, ECCS-1609370, XPS-1629444, and CMMI-1538204, the US Army Research Office (ARO Grant No. W911NF-16-1-0259), the US Office of Naval Research (ONR Grant No. N00014-15-1-2169), DTRA under the Grant Number HDTRA1-16-0017, and a 2018 Faculty Award from Microsoft. Additionally, Haotian Jiang was supported in part by the Department of Physics at Tsinghua University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weina Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Lemma 3

Lemma 3 (Restated)

$$\begin{aligned} d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})},\hat{\pi }^{(k^{(n)})}\Bigr )=O\Biggl (\biggl (\frac{k^{(n)}}{n^{1/4}}\biggr )^2\Biggr ). \quad {(17)\, (\mathrm{Restated})} \end{aligned}$$

Proof

This proof has a similar flavor to the proofs of Lemmas 1 and 2. Recall that \(\left( \widetilde{\varvec{W}}^{(n,k^{(n)})}(t),t\ge 0\right) \), the workload processes of the first \(k^{(n)}\) queues in the system \(\widetilde{\mathcal {S}}^{(n)}\), are \(k^{(n)}\) independent M/G/1 queues each with arrival rate \(\widetilde{\lambda }^{(n)}\) and service time distribution G. We couple this with \(\left( \hat{\varvec{W}}^{(k^{(n)})}(t),t\ge 0\right) \), where \(\hat{\varvec{W}}^{(k^{(n)})}(t)=\bigl (\hat{W}_1(t),\dots ,\hat{W}_{k^{(n)}}(t)\bigr )\) is the workload vector of \(k^{(n)}\) independent M/G/1 queues each with arrival rate \(\lambda \) and service time distribution G. Then, \(\hat{\pi }^{(k^{(n)})}\) is its stationary distribution. We will prove the bound on \(d_{TV}\bigl (\widetilde{\pi }^{(n,k^{(n)})},\hat{\pi }^{(k^{(n)})}\bigr )\) by showing that \(\left( \widetilde{\varvec{W}}^{(n,k^{(n)})}(t),t\ge 0\right) \) and \(\left( \hat{\varvec{W}}^{(k^{(n)})}(t),t\ge 0\right) \) are close.

Now we specify the coupling. All the queues start from empty, i.e., \(\hat{W}_i(0)=\widetilde{W}^{(n)}_i(0)=0\) for all \(i=1,2,\dots ,k^{(n)}\). When there is a task arrival to some queue of \(\hat{\varvec{W}}^{(k^{(n)})}\), we let a task arrive to the corresponding queue of \(\widetilde{\varvec{W}}^{(n,k^{(n)})}\) with probability \(\frac{\widetilde{\lambda }^{(n)}}{\lambda }\), and let these two tasks require the same service time. So with probability \(1-\frac{\widetilde{\lambda }^{(n)}}{\lambda }\) there is no task arrival to \(\widetilde{\varvec{W}}^{(n,k^{(n)})}\).

We pick a time \(\tau ^{(n)}=O\Bigl (\frac{n^{1/2}}{k^{(n)}}\Bigr )\). Let \(\hat{\pi }^{(k^{(n)})}_{\tau ^{(n)}}\) denote the distribution of \(\hat{\varvec{W}}^{(k^{(n)})}(\tau ^{(n)})\). Then,

$$\begin{aligned} d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})},\hat{\pi }^{(k^{(n)})}\Bigr )\le & {} d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})}_{\tau ^{(n)}},\hat{\pi }^{(k^{(n)})}_{\tau ^{(n)}}\Bigr )\\&+\,d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})}_{\tau ^{(n)}},\widetilde{\pi }^{(n,k^{(n)})}\Bigr )+d_{TV}\Bigl (\hat{\pi }^{(k^{(n)})}_{\tau ^{(n)}},\hat{\pi }^{(k^{(n)})}\Bigr ). \end{aligned}$$

Noting Lemma 2, we have

$$\begin{aligned} d_{TV}\Bigl (\widetilde{\pi }_{\tau ^{(n)}}^{(n,k^{(n)})},\widetilde{\pi }^{(n,k^{(n)})}\Bigr )&=O\Biggl (\biggl (\frac{k^{(n)}}{n^{1/4}}\biggr )^2\Biggr ),\end{aligned}$$
(33)
$$\begin{aligned} d_{TV}\Bigl (\hat{\pi }_{\tau ^{(n)}}^{(k^{(n)})},\hat{\pi }^{(k^{(n)})}\Bigr )&=O\Biggl (\biggl (\frac{k^{(n)}}{n^{1/4}}\biggr )^2\Biggr ). \end{aligned}$$
(34)

Next we bound \(d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})}_{\tau ^{(n)}},\hat{\pi }^{(k^{(n)})}_{\tau ^{(n)}}\Bigr )\) using arguments similar to those in the proof of Lemma 1. By the coupling, \(\hat{\varvec{W}}^{(k^{(n)})}(t)\) and \(\widetilde{\varvec{W}}^{(n,k^{(n)})}(t)\) are different for some \(t\in [0,\tau ^{(n)}]\) only when some task arrives to \(\hat{\varvec{W}}^{(k^{(n)})}\) but not to \(\widetilde{\varvec{W}}^{(n,k^{(n)})}\). We denote this event by \(\mathcal {E}\). Then,

$$\begin{aligned} d_{TV}\Bigl (\widetilde{\pi }^{(n,k^{(n)})}_{\tau ^{(n)}},\hat{\pi }^{(k^{(n)})}_{\tau ^{(n)}}\Bigr )\le \mathbb {P}(\mathcal {E}). \end{aligned}$$

So the remainder of this proof is dedicated to bounding \(\mathbb {P}(\mathcal {E})\).

Consider the time interval \([0,\tau ^{(n)}]\). Let A be the number of task arrivals to \(\hat{\varvec{W}}^{(k^{(n)})}\) during this time interval. Then,

$$\begin{aligned} \mathbb {P}(\mathcal {E})&=\sum _{j=0}^{\infty }\mathbb {P}(A=j)\mathbb {P}(\mathcal {E}\mid A=j)\nonumber \\&\le \sum _{j=0}^{\infty }\frac{(\lambda k^{(n)}\tau ^{(n)})^je^{-k^{(n)}\lambda \tau ^{(n)}}}{j!}j\biggl (1-\frac{\widetilde{\lambda }^{(n)}}{\lambda }\biggr ) \nonumber \\&= k^{(n)}\tau ^{(n)}(\lambda -\widetilde{\lambda }^{(n)}), \end{aligned}$$
(35)

where we have used a union bound for (35). By definition,

$$\begin{aligned} \widetilde{\lambda }^{(n)}&=\frac{\varLambda ^{(n)}}{ k^{(n)}}\Biggl (1-\frac{\left( {\begin{array}{c}n-k^{(n)}\\ k^{(n)}\end{array}}\right) }{\left( {\begin{array}{c}n\\ k^{(n)}\end{array}}\right) }\Biggr )\\&\ge \frac{\varLambda ^{(n)}}{ k^{(n)}}\Biggl (1-\biggl (1-\frac{k^{(n)}}{n}\biggr )^{k^{(n)}}\Biggr )\\&=\frac{\varLambda ^{(n)}}{ k^{(n)}}\biggl (\frac{(k^{(n)})^2}{n}+O\biggl (\frac{(k^{(n)})^4}{n^2}\biggr )\biggr )\\&=\lambda +O\biggl (\frac{(k^{(n)})^2}{n}\biggr ). \end{aligned}$$

Therefore,

$$\begin{aligned} \mathbb {P}(\mathcal {E})&= O\Biggl (\biggl (\frac{k^{(n)}}{n^{1/4}}\biggr )^2\Biggr ), \end{aligned}$$

which completes the proof. \(\square \)

Appendix B: Proof of Corollary 1

Corollary

1 (Restated) Consider an n-server system in the limited fork–join model with \(k^{(n)}=o(n^{1/4})\), job arrival rate \(\varLambda ^{(n)}=n\lambda /k^{(n)}\), and exponentially distributed service times with mean \(1/\mu \). Then, the steady-state job delay, \(T^{(n)}\), converges as:

$$\begin{aligned} \lim _{n\rightarrow \infty }\sup _{\tau \ge 0}\left| \mathbb {P}\bigl (T^{(n)}\le \tau \bigr )-\left( 1-e^{-(\mu -\lambda )\tau }\right) ^{k^{(n)}}\right| =0. \quad {(8)\,(\mathrm{Restated})} \end{aligned}$$

Specifically, if \(k^{(n)}\rightarrow \infty \) as \(n\rightarrow \infty \), then

$$\begin{aligned} \frac{T^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}\Rightarrow 1,\quad \text {as }n\rightarrow \infty , \quad {(9)\, (\mathrm{Restated})} \end{aligned}$$

where \(H_{k^{(n)}}\) is the \(k^{(n)}\)-th harmonic number, and further,

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}\bigl [T^{(n)}\bigr ]}{H_{k^{(n)}}/(\mu -\lambda )}=1. \quad {(10)\,(\mathrm{Restated})} \end{aligned}$$

Proof

When the service times are exponentially distributed, each queue is an M/M/1 queue, and thus the cdf of the task delay at each queue, F, is given by

$$\begin{aligned} F(\tau )=1-e^{-(\mu -\lambda )\tau }. \end{aligned}$$

Then, the convergence in (8) directly follows from Theorem 1.

To prove the weak convergence of \(\frac{T^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}\) in (9), we first note that

$$\begin{aligned} \frac{\hat{T}^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}\Rightarrow 1,\quad \text {as }n\rightarrow \infty , \end{aligned}$$

which is a direct implication of a standard result in the asymptotic theory of extremes (see, for example, Theorem 8.12 in [7]). Combining this with (8) yields (9).

To prove the convergence of the expectation in (10), we actually need the stochastic dominance shown in Theorem 2. The expectation in (10) can be written as

$$\begin{aligned} \frac{\mathbb {E}\bigl [T^{(n)}\bigr ]}{H_{k^{(n)}}/(\mu -\lambda )}&=\int _{0}^{\infty }\mathbb {P}\biggl (\frac{T^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}> \tau \biggr )\mathrm{d}\tau . \end{aligned}$$

By Theorem 2, for any \(\tau \ge 0\),

$$\begin{aligned} \mathbb {P}\biggl (\frac{T^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}> \tau \biggr )\le \mathbb {P}\biggl (\frac{\hat{T}^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}> \tau \biggr ). \end{aligned}$$

Since

$$\begin{aligned} \frac{\hat{T}^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}\Rightarrow 1,\quad \text {as }n\rightarrow \infty , \end{aligned}$$

and

$$\begin{aligned} \frac{\mathbb {E}\bigl [\hat{T}^{(n)}\bigr ]}{H_{k^{(n)}}/(\mu -\lambda )}=\int _{0}^{\infty }\mathbb {P}\biggl (\frac{\hat{T}^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}> \tau \biggr )\mathrm{d}\tau =1, \end{aligned}$$

by the General Lebesgue Dominated Convergence Theorem (see, for example, Theorem 19 in [38]), we can take the limit inside the integral and, using (9), get

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\mathbb {E}\bigl [T^{(n)}\bigr ]}{H_{k^{(n)}}/(\mu -\lambda )}&=\int _{0}^{\infty }\lim _{n\rightarrow \infty }\mathbb {P}\biggl (\frac{T^{(n)}}{H_{k^{(n)}}/(\mu -\lambda )}> \tau \biggr )\mathrm{d}\tau \\&=\int _{0}^1 1 \mathrm{d}\tau \\&=1, \end{aligned}$$

which completes the proof. \(\square \)

Appendix C: Non-independence result for \(k^{(n)}=\varvec{\varTheta }(n)\)

Theorem 3

Consider an n-server system in the limited fork–join model with \(k^{(n)}=\varTheta (n)\), job arrival rate \(\varLambda ^{(n)}=n\lambda /k^{(n)}\), and exponentially distributed service times with rate \(\mu \). Let \(\pi ^{(n,2)}\) denote the joint distribution of the steady-state queue lengths for any two queues in the n-server system. Let \(\hat{\pi }^{(2)}\) denote the joint distribution of the steady-state queue lengths of two independent M/M/1 queues, each with load \(\rho \). Then, there exist \(\epsilon >0\) and \(n_0>0\) such that, for any \(n>n_0\), \(d_{TV}\bigl (\pi ^{(n,2)}, \hat{\pi }^{(2)}\bigr )>\epsilon \).

Proof

We assume that \(k^{(n)}= p n\) for a constant p with \(0<p\le 1\). Then the job arrival rate is given by \(\varLambda ^{(n)}= \lambda /p\), which is a constant, so we rewrite \(\varLambda ^{(n)}\) as \(\varLambda \) for conciseness.

Let \(\epsilon =\frac{p\lambda (1-\rho )^2}{2(11\varLambda +8\mu )}\). We will specify \(n_0\) later. Suppose by contradiction that \(d_{TV}\bigl (\pi ^{(n,2)},\hat{\pi }^{(2)}\bigr )\le \epsilon \) for all \(n>n_0\). We will show that this assumption contradicts the balance equations of the first two queues in the limited fork–join system with n servers.

We first write out the balance equations for the Markov chain formed by the queue lengths of the first two queues. Consider a job arrival to this n-server system. Let \(p_0^{(n)}\) be the probability that no task arrives to the first two queues, and \(p_1^{(n)}\) be the probability that exactly one task arrives to the first two queues. Let \(p_2^{(n)}=1-p_0^{(n)}-p_1^{(n)}\) be the probability that two tasks arrive to the first two queues. We can compute these probabilities as follows:

$$\begin{aligned} p_0^{(n)}&= \left( {\begin{array}{c}n-2\\ k\end{array}}\right) \Bigm / \left( {\begin{array}{c}n\\ k\end{array}}\right) \rightarrow p_0:=(1-p)^2 \quad \text {as }n\rightarrow \infty ,\\ p_1^{(n)}&= \frac{2\left( {\begin{array}{c}n-2\\ k-1\end{array}}\right) }{\left( {\begin{array}{c}n\\ k\end{array}}\right) } \rightarrow p_1:=2p(1-p) \quad \text {as }n \rightarrow \infty ,\\ p_2^{(n)}&= \frac{\left( {\begin{array}{c}n-2\\ k-2\end{array}}\right) }{\left( {\begin{array}{c}n\\ k\end{array}}\right) } \rightarrow p_2:=p^2 \quad \text {as }n \rightarrow \infty . \end{aligned}$$

Recall that the joint distribution of the steady-state queue lengths of the first two queues is \(\pi ^{(n,2)}\). Then, the balance equation of the first two queues for the state (1, 1) can be written as

$$\begin{aligned} 0&=\pi ^{(n,2)}(1,1) \cdot (p_1^{(n)}\varLambda +p_2^{(n)}\varLambda +2\mu )\nonumber \\&\quad -\biggl (\frac{1}{2}\pi ^{(n,2)}(0,1)p_1^{(n)}\varLambda +\frac{1}{2}\pi ^{(n,2)}(1,0)p_1^{(n)}\varLambda \nonumber \\&\quad +\pi ^{(n,2)}(0,0)p_2^{(n)}\varLambda +\pi ^{(n,2)}(1,2)\mu +\pi ^{(n,2)}(2,1)\mu \biggr ). \end{aligned}$$
(36)

Let the right-hand side of (36) be denoted by \(\mathcal {R}(\pi ^{(n,2)})\). Let

$$\begin{aligned} a_1&=(p_1^{(n)}-p_1)\varLambda \biggl (\pi ^{(n,2)}(1,1)-\frac{1}{2}\pi ^{(n,2)}(0,1)-\frac{1}{2}\pi ^{(n,2)}(1,0)\biggr )\\&\quad +(p_2^{(n)}-p_2)\varLambda \biggl (\pi ^{(n,2)}(1,1)-\frac{1}{2}\pi ^{(n,2)}(0,1)\\&\quad -\frac{1}{2}\pi ^{(n,2)}(1,0)-\pi ^{(n,2)}(0,0)\biggr ),\\ a_2&=(\pi ^{(n,2)}(1,1)-\hat{\pi }^{(2)}(1,1))(p_1\varLambda +p_2\varLambda +2\mu )\\&\quad -\frac{1}{2}(\pi ^{(n,2)}(0,1)-\hat{\pi }^{(2)}(0,1)+\pi ^{(n,2)}(1,0)-\hat{\pi }^{(2)}(1,0))p_1\varLambda \\&\quad -(\pi ^{(n,2)}(0,0)-\hat{\pi }^{(2)}(0,0))p_2\varLambda \\&\quad -(\pi ^{(n,2)}(1,2)-\hat{\pi }^{(2)}(1,2)+\pi ^{(n,2)}(2,1)-\hat{\pi }^{(2)}(2,1))\mu . \end{aligned}$$

Then since \(\hat{\pi }^{(2)}(q_1,q_2)=(1-\rho )^2\rho ^{q_1+q_2}\) for any \((q_1,q_2)\in \mathbb {Z}_+^2\),

$$\begin{aligned} \mathcal {R}(\pi ^{(n,2)})&=a_1+a_2+\hat{\pi }^{(2)}(1,1) \cdot (p_1\varLambda +p_2\varLambda +2\mu )\nonumber \\&\quad -\biggl (\frac{1}{2}\hat{\pi }^{(2)}(0,1)p_1\varLambda +\frac{1}{2}\hat{\pi }^{(2)}(1,0)p_1\varLambda \nonumber \\&\quad +\hat{\pi }^{(2)}(0,0)p_2\varLambda +\hat{\pi }^{(2)}(1,2)\mu +\hat{\pi }^{(2)}(2,1)\mu \biggr )\nonumber \\&=a_1+a_2-p \lambda (1-\rho )^4. \end{aligned}$$
(37)

We choose \(n_0\) such that, for any \(n> n_0\), \(|p_1^{(n)}-p_1|\le \epsilon \) and \(|p_2^{(n)}-p_2|\le \epsilon \). Then, it is not hard to see that \(|a_1|\le 3\varLambda \epsilon \). By the assumption that \(d_{TV}\bigl (\pi ^{(n,2)},\hat{\pi }^{(2)}\bigr )\le \epsilon \), we have that \(|a_2|\le 8(\varLambda +\mu ) \epsilon \). By the choice of \(\epsilon \), \(|a_1+a_2|\le (11\varLambda +8\mu )\epsilon =\frac{1}{2}p\lambda (1-\rho )^2\). Therefore, \(\mathcal {R}(\pi ^{(n,2)})<0\) by (37), which contradicts the balance equation (36). This completes the proof of Theorem 3. \(\square \)

Appendix D: Definition and some properties of association

Definition 1

(Association [9]) We say random variables \(X_1\), \(X_2,\dots ,X_m\) are associated if for all (entrywisely) nondecreasing functions f and g,

$$\begin{aligned}&\mathbb {E}[f(X_1,X_2,\dots ,X_m)g(X_1,X_2,\dots ,X_m)] \nonumber \\&\quad \ge \mathbb {E}[f(X_1,X_2,\dots ,X_m)]\mathbb {E}[g(X_1,X_2,\dots ,X_m)]. \end{aligned}$$
(38)

Lemma 4

[9] Associated random variables have the following properties:

  • (P1) Nondecreasing functions of associated random variables are associated.

  • (P2) If two sets of associated random variables are independent of one another, then their union is a set of associated random variables.

  • (P3) If a sequence of random vectors \(\varvec{X}(u)\Rightarrow \varvec{X}\) as \(u\rightarrow \infty \) and, for each u, the entries of \(\varvec{X}(u)\) are associated, then the entries of \(\varvec{X}\) are associated.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Harchol-Balter, M., Jiang, H. et al. Delay asymptotics and bounds for multitask parallel jobs. Queueing Syst 91, 207–239 (2019). https://doi.org/10.1007/s11134-018-09597-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11134-018-09597-5

Keywords

Mathematics Subject Classification

Navigation