Mean-field Macro Computation in Large-scale Cloud Service Systems with Resource Management and Job Scheduling

  • Feifei Yang
  • Yanping JiangEmail author
  • Quanlin LiEmail author


Service computing is an emerging and distributed computing mode in cloud service systems, and has become an interesting research direction for both academia and industry. Note that the cloud service systems always display new characteristics, such as stochasticity, large scale, loose coupling, concurrency, non-homogeneity and heterogeneity, thus their load balancing investigation has been more interesting, difficult and challenging until now. By using resource management and job scheduling, this paper proposes an integrated, real-time and dynamic control mechanism for large-scale cloud service systems and their load balancing through combining supermarket models with not only work stealing models but also scheduling of public reserved resource. To this end, this paper provides a novel stochastic model with weak interactions by means of nonlinear Markov processes. To overcome theoretical difficulties growing out of the state explosion in high-dimensional stochastic systems, this paper applies the mean-field theory to develop a macro computational technique in terms of an infinite-dimensional system of mean-field equations. Furthermore, this paper proves the asymptotic independence of the large-scale cloud service system, and show how to compute the fixed point by virtue of an infinite-dimensional system of nonlinear equations. Based on the fixed point, this paper provides effective numerical computation for performance analysis of this system under a high approximate precision. Therefore, we hope that the methodology and results given in this paper can be applicable to the study of more general large-scale cloud service systems.


Large-scale cloud service system resource management job scheduling supermarket model work stealing model scheduling of public reserved resource 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



The authors are grateful to the editor and two anonymous referees for their constructive comments and suggestions, which sufficiently help the authors to improve the presentation of this manuscript. In addition, Yanping Jiangwas supported by the National Natural Science Foundation of China under grant Nos. 71871048 and 71571040; and Quanlin Li was supported by the National Natural Science Foundation of China under grant Nos. 71671158 and 71471160, and by the Natural Science Foundation of Hebei province under grant No. G2017203277.


  1. Anselmi J, Gaujal B (2009). Performance evaluation ofwork stealing for streaming applications. In 13th International Conference on Principles of Distributed Systems, Nimes, France, December 15–18, 2009.Google Scholar
  2. Berenbrink P, Friedetzky T, Goldberg L A (2003). The natural work-stealing algorithm is stable. SIAM Journal on Computing 32(5): 1260–1279.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Blumofe R D, Leiserson C E (1999). Scheduling multi-threaded computations by work stealing. Journal of the ACM 46(5): 720–748.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Bramson M, Lu Y, Prabhakar B (2010). Randomized load balancing with general service time distributions. ACM SIGMETRICS Performance Evaluation Review 38(1): 275–286.CrossRefGoogle Scholar
  5. Bramson M, Lu Y, Prabhakar B (2012). Asymptotic independence of queues under randomized load balancing. Queueing Systems 71(3): 247–292.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Bramson M, Lu Y, Prabhakar B (2013). Decay of tails at equilibrium for FIFO join the shortest queue networks. The Annals of Applied Probability 23(5): 1841–1878.MathSciNetCrossRefzbMATHGoogle Scholar
  7. Calheiros R N, Ranjan R, Beloglazov A, De Rose C A, Buyya R (2011). CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and experience 41(1): 23–50.Google Scholar
  8. Ethier S N, Kurtz T G(2009). Markov Processes: Characterization and Convergence. John Wiley & Sons, Inc., Hoboken, New Jersey.zbMATHGoogle Scholar
  9. Gast N, Gaujal B (2010). A mean field model of work stealing in large-scale systems. ACM SIGMETRICS Performance Evaluation Review 38(1): 13–24.CrossRefGoogle Scholar
  10. Graham C (2000). Chaoticity on path space for a queueing network with selection of the shortest queue among several. Journal of Applied Probability 37(1): 198–211.MathSciNetCrossRefzbMATHGoogle Scholar
  11. Graham C (2005). Functional central limit theorems for a large network in which customers join the shortest of several queues. Probability Theory and Related Fields 131(1): 97–120.MathSciNetCrossRefzbMATHGoogle Scholar
  12. Harchol-Balter M, Li C, Osogami T, Scheller-Wolf A, Squillante M S (2003). Analysis of task assignment with cycle stealing under central queue. In Proceedings of the 23rd International Conference on Distributed Computing Systems, Providence, Rhode Island, May 19–22, 2003.Google Scholar
  13. Hendler D, Shavit N (2002). Non-blocking steal-half work queues. In Proceedings of the 21st Annual Symposium on Principles of Distributed Computing, Monterey, California, July 21–24, 2002.Google Scholar
  14. Iosup A, Ostermann S, Yigitbasi M N, Prodan R, Fahringer T, Epema D (2011). Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Transactions on Parallel and Distributed Systems 22(6): 931–945.CrossRefGoogle Scholar
  15. Jennings B, Stadler R (2015). Resource management in clouds: Survey and research challenges. Journal of Network and Systems Management 23(3): 567–619.CrossRefGoogle Scholar
  16. Li Q L (2014). Tail probabilities in queueing processes. Asia-Pacific Journal of Operational Research 31(2): 1–31.MathSciNetCrossRefzbMATHGoogle Scholar
  17. Li Q L, Dai G, Lui J C S, Wang Y (2014). The mean-field computation in a supermarket model with server multiple vacations. Discrete Event Dynamic Systems 24(4): 473–522.MathSciNetCrossRefzbMATHGoogle Scholar
  18. Li Q L, Du Y, Dai G, Wang M (2015). On a doubly dynamically controlled supermarket model with impatient customers. Computers & Operations Research 55(1): 76–87.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Li Q L, Lui J C S (2016). Block-structured supermarket models. Discrete Event Dynamic Systems 26(2): 147–182.MathSciNetCrossRefzbMATHGoogle Scholar
  20. Li Q L, Yang F (2015). Mean-field analysis for heterogeneous work stealing models. In 14th International Conference on Information Technologies and Mathematical Modelling Anzhero-Sudzhensk, Russia, November 18–22, 2015.Google Scholar
  21. Lin C, Tian Y, Yao M (2012). Green network and green evaluation: Mechanism, modeling and evaluation. Chinese Journal of Computers 34(4): 593–612.CrossRefGoogle Scholar
  22. Lu Y, Xie Q, Kliot G, Geller A, Larus J R, Greenberg A (2011). Join-idle-queue: A novel load balancing algorithm for dynamically scalable web services. Performance Evaluation 68(11): 1056–1071.CrossRefGoogle Scholar
  23. Luczak M, McDiarmid C (2007). Asymptotic distributions and chaos for the supermarket model. Electronic Journal of Probability 12(1): 75–99.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Manvi S S, Shyam G K (2014). Resource management for infrastructure as a service (iaas) in cloud computing: a survey. Journal of Network & Computer Applications 41(1): 424–440.CrossRefGoogle Scholar
  25. Minnebo W, Van Houdt B (2012). Pull versus push mechanism in large distributed networks: Closed formresults. In Proceedings of the 24th International Teletraffic Congress, Krakow, Poland, September 04–07, 2012.Google Scholar
  26. Minnebo W, Van Houdt B (2013). Improved rate-based pull and push strategies in large distributed networks. In the IEEE 21st International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems San Francisco, August 14–16, 2013.Google Scholar
  27. Mitzenmacher M D (1996). The power of two choices in randomized load balancing. Department of Computer Science. PhD Thesis, University of California, Berkeley, USA.Google Scholar
  28. Mitzenmacher M D (2000). Analyses of load stealing models based on families of differential equations. Theory of Computing Systems 34(1): 77–98.MathSciNetCrossRefzbMATHGoogle Scholar
  29. Moreno I S, Garraghan P, Townend P, Xu J (2014). Analysis, modeling and simulation of workload patterns in a large-scale utility cloud. IEEE Transactions on Cloud Computing 2(2): 208–221.CrossRefGoogle Scholar
  30. Osogami T, Harchol-Balter M, Scheller-Wolf A (2003). Analysis of cycle stealing with switching cost. Journal of the ACM 31(1): 184–195.Google Scholar
  31. Sotiriadis S, Bessis N, Antonopoulos N, Anjum A (2013). SimIC: Designing a new inter-cloud simulation platform for integrating large-scale resource management. In the IEEE 27th International Conference on Advanced Information Networking and Applications, Barcelona, Spain, March 25–28, 2013.Google Scholar
  32. Squillante M S (2007). Stochastic analysis of multiserver systems. ACM SIGMETRICS Performance Evaluation Review 34(4): 44–51.CrossRefGoogle Scholar
  33. Squillante M S, Nelson R D (1991). Analysis of task migration in shared-memory multiprocessor scheduling. ACM SIGMETRICS Performance Evaluation Review 19(1): 143–155.CrossRefGoogle Scholar
  34. Stolyar A L (2015). Pull-based load distribution in large-scale heterogeneous service systems. Queueing Systems 80(4): 341–361.MathSciNetCrossRefzbMATHGoogle Scholar
  35. Turner S R (1998). The effect of increasing routing choice on resource pooling. Probability in the Engineering and Informational Sciences 12(1): 109–124.MathSciNetCrossRefzbMATHGoogle Scholar
  36. van der Boor M, Borst S C, van Leeuwaarden J S, Mukherjee D (2018). Scalable load balancing in networked systems: A survey of recent advances. arXiv preprint arXiv:1806.05444 1–69.Google Scholar
  37. Van Houdt B (2011). Performance comparison of aggressive push and traditional pull strategies in large distributed systems. In the 8th International Conference on Quantitative Evaluation of Systems, Aachen, Germany, September 5–8, 2011.Google Scholar
  38. Vvedenskaya N D, Dobrushin R L, Karpelevich F I (1996). Queueing system with selection of the shortest of two queues: An asymptotic approach. Problems of Information Transmission 32(1): 20–34.MathSciNetzbMATHGoogle Scholar
  39. Vvedenskaya N D, Suhov Y M (1997). Dobrushin’s meanfield approximation for a queue with dynamic routing. Markov Processes and Related Fields 13(1): 493–526.zbMATHGoogle Scholar
  40. Wuhib F, Yanggratoke R, Stadler R (2015). Allocating compute and network resources under management objectives in large-scale clouds. Journal of Network and Systems Management 23(1): 111–136.CrossRefGoogle Scholar

Copyright information

© Systems Engineering Society of China and Springer-Verlag GmbH Germany 2019

Authors and Affiliations

  1. 1.School of Business AdministrationNortheastern UniversityShenyangChina
  2. 2.School of Economics and ManagementBeijing University of TechnologyBeijingChina

Personalised recommendations