Advertisement

International Journal of Parallel Programming

, Volume 46, Issue 2, pp 173–197 | Cite as

Scheduling Parallel Computations by Work Stealing: A Survey

  • Jixiang Yang
  • Qingbi He
Article

Abstract

Work stealing has been proven to be an efficient technique for scheduling parallel computations, and has been gaining popularity as the multiprocessor/multicore-processor load balancing technology of choice in both industry and academia. A review on the work stealing scheduling is provided from the perspective of scheduling algorithms, optimization of algorithm implementation and processor architecture oriented optimization. The future research trends and recommendations driven by theory, emerging applications and motifs, architecture and heterogeneous platforms are also provided.

Keywords

Work stealing Load balancing Task parallelism Task scheduling Optimization Multicore/manycore 

Notes

Acknowledgements

This work was supported by Chongqing Natural Science Foundation (No. KJ1400316), partially supported by National Natural Science Foundation of China (No. 11401061).

References

  1. 1.
    Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.L.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)CrossRefGoogle Scholar
  2. 2.
    Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI’98), pp. 212–223. ACM, New York, NY, USA, June 16–19 (1998)Google Scholar
  3. 3.
    Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Burton, F.W., Sleep, M.R.: Executing functional programs on a virtual tree of processors. In: Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture (FPCA’81), pp. 187–194. ACM, New York, NY, USA, October 18–22 (1981)Google Scholar
  5. 5.
    Halstead, R.H.: Implementation of multilisp: Lisp on a multiprocessor. In: Proceedings of the 1984 ACM Symposium on LISP and Functional Programming (LFP’84), pp. 9–17. ACM, New York, NY, USA, August 6–8 (1984)Google Scholar
  6. 6.
    Mohr, E., Kranz, D.A., Halstead, J.R.H.: Lazy task creation: a technique for increasing the granularity of parallel programs. In: Proceedings of the 1990 ACM Conference on LISP and Functional Programming (LFP’90), pp. 185–197. ACM, New York, NY, USA, June 27–29 (1990)Google Scholar
  7. 7.
    Vrba, Ž., Espeland, H., Halvorsen, P., Griwodz, C.: Limits of work-stealing scheduling. In: Proceedings of the 14th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP’09), pp. 280–299. Springer, Berlin, Germany, May 29 (2009)Google Scholar
  8. 8.
    Squillante, M.S., Nelson, R.D.: Analysis of task migration in shared-memory multiprocessor scheduling. In: Proceedings of the 1991 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’91), New York, NY, USA, May 21–24 (1991)Google Scholar
  9. 9.
    Mitzenmacher, M.: Analyses of load stealing models based on differential equations. In: Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’98), pp. 212–221. ACM, New York, NY, USA, June 28–July 2 (1998)Google Scholar
  10. 10.
    Rudolph, L., Slivkin-Allalouf, M., Upfal, E.: A simple load balancing scheme for task allocation in parallel machines. In: Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’91), pp. 237–245. New York, NY, USA, July 21–24 (1991)Google Scholar
  11. 11.
    Eager, D.L., Lazowska, E.D., Zahorjan, J.: A comparison of receiver-initiated and sender-initiated adaptive load sharing. Perform. Eval. 6(1), 53–68 (1986)CrossRefGoogle Scholar
  12. 12.
    Mirchandaney, R., Towsley, D., Stankovic, J.A.: Analysis of the effects of delays on load sharing. IEEE Trans. Comput. 38(11), 1513–1525 (1989)CrossRefGoogle Scholar
  13. 13.
    Squillante, M.S., Lazowska, E.D.: Using processor-cache affinity information in shared-memory multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 4(2), 131–143 (1993)CrossRefGoogle Scholar
  14. 14.
    Squillante, M.S., Xia, C.H., Yao, D.D., Zhang, L.: Threshold-based priority policies for parallel-server systems with affinity scheduling. In: Proceedings of the 2001 American Control Conference (ACC’01), pp. 2992–2999. IEEE, New York, NY, USA, June 25–27 (2001)Google Scholar
  15. 15.
    Acar, U.A., Blelloch, G.E., Blumofe, R.D.: The data locality of work stealing. Theory Comput. Syst. 35(3), 321–347 (2002)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Narang, A., Shyamasundar, R.K.: Performance driven distributed scheduling of parallel hybrid computations. Theor. Comput. Sci. 412(32), 4212–4225 (2011)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Suksompong, W., Leiserson, C.E., Schardl, T.B.: On the efficiency of localized work stealing. Inf. Process. Lett. 116(2), 100–106 (2016)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Mirchandaney, R., Towsley, D., Stankovic, J.A.: Adaptive load sharing in heterogeneous distributed systems. J. Parallel Distrib. Comput. 9(4), 331–346 (1990)CrossRefGoogle Scholar
  19. 19.
    Bender, M.A., Rabin, M.O.: Online scheduling of parallel programs on heterogeneous systems with applications to Cilk. Theory Comput. Syst. 35(3), 289 (2002)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Gast, N., Bruno, G.: A mean field model of work stealing in large-scale systems. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’10), pp. 13–24. ACM, New York, NY, USA, June 14–18 (2010)Google Scholar
  21. 21.
    Blelloch, G.E., Gibbons, P.B., Matias, Y.: Provably efficient scheduling for languages with fine-grained parallelism. J. ACM 46(2), 281–321 (1999)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Fatourou, P., Spirakis, P.: Efficient scheduling of strict multithreaded computations. Theory Comput. Syst. 33(3), 173–232 (2000)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Berenbrink, P., Friedetzky, T., Goldberg, L.A.: The natural work-stealing algorithm is stable. SIAM J. Comput. 32(5), 1260–1279 (2003)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. Theory Comput. Syst. 34(2), 115–144 (2001)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Tchiboukdjian, M., Gast, N., Trystram, D.: Decentralized list scheduling. Ann. Oper. Res. 207(1), 237–259 (2013)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Tchiboukdjian, M., Gast, N., Trystram, D., Roch, J., Bernard, J.: A tighter analysis of work stealing. In: Proceedings of the 21st International Symposium on Algorithms and Computation (ISAAC’10), pp. 291–302. Springer, Berlin, Germany, December 15–17 (2010)Google Scholar
  27. 27.
    Agrawal, K., Leiserson, C.E., Sukha, J.: Helper locks for fork-join parallel programming. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 245–256. ACM, New York, NY, USA, January 9–14 (2010)Google Scholar
  28. 28.
    Cole, R., Ramachandran, V.: Analysis of randomized work stealing with false sharing. In: Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS’13), pp. 985–998. IEEE, Los Alamitos, CA, USA, May 20–24 (2013)Google Scholar
  29. 29.
    Agrawal, K., Fineman, J.T., Sheridan, B., Sukha, J., Utterback, R.: Provably good scheduling for parallel programs that use data structures through implicit batching. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architecture (SPAA’14), pp. 84–95. ACM, New York, NY, USA, June 23–25 (2014)Google Scholar
  30. 30.
    Sanchez, D., Yoo, R.M., Kozyrakis, C.: Flexible architectural support for fine-grain scheduling. In: Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS’10), pp. 311–322. ACM, New York, NY, USA, March 13–17 (2010)Google Scholar
  31. 31.
    Kulkarni, M., Carribault, P., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Scheduling strategies for optimistic parallel execution of irregular programs. In: Proceedings of the 20th Annual Symposium on Parallelism in Algorithms and Architectures (SPAA’08), pp. 217–228. ACM, New York, NY, USA, June 14–16 (2008)Google Scholar
  32. 32.
    Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41(7), 33–38 (2008)CrossRefGoogle Scholar
  33. 33.
    Chen, S., Gibbons, P.B., Kozuch, M., Liaskovitis, V., Ailamaki, A., Blelloch, G.E., Falsafi, B., Fix, L., Hardavellas, N., Mowry, T.C., Wilkerson, C.: Scheduling threads for constructive cache sharing on CMPs. In: Proceedings of the 19th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’07), pp. 105–115. ACM, New York, NY, USA, June 9–11 (2007)Google Scholar
  34. 34.
    Faxén, K.F.: Efficient work stealing for fine grained parallelism. In: Proceedings of the 39th International Conference on Parallel Processing (ICPP’10), pp. 313–322. IEEE, Los Alamitos, CA, USA, September 13–16 (2010)Google Scholar
  35. 35.
    Mohr, E., Kranz, D.A., Halstead Jr., R.H.: Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE Trans. Parallel Distrib. Syst. 2(3), 264–280 (1991)CrossRefGoogle Scholar
  36. 36.
    Loidl, H.W., Hammond, K.: On the granularity of divide-and-conquer parallelism. In: Proceedings of the 1995 Glasgow Workshop on Functional Programming (FP’95), pp. 8–10. Springer, Berlin, Germany, July 10–12 (1995)Google Scholar
  37. 37.
    Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP task scheduling strategies. In: Proceedings of the 4th International Workshop on OpenMP in New Era of Parallelism (IWOMP’08), pp. 100–110. Springer, Berlin, Germany, May 12–14 (2008)Google Scholar
  38. 38.
    Cong, G., Kodali, S., Krishnamoorthy, S., Lea, D., Saraswat, V., Wen, T.: Solving large, irregular graph problems using adaptive work-stealing. In: Proceedings of the 37th International Conference on Parallel Processing (ICPP’08), pp. 536–545. IEEE, Washington, DC, USA, September 9–12 (2008)Google Scholar
  39. 39.
    Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08), pp. 339–349. IEEE, Piscataway, NJ, USA, November 15–21 (2008)Google Scholar
  40. 40.
    Acar, U.A., Charguéraud, A., Rainey, M.: Oracle scheduling: controlling granularity in implicitly parallel languages. In: Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Ap-plications (OOPSLA’11), pp. 499–518. ACM, New York, NY, USA, October 22–27 (2011)Google Scholar
  41. 41.
    Wang, L., Cui, H., Duan, Y., Lu, F., Feng, X., Yew, P.: An adaptive task creation strategy for work-stealing scheduling. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’10). ACM, New York, NY, USA, April 24–28 (2010)Google Scholar
  42. 42.
    Tzannes, A., Caragea, G.C., Barua, R., Vishkin, U.: Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 179–190. ACM, New York, NY, USA, January 9–14 (2010)Google Scholar
  43. 43.
    Cao, Q., Hu, C.J., Li, S.G., He, H.H.: Adaptive task granularity strategy for OpenMP3.0 task model on cell architecture. In: Proceedings of International Conferences on High Performance Networking, Computing, Communication Systems, and Mathematical Foundations (ICHCC 2011), pp. 393–400. Springer, Berlin, Germany, May 5–6 (2011)Google Scholar
  44. 44.
    Hoffmann, R., Rauber, T.: Fine-grained task scheduling using adaptive data structures. In: Proceedings of the 14th International European Conference on Parallel Processing (Euro-Par’08), pp. 253–262. Springer, Berlin, Germany, August 26–29 (2008)Google Scholar
  45. 45.
    Lee, I.A., Boyd-Wickizer, S., Huang, Z., Leiserson, C.E.: Using memory mapping to support cactus stacks in work-stealing runtime systems. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10), pp. 411–420. ACM, New York, NY, USA, September 11–15 (2010)Google Scholar
  46. 46.
    Zhao, J.S., Shirako, J., Nandivada, V.K., Sarkar, V.: Reducing task creation and termination overhead in explicitly parallel programs. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10), pp. 169–180. ACM, New York, NY, USA, September 11–15 (2010)Google Scholar
  47. 47.
    Robison, A., Voss, M., Kukanov, A.: Optimization via reflection on work stealing in TBB. In: Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’08), pp. 1–8. IEEE, New York, NY, USA, April 14–18 (2008)Google Scholar
  48. 48.
    Chen, Q., Huang, Z., Guo, M., Zhou, J.: CAB: cache aware bi-tier task-stealing in multi-socket multi-core architecture. In: Proceedings of the 2011 International Conference on Parallel Processing (ICPP’11), pp. 722–732. IEEE, New York, NY, USA, September 13–16 (2011)Google Scholar
  49. 49.
    Chen, Q., Guo, M.Y., Huang, Z.Y.: Adaptive cache aware bitier work-stealing in multisocket multicore architectures. IEEE Trans. Parallel Distrib. Syst. 24(12), 2334–2343 (2013)CrossRefGoogle Scholar
  50. 50.
    Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Prins, J.F.: Scheduling task parallelism on multi-socket multicore systems. In: Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers (ROSS’11), pp. 49–56. ACM, New York, NY, USA, May 31 (2011)Google Scholar
  51. 51.
    Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012)CrossRefGoogle Scholar
  52. 52.
    Chen, Q., Guo, M.: Locality-aware work stealing based on online profiling and auto-tuning for multisocket multicore architectures. ACM Trans. Arch. Code Optim. 12(222), 1–24 (2015)Google Scholar
  53. 53.
    Tzannes, A., Caragea, G.C., Barua, R., Vishkin, U.: Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 179–189. ACM, New York, NY, USA, January 9–14 (2010)Google Scholar
  54. 54.
    Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’05), pp. 21–28. ACM, New York, NY, USA, July 18–20 (2005)Google Scholar
  55. 55.
    Lê, N.M., Pop, A., Cohen, A., Nardelli, F.Z.: Correct and efficient work-stealing for weak memory models. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13), pp. 69–79. ACM, New York, NY, USA, February 23–27 (2013)Google Scholar
  56. 56.
    Traoré, D., Roch, J., Maillard, N., Gautier, T., Bernard, J.: Deque-free work-optimal parallel STL algorithms. In: Proceedings of the 14th International European Conference on Parallel Processing (Euro-Par’08), pp. 887–897. Springer, Berlin, Germany, August 26–29 (2008)Google Scholar
  57. 57.
    Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable work stealing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC’09), pp. 1–11. ACM, New York, NY, USA, November 14–20 (2009)Google Scholar
  58. 58.
    Michael, M.M., Vechev, M.T., Saraswat, V.A.: Idempotent work stealing. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’09), pp. 45–53. ACM, New York, NY, USA, February 14–18 (2009)Google Scholar
  59. 59.
    Kumar, S., Hughes, C.J., Nguyen, A.: Carbon: architectural support for fine-grained parallelism on Chip multiprocessors. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, Conference (ISCA’07), pp. 162–173. ACM, New York, NY, USA, June 9–13 (2007)Google Scholar
  60. 60.
    Quintin, J., Wagner, F.: Hierarchical work-stealing. In: Proceedings of the 16th International European Conference on Parallel Processing (Euro-Par’10). Springer, Berlin, Germany, August 31–September 3 (2010)Google Scholar
  61. 61.
    Wang, Y., Zhang, Y., Su, Y., Wang, X., Chen, X., Ji, W., Shi, F.: An adaptive and hierarchical task scheduling scheme for multi-core clusters. Parallel Comput. 40(10), 611–627 (2014)CrossRefGoogle Scholar
  62. 62.
    Tsai, Y.C., Huang, Y.C.: A generalized framework for parallelizing traffic sign inventory of video log images using multicore processors. Comput. Aided Civ. Infrastruct. Eng. 27(7SI), 476–493 (2012)CrossRefGoogle Scholar
  63. 63.
    van Dijk, T., van de Pol, J.C.: Lace: non-blocking split deque for work-stealing. In: Proceedings of the 20th International European Conference on Parallel Processing (Euro-Par’14), pp. 206–217. Springer, Berlin, Germany, August 25–26 (2014)Google Scholar
  64. 64.
    Hendler, D., Lev, Y., Shavit, N.: Dynamic memory ABP work-stealing. In: Proceedings of the 18th International Conference on Distributed Computing (DISC 2004), pp. 188–200. Springer, Berlin, Germany, October 4–8 (2004)Google Scholar
  65. 65.
    Guo, Y., Zhao, J.Z., Cave, V., Sarkar, V.: SLAW: A scalable locality-aware adaptive work-stealing scheduler for multi-core systems. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 341–342. ACM, New York, NY, USA, January 09–14 (2010)Google Scholar
  66. 66.
    Guo, Y., Barik, R., Raman, R., Sarkar, V.: Work-first and help-first scheduling policies for async-finish task parallelism. In: Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS’09), pp. 1–12. IEEE, New York, NY, USA, May 23–29 (2009)Google Scholar
  67. 67.
    Paudel, J., Tardieu, O., Amaral, J.N.: On the merits of distributed work-stealing on selective locality-aware tasks. In: Proceedings of the 2013 42nd International Conference on Parallel Processing (ICPP’13), pp. 100–109. IEEE, New York, NY, USA, October 1–4 (2013)Google Scholar
  68. 68.
    Cao, Y., Sun, H.Y., Qian, D.P., Wu, W.G.: Stable adaptive work-stealing for concurrent multicore runtime systems. In: Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications (HPCC’11). IEEE, New York, NY, USA, September 2–4 (2011)Google Scholar
  69. 69.
    Adnan, Sato, M.: Efficient work-stealing strategies for fine-grain task parallelism. In: Proceedings of the 2011 IEEE International Parallel and Distributed Processing Symposium (IPDPS’11). IEEE, Los Alamitos, CA, USA, May 16–22 (2011)Google Scholar
  70. 70.
    Acar, U.A., Chargueraud, A., Rainey, M.: Scheduling parallel programs by work stealing with private deques. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13), pp. 219–228. ACM, New York, NY, USA, February 23–27 (2013)Google Scholar
  71. 71.
    Herlihy, M., Liu, Z.: Well-structured futures and cache locality. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14), pp. 155–166. ACM, New York, NY, USA, February 15–19 (2014)Google Scholar
  72. 72.
    Fohry, C., Bungart, M., Posner, J.: Fault tolerance schemes for global load balancing in X10. Scalable Comput. Pract. Exp. 16(2SI), 169–185 (2015)Google Scholar
  73. 73.
    Tardieu, O., Wang, H., Lin, H.: A work-stealing scheduler for X10’s task parallelism with suspension. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12), pp. 267–276. ACM, New York, NY, USA, February 25–29 (2012)Google Scholar
  74. 74.
    Tardieu, O., Herta, B., Cunningham, D., Grove, D., Kambadur, P., Saraswat, V., Shinnar, A., Takeuchi, M., Vaziri, M., Zhang, W.: X10 and APGAS at Petascale. ACM Trans. Parallel Comput. 2(4), 1–32 (2015)CrossRefGoogle Scholar
  75. 75.
    Agarwal, S., Barik, R., Bonachea, D., Sarkar, V., Shymasundar, R.K., Yelick, K.: Deadlock-free scheduling of X10 computations with bounded resources. In: Proceedings of the 19th Annual Symposium on Parallelism in Algorithms and Architectures (SPAA’07), pp. 229–240. ACM, New York, NY, USA, June 9–11 (2007)Google Scholar
  76. 76.
    Pezzi, G.P., Cera, M.C., Mathias, E., Maillard, N., Navaux, P.O.A.: Online scheduling of MPI-2 programs with hierarchical work stealing. In: Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’07), pp. 247–254. IEEE, Los Alamitos, CA, USA, October 24–27 (2007)Google Scholar
  77. 77.
    Saraswat, V.A., Kambadur, P., Kodali, S., Grove, D., Krishnamoorthy, S.: Lifeline-based global load balancing. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). ACM, New York, NY, USA, February 12–16 (2011)Google Scholar
  78. 78.
    Ravichandran, K., Sangho, L., Pande, S.: Work stealing for multi-core HPC clusters. In: Proceedings of the 17th International European Conference on Parallel Processing (Euro-Par’11), pp. 205–217. Springer, Berlin, Germany, August 29–September 2 (2011)Google Scholar
  79. 79.
    Cao, Y.J., Qian, D.P., Wu, W.G., Dong, X.S.: Adaptive scheduling algorithm based on dynamic core-resource partitions for manycore processor systems. Ruanjian Xuebao J. Softw. 23(02), 240–252 (2012)CrossRefGoogle Scholar
  80. 80.
    Li, Z., Certner, O., Duato, J., Temam, O.: Scalable hardware support for conditional parallelization. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10), pp. 157–168. ACM, New York, NY, USA, September 11–15 (2010)Google Scholar
  81. 81.
    Long, G.P., Zhang, J.C., Fan, D.R.: Architectural support and evaluation of Cilk language on many-core architectures. Chin. J. Comput. 31(11), 1975–1985 (2008)CrossRefGoogle Scholar
  82. 82.
    Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J.D., Lee, E.A., Morgan, N., Necula, G., Patterson, D.A., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.A.: The parallel computing laboratory at U.C. Berkeley: a research agenda based on the Berkeley view, Technical Report No. UCB/EECS-2008-23, Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley (2008)Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.School of Mathematics and StatisticsChongqing Jiaotong UniversityChongqingChina
  2. 2.School of Information Science and EngineeringChongqing Jiaotong UniversityChongqingChina

Personalised recommendations