Workstealing and Nested Parallelism in SMP Systems

  • Larry MeadowsEmail author
  • Simon J. Pennycook
  • Alex Duran
  • Terry Wilmarth
  • Jim Cownie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9903)


We present a workstealing scheduler and show its use in two separate areas: (1) to enable hierarchical parallelism and per-core load balancing in stencil codes, and (2) to reduce overhead in per-thread load balancing in particle codes.


Stencil Nested parallelism Runtime support 


  1. 1.
    Andreolli, C.: Eight Optimizations for 3-Dimensional Finite Difference (3DFD) Code with an Isotropic (ISO). Accessed 21 Oct 2014
  2. 2.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. CACM 52(4), 65 (2009)CrossRefGoogle Scholar
  3. 3.
    Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High-Performance Programming. Morgan Kauffman, Boston (2013)Google Scholar
  4. 4.
    Dempsey, J.: Plesiochronous phasing barriers. In: Jeffers, J., Reinders, J. (eds.) High Performance Parallelism Pearls, pp. 87–115. Morgan Kauffman, Boston (2015)CrossRefGoogle Scholar
  5. 5.
    Briggs, J., et al.: Separable projection integrals for higher-order correlators of the cosmic microwave sky: acceleration by factors exceeding 100, Cornell University Library.
  6. 6.
    Meadows, L., Kim, J., Wells, A.: Parallelization methods for hierarchical SMP systems. In: Terboven, C., et al. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 247–259. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24595-9_18 CrossRefGoogle Scholar
  7. 7.
    McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995Google Scholar
  8. 8.
    Sbalzarini, I.F., Walther, J.H., Bergdorf, M., Hieber, S.E., Kotsalis, E.M., Koumoutsakos, P.: PPM a highly efficient parallel particlemesh library for the simulation of continuum systems. J. Comput. Phys. 215(2), 566 (2006)CrossRefzbMATHGoogle Scholar
  9. 9.
    Madduri, K., Im, E.-J., Ibrahim, K.Z., Williams, S., Ethier, S., Oliker, L.: Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms. Parallel Comput. 37(9), 501 (2011)MathSciNetGoogle Scholar
  10. 10.
    Schweizer, H., Besta, M., Hoefler, T.: Evaluating the cost of atomic operations on modern architectures. In: Proceedings of Parallel Architectures and Compilation (2015)Google Scholar
  11. 11.
    Dureau, D., Poëtte, G.: Hybrid parallel programming models for AMR neutron Monte-Carlo transport. In: Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Larry Meadows
    • 1
    Email author
  • Simon J. Pennycook
    • 1
  • Alex Duran
    • 1
  • Terry Wilmarth
    • 1
  • Jim Cownie
    • 1
  1. 1.Intel CorporationHillsboroUSA

Personalised recommendations