Skip to main content

A New Parallel Research Kernel to Expand Research on Dynamic Load-Balancing Capabilities

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10266))

Abstract

The Parallel Research Kernels (PRK) are a tool to study parallel architectures and runtime systems from an application perspective. They provide paper and pencil specifications and reference implementations of elementary operations covering a broad range of parallel application patterns. Most of the current PRK are trivially statically load-balanced. In a prior study we described a novel PRK that requires dynamic load balancing, and demonstrated its effectiveness to assess automatic dynamic load balancing capabilities of runtimes. While useful, it did not fully represent the problem of greatest interest to researchers of extreme scale computing systems, namely the occurrence of localized, discrete, transient disturbances (system noise). For that purpose we introduce a new PRK, inspired by Adaptive Mesh Refinement (AMR) applications, which provides a proxy for the most detrimental property of noise, namely abrupt and discrete change of local system load. We give a detailed specification of the new PRK, highlighting the challenges and corresponding design choices that make it compact, arbitrarily scalable and self-verifying. We also present an implementation of the AMR PRK in MPI, with application-specific load balancing, as well as one in Adaptive MPI that leverages the MPI version, but adds runtime orchestrated dynamic load balancing, along with a set of performance results. These show that for applications that can be load balanced statically, but experience occasional local changes in computational load, automatic dynamic load balancing typically does not offer an advantage.

This is a preview of subscription content, log in via an institution.

Notes

  1. 1.

    Bi-linear interpolation is a standard technique in which two 1D linear interpolations are combined to compute interpolations in 2D. See, e.g. [18], Appendix D. Taking advantage of the regular structure of BG and RGs, we can implement it with just three floating point operations per RG point.

References

  1. Barker, K., Chernikov, A., Chrisochoides, N., Pingali, K.: A load balancing framework for adaptive and asynchronous applications. IEEE Trans. Parallel Distrib. Syst. 15(2), 183–192 (2004)

    Article  Google Scholar 

  2. Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 66. IEEE Computer Society Press (2012)

    Google Scholar 

  3. Bell, J., Rendleman, C.: CCSE application suite

    Google Scholar 

  4. Bell, J.B., Day, M.S., Grcar, J.F., Lijewski, M.J., Driscoll, J.F., Filatyev, S.A.: Numerical simulation of a laboratory-scale turbulent slot flame. Proc. Combust. Inst. 31(1), 1299–1307 (2007)

    Article  Google Scholar 

  5. Berger, M.J., Oliger, J.: Adaptive mesh refinement for hyperbolic partial differential equations. J. Comput. Phys. 53(3), 484–512 (1984)

    Article  MathSciNet  Google Scholar 

  6. Bienia, C., Li, K.: Parsec 2.0: a new benchmark suite for chip-multiprocessors. In: Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation (2009)

    Google Scholar 

  7. Chamberlain, B.L., Choi, S.-E., Deitz, S.J., Navarro, A.: User-defined parallel zippered iterators in chapel. In: Proceedings of Fifth Conference on Partitioned Global Address Space Programming Models, pp. 1–11 (2011)

    Google Scholar 

  8. Chandra, S., Parashar, M.: ARMaDA: an adaptive application-sensitive partitioning framework for SAMR applications. In: IASTED PDCS, pp. 441–446 (2002)

    Google Scholar 

  9. Colella, P., Graves, D., Ligocki, T., Martin, D., Modiano, D., Serafini, D., Van Straalen, B.: Chombo software package for AMR applications-design document (2003)

    Google Scholar 

  10. Devine, K.D., Boman, E.G., Heaphy, R.T., Hendrickson, B.A., Teresco, J.D., Faik, J., Flaherty, J.E., Gervasio, L.G.: New challenges in dynamic load balancing. Appl. Numer. Math. 52(2), 133–152 (2005)

    Article  MathSciNet  Google Scholar 

  11. Dinan, J., Krishnamoorthy, S., Larkins, D.B., Nieplocha, J., Sadayappan, P.: Scioto: a framework for global-view task parallelism. In: 37th International Conference on Parallel Processing, ICPP 2008, pp. 586–593. IEEE (2008)

    Google Scholar 

  12. Garcia, A.L., Bell, J.B., Crutchfield, W.Y., Alder, B.J.: Adaptive mesh and algorithm refinement using direct simulation Monte Carlo. J. Comput. Phys. 154(1), 134–155 (1999)

    Article  Google Scholar 

  13. Georganas, E., Van der Wijngaart, R.F., Mattson, T.G.: Design and implementation of a Parallel Research Kernel for assessing dynamic load-balancing capabilities. In: 2016 IEEE International Parallel and Distributed Processing Symposium, pp. 73–82. IEEE (2016)

    Google Scholar 

  14. Hornung, R.D., Kohn, S.R.: Managing application complexity in the SAMRAI object-oriented framework. Concurr. Comput.: Pract. Exp. 14(5), 347–368 (2002)

    Article  Google Scholar 

  15. Hornung, R.D., Trangenstein, J.A.: Adaptive mesh refinement and multilevel iteration for flow in porous media. J. Comput. Phys. 136(2), 522–545 (1997)

    Article  MathSciNet  Google Scholar 

  16. Hornung, R.D., Wissink, A.M., Kohn, S.R.: Managing complex data and geometry in parallel structured AMR applications. Eng. Comput. 22(3–4), 181–195 (2006)

    Article  Google Scholar 

  17. Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++, vol. 28. ACM (1993)

    Google Scholar 

  18. Kirkland, E.J.: Linear image approximations. In: Kirkland, E.J. (ed.) Advanced Computing in Electron Microscopy, pp. 19–39. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  19. Koenig, G., Kale, L.V., et al.: Optimizing distributed application performance using dynamic grid topology-aware load balancing. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS 2007, pp. 1–10. IEEE (2007)

    Google Scholar 

  20. MacNeice, P., Olson, K.M., Mobarry, C., de Fainchtein, R., Packer, C.: PARAMESH: a parallel adaptive mesh refinement community toolkit. Comput. Phys. Commun. 126(3), 330–354 (2000)

    Article  Google Scholar 

  21. Meng, Q., Berzins, M., Schmidt, J.: Using hybrid parallelism to improve memory use in the Uintah framework. In: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery, p. 24. ACM (2011)

    Google Scholar 

  22. Nelson, J., Holt, B., Myers, B., Briggs, P., Ceze, L., Kahan, S., Oskin, M.: Latency-tolerant software distributed shared memory. In: 2015 USENIX Annual Technical Conference (USENIX ATC 15), Santa Clara, CA. USENIX Association, July 2015

    Google Scholar 

  23. Olivier, S., Huan, J., Liu, J., Prins, J., Dinan, J., Sadayappan, P., Tseng, C.-W.: UTS: an unbalanced tree search benchmark. In: Almási, G., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 235–250. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72521-3_18

    Chapter  Google Scholar 

  24. Parashar, M., Browne, J.C.: Systems engineering for high performance computing software: the HDDA/DAGH infrastructure for implementation of parallel structured adaptive mesh. In: Baden, S.B., Chrisochoides, N.P., Gannon, D.B., Norman, M.L. (eds.) Structured Adaptive Mesh Refinement (SAMR). IMA, vol. 117, pp. 1–18. Springer, New York (2000). doi:10.1007/978-1-4612-1252-2_1

    Chapter  MATH  Google Scholar 

  25. Paudel, J., Amaral, J.N.: Hybrid parallel task placement in irregular applications. J. Parallel Distrib. Comput. 76, 94–105 (2015)

    Article  Google Scholar 

  26. Sohn, A., Biswas, R., Simon, H.D.: A dynamic load balancing framework for unstructured adaptive computations on distributed-memory multiprocessors. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 189–192. ACM (1996)

    Google Scholar 

  27. Tabbal, A., Anderson, M., Brodowicz, M., Kaiser, H., Sterling, T.: Preliminary design examination of the parallex system from a software and hardware perspective. ACM SIGMETRICS Perform. Eval. Rev. 38(4), 81–87 (2011)

    Article  Google Scholar 

  28. Van der Wijngaart, R.F., et al.: Comparing runtime systems with exascale ambitions using the parallel research kernels. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) ISC High Performance 2016. LNCS, vol. 9697, pp. 321–339. Springer, Cham (2016). doi:10.1007/978-3-319-41321-1_17

    Chapter  Google Scholar 

  29. Van der Wijngaart, R.F., Mattson, T.G.: The parallel research kernels: a tool for architecture and programming system investigation. In: Proceedings of the IEEE High Performance Extreme Computing Conference, HPEC 2014. IEEE Computer Society (2014)

    Google Scholar 

  30. Wissink, A., Kosovic, B., Berger, M., Chand, K., Chow, F.K.: Adaptive Cartesian methods for modeling airborne dispersion. Adv. Comput. Infrast. Parallel Distrib. Adapt. Appl. 79 (2010)

    Google Scholar 

  31. Wissink, A.M., Hornung, R.D., Kohn, S.R., Smith, S.S., Elliott, N.: Large scale parallel structured AMR calculations using the SAMRAI framework. In: ACM/IEEE 2001 Conference on Supercomputing, p. 22. IEEE (2001)

    Google Scholar 

  32. Wissink, A.M., Potsdam, M., Sankaran, V., Sitaraman, J., Mavriplis, D.: A dual-mesh unstructured adaptive cartesian computational fluid dynamics approach for hover prediction. J. Am. Helicopter Soc. 61(1), 1–19 (2016)

    Article  Google Scholar 

  33. Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The splash-2 programs: characterization and methodological considerations. In: ACM SIGARCH Computer Architecture News, vol. 23, pp. 24–36. ACM (1995)

    Google Scholar 

Download references

Acknowledgement

This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

\(^\star \)Other names and brands may be claimed as property of others.

Intel and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rob F. Van der Wijngaart .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Van der Wijngaart, R.F., Georganas, E., Mattson, T.G., Wissink, A. (2017). A New Parallel Research Kernel to Expand Research on Dynamic Load-Balancing Capabilities. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10266. Springer, Cham. https://doi.org/10.1007/978-3-319-58667-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58667-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58666-3

  • Online ISBN: 978-3-319-58667-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics