A New Parallel Research Kernel to Expand Research on Dynamic Load-Balancing Capabilities

Van der Wijngaart, Rob F.; Georganas, Evangelos; Mattson, Timothy G.; Wissink, Andrew

doi:10.1007/978-3-319-58667-0_14

A New Parallel Research Kernel to Expand Research on Dynamic Load-Balancing Capabilities

Rob F. Van der Wijngaart¹⁹,
Evangelos Georganas¹⁹,
Timothy G. Mattson¹⁹ &
…
Andrew Wissink²⁰

Conference paper
First Online: 12 May 2017

2156 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10266))

Abstract

The Parallel Research Kernels (PRK) are a tool to study parallel architectures and runtime systems from an application perspective. They provide paper and pencil specifications and reference implementations of elementary operations covering a broad range of parallel application patterns. Most of the current PRK are trivially statically load-balanced. In a prior study we described a novel PRK that requires dynamic load balancing, and demonstrated its effectiveness to assess automatic dynamic load balancing capabilities of runtimes. While useful, it did not fully represent the problem of greatest interest to researchers of extreme scale computing systems, namely the occurrence of localized, discrete, transient disturbances (system noise). For that purpose we introduce a new PRK, inspired by Adaptive Mesh Refinement (AMR) applications, which provides a proxy for the most detrimental property of noise, namely abrupt and discrete change of local system load. We give a detailed specification of the new PRK, highlighting the challenges and corresponding design choices that make it compact, arbitrarily scalable and self-verifying. We also present an implementation of the AMR PRK in MPI, with application-specific load balancing, as well as one in Adaptive MPI that leverages the MPI version, but adds runtime orchestrated dynamic load balancing, along with a set of performance results. These show that for applications that can be load balanced statically, but experience occasional local changes in computational load, automatic dynamic load balancing typically does not offer an advantage.

This is a preview of subscription content, log in via an institution.

Notes

1.
Bi-linear interpolation is a standard technique in which two 1D linear interpolations are combined to compute interpolations in 2D. See, e.g. [18], Appendix D. Taking advantage of the regular structure of BG and RGs, we can implement it with just three floating point operations per RG point.

References

Barker, K., Chernikov, A., Chrisochoides, N., Pingali, K.: A load balancing framework for adaptive and asynchronous applications. IEEE Trans. Parallel Distrib. Syst. 15(2), 183–192 (2004)
Article Google Scholar
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 66. IEEE Computer Society Press (2012)
Google Scholar
Bell, J., Rendleman, C.: CCSE application suite
Google Scholar
Bell, J.B., Day, M.S., Grcar, J.F., Lijewski, M.J., Driscoll, J.F., Filatyev, S.A.: Numerical simulation of a laboratory-scale turbulent slot flame. Proc. Combust. Inst. 31(1), 1299–1307 (2007)
Article Google Scholar
Berger, M.J., Oliger, J.: Adaptive mesh refinement for hyperbolic partial differential equations. J. Comput. Phys. 53(3), 484–512 (1984)
Article MathSciNet Google Scholar
Bienia, C., Li, K.: Parsec 2.0: a new benchmark suite for chip-multiprocessors. In: Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation (2009)
Google Scholar
Chamberlain, B.L., Choi, S.-E., Deitz, S.J., Navarro, A.: User-defined parallel zippered iterators in chapel. In: Proceedings of Fifth Conference on Partitioned Global Address Space Programming Models, pp. 1–11 (2011)
Google Scholar
Chandra, S., Parashar, M.: ARMaDA: an adaptive application-sensitive partitioning framework for SAMR applications. In: IASTED PDCS, pp. 441–446 (2002)
Google Scholar
Colella, P., Graves, D., Ligocki, T., Martin, D., Modiano, D., Serafini, D., Van Straalen, B.: Chombo software package for AMR applications-design document (2003)
Google Scholar
Devine, K.D., Boman, E.G., Heaphy, R.T., Hendrickson, B.A., Teresco, J.D., Faik, J., Flaherty, J.E., Gervasio, L.G.: New challenges in dynamic load balancing. Appl. Numer. Math. 52(2), 133–152 (2005)
Article MathSciNet Google Scholar
Dinan, J., Krishnamoorthy, S., Larkins, D.B., Nieplocha, J., Sadayappan, P.: Scioto: a framework for global-view task parallelism. In: 37th International Conference on Parallel Processing, ICPP 2008, pp. 586–593. IEEE (2008)
Google Scholar
Garcia, A.L., Bell, J.B., Crutchfield, W.Y., Alder, B.J.: Adaptive mesh and algorithm refinement using direct simulation Monte Carlo. J. Comput. Phys. 154(1), 134–155 (1999)
Article Google Scholar
Georganas, E., Van der Wijngaart, R.F., Mattson, T.G.: Design and implementation of a Parallel Research Kernel for assessing dynamic load-balancing capabilities. In: 2016 IEEE International Parallel and Distributed Processing Symposium, pp. 73–82. IEEE (2016)
Google Scholar
Hornung, R.D., Kohn, S.R.: Managing application complexity in the SAMRAI object-oriented framework. Concurr. Comput.: Pract. Exp. 14(5), 347–368 (2002)
Article Google Scholar
Hornung, R.D., Trangenstein, J.A.: Adaptive mesh refinement and multilevel iteration for flow in porous media. J. Comput. Phys. 136(2), 522–545 (1997)
Article MathSciNet Google Scholar
Hornung, R.D., Wissink, A.M., Kohn, S.R.: Managing complex data and geometry in parallel structured AMR applications. Eng. Comput. 22(3–4), 181–195 (2006)
Article Google Scholar
Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++, vol. 28. ACM (1993)
Google Scholar
Kirkland, E.J.: Linear image approximations. In: Kirkland, E.J. (ed.) Advanced Computing in Electron Microscopy, pp. 19–39. Springer, Heidelberg (1998)
Chapter Google Scholar
Koenig, G., Kale, L.V., et al.: Optimizing distributed application performance using dynamic grid topology-aware load balancing. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS 2007, pp. 1–10. IEEE (2007)
Google Scholar
MacNeice, P., Olson, K.M., Mobarry, C., de Fainchtein, R., Packer, C.: PARAMESH: a parallel adaptive mesh refinement community toolkit. Comput. Phys. Commun. 126(3), 330–354 (2000)
Article Google Scholar
Meng, Q., Berzins, M., Schmidt, J.: Using hybrid parallelism to improve memory use in the Uintah framework. In: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery, p. 24. ACM (2011)
Google Scholar
Nelson, J., Holt, B., Myers, B., Briggs, P., Ceze, L., Kahan, S., Oskin, M.: Latency-tolerant software distributed shared memory. In: 2015 USENIX Annual Technical Conference (USENIX ATC 15), Santa Clara, CA. USENIX Association, July 2015
Google Scholar
Olivier, S., Huan, J., Liu, J., Prins, J., Dinan, J., Sadayappan, P., Tseng, C.-W.: UTS: an unbalanced tree search benchmark. In: Almási, G., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 235–250. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72521-3_18
Chapter Google Scholar
Parashar, M., Browne, J.C.: Systems engineering for high performance computing software: the HDDA/DAGH infrastructure for implementation of parallel structured adaptive mesh. In: Baden, S.B., Chrisochoides, N.P., Gannon, D.B., Norman, M.L. (eds.) Structured Adaptive Mesh Refinement (SAMR). IMA, vol. 117, pp. 1–18. Springer, New York (2000). doi:10.1007/978-1-4612-1252-2_1
Chapter MATH Google Scholar
Paudel, J., Amaral, J.N.: Hybrid parallel task placement in irregular applications. J. Parallel Distrib. Comput. 76, 94–105 (2015)
Article Google Scholar
Sohn, A., Biswas, R., Simon, H.D.: A dynamic load balancing framework for unstructured adaptive computations on distributed-memory multiprocessors. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 189–192. ACM (1996)
Google Scholar
Tabbal, A., Anderson, M., Brodowicz, M., Kaiser, H., Sterling, T.: Preliminary design examination of the parallex system from a software and hardware perspective. ACM SIGMETRICS Perform. Eval. Rev. 38(4), 81–87 (2011)
Article Google Scholar
Van der Wijngaart, R.F., et al.: Comparing runtime systems with exascale ambitions using the parallel research kernels. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) ISC High Performance 2016. LNCS, vol. 9697, pp. 321–339. Springer, Cham (2016). doi:10.1007/978-3-319-41321-1_17
Chapter Google Scholar
Van der Wijngaart, R.F., Mattson, T.G.: The parallel research kernels: a tool for architecture and programming system investigation. In: Proceedings of the IEEE High Performance Extreme Computing Conference, HPEC 2014. IEEE Computer Society (2014)
Google Scholar
Wissink, A., Kosovic, B., Berger, M., Chand, K., Chow, F.K.: Adaptive Cartesian methods for modeling airborne dispersion. Adv. Comput. Infrast. Parallel Distrib. Adapt. Appl. 79 (2010)
Google Scholar
Wissink, A.M., Hornung, R.D., Kohn, S.R., Smith, S.S., Elliott, N.: Large scale parallel structured AMR calculations using the SAMRAI framework. In: ACM/IEEE 2001 Conference on Supercomputing, p. 22. IEEE (2001)
Google Scholar
Wissink, A.M., Potsdam, M., Sankaran, V., Sitaraman, J., Mavriplis, D.: A dual-mesh unstructured adaptive cartesian computational fluid dynamics approach for hover prediction. J. Am. Helicopter Soc. 61(1), 1–19 (2016)
Article Google Scholar
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The splash-2 programs: characterization and methodological considerations. In: ACM SIGARCH Computer Architecture News, vol. 23, pp. 24–36. ACM (1995)
Google Scholar

Download references

Acknowledgement

This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

\(^\star \)Other names and brands may be claimed as property of others.

Intel and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance.

Author information

Authors and Affiliations

Parallel Computing Lab, Intel Corp, Santa Clara, CA, USA
Rob F. Van der Wijngaart, Evangelos Georganas & Timothy G. Mattson
US Army Aviation Development Directorate - AFDD (AMRDEC), Moffett Field, Sunnyvale, CA, USA
Andrew Wissink

Authors

Rob F. Van der Wijngaart
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Georganas
View author publications
You can also search for this author in PubMed Google Scholar
Timothy G. Mattson
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Wissink
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rob F. Van der Wijngaart .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Julian M. Kunkel
Tokyo Institute of Technology, Tokyo, Japan
Rio Yokota
Argonne National Laboratory, Argonne, IL, USA
Pavan Balaji
KAUST, Thuwal, Saudi Arabia
David Keyes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Van der Wijngaart, R.F., Georganas, E., Mattson, T.G., Wissink, A. (2017). A New Parallel Research Kernel to Expand Research on Dynamic Load-Balancing Capabilities. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10266. Springer, Cham. https://doi.org/10.1007/978-3-319-58667-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-58667-0_14
Published: 12 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58666-3
Online ISBN: 978-3-319-58667-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics