Abstract
The Parallel Research Kernels (PRK) are a tool to study parallel architectures and runtime systems from an application perspective. They provide paper and pencil specifications and reference implementations of elementary operations covering a broad range of parallel application patterns. Most of the current PRK are trivially statically load-balanced. In a prior study we described a novel PRK that requires dynamic load balancing, and demonstrated its effectiveness to assess automatic dynamic load balancing capabilities of runtimes. While useful, it did not fully represent the problem of greatest interest to researchers of extreme scale computing systems, namely the occurrence of localized, discrete, transient disturbances (system noise). For that purpose we introduce a new PRK, inspired by Adaptive Mesh Refinement (AMR) applications, which provides a proxy for the most detrimental property of noise, namely abrupt and discrete change of local system load. We give a detailed specification of the new PRK, highlighting the challenges and corresponding design choices that make it compact, arbitrarily scalable and self-verifying. We also present an implementation of the AMR PRK in MPI, with application-specific load balancing, as well as one in Adaptive MPI that leverages the MPI version, but adds runtime orchestrated dynamic load balancing, along with a set of performance results. These show that for applications that can be load balanced statically, but experience occasional local changes in computational load, automatic dynamic load balancing typically does not offer an advantage.
This is a preview of subscription content, log in via an institution.
Notes
- 1.
Bi-linear interpolation is a standard technique in which two 1D linear interpolations are combined to compute interpolations in 2D. See, e.g. [18], Appendix D. Taking advantage of the regular structure of BG and RGs, we can implement it with just three floating point operations per RG point.
References
Barker, K., Chernikov, A., Chrisochoides, N., Pingali, K.: A load balancing framework for adaptive and asynchronous applications. IEEE Trans. Parallel Distrib. Syst. 15(2), 183–192 (2004)
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 66. IEEE Computer Society Press (2012)
Bell, J., Rendleman, C.: CCSE application suite
Bell, J.B., Day, M.S., Grcar, J.F., Lijewski, M.J., Driscoll, J.F., Filatyev, S.A.: Numerical simulation of a laboratory-scale turbulent slot flame. Proc. Combust. Inst. 31(1), 1299–1307 (2007)
Berger, M.J., Oliger, J.: Adaptive mesh refinement for hyperbolic partial differential equations. J. Comput. Phys. 53(3), 484–512 (1984)
Bienia, C., Li, K.: Parsec 2.0: a new benchmark suite for chip-multiprocessors. In: Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation (2009)
Chamberlain, B.L., Choi, S.-E., Deitz, S.J., Navarro, A.: User-defined parallel zippered iterators in chapel. In: Proceedings of Fifth Conference on Partitioned Global Address Space Programming Models, pp. 1–11 (2011)
Chandra, S., Parashar, M.: ARMaDA: an adaptive application-sensitive partitioning framework for SAMR applications. In: IASTED PDCS, pp. 441–446 (2002)
Colella, P., Graves, D., Ligocki, T., Martin, D., Modiano, D., Serafini, D., Van Straalen, B.: Chombo software package for AMR applications-design document (2003)
Devine, K.D., Boman, E.G., Heaphy, R.T., Hendrickson, B.A., Teresco, J.D., Faik, J., Flaherty, J.E., Gervasio, L.G.: New challenges in dynamic load balancing. Appl. Numer. Math. 52(2), 133–152 (2005)
Dinan, J., Krishnamoorthy, S., Larkins, D.B., Nieplocha, J., Sadayappan, P.: Scioto: a framework for global-view task parallelism. In: 37th International Conference on Parallel Processing, ICPP 2008, pp. 586–593. IEEE (2008)
Garcia, A.L., Bell, J.B., Crutchfield, W.Y., Alder, B.J.: Adaptive mesh and algorithm refinement using direct simulation Monte Carlo. J. Comput. Phys. 154(1), 134–155 (1999)
Georganas, E., Van der Wijngaart, R.F., Mattson, T.G.: Design and implementation of a Parallel Research Kernel for assessing dynamic load-balancing capabilities. In: 2016 IEEE International Parallel and Distributed Processing Symposium, pp. 73–82. IEEE (2016)
Hornung, R.D., Kohn, S.R.: Managing application complexity in the SAMRAI object-oriented framework. Concurr. Comput.: Pract. Exp. 14(5), 347–368 (2002)
Hornung, R.D., Trangenstein, J.A.: Adaptive mesh refinement and multilevel iteration for flow in porous media. J. Comput. Phys. 136(2), 522–545 (1997)
Hornung, R.D., Wissink, A.M., Kohn, S.R.: Managing complex data and geometry in parallel structured AMR applications. Eng. Comput. 22(3–4), 181–195 (2006)
Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++, vol. 28. ACM (1993)
Kirkland, E.J.: Linear image approximations. In: Kirkland, E.J. (ed.) Advanced Computing in Electron Microscopy, pp. 19–39. Springer, Heidelberg (1998)
Koenig, G., Kale, L.V., et al.: Optimizing distributed application performance using dynamic grid topology-aware load balancing. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS 2007, pp. 1–10. IEEE (2007)
MacNeice, P., Olson, K.M., Mobarry, C., de Fainchtein, R., Packer, C.: PARAMESH: a parallel adaptive mesh refinement community toolkit. Comput. Phys. Commun. 126(3), 330–354 (2000)
Meng, Q., Berzins, M., Schmidt, J.: Using hybrid parallelism to improve memory use in the Uintah framework. In: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery, p. 24. ACM (2011)
Nelson, J., Holt, B., Myers, B., Briggs, P., Ceze, L., Kahan, S., Oskin, M.: Latency-tolerant software distributed shared memory. In: 2015 USENIX Annual Technical Conference (USENIX ATC 15), Santa Clara, CA. USENIX Association, July 2015
Olivier, S., Huan, J., Liu, J., Prins, J., Dinan, J., Sadayappan, P., Tseng, C.-W.: UTS: an unbalanced tree search benchmark. In: Almási, G., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 235–250. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72521-3_18
Parashar, M., Browne, J.C.: Systems engineering for high performance computing software: the HDDA/DAGH infrastructure for implementation of parallel structured adaptive mesh. In: Baden, S.B., Chrisochoides, N.P., Gannon, D.B., Norman, M.L. (eds.) Structured Adaptive Mesh Refinement (SAMR). IMA, vol. 117, pp. 1–18. Springer, New York (2000). doi:10.1007/978-1-4612-1252-2_1
Paudel, J., Amaral, J.N.: Hybrid parallel task placement in irregular applications. J. Parallel Distrib. Comput. 76, 94–105 (2015)
Sohn, A., Biswas, R., Simon, H.D.: A dynamic load balancing framework for unstructured adaptive computations on distributed-memory multiprocessors. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 189–192. ACM (1996)
Tabbal, A., Anderson, M., Brodowicz, M., Kaiser, H., Sterling, T.: Preliminary design examination of the parallex system from a software and hardware perspective. ACM SIGMETRICS Perform. Eval. Rev. 38(4), 81–87 (2011)
Van der Wijngaart, R.F., et al.: Comparing runtime systems with exascale ambitions using the parallel research kernels. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) ISC High Performance 2016. LNCS, vol. 9697, pp. 321–339. Springer, Cham (2016). doi:10.1007/978-3-319-41321-1_17
Van der Wijngaart, R.F., Mattson, T.G.: The parallel research kernels: a tool for architecture and programming system investigation. In: Proceedings of the IEEE High Performance Extreme Computing Conference, HPEC 2014. IEEE Computer Society (2014)
Wissink, A., Kosovic, B., Berger, M., Chand, K., Chow, F.K.: Adaptive Cartesian methods for modeling airborne dispersion. Adv. Comput. Infrast. Parallel Distrib. Adapt. Appl. 79 (2010)
Wissink, A.M., Hornung, R.D., Kohn, S.R., Smith, S.S., Elliott, N.: Large scale parallel structured AMR calculations using the SAMRAI framework. In: ACM/IEEE 2001 Conference on Supercomputing, p. 22. IEEE (2001)
Wissink, A.M., Potsdam, M., Sankaran, V., Sitaraman, J., Mavriplis, D.: A dual-mesh unstructured adaptive cartesian computational fluid dynamics approach for hover prediction. J. Am. Helicopter Soc. 61(1), 1–19 (2016)
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The splash-2 programs: characterization and methodological considerations. In: ACM SIGARCH Computer Architecture News, vol. 23, pp. 24–36. ACM (1995)
Acknowledgement
This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
\(^\star \)Other names and brands may be claimed as property of others.
Intel and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Van der Wijngaart, R.F., Georganas, E., Mattson, T.G., Wissink, A. (2017). A New Parallel Research Kernel to Expand Research on Dynamic Load-Balancing Capabilities. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10266. Springer, Cham. https://doi.org/10.1007/978-3-319-58667-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-58667-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58666-3
Online ISBN: 978-3-319-58667-0
eBook Packages: Computer ScienceComputer Science (R0)