Abstract
Time-dependent deterministic discrete ordinates transport codes are an important class of application which provide significant challenges for large, many-core systems. One such challenge is the large memory capacity needed by the solve step, which requires us to have a scalable solution in order to have enough node-level memory to store all the data. In our previous work, we demonstrated the first implementation which showed a significant performance benefit for single node solves using GPUs. In this paper we extend our work to large problems and demonstrate the scalability of our solution on two Petascale GPU-based supercomputers: Titan at Oak Ridge and Piz Daint at CSCS. Our results show that our improved node-level parallelism scheme scales just as well across large systems as previous approaches when using the tried and tested KBA domain decomposition technique. We validate our results against an improved performance model which predicts the runtime of the main ‘sweep’ routine when running on different hardware, including CPUs or GPUs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adams, M.P., Adams, M.L., Hawkins, W.D., Smith, T., Rauchwerger, L., Amato, N.M., Bailey, T.S., Falgout, R.D.: Provably optimal parallel transport sweeps on regular grids. In: International Conference on Mathematics, Computational Methods and Reactor Physics, pp. 2535–2553 (2013)
Adams, M.P., Adams, M.L., Mcgraw, C.N., Till, A.T., Bailey, T.S.: Provably optimal parallel transport sweeps with non-contiguous partitions. In: Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method, pp. 1–19. No. ANS MC2015, American Nuclear Society, Nashville, Tennessee (2015)
Bailey, T.S., Falgout, R.D.: Analysis of massively parallel discrete-ordinates transport sweep algorithms with collisions. In: International Conference on Mathematics, Computational Methods, and Reactor Physics, pp. 1–15. American Nuclear Society, New York, USA (2009)
Baker, C., Davidson, G., Evans, T.M., Hamilton, S., Jarrell, J., Joubert, W.: High performance radiation transport simulations: preparing for TITAN. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–10 (2012)
Baker, R., Koch, K.: An Sn algorithm for the massively parallel CM-200 computer. Nucl. Sci. Eng. 128, 312–320 (1998)
Baker, R.S.: An Sn algorithm for modern architectures. In: Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method. No. ANS MC2015, American Nuclear Society, Nashville, TN (2015)
Baker, R., McGhee, J., Koch, K., Morel, J.: Two Sn algorithms for the massively parallel CM-200 computer. Submitted to Nuclear Science and Engineering (1996)
Deakin, T., McIntosh-Smith, S.: GPU-STREAM: benchmarking the achievable memory bandwidth of Graphics Processing Units (poster). In: Supercomputing, Austin, Texas (2015)
Deakin, T., McIntosh-Smith, S., Gaudin, W.: Expressing parallelism on many-core for deterministic discrete ordinates transport. In: Workshop on Representative Applications at IEEE Cluster, Chicago (2015)
Deakin, T., McIntosh-Smith, S., Martineau, M., Gaudin, W.: An improved parallelism scheme for deterministic discrete ordinates transport. Int. J. High Perform. Comput. Appl. (Spec. Issue) (2015, in press)
Evans, T.M., Joubert, W., Hamilton, S.P., Johnson, S.R., Turner, J.A., Davidson, G.G., Pandya, T.M.: Three-Dimensional Discrete Ordinates Reactor Assembly Calculations on GPUs. In: Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method. No. ANS MC2015, American Nuclear Society, Nashville, Tennessee (2015)
Evans, T.M., Stafford, A.S., Slaybaugh, R.N., Clarno, K.T.: Denovo: a new three-dimensional parallel discrete ordinates code in SCALE. Nucl. Technol. 171, 171–200 (2010)
Freed, J., Gupta, S., Tiwari, D.: An analysis of network congestion in the Titan supercomputers interconnect (poster). In: Supercomuting, pp. 1–2 (2015)
Hoisie, A., Lubeck, O., Wasserman, H.: Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications (2000)
Koch, K., Baker, R., Alcouffe, R.: Solution of the first-order form of three-dimensional discrete ordinates equations on a massively parallel machine. Trans. Am. Nucl. Soc. 65, 198–199 (1992)
Lewis, E., Miller, W.J.: Computational Methods of Neutron Transport. American Nuclear Society, La Grange Park (1993)
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995
Munshi, A.: The OpenCL Specification, Version 1.1 (2011)
Pedretti, K., Vaughan, C., Barrett, R., Devine, K., Hemmert, K.S.: Using the Cray Gemini performance counters
Pennycook, S.J., Hammond, S.D., Mudalige, G.R., Wright, S.A., Jarvis, S.A.: On the acceleration of wavefront applications using distributed many-core architectures. Comput. J. 55(2), 138–153 (2012)
Rabenseifner, R., Schulz, G.: B_eff v3.6. https://fs.hlrs.de/projects/par/mpi/b_eff/
Strohmaier, E., Simon, H., Dongarra, J., Meuer, M.: Top 500, November 2015. http://www.top500.org
Villa, O., Johnson, D.R., OConnor, M., Bolotin, E., Nellans, D., Luitjens, J., Sakharnykh, N., Wang, P., Micikevicius, P., Scudiero, A., Keckler, S.W., Dally, W.J.: Scaling the power wall: a path to exascale. In: Supercomputing (2014)
Zerr, R.J., Baker, R.S.: SNAP: SN (discrete ordinates) application proxy - proxy description. Technical report, LA-UR-13-21070, Los Alamos National Labratory (2013)
Acknowledgements
This work has been financially supported by A.W.E. The authors would like to thank the University of Bristol High Performance Computing Group and Intel Parallel Computing Center; and Maria Grazia Giuffreda of CSCS at the Swiss National Supercomputing Centre for access to Piz Daint. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Deakin, T., McIntosh-Smith, S., Gaudin, W. (2016). Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale. In: Kunkel, J., Balaji, P., Dongarra, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9697. Springer, Cham. https://doi.org/10.1007/978-3-319-41321-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-41321-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41320-4
Online ISBN: 978-3-319-41321-1
eBook Packages: Computer ScienceComputer Science (R0)