Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale

Deakin, Tom; McIntosh-Smith, Simon; Gaudin, Wayne

doi:10.1007/978-3-319-41321-1_22

Tom Deakin¹⁶,
Simon McIntosh-Smith¹⁶ &
Wayne Gaudin¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9697))

Included in the following conference series:

International Conference on High Performance Computing

2640 Accesses
8 Citations

Abstract

Time-dependent deterministic discrete ordinates transport codes are an important class of application which provide significant challenges for large, many-core systems. One such challenge is the large memory capacity needed by the solve step, which requires us to have a scalable solution in order to have enough node-level memory to store all the data. In our previous work, we demonstrated the first implementation which showed a significant performance benefit for single node solves using GPUs. In this paper we extend our work to large problems and demonstrate the scalability of our solution on two Petascale GPU-based supercomputers: Titan at Oak Ridge and Piz Daint at CSCS. Our results show that our improved node-level parallelism scheme scales just as well across large systems as previous approaches when using the tried and tested KBA domain decomposition technique. We validate our results against an improved performance model which predicts the runtime of the main ‘sweep’ routine when running on different hardware, including CPUs or GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adams, M.P., Adams, M.L., Hawkins, W.D., Smith, T., Rauchwerger, L., Amato, N.M., Bailey, T.S., Falgout, R.D.: Provably optimal parallel transport sweeps on regular grids. In: International Conference on Mathematics, Computational Methods and Reactor Physics, pp. 2535–2553 (2013)
Google Scholar
Adams, M.P., Adams, M.L., Mcgraw, C.N., Till, A.T., Bailey, T.S.: Provably optimal parallel transport sweeps with non-contiguous partitions. In: Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method, pp. 1–19. No. ANS MC2015, American Nuclear Society, Nashville, Tennessee (2015)
Google Scholar
Bailey, T.S., Falgout, R.D.: Analysis of massively parallel discrete-ordinates transport sweep algorithms with collisions. In: International Conference on Mathematics, Computational Methods, and Reactor Physics, pp. 1–15. American Nuclear Society, New York, USA (2009)
Google Scholar
Baker, C., Davidson, G., Evans, T.M., Hamilton, S., Jarrell, J., Joubert, W.: High performance radiation transport simulations: preparing for TITAN. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–10 (2012)
Google Scholar
Baker, R., Koch, K.: An Sn algorithm for the massively parallel CM-200 computer. Nucl. Sci. Eng. 128, 312–320 (1998)
Article Google Scholar
Baker, R.S.: An Sn algorithm for modern architectures. In: Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method. No. ANS MC2015, American Nuclear Society, Nashville, TN (2015)
Google Scholar
Baker, R., McGhee, J., Koch, K., Morel, J.: Two Sn algorithms for the massively parallel CM-200 computer. Submitted to Nuclear Science and Engineering (1996)
Google Scholar
Deakin, T., McIntosh-Smith, S.: GPU-STREAM: benchmarking the achievable memory bandwidth of Graphics Processing Units (poster). In: Supercomputing, Austin, Texas (2015)
Google Scholar
Deakin, T., McIntosh-Smith, S., Gaudin, W.: Expressing parallelism on many-core for deterministic discrete ordinates transport. In: Workshop on Representative Applications at IEEE Cluster, Chicago (2015)
Google Scholar
Deakin, T., McIntosh-Smith, S., Martineau, M., Gaudin, W.: An improved parallelism scheme for deterministic discrete ordinates transport. Int. J. High Perform. Comput. Appl. (Spec. Issue) (2015, in press)
Google Scholar
Evans, T.M., Joubert, W., Hamilton, S.P., Johnson, S.R., Turner, J.A., Davidson, G.G., Pandya, T.M.: Three-Dimensional Discrete Ordinates Reactor Assembly Calculations on GPUs. In: Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method. No. ANS MC2015, American Nuclear Society, Nashville, Tennessee (2015)
Google Scholar
Evans, T.M., Stafford, A.S., Slaybaugh, R.N., Clarno, K.T.: Denovo: a new three-dimensional parallel discrete ordinates code in SCALE. Nucl. Technol. 171, 171–200 (2010)
Google Scholar
Freed, J., Gupta, S., Tiwari, D.: An analysis of network congestion in the Titan supercomputers interconnect (poster). In: Supercomuting, pp. 1–2 (2015)
Google Scholar
Hoisie, A., Lubeck, O., Wasserman, H.: Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications (2000)
Google Scholar
Koch, K., Baker, R., Alcouffe, R.: Solution of the first-order form of three-dimensional discrete ordinates equations on a massively parallel machine. Trans. Am. Nucl. Soc. 65, 198–199 (1992)
Google Scholar
Lewis, E., Miller, W.J.: Computational Methods of Neutron Transport. American Nuclear Society, La Grange Park (1993)
MATH Google Scholar
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995
Google Scholar
Munshi, A.: The OpenCL Specification, Version 1.1 (2011)
Google Scholar
Pedretti, K., Vaughan, C., Barrett, R., Devine, K., Hemmert, K.S.: Using the Cray Gemini performance counters
Google Scholar
Pennycook, S.J., Hammond, S.D., Mudalige, G.R., Wright, S.A., Jarvis, S.A.: On the acceleration of wavefront applications using distributed many-core architectures. Comput. J. 55(2), 138–153 (2012)
Article Google Scholar
Rabenseifner, R., Schulz, G.: B_eff v3.6. https://fs.hlrs.de/projects/par/mpi/b_eff/
Strohmaier, E., Simon, H., Dongarra, J., Meuer, M.: Top 500, November 2015. http://www.top500.org
Villa, O., Johnson, D.R., OConnor, M., Bolotin, E., Nellans, D., Luitjens, J., Sakharnykh, N., Wang, P., Micikevicius, P., Scudiero, A., Keckler, S.W., Dally, W.J.: Scaling the power wall: a path to exascale. In: Supercomputing (2014)
Google Scholar
Zerr, R.J., Baker, R.S.: SNAP: SN (discrete ordinates) application proxy - proxy description. Technical report, LA-UR-13-21070, Los Alamos National Labratory (2013)
Google Scholar

Download references

Acknowledgements

This work has been financially supported by A.W.E. The authors would like to thank the University of Bristol High Performance Computing Group and Intel Parallel Computing Center; and Maria Grazia Giuffreda of CSCS at the Swiss National Supercomputing Centre for access to Piz Daint. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

Department of Computer Science, University of Bristol, Bristol, UK
Tom Deakin & Simon McIntosh-Smith
High Performance Computing, UK Atomic Weapons Establishment, Aldermaston, UK
Wayne Gaudin

Authors

Tom Deakin
View author publications
You can also search for this author in PubMed Google Scholar
Simon McIntosh-Smith
View author publications
You can also search for this author in PubMed Google Scholar
Wayne Gaudin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom Deakin .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum, Hamburg, Germany
Julian M. Kunkel
Argonne National Laboratory, Lemont, Illinois, USA
Pavan Balaji
University of Tennessee, Knoxville, Tennessee, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deakin, T., McIntosh-Smith, S., Gaudin, W. (2016). Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale. In: Kunkel, J., Balaji, P., Dongarra, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9697. Springer, Cham. https://doi.org/10.1007/978-3-319-41321-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-41321-1_22
Published: 15 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41320-4
Online ISBN: 978-3-319-41321-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics