Skip to main content

Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9697))

Included in the following conference series:

Abstract

Time-dependent deterministic discrete ordinates transport codes are an important class of application which provide significant challenges for large, many-core systems. One such challenge is the large memory capacity needed by the solve step, which requires us to have a scalable solution in order to have enough node-level memory to store all the data. In our previous work, we demonstrated the first implementation which showed a significant performance benefit for single node solves using GPUs. In this paper we extend our work to large problems and demonstrate the scalability of our solution on two Petascale GPU-based supercomputers: Titan at Oak Ridge and Piz Daint at CSCS. Our results show that our improved node-level parallelism scheme scales just as well across large systems as previous approaches when using the tried and tested KBA domain decomposition technique. We validate our results against an improved performance model which predicts the runtime of the main ‘sweep’ routine when running on different hardware, including CPUs or GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adams, M.P., Adams, M.L., Hawkins, W.D., Smith, T., Rauchwerger, L., Amato, N.M., Bailey, T.S., Falgout, R.D.: Provably optimal parallel transport sweeps on regular grids. In: International Conference on Mathematics, Computational Methods and Reactor Physics, pp. 2535–2553 (2013)

    Google Scholar 

  2. Adams, M.P., Adams, M.L., Mcgraw, C.N., Till, A.T., Bailey, T.S.: Provably optimal parallel transport sweeps with non-contiguous partitions. In: Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method, pp. 1–19. No. ANS MC2015, American Nuclear Society, Nashville, Tennessee (2015)

    Google Scholar 

  3. Bailey, T.S., Falgout, R.D.: Analysis of massively parallel discrete-ordinates transport sweep algorithms with collisions. In: International Conference on Mathematics, Computational Methods, and Reactor Physics, pp. 1–15. American Nuclear Society, New York, USA (2009)

    Google Scholar 

  4. Baker, C., Davidson, G., Evans, T.M., Hamilton, S., Jarrell, J., Joubert, W.: High performance radiation transport simulations: preparing for TITAN. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–10 (2012)

    Google Scholar 

  5. Baker, R., Koch, K.: An Sn algorithm for the massively parallel CM-200 computer. Nucl. Sci. Eng. 128, 312–320 (1998)

    Article  Google Scholar 

  6. Baker, R.S.: An Sn algorithm for modern architectures. In: Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method. No. ANS MC2015, American Nuclear Society, Nashville, TN (2015)

    Google Scholar 

  7. Baker, R., McGhee, J., Koch, K., Morel, J.: Two Sn algorithms for the massively parallel CM-200 computer. Submitted to Nuclear Science and Engineering (1996)

    Google Scholar 

  8. Deakin, T., McIntosh-Smith, S.: GPU-STREAM: benchmarking the achievable memory bandwidth of Graphics Processing Units (poster). In: Supercomputing, Austin, Texas (2015)

    Google Scholar 

  9. Deakin, T., McIntosh-Smith, S., Gaudin, W.: Expressing parallelism on many-core for deterministic discrete ordinates transport. In: Workshop on Representative Applications at IEEE Cluster, Chicago (2015)

    Google Scholar 

  10. Deakin, T., McIntosh-Smith, S., Martineau, M., Gaudin, W.: An improved parallelism scheme for deterministic discrete ordinates transport. Int. J. High Perform. Comput. Appl. (Spec. Issue) (2015, in press)

    Google Scholar 

  11. Evans, T.M., Joubert, W., Hamilton, S.P., Johnson, S.R., Turner, J.A., Davidson, G.G., Pandya, T.M.: Three-Dimensional Discrete Ordinates Reactor Assembly Calculations on GPUs. In: Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method. No. ANS MC2015, American Nuclear Society, Nashville, Tennessee (2015)

    Google Scholar 

  12. Evans, T.M., Stafford, A.S., Slaybaugh, R.N., Clarno, K.T.: Denovo: a new three-dimensional parallel discrete ordinates code in SCALE. Nucl. Technol. 171, 171–200 (2010)

    Google Scholar 

  13. Freed, J., Gupta, S., Tiwari, D.: An analysis of network congestion in the Titan supercomputers interconnect (poster). In: Supercomuting, pp. 1–2 (2015)

    Google Scholar 

  14. Hoisie, A., Lubeck, O., Wasserman, H.: Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications (2000)

    Google Scholar 

  15. Koch, K., Baker, R., Alcouffe, R.: Solution of the first-order form of three-dimensional discrete ordinates equations on a massively parallel machine. Trans. Am. Nucl. Soc. 65, 198–199 (1992)

    Google Scholar 

  16. Lewis, E., Miller, W.J.: Computational Methods of Neutron Transport. American Nuclear Society, La Grange Park (1993)

    MATH  Google Scholar 

  17. McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995

    Google Scholar 

  18. Munshi, A.: The OpenCL Specification, Version 1.1 (2011)

    Google Scholar 

  19. Pedretti, K., Vaughan, C., Barrett, R., Devine, K., Hemmert, K.S.: Using the Cray Gemini performance counters

    Google Scholar 

  20. Pennycook, S.J., Hammond, S.D., Mudalige, G.R., Wright, S.A., Jarvis, S.A.: On the acceleration of wavefront applications using distributed many-core architectures. Comput. J. 55(2), 138–153 (2012)

    Article  Google Scholar 

  21. Rabenseifner, R., Schulz, G.: B_eff v3.6. https://fs.hlrs.de/projects/par/mpi/b_eff/

  22. Strohmaier, E., Simon, H., Dongarra, J., Meuer, M.: Top 500, November 2015. http://www.top500.org

  23. Villa, O., Johnson, D.R., OConnor, M., Bolotin, E., Nellans, D., Luitjens, J., Sakharnykh, N., Wang, P., Micikevicius, P., Scudiero, A., Keckler, S.W., Dally, W.J.: Scaling the power wall: a path to exascale. In: Supercomputing (2014)

    Google Scholar 

  24. Zerr, R.J., Baker, R.S.: SNAP: SN (discrete ordinates) application proxy - proxy description. Technical report, LA-UR-13-21070, Los Alamos National Labratory (2013)

    Google Scholar 

Download references

Acknowledgements

This work has been financially supported by A.W.E. The authors would like to thank the University of Bristol High Performance Computing Group and Intel Parallel Computing Center; and Maria Grazia Giuffreda of CSCS at the Swiss National Supercomputing Centre for access to Piz Daint. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tom Deakin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Deakin, T., McIntosh-Smith, S., Gaudin, W. (2016). Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale. In: Kunkel, J., Balaji, P., Dongarra, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9697. Springer, Cham. https://doi.org/10.1007/978-3-319-41321-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41321-1_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41320-4

  • Online ISBN: 978-3-319-41321-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics