Skip to main content

Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 207))

Abstract

A multi-scale finite element method code, msFEM, is tested on Jaguar and Nebulae, two petaflops computers that were listed as #1 and #2 on the Top500 list of June 2010 at the time of the tests. The flat MPI version of msFEM is scaled from 20K up to 200K CPU cores on Jaguar, delivering 70% parallel efficiency at the 200K cores with a finite element model of eight millions of degrees of freedom. GPU versions, in both double precision and mixed precision coded through MPI+OpenMP+CUDA hybrid programming, 900 GPU nodes on Jaguar and 1500 GPU nodes on Nebulae, achieving remarkable 90 + % parallel efficiency on the systems. The mixed-precision GPU version delivers further 1.5 times of speedup over the fully double precision version with no significant implementational cost. The large-scale tests support that the msFEM runs efficiently on petaflops computers and is highly potential for domain applications at extreme-scale.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Summary Report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee, Office of Science, DOE (2010)

    Google Scholar 

  2. http://www.top500.org

  3. Sankaran, R.: Porting S3D turbulent combustion software to accelerator based systems. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)

    Google Scholar 

  4. Archibald, R.: Progress Towards Accelerating CAM-SE on Hybrid Multi-Core Systems. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)

    Google Scholar 

  5. Joubert, W.: Porting the Denovo Radiation Transport Code to Titan: Lessons Learned. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)

    Google Scholar 

  6. Tharrington, A.: LAMMPS: Code Transformations in preparing for Titan. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)

    Google Scholar 

  7. http://ees.lanl.gov/pflotran/

  8. Eisenbach, M.: Preparing WL-LSMS for First Principles Thermodynamics Calculations on Accelerator and Multicore Architectures. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)

    Google Scholar 

  9. Olson, G.B.: Designing a new material world. Science 288(5468), 993–998 (2000)

    Article  Google Scholar 

  10. Olson, G.B.: Computational design of hierarchically structured materials. Science 277(5330), 1237–1242 (1997)

    Article  Google Scholar 

  11. McVeigh, C., Liu, W.K.: Multiresolution continuum modeling of micro-void assisted dynamic adiabatic shear band propagation. Journal of the Mechanics and Physics of Solid 58(2), 187–205 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  12. McVeigh, C., Vernerey, F., Liu, W.K., Brinson, C.: Multiresolution analysis for material design. Computer Methods in Applied Mechanics and Engineering 195, 5053–5076 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  13. McVeigh, C., Vernerey, F.J., Liu, W.K., Moran, B., Olson, G.B.: An Interactive microvoid shear localization mechanism in high strength steels. Journal of the Mechanics and Physics of Solids 55(2), 224–225 (2007)

    Article  Google Scholar 

  14. McVeigh, C.: Ph.D. thesis, Northwestern University (2007)

    Google Scholar 

  15. McVeigh, C., Liu, W.K.: Linking microstructure and properties through a predictive multiresolution continuum. Computer Methods in Applied Mechanics and Engineering 197, 3268–3290 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  16. McVeigh, C., Liu, W.K.: Multiresolution modeling of ductile reinforced brittle composites. Journal of the Mechanics and Physics of Solids 57, 244–267 (2009)

    Article  MATH  Google Scholar 

  17. Tian, R., Moran, B., Liu, W.K., Olson, G.B.: Multiscale fracture simulator. Dynamic Microstructure Design Consortium (ONR Contract: N00014-05-C-0241) Base Final Report (2008)

    Google Scholar 

  18. Tian, R., Liu, W.K., Chan, S., Olson, G.B., Tang, S., Wang, J.S., Jou, H.J., Gong, J.D., Moran, B.: Linking Microstructures to Fracture Toughness—predictive 3D process zone simulations. The D 3-D Annual PI Review, Evanston, IL, March 23-25 (2009)

    Google Scholar 

  19. Tian, R., Chan, S., Tang, S., Kopacz, A.M., Wang, J.-S., Jou, H.-J., Siad, L., Lindgren, L.-E., Olson, G., Liu, W.K.: A multi-resolution continuum simulation of the ductile fracture process. Journal of the Mechanics and Physics of Solids 58(10), 1681–1700 (2010)

    Article  MATH  Google Scholar 

  20. http://www.olcf.ornl.gov/event/cray-technical-workshop-on-xk6-programming/

  21. Aifantis, E.C.: On the role of gradients in the localization of deformation and fracture. International Journal of Engineering Science 30(10), 1279–1299 (1992)

    Article  MATH  Google Scholar 

  22. Hill, R.: Elastic properties of reinforced solids: some theoretical principles. Journal of the Mechanics and Physics of Solids 11(5), 357–372 (1963)

    Article  MATH  Google Scholar 

  23. Hill, R.: On constitutive macro-variables for heterogeneous solids at finite strain. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences 326(1565), 131–147 (1972)

    Article  MATH  Google Scholar 

  24. Tian, R., Yagawa, G.: Generalized node and high-performance elements. International Journal for Numerical Methods in Engineering 64, 2039–2071 (2005)

    Article  MATH  Google Scholar 

  25. Tian, R., Yagawa, G., Terasaka, H.: Linear dependence problems of partition of unity based generalized FEMs. Computer Methods in Applied Mechanics and Engineering 195, 4768–4782 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  26. Tian, R.: A PU-based 4-node quadratic tetrahedron and linear dependence elimination in three dimensions. International Journal of Computational Methods 3, 545–562 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  27. Tian, R., Matsubara, H., Yagawa, G.: Advanced 4-node tetrahedrons. International Journal for Numerical Methods in Engineering 68, 1209–1231 (2006)

    Article  MATH  Google Scholar 

  28. Tian, R., Yagawa, G.: Allman’s triangle, rotational dof and partition of unity. International Journal for Numerical Methods in Engineering 69, 837–858 (2006)

    Article  Google Scholar 

  29. http://glaros.dtc.umn.edu/gkhome/metis/metis/overview

  30. Wilkinson, J.H.: Rounding Errors in Algebraic Processes. Prentice-Hall (1963)

    Google Scholar 

  31. Moler, C.B.: Iterative refinement in floating point. J. ACM 14(2), 316–321 (1967)

    Article  MATH  Google Scholar 

  32. Jankowski, M., Woniakowski, H.: Iterative refinement implies numerical stability. Journal BIT Numerical Mathematics 17(3), 303–311 (1977)

    Article  MATH  Google Scholar 

  33. Higham, N.J.: Accuracy and stability of numerical algorithms. Society for Industrial and Applied Mathematics, Philadelphia (2002)

    Book  MATH  Google Scholar 

  34. Demmel, J.W.: Applied Numerical Linear Algebra. SIAM Press (1997)

    Google Scholar 

  35. Demmel, J., Hida, Y., Kahan, W., Li, X.S., Mukherjee, S., Riedy, E.J.: Error bounds from extra precise iterative refinement. Technical Report No. UCB/CSD-04-1344, LAPACK Working Note 165 (February 2005)

    Google Scholar 

  36. Langou, J., Langou, J., Luszczek, P., Kurzak, J., Buttari, A., Dongarra, J.: Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems). In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (2006)

    Google Scholar 

  37. Kurzak, J., Dongarra, J.: Implementation of mixed precision in solving systems of linear equations on the Cell processor. Concurrency and Computation: Practice and Experience 19(10), 1371–1385 (2007)

    Article  Google Scholar 

  38. Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. High Perform. Comput. Appl. 21, 457–466 (2007)

    Article  Google Scholar 

  39. Buttari, A., Dongarra, J., Kurzak, J., Luszczek, P., Tomov, S.: Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy. ACM Transactions on Mathematical Software (TOMS) 34(4) (2008)

    Google Scholar 

  40. Taiji, M., Narumi, T., Ohno, Y., Futatsugi, N., Suenaga, A., Takada, N., Konagaya, A.: Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations. In: Proc. Supercomputing (2003)

    Google Scholar 

  41. Göddeke, D., Strzodka, R., Turek, S.: Accelerating double precision FEM simulations with GPUs. In: Proceedings of ASIM 2005 - 18th Symposium on Simulation Technique (2005)

    Google Scholar 

  42. Strzodka, R., Göddeke, D.: Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components. In: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), pp. 259–268 (2006)

    Google Scholar 

  43. Strzodka, R., Göddeke, D.: Mixed precision methods for convergent iterative schemes. In: Proceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures, p. D–59–60 (2006)

    Google Scholar 

  44. Göddeke, D., Strzodka, R., Turek, S.: Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations. International Journal of Parallel, Emergent and Distributed Systems (IJPEDS), Special Issue: Applied Parallel Computing 22(4), 221–256 (2007)

    Article  MATH  Google Scholar 

  45. Göddeke, D., Strzodka, R.: Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (part 2: Double precision GPUs). Technical report, Technical University Dortmund (2008)

    Google Scholar 

  46. Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J.W., Dongarra, J.J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. SIAM, http://www.netlib.org/lapack/

  47. Li, X.S., Demmel, J.W., Bailey, D.H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S.Y., Kapur, A., Martin, M.C., Thompson, B.J., Tung, T., Yoo, D.J.: Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software (TOMS) 28(2) (2002)

    Google Scholar 

  48. Göddeke, D., Strzodka, R., Turek, S.: Performance and accuracy of hardware-oriented native-,emulated- and mixed-precision solvers in FEM simulations. International Journal of Parallel, Emer-gent and Distributed Systems, Special Issue: Applied Parallel Computing 22(4), 221–256 (2007)

    Article  MATH  Google Scholar 

  49. Göddeke, D., Wobker, H., Strzodka, R., Mohd-Yusof, J., McCormick, P., Turek, S.: Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU. Accepted for Publication in the International Journal of Computational Science and Engineering (2008)

    Google Scholar 

  50. Strzodka, R., Göddeke, D.: Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components. In: FCCM 2006: Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 259–270 (2006)

    Google Scholar 

  51. Kurzak, J., Dongarra, J.J.: Implementation of mixed precision in solving systems of linear equations on the CELL processor. Concurrency and Computation: Practice and Experience 19(10), 1371–1385 (2007)

    Article  Google Scholar 

  52. Tian, R.: Co-design thinking towards exascale computing. Information Technology Letter 70(3), 50–63 (2012)

    Google Scholar 

  53. Liu, J., Wang, C., Ren, J., Tian, R.: A mixed precision explicit finite element algorithm on heterogeneous architecture and its CUDA implementation. Computer Science 39(6), 293–296 (2012)

    Google Scholar 

  54. Liu, J.: A mixed precision GPU acceleration algorithm and its application to FEM. MS thesis of Graduate School of Chinese Academy of Sciences (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ren, J., Wang, C., Wang, Y., Tian, R. (2013). Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture. In: Zhang, Y., Li, K., Xiao, Z. (eds) High Performance Computing. HPC 2012. Communications in Computer and Information Science, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41591-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41591-3_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41590-6

  • Online ISBN: 978-3-642-41591-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics