Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture

Ren, Jiangyong; Wang, ChaoWei; Wang, Yingrui; Tian, Rong

doi:10.1007/978-3-642-41591-3_14

Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture

Jiangyong Ren⁴,
ChaoWei Wang⁴,
Yingrui Wang⁴ &
…
Rong Tian⁴

Conference paper

882 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 207))

Abstract

A multi-scale finite element method code, msFEM, is tested on Jaguar and Nebulae, two petaflops computers that were listed as #1 and #2 on the Top500 list of June 2010 at the time of the tests. The flat MPI version of msFEM is scaled from 20K up to 200K CPU cores on Jaguar, delivering 70% parallel efficiency at the 200K cores with a finite element model of eight millions of degrees of freedom. GPU versions, in both double precision and mixed precision coded through MPI+OpenMP+CUDA hybrid programming, 900 GPU nodes on Jaguar and 1500 GPU nodes on Nebulae, achieving remarkable 90⁺% parallel efficiency on the systems. The mixed-precision GPU version delivers further 1.5 times of speedup over the fully double precision version with no significant implementational cost. The large-scale tests support that the msFEM runs efficiently on petaflops computers and is highly potential for domain applications at extreme-scale.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Summary Report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee, Office of Science, DOE (2010)
Google Scholar
http://www.top500.org
Sankaran, R.: Porting S3D turbulent combustion software to accelerator based systems. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)
Google Scholar
Archibald, R.: Progress Towards Accelerating CAM-SE on Hybrid Multi-Core Systems. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)
Google Scholar
Joubert, W.: Porting the Denovo Radiation Transport Code to Titan: Lessons Learned. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)
Google Scholar
Tharrington, A.: LAMMPS: Code Transformations in preparing for Titan. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)
Google Scholar
http://ees.lanl.gov/pflotran/
Eisenbach, M.: Preparing WL-LSMS for First Principles Thermodynamics Calculations on Accelerator and Multicore Architectures. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)
Google Scholar
Olson, G.B.: Designing a new material world. Science 288(5468), 993–998 (2000)
Article Google Scholar
Olson, G.B.: Computational design of hierarchically structured materials. Science 277(5330), 1237–1242 (1997)
Article Google Scholar
McVeigh, C., Liu, W.K.: Multiresolution continuum modeling of micro-void assisted dynamic adiabatic shear band propagation. Journal of the Mechanics and Physics of Solid 58(2), 187–205 (2010)
Article MathSciNet MATH Google Scholar
McVeigh, C., Vernerey, F., Liu, W.K., Brinson, C.: Multiresolution analysis for material design. Computer Methods in Applied Mechanics and Engineering 195, 5053–5076 (2006)
Article MathSciNet MATH Google Scholar
McVeigh, C., Vernerey, F.J., Liu, W.K., Moran, B., Olson, G.B.: An Interactive microvoid shear localization mechanism in high strength steels. Journal of the Mechanics and Physics of Solids 55(2), 224–225 (2007)
Article Google Scholar
McVeigh, C.: Ph.D. thesis, Northwestern University (2007)
Google Scholar
McVeigh, C., Liu, W.K.: Linking microstructure and properties through a predictive multiresolution continuum. Computer Methods in Applied Mechanics and Engineering 197, 3268–3290 (2008)
Article MathSciNet MATH Google Scholar
McVeigh, C., Liu, W.K.: Multiresolution modeling of ductile reinforced brittle composites. Journal of the Mechanics and Physics of Solids 57, 244–267 (2009)
Article MATH Google Scholar
Tian, R., Moran, B., Liu, W.K., Olson, G.B.: Multiscale fracture simulator. Dynamic Microstructure Design Consortium (ONR Contract: N00014-05-C-0241) Base Final Report (2008)
Google Scholar
Tian, R., Liu, W.K., Chan, S., Olson, G.B., Tang, S., Wang, J.S., Jou, H.J., Gong, J.D., Moran, B.: Linking Microstructures to Fracture Toughness—predictive 3D process zone simulations. The D 3-D Annual PI Review, Evanston, IL, March 23-25 (2009)
Google Scholar
Tian, R., Chan, S., Tang, S., Kopacz, A.M., Wang, J.-S., Jou, H.-J., Siad, L., Lindgren, L.-E., Olson, G., Liu, W.K.: A multi-resolution continuum simulation of the ductile fracture process. Journal of the Mechanics and Physics of Solids 58(10), 1681–1700 (2010)
Article MATH Google Scholar
http://www.olcf.ornl.gov/event/cray-technical-workshop-on-xk6-programming/
Aifantis, E.C.: On the role of gradients in the localization of deformation and fracture. International Journal of Engineering Science 30(10), 1279–1299 (1992)
Article MATH Google Scholar
Hill, R.: Elastic properties of reinforced solids: some theoretical principles. Journal of the Mechanics and Physics of Solids 11(5), 357–372 (1963)
Article MATH Google Scholar
Hill, R.: On constitutive macro-variables for heterogeneous solids at finite strain. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences 326(1565), 131–147 (1972)
Article MATH Google Scholar
Tian, R., Yagawa, G.: Generalized node and high-performance elements. International Journal for Numerical Methods in Engineering 64, 2039–2071 (2005)
Article MATH Google Scholar
Tian, R., Yagawa, G., Terasaka, H.: Linear dependence problems of partition of unity based generalized FEMs. Computer Methods in Applied Mechanics and Engineering 195, 4768–4782 (2006)
Article MathSciNet MATH Google Scholar
Tian, R.: A PU-based 4-node quadratic tetrahedron and linear dependence elimination in three dimensions. International Journal of Computational Methods 3, 545–562 (2006)
Article MathSciNet MATH Google Scholar
Tian, R., Matsubara, H., Yagawa, G.: Advanced 4-node tetrahedrons. International Journal for Numerical Methods in Engineering 68, 1209–1231 (2006)
Article MATH Google Scholar
Tian, R., Yagawa, G.: Allman’s triangle, rotational dof and partition of unity. International Journal for Numerical Methods in Engineering 69, 837–858 (2006)
Article Google Scholar
http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
Wilkinson, J.H.: Rounding Errors in Algebraic Processes. Prentice-Hall (1963)
Google Scholar
Moler, C.B.: Iterative refinement in floating point. J. ACM 14(2), 316–321 (1967)
Article MATH Google Scholar
Jankowski, M., Woniakowski, H.: Iterative refinement implies numerical stability. Journal BIT Numerical Mathematics 17(3), 303–311 (1977)
Article MATH Google Scholar
Higham, N.J.: Accuracy and stability of numerical algorithms. Society for Industrial and Applied Mathematics, Philadelphia (2002)
Book MATH Google Scholar
Demmel, J.W.: Applied Numerical Linear Algebra. SIAM Press (1997)
Google Scholar
Demmel, J., Hida, Y., Kahan, W., Li, X.S., Mukherjee, S., Riedy, E.J.: Error bounds from extra precise iterative refinement. Technical Report No. UCB/CSD-04-1344, LAPACK Working Note 165 (February 2005)
Google Scholar
Langou, J., Langou, J., Luszczek, P., Kurzak, J., Buttari, A., Dongarra, J.: Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems). In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (2006)
Google Scholar
Kurzak, J., Dongarra, J.: Implementation of mixed precision in solving systems of linear equations on the Cell processor. Concurrency and Computation: Practice and Experience 19(10), 1371–1385 (2007)
Article Google Scholar
Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. High Perform. Comput. Appl. 21, 457–466 (2007)
Article Google Scholar
Buttari, A., Dongarra, J., Kurzak, J., Luszczek, P., Tomov, S.: Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy. ACM Transactions on Mathematical Software (TOMS) 34(4) (2008)
Google Scholar
Taiji, M., Narumi, T., Ohno, Y., Futatsugi, N., Suenaga, A., Takada, N., Konagaya, A.: Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations. In: Proc. Supercomputing (2003)
Google Scholar
Göddeke, D., Strzodka, R., Turek, S.: Accelerating double precision FEM simulations with GPUs. In: Proceedings of ASIM 2005 - 18th Symposium on Simulation Technique (2005)
Google Scholar
Strzodka, R., Göddeke, D.: Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components. In: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), pp. 259–268 (2006)
Google Scholar
Strzodka, R., Göddeke, D.: Mixed precision methods for convergent iterative schemes. In: Proceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures, p. D–59–60 (2006)
Google Scholar
Göddeke, D., Strzodka, R., Turek, S.: Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations. International Journal of Parallel, Emergent and Distributed Systems (IJPEDS), Special Issue: Applied Parallel Computing 22(4), 221–256 (2007)
Article MATH Google Scholar
Göddeke, D., Strzodka, R.: Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (part 2: Double precision GPUs). Technical report, Technical University Dortmund (2008)
Google Scholar
Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J.W., Dongarra, J.J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. SIAM, http://www.netlib.org/lapack/
Li, X.S., Demmel, J.W., Bailey, D.H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S.Y., Kapur, A., Martin, M.C., Thompson, B.J., Tung, T., Yoo, D.J.: Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software (TOMS) 28(2) (2002)
Google Scholar
Göddeke, D., Strzodka, R., Turek, S.: Performance and accuracy of hardware-oriented native-,emulated- and mixed-precision solvers in FEM simulations. International Journal of Parallel, Emer-gent and Distributed Systems, Special Issue: Applied Parallel Computing 22(4), 221–256 (2007)
Article MATH Google Scholar
Göddeke, D., Wobker, H., Strzodka, R., Mohd-Yusof, J., McCormick, P., Turek, S.: Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU. Accepted for Publication in the International Journal of Computational Science and Engineering (2008)
Google Scholar
Strzodka, R., Göddeke, D.: Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components. In: FCCM 2006: Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 259–270 (2006)
Google Scholar
Kurzak, J., Dongarra, J.J.: Implementation of mixed precision in solving systems of linear equations on the CELL processor. Concurrency and Computation: Practice and Experience 19(10), 1371–1385 (2007)
Article Google Scholar
Tian, R.: Co-design thinking towards exascale computing. Information Technology Letter 70(3), 50–63 (2012)
Google Scholar
Liu, J., Wang, C., Ren, J., Tian, R.: A mixed precision explicit finite element algorithm on heterogeneous architecture and its CUDA implementation. Computer Science 39(6), 293–296 (2012)
Google Scholar
Liu, J.: A mixed precision GPU acceleration algorithm and its application to FEM. MS thesis of Graduate School of Chinese Academy of Sciences (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

HPC Research Center, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
Jiangyong Ren, ChaoWei Wang, Yingrui Wang & Rong Tian

Authors

Jiangyong Ren
View author publications
You can also search for this author in PubMed Google Scholar
ChaoWei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yingrui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rong Tian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Chinese Academy of Sciences, 4# South Fourth Street, Zhong Guan Cun, 100190, Beijing, P.R. China
Yunquan Zhang
National Supercomputing Center in Changsha, College of Information Science and Engineering, Hunan University, #2, South Lushan Road, Yuelu District, 410082, Changsha, P.R. China
Kenli Li
College of Information Science and Engineering, Hunan University, #2, South Lushan Road, Yuelu District, 410082, Changsha, P.R. China
Zheng Xiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, J., Wang, C., Wang, Y., Tian, R. (2013). Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture. In: Zhang, Y., Li, K., Xiao, Z. (eds) High Performance Computing. HPC 2012. Communications in Computer and Information Science, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41591-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-41591-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41590-6
Online ISBN: 978-3-642-41591-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics