Abstract
A multi-scale finite element method code, msFEM, is tested on Jaguar and Nebulae, two petaflops computers that were listed as #1 and #2 on the Top500 list of June 2010 at the time of the tests. The flat MPI version of msFEM is scaled from 20K up to 200K CPU cores on Jaguar, delivering 70% parallel efficiency at the 200K cores with a finite element model of eight millions of degrees of freedom. GPU versions, in both double precision and mixed precision coded through MPI+OpenMP+CUDA hybrid programming, 900 GPU nodes on Jaguar and 1500 GPU nodes on Nebulae, achieving remarkable 90 + % parallel efficiency on the systems. The mixed-precision GPU version delivers further 1.5 times of speedup over the fully double precision version with no significant implementational cost. The large-scale tests support that the msFEM runs efficiently on petaflops computers and is highly potential for domain applications at extreme-scale.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Summary Report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee, Office of Science, DOE (2010)
Sankaran, R.: Porting S3D turbulent combustion software to accelerator based systems. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)
Archibald, R.: Progress Towards Accelerating CAM-SE on Hybrid Multi-Core Systems. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)
Joubert, W.: Porting the Denovo Radiation Transport Code to Titan: Lessons Learned. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)
Tharrington, A.: LAMMPS: Code Transformations in preparing for Titan. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)
Eisenbach, M.: Preparing WL-LSMS for First Principles Thermodynamics Calculations on Accelerator and Multicore Architectures. Titan Summit. August 15-17, JICS Auditorium, Building 5100, ORNL, USA (2011)
Olson, G.B.: Designing a new material world. Science 288(5468), 993–998 (2000)
Olson, G.B.: Computational design of hierarchically structured materials. Science 277(5330), 1237–1242 (1997)
McVeigh, C., Liu, W.K.: Multiresolution continuum modeling of micro-void assisted dynamic adiabatic shear band propagation. Journal of the Mechanics and Physics of Solid 58(2), 187–205 (2010)
McVeigh, C., Vernerey, F., Liu, W.K., Brinson, C.: Multiresolution analysis for material design. Computer Methods in Applied Mechanics and Engineering 195, 5053–5076 (2006)
McVeigh, C., Vernerey, F.J., Liu, W.K., Moran, B., Olson, G.B.: An Interactive microvoid shear localization mechanism in high strength steels. Journal of the Mechanics and Physics of Solids 55(2), 224–225 (2007)
McVeigh, C.: Ph.D. thesis, Northwestern University (2007)
McVeigh, C., Liu, W.K.: Linking microstructure and properties through a predictive multiresolution continuum. Computer Methods in Applied Mechanics and Engineering 197, 3268–3290 (2008)
McVeigh, C., Liu, W.K.: Multiresolution modeling of ductile reinforced brittle composites. Journal of the Mechanics and Physics of Solids 57, 244–267 (2009)
Tian, R., Moran, B., Liu, W.K., Olson, G.B.: Multiscale fracture simulator. Dynamic Microstructure Design Consortium (ONR Contract: N00014-05-C-0241) Base Final Report (2008)
Tian, R., Liu, W.K., Chan, S., Olson, G.B., Tang, S., Wang, J.S., Jou, H.J., Gong, J.D., Moran, B.: Linking Microstructures to Fracture Toughness—predictive 3D process zone simulations. The D 3-D Annual PI Review, Evanston, IL, March 23-25 (2009)
Tian, R., Chan, S., Tang, S., Kopacz, A.M., Wang, J.-S., Jou, H.-J., Siad, L., Lindgren, L.-E., Olson, G., Liu, W.K.: A multi-resolution continuum simulation of the ductile fracture process. Journal of the Mechanics and Physics of Solids 58(10), 1681–1700 (2010)
http://www.olcf.ornl.gov/event/cray-technical-workshop-on-xk6-programming/
Aifantis, E.C.: On the role of gradients in the localization of deformation and fracture. International Journal of Engineering Science 30(10), 1279–1299 (1992)
Hill, R.: Elastic properties of reinforced solids: some theoretical principles. Journal of the Mechanics and Physics of Solids 11(5), 357–372 (1963)
Hill, R.: On constitutive macro-variables for heterogeneous solids at finite strain. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences 326(1565), 131–147 (1972)
Tian, R., Yagawa, G.: Generalized node and high-performance elements. International Journal for Numerical Methods in Engineering 64, 2039–2071 (2005)
Tian, R., Yagawa, G., Terasaka, H.: Linear dependence problems of partition of unity based generalized FEMs. Computer Methods in Applied Mechanics and Engineering 195, 4768–4782 (2006)
Tian, R.: A PU-based 4-node quadratic tetrahedron and linear dependence elimination in three dimensions. International Journal of Computational Methods 3, 545–562 (2006)
Tian, R., Matsubara, H., Yagawa, G.: Advanced 4-node tetrahedrons. International Journal for Numerical Methods in Engineering 68, 1209–1231 (2006)
Tian, R., Yagawa, G.: Allman’s triangle, rotational dof and partition of unity. International Journal for Numerical Methods in Engineering 69, 837–858 (2006)
Wilkinson, J.H.: Rounding Errors in Algebraic Processes. Prentice-Hall (1963)
Moler, C.B.: Iterative refinement in floating point. J. ACM 14(2), 316–321 (1967)
Jankowski, M., Woniakowski, H.: Iterative refinement implies numerical stability. Journal BIT Numerical Mathematics 17(3), 303–311 (1977)
Higham, N.J.: Accuracy and stability of numerical algorithms. Society for Industrial and Applied Mathematics, Philadelphia (2002)
Demmel, J.W.: Applied Numerical Linear Algebra. SIAM Press (1997)
Demmel, J., Hida, Y., Kahan, W., Li, X.S., Mukherjee, S., Riedy, E.J.: Error bounds from extra precise iterative refinement. Technical Report No. UCB/CSD-04-1344, LAPACK Working Note 165 (February 2005)
Langou, J., Langou, J., Luszczek, P., Kurzak, J., Buttari, A., Dongarra, J.: Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems). In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (2006)
Kurzak, J., Dongarra, J.: Implementation of mixed precision in solving systems of linear equations on the Cell processor. Concurrency and Computation: Practice and Experience 19(10), 1371–1385 (2007)
Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. High Perform. Comput. Appl. 21, 457–466 (2007)
Buttari, A., Dongarra, J., Kurzak, J., Luszczek, P., Tomov, S.: Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy. ACM Transactions on Mathematical Software (TOMS) 34(4) (2008)
Taiji, M., Narumi, T., Ohno, Y., Futatsugi, N., Suenaga, A., Takada, N., Konagaya, A.: Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations. In: Proc. Supercomputing (2003)
Göddeke, D., Strzodka, R., Turek, S.: Accelerating double precision FEM simulations with GPUs. In: Proceedings of ASIM 2005 - 18th Symposium on Simulation Technique (2005)
Strzodka, R., Göddeke, D.: Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components. In: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), pp. 259–268 (2006)
Strzodka, R., Göddeke, D.: Mixed precision methods for convergent iterative schemes. In: Proceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures, p. D–59–60 (2006)
Göddeke, D., Strzodka, R., Turek, S.: Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations. International Journal of Parallel, Emergent and Distributed Systems (IJPEDS), Special Issue: Applied Parallel Computing 22(4), 221–256 (2007)
Göddeke, D., Strzodka, R.: Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (part 2: Double precision GPUs). Technical report, Technical University Dortmund (2008)
Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J.W., Dongarra, J.J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. SIAM, http://www.netlib.org/lapack/
Li, X.S., Demmel, J.W., Bailey, D.H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S.Y., Kapur, A., Martin, M.C., Thompson, B.J., Tung, T., Yoo, D.J.: Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software (TOMS) 28(2) (2002)
Göddeke, D., Strzodka, R., Turek, S.: Performance and accuracy of hardware-oriented native-,emulated- and mixed-precision solvers in FEM simulations. International Journal of Parallel, Emer-gent and Distributed Systems, Special Issue: Applied Parallel Computing 22(4), 221–256 (2007)
Göddeke, D., Wobker, H., Strzodka, R., Mohd-Yusof, J., McCormick, P., Turek, S.: Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU. Accepted for Publication in the International Journal of Computational Science and Engineering (2008)
Strzodka, R., Göddeke, D.: Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components. In: FCCM 2006: Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 259–270 (2006)
Kurzak, J., Dongarra, J.J.: Implementation of mixed precision in solving systems of linear equations on the CELL processor. Concurrency and Computation: Practice and Experience 19(10), 1371–1385 (2007)
Tian, R.: Co-design thinking towards exascale computing. Information Technology Letter 70(3), 50–63 (2012)
Liu, J., Wang, C., Ren, J., Tian, R.: A mixed precision explicit finite element algorithm on heterogeneous architecture and its CUDA implementation. Computer Science 39(6), 293–296 (2012)
Liu, J.: A mixed precision GPU acceleration algorithm and its application to FEM. MS thesis of Graduate School of Chinese Academy of Sciences (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ren, J., Wang, C., Wang, Y., Tian, R. (2013). Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture. In: Zhang, Y., Li, K., Xiao, Z. (eds) High Performance Computing. HPC 2012. Communications in Computer and Information Science, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41591-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-41591-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41590-6
Online ISBN: 978-3-642-41591-3
eBook Packages: Computer ScienceComputer Science (R0)