Abstract
The efficient code development for multibody simulation is considered. The solver is developed for dynamic stress-strain simulation of bodies in complex mechanisms. The mathematical formulation for a stress-strain solver based on discrete elements method is presented. Main aspects of the computational algorithm are considered to reveal possibilities to increase performance. The computational algorithm has limitations of scalability and maximal speedup in a parallel implementation. Further optimization is performed using different sets of vector instructions such as SSE, AVX, AVX2, FMA, IMCI for Intel Xeon Phi coprocessors (KNC) and AVX512 for 2nd generation Intel Xeon Phi processors (KNL). Some advanced techniques are developed and explained for packing matrix and vector data into 512-bit SIMD registers. For parallel implementation, OpenMP is used. For heterogeneous computing hardware, such as GPU and FGA, OpenCL is considered as universal and open standard. The vectorized parallel solver version is tested on Intel Xeon E5, MIC KNC and KNL architectures. OpenCL version is tested on NVIDIA Tesla architecture. Speedup results are achieved and compared with compiler autovectorization feature. Perspectives of future research are summarized and formulated in conclusion.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Getmanskiy, V., Gorobtsov, A., Sergeev, S., Izmailov, D., Shapovalov, O.: Concurrent simulation of multibody systems coupled with stress-strain and heat transfer solvers. J. Comput. Sci. 3(6), 492–497 (2012)
Gorobtsov, A., Getmanskiy, V., Andreev, A., Trung, D.D.: Simulation and visualization software for vehicle dynamics analysis using multibody system approach. Commun. Comput. Inf. Sci. 535, 378–390 (2015)
Kireev, S.E.: Optimization for a cluster with Xeon PHI accelerators the problem of filtration of water-oil mixture through an elastic porous medium. Numer. Methods Program. 16(2), 177–186 (2015). MSU, Moscow. (In Russian)
Heybrock S., Joó, B., Kalamkar D.D., Smelyanskiy M., Vaidyanathan K., Wettig T.: Lattice QCD with domain decomposition on Intel Xeon Phi co-processors (2014). arXiv:1412.2629v1
Murano, K., Shimobaba, T., Sugiyama, A., Takada, N., Kakue, T., Oikawa, M., Ito, T.: Fast computation of computer generated hologram using Xeon Phi coprocessor. Comput. Phys. Commun. 185(N10), 2742–2757 (2014)
Barnes, T.: Evaluating and optimizing the NERSC workload on knights landing. In: Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, PMBS 2016, pp. 43–53 (2016). doi:10.1109/PMBS.2016.10
Adinetz, A.V.: NUDA: programming graphics processors with extensible languages. Nuclear Electronics & Computing, Varna, Bulgaria (2011)
Capuzzo-Dolcetta, R., Spera, M., Punzo, D.: A fully parallel, high precision, N-body code running on hybrid computing platforms. J. Comput. Phys. 236, 580–593 (2013). doi:10.1016/j.jcp.2012.11.013
Getmanskiy, V.V., Andreev, A.E., Movchan, E.O.: Key features of multibody code vectorization using different instruction sets. In: Proceedings of International Conference Russian Supercomputing Days 2016, MSU, Moscow, pp. 365–372 (2016)
Andreev, A., Nasonov, A., Novokshenov, A., Bochkarev, A., Kharkov, E., Zharikov, D., Kharchenko, S., Yuschenko, A.: Vectorization algorithms of block linear algebra operations using SIMD instructions. In: Kravets, A., Shcherbakov, M., Kultsova, M., Shabalina, O. (eds.) CIT&DS 2015. CCIS, vol. 535, pp. 323–341. Springer, Cham (2015). doi:10.1007/978-3-319-23766-4_26
Acknowledgements
Work is performed with the financial support of the Russian Foundation for Basic Research - projects ## 16-47-340385, 16-07-00534, 15-01-04577, 15-07-06254 and the financial support of the Administration of Volgograd region.
All experiments were conducted using a computational cluster of Volgograd State Technical University. The cluster was assembled from the equipment acquired in the course of the implementation of the Strategic University development program, Program of the engineering training for industry and the Development program of the flagship university.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Getmanskiy, V., Andreev, A.E., Alekseev, S., Gorobtsov, A.S., Egunov, V., Kharkov, E. (2017). Optimization and Parallelization of CAE Software Stress-Strain Solver for Heterogeneous Computing Hardware. In: Kravets, A., Shcherbakov, M., Kultsova, M., Groumpos, P. (eds) Creativity in Intelligent Technologies and Data Science. CIT&DS 2017. Communications in Computer and Information Science, vol 754. Springer, Cham. https://doi.org/10.1007/978-3-319-65551-2_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-65551-2_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65550-5
Online ISBN: 978-3-319-65551-2
eBook Packages: Computer ScienceComputer Science (R0)