Optimization and Parallelization of CAE Software Stress-Strain Solver for Heterogeneous Computing Hardware

Getmanskiy, Victor; Andreev, Andrey E.; Alekseev, Sergey; Gorobtsov, Alexander S.; Egunov, Vitaly; Kharkov, Egor

doi:10.1007/978-3-319-65551-2_41

Optimization and Parallelization of CAE Software Stress-Strain Solver for Heterogeneous Computing Hardware

Victor Getmanskiy¹³,
Andrey E. Andreev¹³,
Sergey Alekseev¹³,
Alexander S. Gorobtsov¹³,
Vitaly Egunov¹³ &
…
Egor Kharkov¹³

Conference paper
First Online: 17 August 2017

1280 Accesses
4 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 754))

Abstract

The efficient code development for multibody simulation is considered. The solver is developed for dynamic stress-strain simulation of bodies in complex mechanisms. The mathematical formulation for a stress-strain solver based on discrete elements method is presented. Main aspects of the computational algorithm are considered to reveal possibilities to increase performance. The computational algorithm has limitations of scalability and maximal speedup in a parallel implementation. Further optimization is performed using different sets of vector instructions such as SSE, AVX, AVX2, FMA, IMCI for Intel Xeon Phi coprocessors (KNC) and AVX512 for 2^nd generation Intel Xeon Phi processors (KNL). Some advanced techniques are developed and explained for packing matrix and vector data into 512-bit SIMD registers. For parallel implementation, OpenMP is used. For heterogeneous computing hardware, such as GPU and FGA, OpenCL is considered as universal and open standard. The vectorized parallel solver version is tested on Intel Xeon E5, MIC KNC and KNL architectures. OpenCL version is tested on NVIDIA Tesla architecture. Speedup results are achieved and compared with compiler autovectorization feature. Perspectives of future research are summarized and formulated in conclusion.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Getmanskiy, V., Gorobtsov, A., Sergeev, S., Izmailov, D., Shapovalov, O.: Concurrent simulation of multibody systems coupled with stress-strain and heat transfer solvers. J. Comput. Sci. 3(6), 492–497 (2012)
Article Google Scholar
Gorobtsov, A., Getmanskiy, V., Andreev, A., Trung, D.D.: Simulation and visualization software for vehicle dynamics analysis using multibody system approach. Commun. Comput. Inf. Sci. 535, 378–390 (2015)
Google Scholar
Kireev, S.E.: Optimization for a cluster with Xeon PHI accelerators the problem of filtration of water-oil mixture through an elastic porous medium. Numer. Methods Program. 16(2), 177–186 (2015). MSU, Moscow. (In Russian)
Google Scholar
Heybrock S., Joó, B., Kalamkar D.D., Smelyanskiy M., Vaidyanathan K., Wettig T.: Lattice QCD with domain decomposition on Intel Xeon Phi co-processors (2014). arXiv:1412.2629v1
Murano, K., Shimobaba, T., Sugiyama, A., Takada, N., Kakue, T., Oikawa, M., Ito, T.: Fast computation of computer generated hologram using Xeon Phi coprocessor. Comput. Phys. Commun. 185(N10), 2742–2757 (2014)
Article Google Scholar
Barnes, T.: Evaluating and optimizing the NERSC workload on knights landing. In: Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, PMBS 2016, pp. 43–53 (2016). doi:10.1109/PMBS.2016.10
Adinetz, A.V.: NUDA: programming graphics processors with extensible languages. Nuclear Electronics & Computing, Varna, Bulgaria (2011)
Google Scholar
Capuzzo-Dolcetta, R., Spera, M., Punzo, D.: A fully parallel, high precision, N-body code running on hybrid computing platforms. J. Comput. Phys. 236, 580–593 (2013). doi:10.1016/j.jcp.2012.11.013
Article MathSciNet Google Scholar
Getmanskiy, V.V., Andreev, A.E., Movchan, E.O.: Key features of multibody code vectorization using different instruction sets. In: Proceedings of International Conference Russian Supercomputing Days 2016, MSU, Moscow, pp. 365–372 (2016)
Google Scholar
Andreev, A., Nasonov, A., Novokshenov, A., Bochkarev, A., Kharkov, E., Zharikov, D., Kharchenko, S., Yuschenko, A.: Vectorization algorithms of block linear algebra operations using SIMD instructions. In: Kravets, A., Shcherbakov, M., Kultsova, M., Shabalina, O. (eds.) CIT&DS 2015. CCIS, vol. 535, pp. 323–341. Springer, Cham (2015). doi:10.1007/978-3-319-23766-4_26
Chapter Google Scholar

Download references

Acknowledgements

Work is performed with the financial support of the Russian Foundation for Basic Research - projects ## 16-47-340385, 16-07-00534, 15-01-04577, 15-07-06254 and the financial support of the Administration of Volgograd region.

All experiments were conducted using a computational cluster of Volgograd State Technical University. The cluster was assembled from the equipment acquired in the course of the implementation of the Strategic University development program, Program of the engineering training for industry and the Development program of the flagship university.

Author information

Authors and Affiliations

Volgograd State Technical University, Volgograd, Russia
Victor Getmanskiy, Andrey E. Andreev, Sergey Alekseev, Alexander S. Gorobtsov, Vitaly Egunov & Egor Kharkov

Authors

Victor Getmanskiy
View author publications
You can also search for this author in PubMed Google Scholar
Andrey E. Andreev
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Alekseev
View author publications
You can also search for this author in PubMed Google Scholar
Alexander S. Gorobtsov
View author publications
You can also search for this author in PubMed Google Scholar
Vitaly Egunov
View author publications
You can also search for this author in PubMed Google Scholar
Egor Kharkov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey E. Andreev .

Editor information

Editors and Affiliations

Volgograd State Technical University, Volgograd, Russia
Alla Kravets
Volgograd State Technical University, Volgograd, Russia
Maxim Shcherbakov
Volgograd State Technical University, Volgograd, Russia
Marina Kultsova
University of Patras, Patras, Greece
Peter Groumpos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Getmanskiy, V., Andreev, A.E., Alekseev, S., Gorobtsov, A.S., Egunov, V., Kharkov, E. (2017). Optimization and Parallelization of CAE Software Stress-Strain Solver for Heterogeneous Computing Hardware. In: Kravets, A., Shcherbakov, M., Kultsova, M., Groumpos, P. (eds) Creativity in Intelligent Technologies and Data Science. CIT&DS 2017. Communications in Computer and Information Science, vol 754. Springer, Cham. https://doi.org/10.1007/978-3-319-65551-2_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-65551-2_41
Published: 17 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65550-5
Online ISBN: 978-3-319-65551-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics