Widespread application of supercomputer technologies in various spheres of life, as well as the need of high-performance calculations allows us to speak about the relevance of the problem of increasing the performance of computer codes on supercomputers of modern architectures. Vectorization of program code is a low-level optimization that can, with a relatively local and compact application, increase the productivity of computational codes by several times. Modern Intel microprocessors have support for a unique set of instructions AVX-512, which, due to its features, allows you to vectorize almost any kind of code written in a predicate form. A set of simple restrictions when developing programs along with vectorization tools to enable the use of the AVX-512 instruction set can significantly speed up the resulting program. The article discusses approaches to vectorization of flat loops—a special-purpose program context, the successful vectorization of which allows to increase the productivity of supercomputer applications even for such program code for which optimizing compilers are powerless.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
R. Fadeev, K. Ushakov, M. Tolstykh, R. Ibrayev, V. Shashkin, and G. Goyman, ‘‘Supercomputing the seasonal weather prediction,’’ in Supercomputing. RuSCDays 2019, Ed. by V. Voevodin and S. Sobolev, Commun. Comput. Inform. Sci. 1129 (2019).
Y. Hu, H. Yang, Z. Luan, L. Gan, G. Yang, and D. Qian, ‘‘Massively scaling seismic processing on sunway TaihuLight supercomputer,’’ IEEE Trans. Parallel Distrib. Syst. 31, 1194–1208 (2020).
K. E. Jones, ‘‘Supercomputing improves predictions of fluid flow in rock,’’ Comput. Sci. Eng. 21 (6), 74–76 (2019).
V. Kalantzis, ‘‘Data analytics, accelerators, and supercomputing: The challenges and future of MPI,’’ XRDS 23, 50–52 (2017).
A. A. Rybakov, ‘‘Inner respresentation and crossprocess exchange mechanism for block-structured grid for supercomputer calculations,’’ Program Syst.: Theory Appl. 32 (8:1), 121–134 (2017).
A. V. Baranov, G. I. Savin, B. M. Shabanov, et al., ‘‘Methods of jobs containerization for supercomputer workload managers,’’ Lobachevskii J. Math. 40 (5), 525–534 (2019).
J. Doerfert and H. Finkel, ‘‘Compiler optimizations for parallel programs,’’ in Languages and Compilers for Parallel Computing LCPC 2018, Ed. by M. Hall and H. Sundar, Lect. Notes Comput. Sci. 11882 (2019).
B. M. Shabanov, A. A. Rybakov, and S. S. Shumilin, ‘‘Vectorization of high-performance scientific calculations using AVX-512 intruction set,’’ Lobachevskii J. Math. 40 (5), 580–598 (2019).
Intel 64 and IA-32 Architectures Software Developer’s Manual (Intel Corp., 2019), Combined Vols.: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4.
A. A. Rybakov, ‘‘Optimization of the problem of conflict detection with dangerous aircraft movement areas to execute on Intel Xeon Phi,’’ Program. Produkty Sist. 30, 524–528 (2017).
O. Krzikalla, F. Wende, and M. Höhnerbach, ‘‘Dynamic SIMD vector lane scheduling,’’ Lect. Notes Comput. Sci. 9945, 354–365 (2016).
Intel Intrinsics Guide. https://software.intel.com/sites/landingpage/IntrinsicsGuide/. Accessed 2020.
E. F. Toro, NUMERICA, A Library of Sources for Teaching, Research and Applications. https://github.com/dasikasunder/NUMERICA. Accessed 2018.
M. Bader, A. Breuer, W. Höltz, S. Rettenberger, ‘‘Vectorization of an augmented Riemann solver for the shallow water equations,’’ in Proceedings of the 2014 International Conference on High Performance Computing and Simulation HPCS 2014 (2014), pp. 193–201.
C. R. Ferreira, K. T. Mandli, and M. Bader, ‘‘Vectorization of Riemann solvers for the single- ans multi-layer shallow water equations,’’ in Proceedings of the 2018 International Conference on High Performance Computing and Simulation, HPCS 2018 (2018), pp. 415–422.
R. Mittal and G. Iaccarino, ‘‘Immersed boundary methods,’’ Ann. Rev. Fluid Mech. 37, 239–261 (2005).
Y.-H. Tseng and J. H. Ferziger, ‘‘A ghost-cell immersed boundary method for flow in complex geometry,’’ J. Comput. Phys. 192, 593–623 (2003).
The supercomputer MVS-10P, located at the JSCC RAS, was used for calculations during the research.
The work has been done at the JSCC RAS as part of the state assignment for the topic 0065-2019-0016 (reg. no. AAAA-A19-119011590098-8).
(Submitted by A. M. Elizarov)
About this article
Cite this article
Savin, G.I., Shabanov, B.M., Rybakov, A.A. et al. Vectorization of Flat Loops of Arbitrary Structure Using Instructions AVX-512. Lobachevskii J Math 41, 2575–2592 (2020). https://doi.org/10.1134/S1995080220120331
- flat loop
- predicated execution
- intrinsic function