Abstract
Processor vendors have been expanding Single Instruction Multiple Data (SIMD) extensions in their General Purpose Processors (GPPs). These extensions have their own instruction set architecture and equipped with Special Purpose Instructions (SPIs) to exploit Data Level Parallelism (DLP). In addition, to these extensions, GPPs have been equipped with multicore technologies so that each processor has multicore to process program using exploiting Thread-level Parallelism (TLP). In order to exploit these technologies, SIMD and multicore, many parallel programming models such as Intrinsic Programming Model (IPM), and Compiler’s Automatic Vectorization (CAV) and OpenMP have been proposed. Increasing performance using DLP depends on the number of data that can be processed in parallel using SIMD instructions. While performance improvement using TLP depends on the number of cores and program dependencies. In order to increase the performance of multimedia kernels, we exploit both DLP and TLP using parallel programming models such as IPM, CAVs, and OpenMP in this paper. Our experimental results show that the combination of DLP and TLP can improve performance significantly compared to each DLP and TLP individually. In addition, various compilers such as ICC, GCC, and LLVM are evaluated in terms of automatic vectorization. The obtained results show that ICC and GCC compilers have more ability to vectorize the kernels in comparison with LLVM compiler. In addition, despite the higher efficiency of IPM than CAVs, it is tedious and error-prone, and more attention is needed to develop and to extend auto-vectorization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shahbahrami, A., Juurlink, B., Vassiliadis, S.: Matrix register file and extended subwords. In: Proceedings of the 2nd Conference on Computing Frontiers, vol. 5, no. 1, p. 171 (2005)
Shahbahrami, A.: Avoiding conversion and rearrangement overhead in SIMD architectures. Ph.D. dissertation, Computer Engineering Laboratory, Delft University of Technology, Delft, Netherlands (2008)
Intel Corporation: Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture, vol. 1, no. 253665-060US (2016)
Pohl, A., Cosenza, B., Mesa, M. A., Chi, C.C., Juurlink, B.: An evaluation of current SIMD programming models for C++. In: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing - WPMVP 2016, pp. 1–8 (2016)
Shahbahrami, A., Juurlink, B., Vassiliadis, S.: A comparison between processor architectures for multimedia application. In: Proceedings of the 15th Annual Workshop on Circuits, Systems and Signal Processing 2004, pp. 138–152 (2004)
Choi, J., Dongarra, J.J., Walker, D.W.: Parallel matrix multiplication algorithms on distributed memory concurrent computers. Parallel Comput. 21(9), 1387–1405 (1995)
Zekri, A.S.: Enhancing the matrix transpose operation using Intel AVX instruction set extension. Int. J. Comput. Sci. Inf. Technol. 6(3), 67–78 (2014)
Chatterjee, S., Sen, S.: Cache-efficient matrix transposition. In: Proceedings Sixth International Symposium on High-Performance Computer Architecture, HPCA-6 (Cat. No. PR00550), pp. 195–205 (2000)
Kyo, S., Okazaki, S., Kuroda, I.: An extended C language and a SIMD compiler for efficient implementation of image filters on media extended micro-processors. In: Proceedings of Advanced Concepts for Intelligent Vision Systems, pp. 234–241 (2003)
Amiri, H., Shahbahrami, A.: High performance implementation of 2D convolution using Intel’s advanced vector extensions. In: 2017 Artificial Intelligence and Signal Processing Conference, AISP, pp. 25–30 (2017)
Peleg, A., Weiser, U.: MMX technology extension to the Intel architecture. IEEE Micro 16(4), 42–50 (1996)
Intel Corporation: Intel Intrinsics Guide, 29 January 2017. https://software.intel.com/sites/landingpage/IntrinsicsGuide
Amiri, H., Shahbahrami, A., Pohl, A., Juurlink, B.: Performance evaluation of implicit and explicit SIMDization. Microprocess. Microsyst. 63, 158–168 (2018)
Amiri, H., Shahbahrami, A.: High performance implementation of 2-D convolution using AVX2. In: 2017 19th International Symposium on Computer Architecture and Digital Systems (CADS), pp. 1–4. IEEE (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Moradifar, M., Shahbahrami, A., Nematpour, M., Amiri, H. (2019). Performance Improvement of Multimedia Kernels Using Data- and Thread- Level Parallelism on CPU Platform. In: Grandinetti, L., Mirtaheri, S., Shahbazian, R. (eds) High-Performance Computing and Big Data Analysis. TopHPC 2019. Communications in Computer and Information Science, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-33495-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-33495-6_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33494-9
Online ISBN: 978-3-030-33495-6
eBook Packages: Computer ScienceComputer Science (R0)