Performance Improvement of Multimedia Kernels Using Data- and Thread- Level Parallelism on CPU Platform

Moradifar, Maryam; Shahbahrami, Asadollah; Nematpour, Mina; Amiri, Hossein

doi:10.1007/978-3-030-33495-6_35

Maryam Moradifar⁹,
Asadollah Shahbahrami⁹,
Mina Nematpour⁹ &
…
Hossein Amiri⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 891))

Included in the following conference series:

International Congress on High-Performance Computing and Big Data Analysis

660 Accesses
2 Citations

Abstract

Processor vendors have been expanding Single Instruction Multiple Data (SIMD) extensions in their General Purpose Processors (GPPs). These extensions have their own instruction set architecture and equipped with Special Purpose Instructions (SPIs) to exploit Data Level Parallelism (DLP). In addition, to these extensions, GPPs have been equipped with multicore technologies so that each processor has multicore to process program using exploiting Thread-level Parallelism (TLP). In order to exploit these technologies, SIMD and multicore, many parallel programming models such as Intrinsic Programming Model (IPM), and Compiler’s Automatic Vectorization (CAV) and OpenMP have been proposed. Increasing performance using DLP depends on the number of data that can be processed in parallel using SIMD instructions. While performance improvement using TLP depends on the number of cores and program dependencies. In order to increase the performance of multimedia kernels, we exploit both DLP and TLP using parallel programming models such as IPM, CAVs, and OpenMP in this paper. Our experimental results show that the combination of DLP and TLP can improve performance significantly compared to each DLP and TLP individually. In addition, various compilers such as ICC, GCC, and LLVM are evaluated in terms of automatic vectorization. The obtained results show that ICC and GCC compilers have more ability to vectorize the kernels in comparison with LLVM compiler. In addition, despite the higher efficiency of IPM than CAVs, it is tedious and error-prone, and more attention is needed to develop and to extend auto-vectorization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shahbahrami, A., Juurlink, B., Vassiliadis, S.: Matrix register file and extended subwords. In: Proceedings of the 2nd Conference on Computing Frontiers, vol. 5, no. 1, p. 171 (2005)
Google Scholar
Shahbahrami, A.: Avoiding conversion and rearrangement overhead in SIMD architectures. Ph.D. dissertation, Computer Engineering Laboratory, Delft University of Technology, Delft, Netherlands (2008)
Google Scholar
Intel Corporation: Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture, vol. 1, no. 253665-060US (2016)
Google Scholar
Pohl, A., Cosenza, B., Mesa, M. A., Chi, C.C., Juurlink, B.: An evaluation of current SIMD programming models for C++. In: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing - WPMVP 2016, pp. 1–8 (2016)
Google Scholar
Shahbahrami, A., Juurlink, B., Vassiliadis, S.: A comparison between processor architectures for multimedia application. In: Proceedings of the 15th Annual Workshop on Circuits, Systems and Signal Processing 2004, pp. 138–152 (2004)
Google Scholar
Choi, J., Dongarra, J.J., Walker, D.W.: Parallel matrix multiplication algorithms on distributed memory concurrent computers. Parallel Comput. 21(9), 1387–1405 (1995)
Article MathSciNet Google Scholar
Zekri, A.S.: Enhancing the matrix transpose operation using Intel AVX instruction set extension. Int. J. Comput. Sci. Inf. Technol. 6(3), 67–78 (2014)
Google Scholar
Chatterjee, S., Sen, S.: Cache-efficient matrix transposition. In: Proceedings Sixth International Symposium on High-Performance Computer Architecture, HPCA-6 (Cat. No. PR00550), pp. 195–205 (2000)
Google Scholar
Kyo, S., Okazaki, S., Kuroda, I.: An extended C language and a SIMD compiler for efficient implementation of image filters on media extended micro-processors. In: Proceedings of Advanced Concepts for Intelligent Vision Systems, pp. 234–241 (2003)
Google Scholar
Amiri, H., Shahbahrami, A.: High performance implementation of 2D convolution using Intel’s advanced vector extensions. In: 2017 Artificial Intelligence and Signal Processing Conference, AISP, pp. 25–30 (2017)
Google Scholar
Peleg, A., Weiser, U.: MMX technology extension to the Intel architecture. IEEE Micro 16(4), 42–50 (1996)
Article Google Scholar
Intel Corporation: Intel Intrinsics Guide, 29 January 2017. https://software.intel.com/sites/landingpage/IntrinsicsGuide
Amiri, H., Shahbahrami, A., Pohl, A., Juurlink, B.: Performance evaluation of implicit and explicit SIMDization. Microprocess. Microsyst. 63, 158–168 (2018)
Article Google Scholar
Amiri, H., Shahbahrami, A.: High performance implementation of 2-D convolution using AVX2. In: 2017 19th International Symposium on Computer Architecture and Digital Systems (CADS), pp. 1–4. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Faculty of Engineering, University of Guilan, Rasht, Iran
Maryam Moradifar, Asadollah Shahbahrami, Mina Nematpour & Hossein Amiri

Authors

Maryam Moradifar
View author publications
You can also search for this author in PubMed Google Scholar
Asadollah Shahbahrami
View author publications
You can also search for this author in PubMed Google Scholar
Mina Nematpour
View author publications
You can also search for this author in PubMed Google Scholar
Hossein Amiri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asadollah Shahbahrami .

Editor information

Editors and Affiliations

University of Calabria, Rende, Italy
Lucio Grandinetti
Kharazmi University, Tehran, Iran
Seyedeh Leili Mirtaheri
University of Calabria, Rende, Italy
Reza Shahbazian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moradifar, M., Shahbahrami, A., Nematpour, M., Amiri, H. (2019). Performance Improvement of Multimedia Kernels Using Data- and Thread- Level Parallelism on CPU Platform. In: Grandinetti, L., Mirtaheri, S., Shahbazian, R. (eds) High-Performance Computing and Big Data Analysis. TopHPC 2019. Communications in Computer and Information Science, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-33495-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-33495-6_35
Published: 20 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33494-9
Online ISBN: 978-3-030-33495-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics