Skip to main content

Performance Improvement of Multimedia Kernels Using Data- and Thread- Level Parallelism on CPU Platform

  • Conference paper
  • First Online:
High-Performance Computing and Big Data Analysis (TopHPC 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 891))

Abstract

Processor vendors have been expanding Single Instruction Multiple Data (SIMD) extensions in their General Purpose Processors (GPPs). These extensions have their own instruction set architecture and equipped with Special Purpose Instructions (SPIs) to exploit Data Level Parallelism (DLP). In addition, to these extensions, GPPs have been equipped with multicore technologies so that each processor has multicore to process program using exploiting Thread-level Parallelism (TLP). In order to exploit these technologies, SIMD and multicore, many parallel programming models such as Intrinsic Programming Model (IPM), and Compiler’s Automatic Vectorization (CAV) and OpenMP have been proposed. Increasing performance using DLP depends on the number of data that can be processed in parallel using SIMD instructions. While performance improvement using TLP depends on the number of cores and program dependencies. In order to increase the performance of multimedia kernels, we exploit both DLP and TLP using parallel programming models such as IPM, CAVs, and OpenMP in this paper. Our experimental results show that the combination of DLP and TLP can improve performance significantly compared to each DLP and TLP individually. In addition, various compilers such as ICC, GCC, and LLVM are evaluated in terms of automatic vectorization. The obtained results show that ICC and GCC compilers have more ability to vectorize the kernels in comparison with LLVM compiler. In addition, despite the higher efficiency of IPM than CAVs, it is tedious and error-prone, and more attention is needed to develop and to extend auto-vectorization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shahbahrami, A., Juurlink, B., Vassiliadis, S.: Matrix register file and extended subwords. In: Proceedings of the 2nd Conference on Computing Frontiers, vol. 5, no. 1, p. 171 (2005)

    Google Scholar 

  2. Shahbahrami, A.: Avoiding conversion and rearrangement overhead in SIMD architectures. Ph.D. dissertation, Computer Engineering Laboratory, Delft University of Technology, Delft, Netherlands (2008)

    Google Scholar 

  3. Intel Corporation: Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture, vol. 1, no. 253665-060US (2016)

    Google Scholar 

  4. Pohl, A., Cosenza, B., Mesa, M. A., Chi, C.C., Juurlink, B.: An evaluation of current SIMD programming models for C++. In: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing - WPMVP 2016, pp. 1–8 (2016)

    Google Scholar 

  5. Shahbahrami, A., Juurlink, B., Vassiliadis, S.: A comparison between processor architectures for multimedia application. In: Proceedings of the 15th Annual Workshop on Circuits, Systems and Signal Processing 2004, pp. 138–152 (2004)

    Google Scholar 

  6. Choi, J., Dongarra, J.J., Walker, D.W.: Parallel matrix multiplication algorithms on distributed memory concurrent computers. Parallel Comput. 21(9), 1387–1405 (1995)

    Article  MathSciNet  Google Scholar 

  7. Zekri, A.S.: Enhancing the matrix transpose operation using Intel AVX instruction set extension. Int. J. Comput. Sci. Inf. Technol. 6(3), 67–78 (2014)

    Google Scholar 

  8. Chatterjee, S., Sen, S.: Cache-efficient matrix transposition. In: Proceedings Sixth International Symposium on High-Performance Computer Architecture, HPCA-6 (Cat. No. PR00550), pp. 195–205 (2000)

    Google Scholar 

  9. Kyo, S., Okazaki, S., Kuroda, I.: An extended C language and a SIMD compiler for efficient implementation of image filters on media extended micro-processors. In: Proceedings of Advanced Concepts for Intelligent Vision Systems, pp. 234–241 (2003)

    Google Scholar 

  10. Amiri, H., Shahbahrami, A.: High performance implementation of 2D convolution using Intel’s advanced vector extensions. In: 2017 Artificial Intelligence and Signal Processing Conference, AISP, pp. 25–30 (2017)

    Google Scholar 

  11. Peleg, A., Weiser, U.: MMX technology extension to the Intel architecture. IEEE Micro 16(4), 42–50 (1996)

    Article  Google Scholar 

  12. Intel Corporation: Intel Intrinsics Guide, 29 January 2017. https://software.intel.com/sites/landingpage/IntrinsicsGuide

  13. Amiri, H., Shahbahrami, A., Pohl, A., Juurlink, B.: Performance evaluation of implicit and explicit SIMDization. Microprocess. Microsyst. 63, 158–168 (2018)

    Article  Google Scholar 

  14. Amiri, H., Shahbahrami, A.: High performance implementation of 2-D convolution using AVX2. In: 2017 19th International Symposium on Computer Architecture and Digital Systems (CADS), pp. 1–4. IEEE (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asadollah Shahbahrami .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moradifar, M., Shahbahrami, A., Nematpour, M., Amiri, H. (2019). Performance Improvement of Multimedia Kernels Using Data- and Thread- Level Parallelism on CPU Platform. In: Grandinetti, L., Mirtaheri, S., Shahbazian, R. (eds) High-Performance Computing and Big Data Analysis. TopHPC 2019. Communications in Computer and Information Science, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-33495-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33495-6_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33494-9

  • Online ISBN: 978-3-030-33495-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics