Optimizing image spatial filtering on single CPU core

Zekri, Ahmed Sherif

doi:10.1007/s11042-016-4266-5

Optimizing image spatial filtering on single CPU core

Published: 23 December 2016

Volume 77, pages 251–281, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ahmed Sherif Zekri^1,2

284 Accesses
5 Citations
Explore all metrics

Abstract

Nowadays, computing becomes a service on cloud computing resources. Users reserve virtual machines to execute their applications with minimum number of processing cores to save money. Optimizing user applications on the level of single core of a physical machine is highly desirable to users to reduce cost, as well as to cloud providers to reduce power consumption. In this paper, we showed how to exploit all the processing resources available in a single CPU physical core to optimize the performance of the 2D spatial filtering operation, a basic kernel in important image and multimedia applications such as image enhancement, edge detection, image segmentation, and image analysis. We proposed a novel computational procedure to restructure the conventional image filtering operation. Then, we demonstrated the merits of combining hand-optimized source-code restructuring, auto-optimized compiler techniques including vectorization, and hand-optimized threading to squeeze the performance of a single CPU core. Our intensive performance evaluations, using Sobel filters, on a variety of image sizes using the Linux Perf tool on a single core of the quad-core Intel Core i7 processor showed that our source-code restructurings with compiler auto-vectorization, using Intel AVX vector instructions, is 1.3X better than the non-restructured auto-vectorized version of the CImg library for computing the image gradient. Moreover, using OpenMP library directives we studied different image partitioning strategies to better exploit the two hardware threads inside a CPU core which boosted performance to 2.6X. Compared with the conventional CImg implementation, we obtained an average enhancement of 5.0X for image sizes ranging from 0.5 MPixel to 8 MPixel. However, comparing our best-optimized code to the conventional non-optimized serial code, without threading, resulted in a significant enhancement of 23X. The overall results showed how significant performance in important image processing applications can be obtained by applying source-code restructurings before employing any automatic compiler optimizations to exploit ILP, DLP and TLP parallelism degrees inside a single core of a multi-core CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bacon DF, Graham SL, Sharp OJ (1994) Compiler Transformations for High Performance Computing. ACM Computing Surveys, pp 345–420
Banerjee U, Eigenmann R, Nicolau A, Padua DA (1993) Automatic Program Parallelization. Proc IEEE 81(2):211–243
Article Google Scholar
Bik AJC, Girkar M, Grey PM, Tian X (2002) Automatic intra-register vectorization for the Intel architecture. Int J Parallel Prog 30(2):65–98
Article MATH Google Scholar
Chang F-C, Huang H-C (2012) A refactoring method for cache-efficient swarm intelligence algorithms. Inf Sci 192:39–49
Article Google Scholar
CImg Image processing library (2016) http://cimg.eu/reference/index.html. Retrieved 19 May 2016
Free Software Foundation, Inc. (2016) Options That Control Optimization in GCC and G++ compiler. https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html. Retrieved 28 May 2016
Gonzalez RC (2008) Digital Image Processing. Pearson Education, Inc., publishing as Prentice Hall
Intel Corporation (2007) Intel SSE4 Programming Reference. Retrieved 18 July 2009
Intel Corporation (2016) Intel 64 and IA-32 architectures optimization reference manual. Retrieved 19 May 2016
Intrinsics for All Intel® Architectures (2016) https://software.intel.com/en-us/node/513827. Retrieved 31 May 2016
Kim D, Lee VW, Chen YK (2010) Image processing on multicore ×86 architectures. IEEE Signal Process Mag 27(2):97–107
Article Google Scholar
Kim CG, Kim JG, Lee DH (2014) Optimizing image processing on multi-core CPUs with Intel parallel programming technologies. Multimed Tools Appl 68:237–251
Article Google Scholar
Mitra G, et al. (2013) Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms. In Proceedings of the 2013 I.E. 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW ‘13), Washington, DC, pp 1107–1116
Patterson DA, Hennessy JL (2013) Computer organization and design, fifth edition: the hardware/software Interface, 5th edn. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Pingali VK, McKee SA, Hsieh WC, Carter JB (2003) Restructuring computations for temporal data cache locality. Int J Parallel Prog 31(4):305–338
Article MATH Google Scholar
Slingerland N, Smith A (2005) Multimedia extensions for general purpose microprocessors: a survey. Microprocess Microsyst 29(5):225–246
Article Google Scholar
Sobel I (1990) An isotropic 3 × 3 gradient operator. In: Freeman H (ed) Machine vision for three – dimensional scenes. Academic Press, NY, pp. 376–379
Google Scholar
Torres G (2016) Inside the Intel Sandy Bridge Microarchitecture. hardwaresecrets.com. Retrieved 19 May 2016

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Faculty of Science, Beirut Arab University, P.O. Box 115020, Riad El Solh, Beirut, 11072809, Lebanon
Ahmed Sherif Zekri
Department of Mathematics and Computer Science, Faculty of Science, Alexandria University, Alexandria, Egypt
Ahmed Sherif Zekri

Authors

Ahmed Sherif Zekri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Sherif Zekri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zekri, A.S. Optimizing image spatial filtering on single CPU core. Multimed Tools Appl 77, 251–281 (2018). https://doi.org/10.1007/s11042-016-4266-5

Download citation

Received: 31 May 2016
Revised: 07 December 2016
Accepted: 13 December 2016
Published: 23 December 2016
Issue Date: January 2018
DOI: https://doi.org/10.1007/s11042-016-4266-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing image spatial filtering on single CPU core

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Histogram Equalization Variants as Optimization Problems: A Review

Image and video processing on mobile devices: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimizing image spatial filtering on single CPU core

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Histogram Equalization Variants as Optimization Problems: A Review

Image and video processing on mobile devices: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation