Optimizing image processing on multi-core CPUs with Intel parallel programming technologies

Kim, Cheong Ghil; Kim, Jeom Goo; Lee, Do Hyeon

doi:10.1007/s11042-011-0906-y

Optimizing image processing on multi-core CPUs with Intel parallel programming technologies

Published: 09 November 2011

Volume 68, pages 237–251, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Cheong Ghil Kim¹,
Jeom Goo Kim¹ &
Do Hyeon Lee²

743 Accesses
20 Citations
3 Altmetric
Explore all metrics

Abstract

The rapid advance of computer hardware and popularity of multimedia applications enable multi-core processors with sub-word parallelism instructions to become a dominant market trend in desk-top PCs as well as high end mobile devices. This paper presents an efficient parallel implementation of 2D convolution algorithm demanding high performance computing power in multi-core desktop PCs. It is a representative computation intensive algorithm, in image and signal processing applications, accompanied by heavy memory access; on the other hand, their computational complexities are relatively low. The purpose of this study is to explore the effectiveness of exploiting the streaming SIMD (Single Instruction Multiple Data) extension (SSE) technology and TBB (Threading Building Block) run-time library in Intel multi-core processors. By doing so, we can take advantage of all the hardware features of multi-core processor concurrently for data- and task-level parallelism. For the performance evaluation, we implemented a 3 × 3 kernel based convolution algorithm using SSE2 and TBB with different combinations and compared their processing speeds. The experimental results show that both technologies have a significant effect on the performance and the processing speed can be greatly improved when using two technologies at the same time; for example, 6.2, 6.1, and 1.4 times speedup compared with the implementation of either of them are suggested for 256 × 256, 512 × 512, and 1024 × 1024 data sets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing Pointwise Convolutions on Multi-core DSPs

A Parallel Programming Model Research Based on Heterogeneous Multi-core Embedded Processor

Multithreading Programming for Feature Extraction in Digital Images

References

Akhter S, Roberts J (2006) Multi-core programming: increasing performance through software multi-threading. Intel Press
Baker CG, Carter Edwards H, Heroux MA, Williams AB (2010) A light-weight api for portable multicore programming. In Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, Washington, DC, USA, 2010
Bosi B, Bois G, Savaria Y (1999) Reconfigurable pipelined 2D convolvers for fast digital signal processing. IEEE Transactions on VLSI Systems 7(3):299–308
Article Google Scholar
Chhugani J, Macy M, Baransi A, Nguyen AD, Hagog M, Kumar S, Lee VW, Chen YK (2008) Efficient implementation of sorting on multi-core SIMD CPU architecture. Pradeep Dubey Journal: Proceedings of the VLDB Endowment 1(2):1313–1324
Google Scholar
Contreras G, Martonosi M (2008) Characterizing and improving the performance of Intel threading building blocks. International Symposium on Workload Characterization (IISWC'08), September 2008. pp 1–10
David M, Vasco S, Martin MD, Ken R, Peter C (2009) Digital signal processing on Intel architecture. Intel Press
Diefendorff K, Dubey PK, Hochsprung R, Scale H (2000) AltiVec extension to PowerPC accelerates media processing. IEEE Micro 20(2):85–95
Article Google Scholar
Falcou J, Sérot J, Chateau T, Lapresté J-T (2006) Quaff: efficient C++ design for parallel skeletons. Parallel Computing 32(7–8):604–615
Article Google Scholar
Gonzalez R, Woods R (2002) Digital image processing, 2nd edn. Prentice-Hall, Englewood Cliffs
Google Scholar
Hecht V, Rönner K, Pirsch P (1991) An advanced programmable 2D convolution chip for real time image processing. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp 1897–1900
Hennessy JL, Patterson DA (2003) Computer architecture: a quantitative approach, 3rd edn. Morgan-Kaufmann
Kayi A, Yao Y, El-Ghazawi T, Newby G (2007) Experimental evaluation of emerging multi-core architectures. In Proceeding of IPDPS 2007:1–6
Google Scholar
Kim WY, Voss M (2011) Multicore desktop programming with Intel threading building blocks. IEEE Softw 2011:23–31
Article Google Scholar
Kim CG, Kim HS, Kang SH, Kim SD, Han GH (2004) An acceleration processor for data intensive scientific computing. IEICE Trans Inf Syst E87-D:1766–1773
Google Scholar
Kirschenmann W, Plagne L, Vialle S (2010) Multi-target vectorization with MTPS C++ generic library. In PARA 2010: State of the Art in Scientific and Parallel Computing, Iceland Reykjavik, June 2010
Kohn L, Maturana G, Tremblay M, Prabhu A, Zyner G (1995) The visual instruction set (VIS) in UltraSPARC (Compcon 95). Technologies for the Information Superhighway, Digest of Papers, pp 462–469
Lee RB, Fiskiran AM (2002) Multimedia instructions in microprocessors for native signal processing. Programmable Digital Signal Processors: Architecture, Programming, and Applications, Marcel Dekker, pp 91–145
Ma WC, Yang CL (2002) Using intel streaming SIMD extensions for 3D geometry processing. Proceedings of the 3rd IEEE Pacific-Rim Conference on Multimedia Processing
Nicole R (2001) Desktop performance and optimization for Intel® Pentium® 4 Processor, founded at ftp://download.intel.com/design/pentium4/papers/24943801.pdf
Oberman S, Favor G, Weber F (1999) AMD 3D now! Technology: architecture and implementations. IEEE Micro 19(2):37–48
Article Google Scholar
Paxson V, Sommer R, Weaver N (2007) An architecture for exploiting multi-core processors to parallelize network intrusion prevention. In Proceeding of IEEE Sarnoff Symposium 2007:1–7
Google Scholar
Peleg A, Weiser U (1996) MMX technology extension to the Intel architecture. IEEE Micro 16(4):42–50
Article Google Scholar
Perria S, Lanuzzaa M, Corsonellob P, Cocorulloa G (2005) A high-performance fully reconfigurable FPGA-based 2D convolution processor. Microprocessors and Microsystems 29:381–391
Article Google Scholar
Reinders J (2007) Intel threading building blocks. O’Reilly, Sebastopol
Google Scholar
Robison A, Voss M, Kukanov A (2008) Optimization via reflection on work stealing in TBB. IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2008), pp 1–8

Download references

Acknowledgements

Funding for this paper was provided by Namseoul University.

Author information

Authors and Affiliations

Department of Computer Science, Namseoul University, 21 Maeju-ri, Seonghwan-eup, Seobuk-gu, Cheonan-city, Choongnam, 331-707, Korea
Cheong Ghil Kim & Jeom Goo Kim
IT Convergence Technology Research & Education Center, Namseoul University, 21 Maeju-ri, Seonghwan-eup, Seobuk-gu, Cheonan-city, Choongnam, 331-707, Korea
Do Hyeon Lee

Authors

Cheong Ghil Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jeom Goo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Do Hyeon Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Do Hyeon Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, C.G., Kim, J.G. & Lee, D.H. Optimizing image processing on multi-core CPUs with Intel parallel programming technologies. Multimed Tools Appl 68, 237–251 (2014). https://doi.org/10.1007/s11042-011-0906-y

Download citation

Published: 09 November 2011
Issue Date: January 2014
DOI: https://doi.org/10.1007/s11042-011-0906-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing image processing on multi-core CPUs with Intel parallel programming technologies

Abstract

Access this article

Similar content being viewed by others

Optimizing Pointwise Convolutions on Multi-core DSPs

A Parallel Programming Model Research Based on Heterogeneous Multi-core Embedded Processor

Multithreading Programming for Feature Extraction in Digital Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimizing image processing on multi-core CPUs with Intel parallel programming technologies

Abstract

Access this article

Similar content being viewed by others

Optimizing Pointwise Convolutions on Multi-core DSPs

A Parallel Programming Model Research Based on Heterogeneous Multi-core Embedded Processor

Multithreading Programming for Feature Extraction in Digital Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation