Skip to main content
Log in

Optimizing image processing on multi-core CPUs with Intel parallel programming technologies

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The rapid advance of computer hardware and popularity of multimedia applications enable multi-core processors with sub-word parallelism instructions to become a dominant market trend in desk-top PCs as well as high end mobile devices. This paper presents an efficient parallel implementation of 2D convolution algorithm demanding high performance computing power in multi-core desktop PCs. It is a representative computation intensive algorithm, in image and signal processing applications, accompanied by heavy memory access; on the other hand, their computational complexities are relatively low. The purpose of this study is to explore the effectiveness of exploiting the streaming SIMD (Single Instruction Multiple Data) extension (SSE) technology and TBB (Threading Building Block) run-time library in Intel multi-core processors. By doing so, we can take advantage of all the hardware features of multi-core processor concurrently for data- and task-level parallelism. For the performance evaluation, we implemented a 3 × 3 kernel based convolution algorithm using SSE2 and TBB with different combinations and compared their processing speeds. The experimental results show that both technologies have a significant effect on the performance and the processing speed can be greatly improved when using two technologies at the same time; for example, 6.2, 6.1, and 1.4 times speedup compared with the implementation of either of them are suggested for 256 × 256, 512 × 512, and 1024 × 1024 data sets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Akhter S, Roberts J (2006) Multi-core programming: increasing performance through software multi-threading. Intel Press

  2. Baker CG, Carter Edwards H, Heroux MA, Williams AB (2010) A light-weight api for portable multicore programming. In Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, Washington, DC, USA, 2010

  3. Bosi B, Bois G, Savaria Y (1999) Reconfigurable pipelined 2D convolvers for fast digital signal processing. IEEE Transactions on VLSI Systems 7(3):299–308

    Article  Google Scholar 

  4. Chhugani J, Macy M, Baransi A, Nguyen AD, Hagog M, Kumar S, Lee VW, Chen YK (2008) Efficient implementation of sorting on multi-core SIMD CPU architecture. Pradeep Dubey Journal: Proceedings of the VLDB Endowment 1(2):1313–1324

    Google Scholar 

  5. Contreras G, Martonosi M (2008) Characterizing and improving the performance of Intel threading building blocks. International Symposium on Workload Characterization (IISWC'08), September 2008. pp 1–10

  6. David M, Vasco S, Martin MD, Ken R, Peter C (2009) Digital signal processing on Intel architecture. Intel Press

  7. Diefendorff K, Dubey PK, Hochsprung R, Scale H (2000) AltiVec extension to PowerPC accelerates media processing. IEEE Micro 20(2):85–95

    Article  Google Scholar 

  8. Falcou J, Sérot J, Chateau T, Lapresté J-T (2006) Quaff: efficient C++ design for parallel skeletons. Parallel Computing 32(7–8):604–615

    Article  Google Scholar 

  9. Gonzalez R, Woods R (2002) Digital image processing, 2nd edn. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  10. Hecht V, Rönner K, Pirsch P (1991) An advanced programmable 2D convolution chip for real time image processing. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp 1897–1900

  11. Hennessy JL, Patterson DA (2003) Computer architecture: a quantitative approach, 3rd edn. Morgan-Kaufmann

  12. Kayi A, Yao Y, El-Ghazawi T, Newby G (2007) Experimental evaluation of emerging multi-core architectures. In Proceeding of IPDPS 2007:1–6

    Google Scholar 

  13. Kim WY, Voss M (2011) Multicore desktop programming with Intel threading building blocks. IEEE Softw 2011:23–31

    Article  Google Scholar 

  14. Kim CG, Kim HS, Kang SH, Kim SD, Han GH (2004) An acceleration processor for data intensive scientific computing. IEICE Trans Inf Syst E87-D:1766–1773

    Google Scholar 

  15. Kirschenmann W, Plagne L, Vialle S (2010) Multi-target vectorization with MTPS C++ generic library. In PARA 2010: State of the Art in Scientific and Parallel Computing, Iceland Reykjavik, June 2010

  16. Kohn L, Maturana G, Tremblay M, Prabhu A, Zyner G (1995) The visual instruction set (VIS) in UltraSPARC (Compcon 95). Technologies for the Information Superhighway, Digest of Papers, pp 462–469

  17. Lee RB, Fiskiran AM (2002) Multimedia instructions in microprocessors for native signal processing. Programmable Digital Signal Processors: Architecture, Programming, and Applications, Marcel Dekker, pp 91–145

  18. Ma WC, Yang CL (2002) Using intel streaming SIMD extensions for 3D geometry processing. Proceedings of the 3rd IEEE Pacific-Rim Conference on Multimedia Processing

  19. Nicole R (2001) Desktop performance and optimization for Intel® Pentium® 4 Processor, founded at ftp://download.intel.com/design/pentium4/papers/24943801.pdf

  20. Oberman S, Favor G, Weber F (1999) AMD 3D now! Technology: architecture and implementations. IEEE Micro 19(2):37–48

    Article  Google Scholar 

  21. Paxson V, Sommer R, Weaver N (2007) An architecture for exploiting multi-core processors to parallelize network intrusion prevention. In Proceeding of IEEE Sarnoff Symposium 2007:1–7

    Google Scholar 

  22. Peleg A, Weiser U (1996) MMX technology extension to the Intel architecture. IEEE Micro 16(4):42–50

    Article  Google Scholar 

  23. Perria S, Lanuzzaa M, Corsonellob P, Cocorulloa G (2005) A high-performance fully reconfigurable FPGA-based 2D convolution processor. Microprocessors and Microsystems 29:381–391

    Article  Google Scholar 

  24. Reinders J (2007) Intel threading building blocks. O’Reilly, Sebastopol

    Google Scholar 

  25. Robison A, Voss M, Kukanov A (2008) Optimization via reflection on work stealing in TBB. IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2008), pp 1–8

Download references

Acknowledgements

Funding for this paper was provided by Namseoul University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Do Hyeon Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, C.G., Kim, J.G. & Lee, D.H. Optimizing image processing on multi-core CPUs with Intel parallel programming technologies. Multimed Tools Appl 68, 237–251 (2014). https://doi.org/10.1007/s11042-011-0906-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-011-0906-y

Keywords

Navigation