Abstract
GPGPU (General Purpose computing on Graphics Processing Units) enables massive parallelism by taking advantage of the Single Instruction Multiple Data (SIMD) architecture of the large number of cores found on modern graphics cards. A parameter called local work group size controls how many work items are concurrently executed on a single compute unit. Though critical to the performance, there is no deterministic way to tune it, leaving developers to manual trial and error. This paper applies amortised optimisation to determine the best local work group size for GPGPU implementations of OpenCV template matching feature. The empirical evaluation shows that optimised local work group size can outperform the default value with large effect sizes.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
On the other hand, the global work group size corresponds to the number of all parallel work items.
- 2.
Boxplots for all experiments can be found at http://coinse.kaist.ac.kr/projects/adpoopencv.
References
What is the algorithm to determine optimal work group size and number of workgroup? http://stackoverflow.com/questions/10096443/what-is-the-algorithm-to-determine-optimal-work-group-size-and-number-of-workgro
OpenCL Performance in OpenCV 3.0, May 2016. http://opencv.org/platforms/opencl.html
Chen, J.Y.: Gpu technology trends and future requirements. In: 2009 IEEE International Electron Devices Meeting (IEDM), pp. 1–6, December 2009
Intel Corporation: Work-group size considerations (2012). https://software.intel.com/sites/landingpage/opencl/optimization-guide/Work-Group_Size_Considerations.htm
Itseez: Open source computer vision library. https://github.com/itseez/opencv
Luebke, D., Harris, M., Krüger, J., Purcell, T., Govindaraju, N., Buck, I., Woolley, C., Lefohn, A.: GPGPU: General purpose computation on graphics hardware. In: ACM SIGGRAPH 2004 Course Notes, SIGGRAPH 2004. ACM (2004)
Moore, G.E.: Cramming more components onto integrated circuits. Electron. Mag. 38, 114–117 (1965)
Stone, J.E., Gohara, D., Shi, G.: Opencl: A parallel programming standard for heterogeneous computing systems. IEEE Des. Test 12(3), 66–73 (2010)
Wu, F., Weimer, W., Harman, M., Jia, Y., Krinke, J.: Deep parameter optimisation. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, GECCO 2015, pp. 1375–1382. ACM, New York (2015)
Yoo, S.: Amortised optimisation of non-functional properties in production environments. In: Barros, M., Labiche, Y. (eds.) SSBSE 2015. LNCS, vol. 9275, pp. 31–46. Springer, Heidelberg (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Sohn, J., Lee, S., Yoo, S. (2016). Amortised Deep Parameter Optimisation of GPGPU Work Group Size for OpenCV. In: Sarro, F., Deb, K. (eds) Search Based Software Engineering. SSBSE 2016. Lecture Notes in Computer Science(), vol 9962. Springer, Cham. https://doi.org/10.1007/978-3-319-47106-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-47106-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47105-1
Online ISBN: 978-3-319-47106-8
eBook Packages: Computer ScienceComputer Science (R0)