Abstract
As heterogeneous computing platforms become more prevalent, the programmer must account for complex memory hierarchies in addition to the difficulties of parallel programming. OpenCL is an open standard for parallel computing that helps alleviate this difficulty by providing a portable set of abstractions for device memory hierarchies. However, OpenCL requires that the programmer explicitly controls data transfer and device synchronization, two tedious and error-prone tasks. This paper introduces Maestro, an open source library for data orchestration on OpenCL devices. Maestro provides automatic data transfer, task decomposition across multiple devices, and autotuning of dynamic execution parameters for some types of problems.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download to read the full chapter text
Chapter PDF
References
Meredith, J.S., Alvarez, G., Maier, T.A., Schulthess, T.C., Vetter, J.S.: Accuracy and Performance of Graphics Processors: A Quantum Monte Carlo Application Case Study. Parallel Computing 35(3), 151–163 (2009)
Spafford, K.L., Meredith, J.S., Vetter, J.S., Chen, J., Grout, R., Sankaran, R.: Accelerating S3D: A GPGPU Case Study. In: HeteroPar 2009: Proceedings of the Seventh International Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Platforms (2009)
Rodrigues, C.I., Hardy, D.J., Stone, J.E., Schulten, K., Hwu, W.M.W.: GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications. In: CF 2008: Proceedings of the 2008 Conference on Computing Frontiers, pp. 273–282. ACM, New York (2008)
He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient Gather and Scatter Operations on Graphics Processors. In: SC 2007: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pp. 1–12. ACM, New York (2007)
Fujimoto, N.: Faster Matrix-Vector Multiplication on GeForce 8800GTX. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–8 (April 2008)
Bolz, J., Farmer, I., Grinspun, E., Schröoder, P.: Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. In: ACM SIGGRAPH 2003, pp. 917–924. ACM, New York (2003)
Stone, J.E., Phillips, J.C., Freddolino, P.L., Hardy, D.J., Trabuco, L.G., Schulten, K.: Accelerating Molecular Modeling Applications With Graphics Processors. Journal of Computational Chemistry 28, 2618–2640 (2005)
The Khronos Group (2009), http://www.khronos.org/opencl/
Danalis, A., Marin, G., McCurdy, C., Mereidth, J., Roth, P., Spafford, K., Tipparaju, V., Vetter, J.: The Scalable Heterogeneous Computing (SHOC) Benchmark Suite. In: Proceedings of the Third Annual Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU 2010). ACM, New York (2010)
Owens, J., Houston, M., Luebke, D., Green, S., Stone, J., Phillips, J.: GPU Computing. Proceedings of the IEEE 96(5), 879–899 (2008)
Venkatasubramanian, S., Vuduc, R.W.: Tuned and Wildly Asynchronous Stencil Kernels for Hybrid CPU/GPU Systems. In: ICS 2009: Proceedings of the 23rd international conference on Supercomputing, pp. 244–255. ACM, New York (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Spafford, K., Meredith, J., Vetter, J. (2010). Maestro: Data Orchestration and Tuning for OpenCL Devices. In: D’Ambra, P., Guarracino, M., Talia, D. (eds) Euro-Par 2010 - Parallel Processing. Euro-Par 2010. Lecture Notes in Computer Science, vol 6272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15291-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-15291-7_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15290-0
Online ISBN: 978-3-642-15291-7
eBook Packages: Computer ScienceComputer Science (R0)