Automatic Data Layout Transformation for Heterogeneous Many-Core Systems

Tseng, Ying-Yu; Huang, Yu-Hao; Lai, Bo-Cheng Charles; Lin, Jiun-Liang

doi:10.1007/978-3-662-44917-2_18

Ying-Yu Tseng¹⁸,
Yu-Hao Huang¹⁸,
Bo-Cheng Charles Lai¹⁸ &
…
Jiun-Liang Lin¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8707))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

2077 Accesses
3 Citations

Abstract

Applying appropriate data structures is critical to attain superior performance in heterogeneous many-core systems. A heterogeneous many-core system is comprised of a host for control flow management, and a device for massive parallel data processing. However, the host and device require different types of data structures. The host prefers Array-of-Structures (AoS) to ease the programming, while the device requires Structure-of-Arrays (SoA) for efficient data accesses. The conflicted preferences cost excessive effort for programmers to transform the data structures between two parts. The separately designed kernels with different coding styles also cause difficulty in maintaining programs. This paper addresses this issue by proposing a fully automated data layout transformation framework. Programmers can maintain the code in AoS style on the host, while the data layout is converted into SoA when being transferred to the device. The proposed framework streamlines the design flow and demonstrates up to 177% performance improvement.

Download to read the full chapter text

Chapter PDF

Data Layout Optimization for Portable Performance

Automatic Data Layout Optimizations for GPUs

Compiler-Driven Data Layout Transformation for Heterogeneous Platforms

Keywords

References

Sung, I.-J., Stratton, J.A., Hwu, W.-M.W.: DL: A Data Layout Transformation System for Heterogeneous Computing. In: Proc. IEEE InPar, San Jose, pp. 513–522 (May 13, 2012)
Google Scholar
Jang, B., Schaa, D., Mistry, P., Kaeli, D.: Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures. Proc. IEEE Transactions on Parallel and Distributed Systems 22(1) (January 2011)
Google Scholar
Che, S., Sheaffer, J.W., Skadron, K.: Dymaxion: optimizing memory access patterns for heterogeneous systems. In: Proc. SC, pp. 13–13 (2011)
Google Scholar
Karlsson, L.: Blocked in-place transposition with application to storage format conversion. Technical report (2009)
Google Scholar
Gustavson, F., Karlsson, L., Kagström, B.: Parallel and cache-efficient in-place matrix storage format conversion. ACM Transactions on Mathematical Software
Google Scholar
Ruetsch, G., Micikevicius, P.: Optimizing matrix transpose in CUDA (January 2009)
Google Scholar
Che, S., Sheaffer, J.W., Skadron, K.: Dymaxion: optimizing memory access patterns for heterogeneous systems. In: Proc. SC, p. 13 (2011)
Google Scholar
CUDA C programming guide, http://docs.nvidia.com/cuda/cuda-c-programmingguide/index.html
Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA Workloads Using a Detailed GPGPU Simulator. In: Ispass 2009: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163–174 (2009)
Google Scholar
Garland, M., Grand, S.L., Nickolls, J.: Parallel Computing Experiences with Cuda. IEEE Computer Society (2008)
Google Scholar
GPU Computing SDK, https://developer.nvidia.com/gpu-computing-sdk
Parboil Benchmarks, http://impact.crhc.illinois.edu/Parboil/parboil.aspx

Download references

Author information

Authors and Affiliations

Department of Electronics Engineering, National Chiao-Tung University, 1001 Da-Hsueh Rd, Hsinchu, Taiwan
Ying-Yu Tseng, Yu-Hao Huang, Bo-Cheng Charles Lai & Jiun-Liang Lin

Authors

Ying-Yu Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Bo-Cheng Charles Lai
View author publications
You can also search for this author in PubMed Google Scholar
Jiun-Liang Lin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Chung Hua University, 707, Sec. 2, WuFu Rd., 30012, Hsinchu, Taiwan
Ching-Hsien Hsu
Huazhong University of Science and Technology, 1037#, Luoyu Road, 430074, Wuhan, China
Xuanhua Shi
IBM Thomas J. Watson Research Center, 1101 Kitchawan Rd., 10598, Yorktown Heights, NY, USA
Valentina Salapura

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tseng, YY., Huang, YH., Lai, BC.C., Lin, JL. (2014). Automatic Data Layout Transformation for Heterogeneous Many-Core Systems. In: Hsu, CH., Shi, X., Salapura, V. (eds) Network and Parallel Computing. NPC 2014. Lecture Notes in Computer Science, vol 8707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44917-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-662-44917-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44916-5
Online ISBN: 978-3-662-44917-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Data Layout Transformation for Heterogeneous Many-Core Systems

Abstract

Chapter PDF

Similar content being viewed by others

Data Layout Optimization for Portable Performance

Automatic Data Layout Optimizations for GPUs

Compiler-Driven Data Layout Transformation for Heterogeneous Platforms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Data Layout Transformation for Heterogeneous Many-Core Systems

Abstract

Chapter PDF

Similar content being viewed by others

Data Layout Optimization for Portable Performance

Automatic Data Layout Optimizations for GPUs

Compiler-Driven Data Layout Transformation for Heterogeneous Platforms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation