Abstract
This paper describes the implementation of a runtime library for asynchronous communication in the Cell BE processor. The runtime library implementation provides with several services that allow the compiler to generate code, maximizing the chances for overlapping communication and computation. The library implementation is organized as a Software Cache and the main services correspond to mechanisms for data look up, data placement and replacement, data write back, memory synchronization and address translation. The implementation guarantees that all those services can be totally uncoupled when dealing with memory references. Therefore this provides opportunities to the compiler to organize the generated code in order to overlap as much as possible computation with communication. The paper also describes the necessary mechanism to overlap the communication related to write back operations with actual computation. The paper includes the description of the compiler basic algorithms and optimizations for code generation. The system is evaluated measuring bandwidth and global updates ratios, with two benchmarks from the HPCC benchmark suite: Stream and Random Access.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Eichenberger, A.E., O’Brien, K., O’Brien, K., Wu, P., Chen, T., Oden, P.H., Prener, D.A., Shepherd, J.C., So, B., Sura, Z.: Optimizing Compiler for a Cell Processor. In: 14th Parallel Architectures and Compilation Techniques, Saint Louis (Missouri) (September 2005)
Kistler, M., Perrone, M., Petrini, F.: Cell Multiprocessor Communication Network: Built for Speed. IEEE Micro 26(3), 10–23 (2006)
Luszczek, P., Bailey, D., Dongarra, J., Kepner, J., Lucas, R., Rabenseifner, R., Takahashi, D.: The HPC Challenge (HPCC) Benchmark Suite. In: SC 2006 Conference Tutorial. IEEE, Los Alamitos (2006)
Wang, Q., Zhang, W., Zang, B.: Optimizing Software Cache Performance of Packet Processing Applications. In: LCTES 2007 (2007)
Dai, J., Li, L., Huang, B.: Pipelined Execution of Critical Sections Using Software-Controlled Caching in Network Processors. In: Proceedings of the International Symposium on Code Generation and Optimization table of contents, pp. 312–324 (2007), ISBN:0-7695-2764-7
Ravindran, R., Chu, M., Mahlke, S.: Compiler Managed Partitioned Data Caches for Low Power. In: LCTES 2007 (2007)
Chen, T., Sura, Z., O’Brien, K., O’Brien, K.: Optimizing the use of static buffers for DMA on a Cell chip. In: 19th International Workshop on Languages and Compilers for Parallel Computing, New Orleans, Louisiana, November 2-4 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Balart, J. et al. (2008). A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor. In: Adve, V., Garzarán, M.J., Petersen, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2007. Lecture Notes in Computer Science, vol 5234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85261-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-85261-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85260-5
Online ISBN: 978-3-540-85261-2
eBook Packages: Computer ScienceComputer Science (R0)