Extending Distributed Shared Memory for the Cell Broadband Engine to a Channel Model
As the performance gains from traditional processors decline, alternative processor designs are becoming available. One such processor is the CELL-BE processor, which theoretically can deliver a sustained performance close to 205 GFLOPS per processor. Unfortunately, the high performance comes at the price of a quite complex programming model. In this paper we present an easy-to-use, CSP-like, communication method, which enables transfers of shared memory objects. The channel based communication method can significantly reduce the complexity of massively parallel programs. By implementing a few scientific computational cores we show that performance and scalability of the system is acceptable for most problems.
KeywordsCSP CELL-BE DSMCBE channel communication
Unable to display preview. Download preview PDF.
- 2.Beltran, V., Carrera, D., Torres, J., Ayguade, E.: CellMT: A Cooperative Multithreading Library for the Cell/B.E. In: The Proceedings of the 16th Annual IEEE International Conference on High Performance Computing, HiPC 2009 (December 2009)Google Scholar
- 3.Bjørndalen, J.M., Vinter, B., Anshus, O.: PyCSP - Communicating Sequential Processes for PythonGoogle Scholar
- 4.Brown, N., Welch, P.: An Introduction to the Kent C++CSP Library. SlidesGoogle Scholar
- 5.Chen, T.: Cell Broadband Engine Architecture and its first implementation - A Performance View (2005), http://www.ibm.com/developerworks/power/library/pa-cellperf/ (accessed July 26, 2010)
- 7.Hoare, C.A.R.: Communicating Sequential Processes. Prentice-Hall (1985)Google Scholar
- 8.IBM: IBM doubles down on cell blade (2007), http://www-03.ibm.com/press/us/en/pressrelease/22258.wss (accessed July 26, 2010)
- 9.Jowkar, M.: Exploring the Potential of the Cell Processor for High Performance Computing (2007), http://www.diku.dk/~rehr/cell/docs/mohammad_jowkar_thesis.pdf (accessed July 26, 2010)
- 10.Langou, J., Langou, J., Luszczek, P., Kurzak, J., Buttari, A., Dongarra, J.: Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (2006), http://www.netlib.org/lapack/lawnspdf/lawn175.pdf (accessed March 29, 2010)
- 11.Larsen, M.N., Skovhede, K., Vinter, B.: Distributed Shared Memory for the Cell Broadband Engine (DSMCBE). In: International Symposium on Parallel and Distributed Computing, pp. 121–124 (2009)Google Scholar
- 12.Mcewan, A.A., Schneider, S., Ifill, W., Welch, P., Brown, N.: C++CSP2: A Many-to-Many Threading Model for Multicore Architectures (2007)Google Scholar
- 13.Rehr, M.: Application Porting and Tuning on the Cell-BE Processor (2008), http://dk.migrid.org/public/doc/published_papers/nqueens.pdf (accessed July 26, 2010)
- 14.Schaller, N.C., Hilderink, G.H., Welch, P.H.: Using Java for Parallel Computing - JCSP versus CTJ. In: Communicating Process Architectures 2000, pp. 205–226 (2000)Google Scholar