Exploiting Locality on the Cell/B.E. through Bypassing

  • Pieter Bellens
  • Josep M. Perez
  • Rosa M. Badia
  • Jesus Labarta
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5657)


Cell Superscalar (CellSs) provides a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of applications at a function or task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that orchestrates the concurrent execution of the application. In the context of our parallel runtime we analyse the effect of the bandwidth of the Element Interconnect Bus (EIB) on an application’s performance. We introduce a technique called bypassing that potentially increases the observed bandwidth and improves the execution time for applications with a distributed computation pattern. Although the integration of bypassing with CellSs is work in progress we present results for five fundamental linear algebra kernels to demonstrate the applicability of bypassing and to attempt to quantify the benefit that can be reaped.


Wait Time Main Memory Local Store Processor Element Helper Thread 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wulf, W.A., McKee, S.A.: Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News 23(1), 20–24 (1995)CrossRefGoogle Scholar
  2. 2.
    Wilkes, M.V.: The memory wall and the cmos end-point. SIGARCH Comput. Archit. News 23(4), 4–6 (1995)CrossRefGoogle Scholar
  3. 3.
    Rafique, N., Lim, W.-T., Thottethodi, M.: Effective management of dram bandwidth in multicore processors. In: PACT 2007: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, Washington, DC, USA, pp. 245–258. IEEE Computer Society, Los Alamitos (2007)Google Scholar
  4. 4.
    Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory access scheduling. In: ISCA 2000: Proceedings of the 27th annual international symposium on Computer architecture, pp. 128–138. ACM, New York (2000)Google Scholar
  5. 5.
    Ainsworth, T.W., Pinkston, T.M.: Characterizing the cell eib on-chip network. IEEE Micro. 27(5), 6–14 (2007)CrossRefGoogle Scholar
  6. 6.
    Jiménez-González, D., Martorell, X., Ramírez, A.: Performance analysis of cell broadband engine for high memory bandwidth applications. In: ISPASS, pp. 210–219. IEEE Computer Society, Los Alamitos (2007)Google Scholar
  7. 7.
    Ainsworth, T.W., Pinkston, T.M.: On characterizing performance of the cell broadband engine element interconnect bus. In: Proceedings of the First International Symposium on Networks-on-Chip (2007)Google Scholar
  8. 8.
    Chow, A.C., Fossum, G.C., Brokenshire, D.A.: A Programming Example: Large FFT on the Cell Broadband Engine. IBM (May 2005)Google Scholar
  9. 9.
    Blagojevic, F., Nikolopoulos, D.S., Stamatakis, A., Antonopoulos, C.D.: Dynamic multigrain parallelization on the cell broadband engine. In: PPoPP 2007: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 90–100. ACM, New York (2007)Google Scholar
  10. 10.
    Eichenberger, A.E., O’Brien, J.K., O’Brien, K.M., Wu, P., Chen, T., Oden, P.H., Prener, D.A., Shepherd, J.C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M.K., Archambault, R., Gao, Y., Koo, R.: Using advanced compiler technology to exploit the performance of the cell broadband engineTMarchitecture. IBM System Journal 45(1), 59–84 (2006)CrossRefGoogle Scholar
  11. 11.
    OMP community. The community of OpenMP users, researchers, tool developers and provider website (2006),
  12. 12.
    Snir, M., Otto, S.: MPI-The Complete Reference: The MPI Core. MIT Press, Cambridge (1998)Google Scholar
  13. 13.
    Planas, J., Badia, R.M., Ayguadé, E., Labarta, J.: Hierarchical task based programming with StarSs. International Journal of HIgh Performance Computing Applications (under evaluation)Google Scholar
  14. 14.
    Perez, J.P., Bellens, P., Badia, R.M., Labarta, J.: Cellss: making it easier to program the cell broadband engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)CrossRefGoogle Scholar
  15. 15.
    Cell broadband engine programming handbook, version 1.1, International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corporation (2007)Google Scholar
  16. 16.
    Hackenberg, D.: Fast matrix multiplication on cell (smp) systems website (2008),

Copyright information

© IFIP International Federation for Information Processing 2009

Authors and Affiliations

  • Pieter Bellens
    • 1
  • Josep M. Perez
    • 1
  • Rosa M. Badia
    • 1
    • 3
  • Jesus Labarta
    • 1
    • 2
  1. 1.Barcelona Supercomputing CenterSpain
  2. 2.Universitat Politecnica de CatalunyaSpain
  3. 3.Consejo Superior de Investigaciones CientificasSpain

Personalised recommendations