Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System

  • Cédric Augonnet
  • Samuel Thibault
  • Raymond Namyst
  • Maik Nijhuis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5657)


Core specialization is currently one of the most promising ways for designing power-efficient multicore chips. However, approaching the theoretical peak performance of such heterogeneous multicore architectures with specialized accelerators, is a complex issue. While substantial effort has been devoted to efficiently offloading parts of the computation, designing an execution model that unifies all computing units is the main challenge.

We therefore designed the StarPU  runtime system for providing portable support for heterogeneous multicore processors to high performance applications and compiler environments. StarPU  provides a high-level, unified execution model which is tightly coupled to an expressive data management library. In addition to our previous results on using multicore processors alongside with graphic processors, we show that StarPU  is flexible enough to efficiently exploit the heterogeneous resources in the Cell  processor. We present a scalable design supporting multiple different accelerators while minimizing the overhead on the overall system. Using experiments with classical linear algebra algorithms, we show that StarPU  improves programmability and provides performance portability.


CHOLESKY Decomposition Execution Model Runtime System Multicore Processor Cell Processor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Augonnet, C., Namyst, R.: A unified runtime system for heterogeneous multicore architectures. In: Highly Parallel Processing on a Chip (2008)Google Scholar
  2. 2.
    Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: CellSs: a programming model for the cell BE architecture. In: ACM/IEEE conference on SuperComputing (2006)Google Scholar
  3. 3.
    Crawford, C.H., Henning, P., Kistler, M., Wright, C.: Accelerating computing with the Cell Broadband Engine processor. In: Conference on Computing Frontiers (2008)Google Scholar
  4. 4.
    Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A Hybrid Multi-core Parallel Programming Environment. Technical report, CAPS entreprise (2007)Google Scholar
  5. 5.
    Fatahalian, K., Knight, T.J., Houston, M., Erez, M., Reiter Horn, D., Leem, L., Young Park, J., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: Programming the Memory Hierarchy. In: ACM/IEEE Conference on Supercomputing (2006)Google Scholar
  6. 6.
    Kunzman, D., Zheng, G., Bohm, E., Kalé, L.V.: Charm++, Offload API, and the Cell Processor. In: Proceedings of the Workshop on Programming Models for Ubiquitous Parallelism, Seattle, WA, USA (September 2006)Google Scholar
  7. 7.
    Kurzak, J., Buttari, A., Dongarra, J.: Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization. IEEE Transactions on Parallel and Distributed Systems 19(9) (2008)Google Scholar
  8. 8.
    McCool, M.D.: Data-Parallel Programming on the Cell BE and the GPU using the RapidMind Development Platform. In: GSPx Multicore Applications Conference (2006)Google Scholar
  9. 9.
    Nijhuis, M., Bos, H., Bal, H., Augonnet, C.: Mapping and synchronizing streaming applications on Cell processors. In: International Conference on High Performance Embedded Architectures & Compilers (2009)Google Scholar
  10. 10.
    Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: MPI Microtask for programming the Cell Broadband Engine processor. IBM Syst. J. 45(1) (2006)Google Scholar
  11. 11.
    Schneider, S., Yeom, J.S., Rose, B., Linford, J.C., Sandu, A., Nikolopoulos, D.S.: A comparison of programming models for multiprocessors with explicitly managed memory hierarchies. In: PPoPP 2009 Proceedings. ACM, New York (2008)Google Scholar
  12. 12.
    Wesolowski, L.: An Application Programming Interface for General Purpose Graphics Processing Units in an Asynchronous Runtime System. Master’s thesis (2008)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2009

Authors and Affiliations

  • Cédric Augonnet
    • 1
  • Samuel Thibault
    • 1
  • Raymond Namyst
    • 1
  • Maik Nijhuis
    • 2
  1. 1.INRIA Bordeaux Sud-Ouest – LaBRIUniversity of BordeauxFrance
  2. 2.Vrije Universiteit AmsterdamNetherlands

Personalised recommendations