Skip to main content

Dependence-Based Code Generation for a CELL Processor

  • Conference paper
Book cover Languages and Compilers for Parallel Computing (LCPC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4382))

Abstract

Obtaining high performance on the STI CELL processor requires substantial programming effort because its architectural features must be explicitly managed, with separate codes required for two different types of cores (PPE and SPE). Research at IBM has developed a single source-image compiler for CELL that performs vectorization but uses OpenMP to specify cross-core parallelism. In this paper, we present and evaluate an alternative dependence-based compiler approach that automatically generates parallel and vector code for CELL from a single source program with no parallelism directives. In contrast to OpenMP, our approach can also handle loop nests that carry dependences. To preserve correct program semantics, we employ on-chip communication mechanisms to implement barrier and unidirectional synchronization primitives. We also implement strategies to boost performance by managing DMA data movement, improving data alignment, and exploiting memory reuse in the innermost loop.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, J.R.: Dependence Analysis for Subscripted Variables and its Application to Program Transformation. PhD thesis, Rice University, Houston, Texas (1983)

    Google Scholar 

  2. Allen, R., Callahan, D., Kennedy, K.: Automatic decomposition of scientific programs for parallel execution. In: POPL ’87: Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, Munich, West Germany, ACM Press, New York (1987)

    Google Scholar 

  3. Allen, R., Kennedy, K.: Vector register allocation. IEEE Transactions on Computers 41(10), 1290–1317 (1992)

    Article  Google Scholar 

  4. Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  5. Bik, A.J.C., et al.: Automatic intra-register vectorization for the intel architecture. International Journal of Parallel Programming 30(2), 65–98 (2002)

    Article  MATH  Google Scholar 

  6. Callahan, D., Kennedy, K., Porterfield, A.: Software prefetching. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, April (1991)

    Google Scholar 

  7. Carr, S., Kennedy, K.: Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems 15(3), 400–462 (1994)

    Google Scholar 

  8. Crescent Bay Software. VAST/AltiVec. http://www.crescentbaysoftware.com/vast_altivec.html

  9. Eichenberger, A.E., et al.: Optimizing compiler for a cell processor. In: PACT (2005)

    Google Scholar 

  10. Eichenberger, A.E., Wu, P., O’Brien, K.: Vectorization for SIMD architectures with alignment constraints. In: PLDI’04, Washington DC, USA, June (2004)

    Google Scholar 

  11. Feldman, S.I., et al.: A fortran-to-C converter. Technical Report 149, AT&T Bell Laboratories, Murray Hill, NJ (1990)

    Google Scholar 

  12. Lam, M.D., Rothberg, E.E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: ASPLOS-IV: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, United States, April (1991)

    Google Scholar 

  13. Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: PLDI (2000)

    Google Scholar 

  14. Mowry, T.C.: Tolerating Latency Through Software-Controlled Data Prefetching. PhD thesis, Standford University, California (1994)

    Google Scholar 

  15. Nuzman, D., Henderson, R.: Multi-platform auto-vectorization. In: CGO ’06: Proceedings of the International Symposium on Code Generation and Optimization, Washington, DC, USA (2006)

    Google Scholar 

  16. Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. In: PLDI, Ottawa, Ontario, Canada (2006)

    Google Scholar 

  17. Shin, J., Chame, J., Hall, M.W.: Compiler-controlled caching in superword register files for multimeida extension architecture. In: PACT (2002)

    Google Scholar 

  18. Temam, O., Granston, E.D., Jalby, W.: To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In: Supercomputing ’93: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, Portland, Oregon, United States, November 1993, IEEE Computer Society Press, Los Alamitos (1993)

    Google Scholar 

  19. Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27(1), 3–25 (2001)

    Article  MATH  Google Scholar 

  20. Yi, Q.: Applying data copy to improve memory performance of general array computations. In: Ayguadé, E., et al. (eds.) LCPC 2005. LNCS, vol. 4339, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Zhao, Y., Kennedy, K.: Scalarization on short vector machines. In: 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, Texas, March 20–22, 2005, IEEE Computer Society Press, Los Alamitos (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

George Almási Călin Caşcaval Peng Wu

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Zhao, Y., Kennedy, K. (2007). Dependence-Based Code Generation for a CELL Processor. In: Almási, G., Caşcaval, C., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2006. Lecture Notes in Computer Science, vol 4382. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72521-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72521-3_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72520-6

  • Online ISBN: 978-3-540-72521-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics