Skip to main content

Scalability Evaluation of a Polymorphic Register File: A CG Case Study

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6566))

Abstract

We evaluate the scalability of a Polymorphic Register File using the Conjugate Gradient method as a case study. We focus on a heterogeneous multi-processor architecture, taking into consideration critical parameters such as cache bandwidth and memory latency. We compare the performance of 256 Polymorphic Register File-augmented workers against a single Cell PowerPC Processor Unit (PPU). In such a scenario, simulation results suggest that for the Sparse Matrix Vector Multiplication kernel, absolute speedups of up to 200 times can be obtained. Moreover, when equal number of workers in the range 1-256 is employed, our design is between 1.7 and 4.2 times faster than a Cell PPU-based system. Furthermore, we study the memory latency and cache bandwidth impact on the sustainable speedups of the system considered. Our tests suggest that a 128 worker configuration requires the caches to deliver 1638.4 GB/sec in order to preserve 80% of its peak speedup.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bailey, D., Barton, J., Lasinski, T., Simon, H. (eds.): The NAS Parallel Benchmarks. Technical Report Technical Report RNR-91-02, NASA Ames Research Center, Moffett Field, CA 94035 (1991)

    Google Scholar 

  2. Barcelona Supercomputing Center. Paraver, http://www.bsc.es/paraver

  3. Barcelona Supercomputing Center. The NANOS Group Site: The Mercurium Compiler, http://nanos.ac.upc.edu/mcxx

  4. Ciobanu, C., Kuzmanov, G.K., Ramirez, A., Gaydadjiev, G.N.: A Polymorphic Register File for Matrix Operations. In: Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS 2010), pp. 241–249 (July 2010)

    Google Scholar 

  5. Corbal, J., Espasa, R., Valero, M.: MOM: a Matrix SIMD Instruction Set Architecture for Multimedia Applications. In: Proceedings of the ACM/IEEE SC 1999 Conference, pp. 1–12 (1999)

    Google Scholar 

  6. Das, R., Uysal, M., Saltz, J., Shin Hwang, Y.: Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures. Journal of Parallel and Distributed Computing 22, 462–479 (1993)

    Article  Google Scholar 

  7. Ferrer, R., González, M., Silla, F., Martorell, X., Ayguadé, E.: Evaluation of memory performance on the Cell BE with the SARC programming model. In: MEDEA 2008: Proceedings of the 9th Workshop on MEmory Performance, pp. 77–84. ACM, New York (2008)

    Google Scholar 

  8. Gueron, S.: Intel Advanced Encryption Standard (AES) Instructions Set (2010), http://software.intel.com/enus/articles/intel-advancedencryption-standard-aesinstructions-set/

  9. Gwennap, L.: AltiVec Vectorizes PowerPC. Microprocessor Report 12(6), 1–5 (1998)

    Google Scholar 

  10. IBM. Cell Broadband Engine Programming Handbook Including the PowerXCell 8i Processor, 1.11 edn. (May 2008)

    Google Scholar 

  11. Juurlink, B., Cheresiz, D., Vassiliadis, S., Wijshoff, H.A.G.: Implementation and Evaluation of the Complex Streamed Instruction Set. In: Int. Conf. on Parallel Architectures and Compilation Techniques (PACT), pp. 73–82 (2001)

    Google Scholar 

  12. Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the Cell Multiprocessor. IBM J. Res. Dev. 49(4/5), 589–604 (2005)

    Article  Google Scholar 

  13. Kuck, D., Stokes, R.: The Burroughs Scientific Processor (BSP). IEEE Transactions on Computers C-31(5), 363–376 (1982)

    Article  Google Scholar 

  14. Panda, D., Hwang, K.: Reconfigurable Vector Register Windows for Fast Matrix Computation on the Orthogonal Multiprocessor. In: Proceedings of the International Conference on Application Specific Array Processors, pp. 202–213, 5-7 (1990)

    Google Scholar 

  15. Park, J., Park, S.-B., Balfour, J.D., Black-Schaffer, D., Kozyrakis, C., Dally, W.J.: Register Pointer Architecture for Efficient Embedded Processors. In: DATE 2007: Proceedings of the Conference on Design, Automation and Test in Europe, San Jose, CA, USA, pp. 600–605. EDA Consortium (2007)

    Google Scholar 

  16. Ramirez, A., Cabarcas, F., Juurlink, B., Alvarez Mesa, M., Azevedo, A., Meenderinck, C., Gaydadjiev, G., Ciobanu, C., Isaza, S., Sanchez, F.: The SARC Architecture. Micro 30(5), 16–29 (2010)

    Google Scholar 

  17. Rico, A., Cabarcas, F., Quesada, A., Pavlovic, M., Vega, A., Villavieja, C., Etsion, Y., Ramirez, A.: Scalable Simulation of Decoupled Accelerator Architectures. Technical report, Universitat Politècnica de Catalunya, Barcelona, Spain (2010)

    Google Scholar 

  18. Shahbahrami, A., Juurlink, B., Vassiliadis, S.: Matrix Register File and Extended Subwords: Two Techniques for Embedded Media Processors. In: Proceedings of the 2nd ACM Int. Conf. on Computing Frontiers, pp. 171–180 (May 2005)

    Google Scholar 

  19. Shewchuk, J.R.: An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Technical report, Carnegie Mellon University, Pittsburgh, PA, USA (1994)

    Google Scholar 

  20. Wong, S., Anjam, F., Nadeem, M.: Dynamically Reconfigurable Register File for a Softcore VLIW Processor. In: Proceedings of the Design, Automation and Test in Europe Conference, DATE 2010 (March 2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ciobanu, C.B., Martorell, X., Kuzmanov, G.K., Ramirez, A., Gaydadjiev, G.N. (2011). Scalability Evaluation of a Polymorphic Register File: A CG Case Study. In: Berekovic, M., Fornaciari, W., Brinkschulte, U., Silvano, C. (eds) Architecture of Computing Systems - ARCS 2011. ARCS 2011. Lecture Notes in Computer Science, vol 6566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19137-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19137-4_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19136-7

  • Online ISBN: 978-3-642-19137-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics