Skip to main content

LPA: A First Approach to the Loop Processor Architecture

  • Conference paper
High Performance Embedded Architectures and Compilers (HiPEAC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4917))

Abstract

Current processors frequently run applications containing loop structures. However, traditional processor designs do not take into account the semantic information of the executed loops, failing to exploit an important opportunity. In this paper, we take our first step toward a loop-conscious processor architecture that has great potential to achieve high performance and relatively low energy consumption.

In particular, we propose to store simple dynamic loops in a buffer, namely the loop window. Loop instructions are kept in the loop window along with all the information needed to build the rename mapping. Therefore, the loop window can directly feed the execution back-end queues with instructions, avoiding the need for using the prediction, fetch, decode, and rename stages of the normal processor pipeline. Our results show that the loop window is a worthwhile complexity-effective alternative for processor design that reduces front-end activity by 14% for SPECint benchmarks and by 45% for SPECfp benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. de Alba, M.R., Kaeli, D.R.: Runtime predictability of loops. In: Proceedings of the 4th Workshop on Workload Characterization (2001)

    Google Scholar 

  2. Badulescu, A., Veidenbaum, A.: Energy efficient instruction cache for wide-issue processors. In: Proceedings of the International Workshop on Innovative Architecture (2001)

    Google Scholar 

  3. Parikh, D., Skadron, K., Zhang, Y., Barcella, M., Stan, M.: Power issues related to branch prediction. In: Proceedings of the 8th International Symposium on High-Performance Computer Architecture (2002)

    Google Scholar 

  4. Folegnani, D., González, A.: Energy-effective issue logic. In: Proceedings of the 28th International Symposium on Computer Architecture (2001)

    Google Scholar 

  5. Cristal, A., Santana, O., Cazorla, F., Galluzzi, M., Ramírez, T., Pericàs, M., Valero, M.: Kilo-instruction processors: Overcoming the memory wall. IEEE Micro 25(3) (2005)

    Google Scholar 

  6. Monreal, T., González, J., González, A., Valero, M., Viñals, V.: Late allocation and early release of physical registers. IEEE Transactions on Computers 53(10) (2004)

    Google Scholar 

  7. Gwennap, L.: Digital 21264 sets new standard. Microprocessor Report 10(14) (1996)

    Google Scholar 

  8. Sherwood, T., Perelman, E., Calder, B.: Basic block distribution analysis to find periodic behavior and simulation points in applications. In: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (2001)

    Google Scholar 

  9. Thornton, J.E.: Parallel operation in the Control Data 6600. In: Proceedings of the AFIPS Fall Joint Computer Conference (1964)

    Google Scholar 

  10. Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development 11(1) (1967)

    Google Scholar 

  11. Anderson, D.W., Sparacio, F.J., Tomasulo, R.M.: The IBM System/360 model 91: Machine philosophy and instruction-handling. IBM Journal of Research and Development 11(1) (1967)

    Google Scholar 

  12. Lee, L.H., Moyer, W., Arends, J.: Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In: International Symposium on Low Power Electronics and Design (1999)

    Google Scholar 

  13. Rivers, J.A., Asaad, S., Wellman, J.D., Moreno, J.H.: Reducing instruction fetch energy with backward branch control information and buffering. In: International Symposium on Low Power Electronics and Design (2003)

    Google Scholar 

  14. Sherwood, T., Calder, B.: Loop termination prediction. In: Proceedings of the 3rd International Symposium on High Performance Computing (2000)

    Google Scholar 

  15. de Alba, M.R., Kaeli, D.R.: Path-based hardware loop prediction. In: Proceedings of the International Conference on Control, Virtual Instrumentation and Digital Systems (2002)

    Google Scholar 

  16. Vajapeyam, S., Mitra, T.: Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences. In: Proceedings of the 24th International Symposium on Computer Architecture (1997)

    Google Scholar 

  17. Vajapeyam, S., Joseph, P.J., Mitra, T.: Dynamic vectorization: A mechanism for exploiting far-flung ILP in ordinary programs. In: Proceedings of the 24th International Symposium on Computer Architecture (1999)

    Google Scholar 

  18. Talpes, E., Marculescu, D.: Execution cache-based microarchitectures for power-efficient superscalar processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 13(1) (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Per Stenström Michel Dubois Manolis Katevenis Rajiv Gupta Theo Ungerer

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

García, A., Santana, O.J., Fernández, E., Medina, P., Valero, M. (2008). LPA: A First Approach to the Loop Processor Architecture. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2008. Lecture Notes in Computer Science, vol 4917. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77560-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77560-7_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77559-1

  • Online ISBN: 978-3-540-77560-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics