LPA: A First Approach to the Loop Processor Architecture

García, Alejandro; Santana, Oliverio J.; Fernández, Enrique; Medina, Pedro; Valero, Mateo

doi:10.1007/978-3-540-77560-7_19

Alejandro García¹,
Oliverio J. Santana²,
Enrique Fernández²,
Pedro Medina² &
…
Mateo Valero^1,3

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4917))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

746 Accesses
1 Citations

Abstract

Current processors frequently run applications containing loop structures. However, traditional processor designs do not take into account the semantic information of the executed loops, failing to exploit an important opportunity. In this paper, we take our first step toward a loop-conscious processor architecture that has great potential to achieve high performance and relatively low energy consumption.

In particular, we propose to store simple dynamic loops in a buffer, namely the loop window. Loop instructions are kept in the loop window along with all the information needed to build the rename mapping. Therefore, the loop window can directly feed the execution back-end queues with instructions, avoiding the need for using the prediction, fetch, decode, and rename stages of the normal processor pipeline. Our results show that the loop window is a worthwhile complexity-effective alternative for processor design that reduces front-end activity by 14% for SPECint benchmarks and by 45% for SPECfp benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

de Alba, M.R., Kaeli, D.R.: Runtime predictability of loops. In: Proceedings of the 4th Workshop on Workload Characterization (2001)
Google Scholar
Badulescu, A., Veidenbaum, A.: Energy efficient instruction cache for wide-issue processors. In: Proceedings of the International Workshop on Innovative Architecture (2001)
Google Scholar
Parikh, D., Skadron, K., Zhang, Y., Barcella, M., Stan, M.: Power issues related to branch prediction. In: Proceedings of the 8th International Symposium on High-Performance Computer Architecture (2002)
Google Scholar
Folegnani, D., González, A.: Energy-effective issue logic. In: Proceedings of the 28th International Symposium on Computer Architecture (2001)
Google Scholar
Cristal, A., Santana, O., Cazorla, F., Galluzzi, M., Ramírez, T., Pericàs, M., Valero, M.: Kilo-instruction processors: Overcoming the memory wall. IEEE Micro 25(3) (2005)
Google Scholar
Monreal, T., González, J., González, A., Valero, M., Viñals, V.: Late allocation and early release of physical registers. IEEE Transactions on Computers 53(10) (2004)
Google Scholar
Gwennap, L.: Digital 21264 sets new standard. Microprocessor Report 10(14) (1996)
Google Scholar
Sherwood, T., Perelman, E., Calder, B.: Basic block distribution analysis to find periodic behavior and simulation points in applications. In: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (2001)
Google Scholar
Thornton, J.E.: Parallel operation in the Control Data 6600. In: Proceedings of the AFIPS Fall Joint Computer Conference (1964)
Google Scholar
Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development 11(1) (1967)
Google Scholar
Anderson, D.W., Sparacio, F.J., Tomasulo, R.M.: The IBM System/360 model 91: Machine philosophy and instruction-handling. IBM Journal of Research and Development 11(1) (1967)
Google Scholar
Lee, L.H., Moyer, W., Arends, J.: Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In: International Symposium on Low Power Electronics and Design (1999)
Google Scholar
Rivers, J.A., Asaad, S., Wellman, J.D., Moreno, J.H.: Reducing instruction fetch energy with backward branch control information and buffering. In: International Symposium on Low Power Electronics and Design (2003)
Google Scholar
Sherwood, T., Calder, B.: Loop termination prediction. In: Proceedings of the 3rd International Symposium on High Performance Computing (2000)
Google Scholar
de Alba, M.R., Kaeli, D.R.: Path-based hardware loop prediction. In: Proceedings of the International Conference on Control, Virtual Instrumentation and Digital Systems (2002)
Google Scholar
Vajapeyam, S., Mitra, T.: Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences. In: Proceedings of the 24th International Symposium on Computer Architecture (1997)
Google Scholar
Vajapeyam, S., Joseph, P.J., Mitra, T.: Dynamic vectorization: A mechanism for exploiting far-flung ILP in ordinary programs. In: Proceedings of the 24th International Symposium on Computer Architecture (1999)
Google Scholar
Talpes, E., Marculescu, D.: Execution cache-based microarchitectures for power-efficient superscalar processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 13(1) (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Universitat Politècnica de Catalunya, Spain
Alejandro García & Mateo Valero
Universidad de Las Palmas de Gran Canaria, Spain
Oliverio J. Santana, Enrique Fernández & Pedro Medina
Barcelona Supercomputing Center, Spain
Mateo Valero

Authors

Alejandro García
View author publications
You can also search for this author in PubMed Google Scholar
Oliverio J. Santana
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Medina
View author publications
You can also search for this author in PubMed Google Scholar
Mateo Valero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Per Stenström Michel Dubois Manolis Katevenis Rajiv Gupta Theo Ungerer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

García, A., Santana, O.J., Fernández, E., Medina, P., Valero, M. (2008). LPA: A First Approach to the Loop Processor Architecture. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2008. Lecture Notes in Computer Science, vol 4917. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77560-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-77560-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77559-1
Online ISBN: 978-3-540-77560-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics