MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

Van Craeynest, Kenzo; Eyerman, Stijn; Eeckhout, Lieven

doi:10.1007/978-3-540-92990-1_10

Kenzo Van Craeynest⁶,
Stijn Eyerman⁶ &
Lieven Eeckhout⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5409))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

978 Accesses
4 Citations
3 Altmetric

Abstract

Threads experiencing long-latency loads on a simultaneous multith- reading (SMT) processor may clog shared processor resources without making forward progress, thereby starving other threads and reducing overall system throughput. An elegant solution to the long-latency load problem in SMT processors is to employ runahead execution. Runahead threads do not block commit on a long-latency load but instead execute subsequent instructions in a speculative execution mode to expose memory-level parallelism (MLP) through prefetching. The key benefit of runahead SMT threads is twofold: (i) runahead threads do not clog resources on a long-latency load, and (ii) runahead threads exploit far-distance MLP.

This paper proposes MLP-aware runahead threads: runahead execution is only initiated in case there is far-distance MLP to be exploited. By doing so, useless runahead executions are eliminated, thereby reducing the number of speculatively executed instructions (and thus energy consumption) while preserving the performance of the runahead thread and potentially improving the performance of the co-executing thread(s). Our experimental results show that MLP-aware runahead threads reduce the number of speculatively executed instructions by 13.9% and 10.1% for two-program and four-program workloads, respectively, compared to MLP-agnostic runahead threads while achieving comparable system throughput and job turnaround time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cazorla, F.J., Fernandez, E., Ramirez, A., Valero, M.: Optimizing long-latency-load-aware fetch policies for SMT processors. International Journal of High Performance Computing and Networking (IJHPCN) 2(1), 45–54 (2004)
Article Google Scholar
Cazorla, F.J., Ramirez, A., Valero, M., Fernandez, E.: Dynamically controlled resource allocation in SMT processors. In: MICRO, pp. 171–182 (December 2004)
Google Scholar
Chou, Y., Fahs, B., Abraham, S.: Microarchitecture optimizations for exploiting memory-level parallelism. In: ISCA, pp. 76–87 (June 2004)
Google Scholar
Dundas, J., Mudge, T.: Improving data cache performance by pre-executing instructions under a cache miss. In: ICS, pp. 68–75 (July 1997)
Google Scholar
El-Moursy, A., Albonesi, D.H.: Front-end policies for improved issue efficiency in SMT processors. In: HPCA, pp. 31–40 (February 2003)
Google Scholar
Eyerman, S., Eeckhout, L.: A memory-level parallelism aware fetch policy for SMT processors. In: HPCA, pp. 240–249 (February 2007)
Google Scholar
Eyerman, S., Eeckhout, L.: System-level performance metrics for multi-program workloads. IEEE Micro. 28(3), 42–53 (2008)
Article Google Scholar
Glew, A.: MLP yes! ILP no! In: ASPLOS Wild and Crazy Idea Session (October 1998)
Google Scholar
Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., Roussel, P.: The microarchitecture of the Pentium 4 processor. Intel. Technology Journal Q1 (2001)
Google Scholar
John, L.K.: Aggregating performance metrics over a benchmark suite. In: John, L.K., Eeckhout, L. (eds.) Performance Evaluation and Benchmarking, pp. 47–58. CRC Press, Boca Raton (2006)
Google Scholar
Kessler, R.E., McLellan, E.J., Webb, D.A.: The Alpha 21264 microprocessor architecture. In: ICCD, pp. 90–95 (October 1998)
Google Scholar
Luo, K., Gummaraju, J., Franklin, M.: Balancing throughput and fairness in SMT processors. In: ISPASS, pp. 164–171 (November 2001)
Google Scholar
Mutlu, O., Kim, H., Patt, Y.N.: Techniques for efficient processing in runahead execution engines. In: ISCA, pp. 370–381 (June 2005)
Google Scholar
Mutlu, O., Stark, J., Wilkerson, C., Patt, Y.N.: Runahead execution: An alternative to very large instruction windows for out-of-order processors. In: HPCA, pp. 129–140 (February 2003)
Google Scholar
Perelman, E., Hamerly, G., Calder, B.: Picking statistically valid and early simulation points. In: Malyshkin, V.E. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 244–256. Springer, Heidelberg (2003)
Chapter Google Scholar
Raasch, S.E., Reinhardt, S.K.: The impact of resource partitioning on SMT processors. In: Malyshkin, V.E. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 15–26. Springer, Heidelberg (2003)
Google Scholar
Ramirez, T., Pajuelo, A., Santana, O.J., Valero, M.: Runahead threads to improve SMT performance. In: HPCA, pp. 149–158 (February 2008)
Google Scholar
Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: ASPLOS, pp. 45–57 (October 2002)
Google Scholar
Snavely, A., Tullsen, D.M.: Symbiotic jobscheduling for simultaneous multithreading processor. In: ASPLOS, pp. 234–244 (November 2000)
Google Scholar
Tullsen, D.: Simulation and modeling of a simultaneous multithreading processor. In: Proceedings of the 22nd Annual Computer Measurement Group Conference (December 1996)
Google Scholar
Tullsen, D.M., Brown, J.A.: Handling long-latency loads in a simultaneous multithreading processor. In: MICRO, pp. 318–327 (December 2001)
Google Scholar
Tullsen, D.M., Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L.: Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In: ISCA, pp. 191–202 (May 1996)
Google Scholar
Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: Maximizing on-chip parallelism. In: ISCA, pp. 392–403 (June 1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Information Systems (ELIS), Ghent University, Belgium
Kenzo Van Craeynest, Stijn Eyerman & Lieven Eeckhout

Authors

Kenzo Van Craeynest
View author publications
You can also search for this author in PubMed Google Scholar
Stijn Eyerman
View author publications
You can also search for this author in PubMed Google Scholar
Lieven Eeckhout
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRISA, Campus de Beaulieu, 35042, Rennes Cedex, France
André Seznec
Intel Corporation, Massachusetts Microprocessor Design Center, 77 Reed Road, MA 01749, Hudson, USA
Joel Emer
School of Informatics, Institute for Computing Systems Architecture, King’ s Buildings, EH9 3JZ, Edinburgh, United Kingdom
Michael O’Boyle
Department of Electrical Engineering, Princeton University, 34 Olden Street, NJ 08544-5263, Princeton, USA
Margaret Martonosi
Department of Computer Science, University of Augsburg, 86135, Augsburg, Germany
Theo Ungerer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Van Craeynest, K., Eyerman, S., Eeckhout, L. (2009). MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2009. Lecture Notes in Computer Science, vol 5409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92990-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-92990-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92989-5
Online ISBN: 978-3-540-92990-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics