Skip to main content

MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

  • Conference paper
High Performance Embedded Architectures and Compilers (HiPEAC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5409))

Abstract

Threads experiencing long-latency loads on a simultaneous multith- reading (SMT) processor may clog shared processor resources without making forward progress, thereby starving other threads and reducing overall system throughput. An elegant solution to the long-latency load problem in SMT processors is to employ runahead execution. Runahead threads do not block commit on a long-latency load but instead execute subsequent instructions in a speculative execution mode to expose memory-level parallelism (MLP) through prefetching. The key benefit of runahead SMT threads is twofold: (i) runahead threads do not clog resources on a long-latency load, and (ii) runahead threads exploit far-distance MLP.

This paper proposes MLP-aware runahead threads: runahead execution is only initiated in case there is far-distance MLP to be exploited. By doing so, useless runahead executions are eliminated, thereby reducing the number of speculatively executed instructions (and thus energy consumption) while preserving the performance of the runahead thread and potentially improving the performance of the co-executing thread(s). Our experimental results show that MLP-aware runahead threads reduce the number of speculatively executed instructions by 13.9% and 10.1% for two-program and four-program workloads, respectively, compared to MLP-agnostic runahead threads while achieving comparable system throughput and job turnaround time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cazorla, F.J., Fernandez, E., Ramirez, A., Valero, M.: Optimizing long-latency-load-aware fetch policies for SMT processors. International Journal of High Performance Computing and Networking (IJHPCN) 2(1), 45–54 (2004)

    Article  Google Scholar 

  2. Cazorla, F.J., Ramirez, A., Valero, M., Fernandez, E.: Dynamically controlled resource allocation in SMT processors. In: MICRO, pp. 171–182 (December 2004)

    Google Scholar 

  3. Chou, Y., Fahs, B., Abraham, S.: Microarchitecture optimizations for exploiting memory-level parallelism. In: ISCA, pp. 76–87 (June 2004)

    Google Scholar 

  4. Dundas, J., Mudge, T.: Improving data cache performance by pre-executing instructions under a cache miss. In: ICS, pp. 68–75 (July 1997)

    Google Scholar 

  5. El-Moursy, A., Albonesi, D.H.: Front-end policies for improved issue efficiency in SMT processors. In: HPCA, pp. 31–40 (February 2003)

    Google Scholar 

  6. Eyerman, S., Eeckhout, L.: A memory-level parallelism aware fetch policy for SMT processors. In: HPCA, pp. 240–249 (February 2007)

    Google Scholar 

  7. Eyerman, S., Eeckhout, L.: System-level performance metrics for multi-program workloads. IEEE Micro. 28(3), 42–53 (2008)

    Article  Google Scholar 

  8. Glew, A.: MLP yes! ILP no! In: ASPLOS Wild and Crazy Idea Session (October 1998)

    Google Scholar 

  9. Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., Roussel, P.: The microarchitecture of the Pentium 4 processor. Intel. Technology Journal Q1 (2001)

    Google Scholar 

  10. John, L.K.: Aggregating performance metrics over a benchmark suite. In: John, L.K., Eeckhout, L. (eds.) Performance Evaluation and Benchmarking, pp. 47–58. CRC Press, Boca Raton (2006)

    Google Scholar 

  11. Kessler, R.E., McLellan, E.J., Webb, D.A.: The Alpha 21264 microprocessor architecture. In: ICCD, pp. 90–95 (October 1998)

    Google Scholar 

  12. Luo, K., Gummaraju, J., Franklin, M.: Balancing throughput and fairness in SMT processors. In: ISPASS, pp. 164–171 (November 2001)

    Google Scholar 

  13. Mutlu, O., Kim, H., Patt, Y.N.: Techniques for efficient processing in runahead execution engines. In: ISCA, pp. 370–381 (June 2005)

    Google Scholar 

  14. Mutlu, O., Stark, J., Wilkerson, C., Patt, Y.N.: Runahead execution: An alternative to very large instruction windows for out-of-order processors. In: HPCA, pp. 129–140 (February 2003)

    Google Scholar 

  15. Perelman, E., Hamerly, G., Calder, B.: Picking statistically valid and early simulation points. In: Malyshkin, V.E. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 244–256. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. Raasch, S.E., Reinhardt, S.K.: The impact of resource partitioning on SMT processors. In: Malyshkin, V.E. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 15–26. Springer, Heidelberg (2003)

    Google Scholar 

  17. Ramirez, T., Pajuelo, A., Santana, O.J., Valero, M.: Runahead threads to improve SMT performance. In: HPCA, pp. 149–158 (February 2008)

    Google Scholar 

  18. Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: ASPLOS, pp. 45–57 (October 2002)

    Google Scholar 

  19. Snavely, A., Tullsen, D.M.: Symbiotic jobscheduling for simultaneous multithreading processor. In: ASPLOS, pp. 234–244 (November 2000)

    Google Scholar 

  20. Tullsen, D.: Simulation and modeling of a simultaneous multithreading processor. In: Proceedings of the 22nd Annual Computer Measurement Group Conference (December 1996)

    Google Scholar 

  21. Tullsen, D.M., Brown, J.A.: Handling long-latency loads in a simultaneous multithreading processor. In: MICRO, pp. 318–327 (December 2001)

    Google Scholar 

  22. Tullsen, D.M., Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L.: Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In: ISCA, pp. 191–202 (May 1996)

    Google Scholar 

  23. Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: Maximizing on-chip parallelism. In: ISCA, pp. 392–403 (June 1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Van Craeynest, K., Eyerman, S., Eeckhout, L. (2009). MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2009. Lecture Notes in Computer Science, vol 5409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92990-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-92990-1_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-92989-5

  • Online ISBN: 978-3-540-92990-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics