Skip to main content

Performance Potential of Effective Address Prediction of Load Instructions

  • Chapter
High Performance Memory Systems

Abstract

Modern, deeply pipelined, out-of-order, and speculative microprocessors are still plagued by the latency of load instructions. This latency is dominated by the latencies to resolve the source operands of the load, to compute its effective address, and to fetch the load’s data from caches or the main memory. This chapter examines the performance potential of hiding a load’s data fetch latency using effective address prediction. By predicting the effective address of a load early in the pipeline, we can initiate the cache access early, thereby improving performance.

The current generation of effective address predictors for a load instruction is based on either the history or the context of the specific load. In addition, researchers have examined load-load dependence predictors of prefetch cache misses. This chapter examines the performance potential of using a load-load dependence predictor to predict effective addresses of load instructions and issue them early in the pipeline. We call this predictor the DEAP predictor.

We show that on average DEAP can improve the accuracy of effective address prediction by 28% over a perfect combination of last address, stride address, and context-based address predictors across our seven benchmarks from the SPEC95 and Olden suites. We find that an ideal hybrid of these four predictors—a predictor that always picks the right predictor for a load—can potentially achieve performance close to that of a Perfect predictor in most cases. We use an oracle-based simulation approach to evaluate our timing results. This method allows us to measure the upper bound of the performance from effective address prediction using a mostly realistic pipeline. However, our timing simulation method does not account for the penalty due to mis-prediction of an effective address and assumes a zero-cycle latency from address prediction resolution to address predictor update.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chrysos G, Emer J (1998) Memory Dependence Prediction Using Store Sets, In: Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA), June 1998.

    Google Scholar 

  2. Moshovos A, Breach SE, Vijaykumar TN, Sohi GS (1997) Dynamic Speculation and Synchronization of Data Dependences, In: Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), May 1997.

    Google Scholar 

  3. Reinman G, Calder B (1998) Predictive Techniques for Aggressive Load Speculation, In: Proceedings of the 31st Annual International Symposium on Microarchitecture (MICRO), December 1998.

    Google Scholar 

  4. Bekerman M, Jourdan S, Ronnen R, Kirshenboim G, Rappoport L, Yoaz A, Weiser U (1999) Correlated Load-Address Predictors, In: Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA), May 1999.

    Google Scholar 

  5. Roth A, Moshovos A, Sohi GS (1998) Dependence Based Prefetching for Linked Data Structures, In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASP-LOS), October 1998.

    Google Scholar 

  6. Bekerman M, Yoaz A, Gabbay F, Jourdan S, Kalaev M, Ronen R (2000) Early Load Address Resolution via Register Tracking, In: Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA), June 2000.

    Google Scholar 

  7. Lipasti MH, Wilkerson CB, Shen JP (1996) Value Locality and Load Value Prediction, In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 138–147, October 1996.

    Google Scholar 

  8. Chen T-F, Baer J-L (1995) Effective Hardware-Based Data Prefetching for High Performance Processors, IEEE Transactions on Computers, 44(5):609–623, May.

    Article  MATH  Google Scholar 

  9. Eikermeyer RJ, Vassiliadis S (1993) A Load Instruction Unit for Pipelined Processors, IBM Journal of Research and Development, 37:547–564, July.

    Article  Google Scholar 

  10. Sazeides Y, Smith JE (1997) The Predictability of Data Values, In: Proceedings of the 30th International Symposium on Microarchitecture (MICRO), pp. 248–258, December 1997.

    Google Scholar 

  11. Carlisle MC and Rogers A (1995) Software Caching and Computation Migration on Olden, In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), July 1995.

    Google Scholar 

  12. Smith J (2000) Slow Wires, Hot Chips, and Leaky Transistors: New Challenges in the New Millennium, Panel at the International Symposium on Computer Architecture (ISCA), June 2000.

    Google Scholar 

  13. Gonzalez J, Gonzalez A (1997) Speculative Execution via Address Prediction and Data Prefetching, In: Proceedings of the 11th International Conference on Supercomputing (ICS), p. 196–203, July 1997.

    Google Scholar 

  14. Austin TM, Sohi G S (1995) Zero-cycle Loads: Microarchitecture Support for Reducing Load Latency, In: Proceedings of the 28th Annual International Symposium on Microarchitecture (MICRO), pages 82–92, December 1995.

    Google Scholar 

  15. Black B, Mueller B, Postal S, Rakvie R, Tamaphethai N, and Shen JP (1998) Load Execution Latency Reduction, In: Proceedings of the 12th International Conference on Supercomputing (ICS), June 1998.

    Google Scholar 

  16. Mukherjee S (2001) The Asim Manual, Confidential Document, Compaq Computer Corporation.

    Google Scholar 

  17. Bechern C, Combs J, Utamaphethai N, Black B, Blanton RD, Shen JP (1999) An Integrated Functional Performance Simulator, IEEE Micro 19(3):26–35, May/June.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer Science+Business Media New York

About this chapter

Cite this chapter

Ahuja, P.S., Emer, J., Klauser, A., Mukherjee, S.S. (2004). Performance Potential of Effective Address Prediction of Load Instructions. In: Hadimioglu, H., Kuskin, J., Torrellas, J., Kaeli, D., Nanda, A. (eds) High Performance Memory Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-8987-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-8987-1_15

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-6477-4

  • Online ISBN: 978-1-4419-8987-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics