Performance Potential of Effective Address Prediction of Load Instructions

Ahuja, Pritpal S.; Emer, Joel; Klauser, Artur; Mukherjee, Shubhendu S.

doi:10.1007/978-1-4419-8987-1_15

Pritpal S. Ahuja⁶,
Joel Emer⁶,
Artur Klauser⁶ &
…
Shubhendu S. Mukherjee⁶

160 Accesses
1 Citations

Abstract

Modern, deeply pipelined, out-of-order, and speculative microprocessors are still plagued by the latency of load instructions. This latency is dominated by the latencies to resolve the source operands of the load, to compute its effective address, and to fetch the load’s data from caches or the main memory. This chapter examines the performance potential of hiding a load’s data fetch latency using effective address prediction. By predicting the effective address of a load early in the pipeline, we can initiate the cache access early, thereby improving performance.

The current generation of effective address predictors for a load instruction is based on either the history or the context of the specific load. In addition, researchers have examined load-load dependence predictors of prefetch cache misses. This chapter examines the performance potential of using a load-load dependence predictor to predict effective addresses of load instructions and issue them early in the pipeline. We call this predictor the DEAP predictor.

We show that on average DEAP can improve the accuracy of effective address prediction by 28% over a perfect combination of last address, stride address, and context-based address predictors across our seven benchmarks from the SPEC95 and Olden suites. We find that an ideal hybrid of these four predictors—a predictor that always picks the right predictor for a load—can potentially achieve performance close to that of a Perfect predictor in most cases. We use an oracle-based simulation approach to evaluate our timing results. This method allows us to measure the upper bound of the performance from effective address prediction using a mostly realistic pipeline. However, our timing simulation method does not account for the penalty due to mis-prediction of an effective address and assumes a zero-cycle latency from address prediction resolution to address predictor update.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chrysos G, Emer J (1998) Memory Dependence Prediction Using Store Sets, In: Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA), June 1998.
Google Scholar
Moshovos A, Breach SE, Vijaykumar TN, Sohi GS (1997) Dynamic Speculation and Synchronization of Data Dependences, In: Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), May 1997.
Google Scholar
Reinman G, Calder B (1998) Predictive Techniques for Aggressive Load Speculation, In: Proceedings of the 31st Annual International Symposium on Microarchitecture (MICRO), December 1998.
Google Scholar
Bekerman M, Jourdan S, Ronnen R, Kirshenboim G, Rappoport L, Yoaz A, Weiser U (1999) Correlated Load-Address Predictors, In: Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA), May 1999.
Google Scholar
Roth A, Moshovos A, Sohi GS (1998) Dependence Based Prefetching for Linked Data Structures, In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASP-LOS), October 1998.
Google Scholar
Bekerman M, Yoaz A, Gabbay F, Jourdan S, Kalaev M, Ronen R (2000) Early Load Address Resolution via Register Tracking, In: Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA), June 2000.
Google Scholar
Lipasti MH, Wilkerson CB, Shen JP (1996) Value Locality and Load Value Prediction, In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 138–147, October 1996.
Google Scholar
Chen T-F, Baer J-L (1995) Effective Hardware-Based Data Prefetching for High Performance Processors, IEEE Transactions on Computers, 44(5):609–623, May.
Article MATH Google Scholar
Eikermeyer RJ, Vassiliadis S (1993) A Load Instruction Unit for Pipelined Processors, IBM Journal of Research and Development, 37:547–564, July.
Article Google Scholar
Sazeides Y, Smith JE (1997) The Predictability of Data Values, In: Proceedings of the 30th International Symposium on Microarchitecture (MICRO), pp. 248–258, December 1997.
Google Scholar
Carlisle MC and Rogers A (1995) Software Caching and Computation Migration on Olden, In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), July 1995.
Google Scholar
Smith J (2000) Slow Wires, Hot Chips, and Leaky Transistors: New Challenges in the New Millennium, Panel at the International Symposium on Computer Architecture (ISCA), June 2000.
Google Scholar
Gonzalez J, Gonzalez A (1997) Speculative Execution via Address Prediction and Data Prefetching, In: Proceedings of the 11th International Conference on Supercomputing (ICS), p. 196–203, July 1997.
Google Scholar
Austin TM, Sohi G S (1995) Zero-cycle Loads: Microarchitecture Support for Reducing Load Latency, In: Proceedings of the 28th Annual International Symposium on Microarchitecture (MICRO), pages 82–92, December 1995.
Google Scholar
Black B, Mueller B, Postal S, Rakvie R, Tamaphethai N, and Shen JP (1998) Load Execution Latency Reduction, In: Proceedings of the 12th International Conference on Supercomputing (ICS), June 1998.
Google Scholar
Mukherjee S (2001) The Asim Manual, Confidential Document, Compaq Computer Corporation.
Google Scholar
Bechern C, Combs J, Utamaphethai N, Black B, Blanton RD, Shen JP (1999) An Integrated Functional Performance Simulator, IEEE Micro 19(3):26–35, May/June.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Intel Corporation, Shrewsbury, MA, USA
Pritpal S. Ahuja, Joel Emer, Artur Klauser & Shubhendu S. Mukherjee

Authors

Pritpal S. Ahuja
View author publications
You can also search for this author in PubMed Google Scholar
Joel Emer
View author publications
You can also search for this author in PubMed Google Scholar
Artur Klauser
View author publications
You can also search for this author in PubMed Google Scholar
Shubhendu S. Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer and Information Science, Polytechnic University, Brooklyn, NY, USA
Haldun Hadimioglu
Atheros Communications, Inc., Sunnyvale, CA, USA
Jeffrey Kuskin
Dept. of Computer Science, University of Illinois, Urbana, IL, USA
Josep Torrellas
Dept. of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA
David Kaeli
IBM TJ Watson Research Ctr., Yorktown Heights, NY, USA
Ashwini Nanda

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ahuja, P.S., Emer, J., Klauser, A., Mukherjee, S.S. (2004). Performance Potential of Effective Address Prediction of Load Instructions. In: Hadimioglu, H., Kuskin, J., Torrellas, J., Kaeli, D., Nanda, A. (eds) High Performance Memory Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-8987-1_15

Download citation

DOI: https://doi.org/10.1007/978-1-4419-8987-1_15
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-6477-4
Online ISBN: 978-1-4419-8987-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics