Abstract
Modern, deeply pipelined, out-of-order, and speculative microprocessors are still plagued by the latency of load instructions. This latency is dominated by the latencies to resolve the source operands of the load, to compute its effective address, and to fetch the load’s data from caches or the main memory. This chapter examines the performance potential of hiding a load’s data fetch latency using effective address prediction. By predicting the effective address of a load early in the pipeline, we can initiate the cache access early, thereby improving performance.
The current generation of effective address predictors for a load instruction is based on either the history or the context of the specific load. In addition, researchers have examined load-load dependence predictors of prefetch cache misses. This chapter examines the performance potential of using a load-load dependence predictor to predict effective addresses of load instructions and issue them early in the pipeline. We call this predictor the DEAP predictor.
We show that on average DEAP can improve the accuracy of effective address prediction by 28% over a perfect combination of last address, stride address, and context-based address predictors across our seven benchmarks from the SPEC95 and Olden suites. We find that an ideal hybrid of these four predictors—a predictor that always picks the right predictor for a load—can potentially achieve performance close to that of a Perfect predictor in most cases. We use an oracle-based simulation approach to evaluate our timing results. This method allows us to measure the upper bound of the performance from effective address prediction using a mostly realistic pipeline. However, our timing simulation method does not account for the penalty due to mis-prediction of an effective address and assumes a zero-cycle latency from address prediction resolution to address predictor update.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chrysos G, Emer J (1998) Memory Dependence Prediction Using Store Sets, In: Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA), June 1998.
Moshovos A, Breach SE, Vijaykumar TN, Sohi GS (1997) Dynamic Speculation and Synchronization of Data Dependences, In: Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), May 1997.
Reinman G, Calder B (1998) Predictive Techniques for Aggressive Load Speculation, In: Proceedings of the 31st Annual International Symposium on Microarchitecture (MICRO), December 1998.
Bekerman M, Jourdan S, Ronnen R, Kirshenboim G, Rappoport L, Yoaz A, Weiser U (1999) Correlated Load-Address Predictors, In: Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA), May 1999.
Roth A, Moshovos A, Sohi GS (1998) Dependence Based Prefetching for Linked Data Structures, In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASP-LOS), October 1998.
Bekerman M, Yoaz A, Gabbay F, Jourdan S, Kalaev M, Ronen R (2000) Early Load Address Resolution via Register Tracking, In: Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA), June 2000.
Lipasti MH, Wilkerson CB, Shen JP (1996) Value Locality and Load Value Prediction, In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 138–147, October 1996.
Chen T-F, Baer J-L (1995) Effective Hardware-Based Data Prefetching for High Performance Processors, IEEE Transactions on Computers, 44(5):609–623, May.
Eikermeyer RJ, Vassiliadis S (1993) A Load Instruction Unit for Pipelined Processors, IBM Journal of Research and Development, 37:547–564, July.
Sazeides Y, Smith JE (1997) The Predictability of Data Values, In: Proceedings of the 30th International Symposium on Microarchitecture (MICRO), pp. 248–258, December 1997.
Carlisle MC and Rogers A (1995) Software Caching and Computation Migration on Olden, In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), July 1995.
Smith J (2000) Slow Wires, Hot Chips, and Leaky Transistors: New Challenges in the New Millennium, Panel at the International Symposium on Computer Architecture (ISCA), June 2000.
Gonzalez J, Gonzalez A (1997) Speculative Execution via Address Prediction and Data Prefetching, In: Proceedings of the 11th International Conference on Supercomputing (ICS), p. 196–203, July 1997.
Austin TM, Sohi G S (1995) Zero-cycle Loads: Microarchitecture Support for Reducing Load Latency, In: Proceedings of the 28th Annual International Symposium on Microarchitecture (MICRO), pages 82–92, December 1995.
Black B, Mueller B, Postal S, Rakvie R, Tamaphethai N, and Shen JP (1998) Load Execution Latency Reduction, In: Proceedings of the 12th International Conference on Supercomputing (ICS), June 1998.
Mukherjee S (2001) The Asim Manual, Confidential Document, Compaq Computer Corporation.
Bechern C, Combs J, Utamaphethai N, Black B, Blanton RD, Shen JP (1999) An Integrated Functional Performance Simulator, IEEE Micro 19(3):26–35, May/June.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer Science+Business Media New York
About this chapter
Cite this chapter
Ahuja, P.S., Emer, J., Klauser, A., Mukherjee, S.S. (2004). Performance Potential of Effective Address Prediction of Load Instructions. In: Hadimioglu, H., Kuskin, J., Torrellas, J., Kaeli, D., Nanda, A. (eds) High Performance Memory Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-8987-1_15
Download citation
DOI: https://doi.org/10.1007/978-1-4419-8987-1_15
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-6477-4
Online ISBN: 978-1-4419-8987-1
eBook Packages: Springer Book Archive