Advertisement

Speculative Prefetching of Induction Pointers

  • Artour Stoutchinin
  • José Nelson Amaral
  • Guang R. Gao
  • James C. Dehnert
  • Suneel Jain
  • Alban Douillet
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2027)

Abstract

We present an automatic approach for prefetching data for linked list data structures. The main idea is based on the observation that linked list elements are frequently allocated at constant distance from one another in the heap. When linked lists are traversed, a regular pattern of memory accesses with constant stride emerges. This regularity in the memory footprint of linked lists enables the development of a prefetching framework where the address of the element accessed in one of the future iterations of the loop is dynamically predicted based on its previous regular behavior.

We automatically identify pointer-chasing recurrences in loops that access linked lists. This identification uses a surprisingly simple method that looks for induction pointers — pointers that are updated in each loop iteration by a load with a constant offset. We integrate induction pointer prefetching with loop scheduling. A key intuition incorporated in our framework is to insert prefetches only if there are processor resources and memory bandwidth available. In order to estimate available memory bandwidth we calculate the number of potential cache misses in one loop iteration. Our estimation algorithm is based on an application of graph coloring on a memory access interference graph derived from the control flow graph. We implemented the prefetching framework in an industry-strength production compiler, and performed experiments on ten benchmark programs with linked lists. We observed performance improvements between 15% and 35% in three of them.

Keywords

Basic Block Memory Reference Memory Bandwidth Initiation Rate Control Path 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    T. F. Chen and J. L. Baer. Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers, 44:609–623, May 1995.zbMATHCrossRefGoogle Scholar
  2. 2.
    T. M. Chilimbi, M. D. Hill, and J. R. Larus. Making pointer-based data structures cache conscious. Computer, 33(12):67–74, December 2000.CrossRefGoogle Scholar
  3. 3.
    F. Chow, S. Chan, R. Kennedy, S-M Liu, R. Lo, and Peng Tu. A new algorithm for partial redundancy elimination based on SSA form. In International Conference on Programming Languages Design and Implementation, pages 273–286, 1997.Google Scholar
  4. 4.
    T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press; McGraw-Hill Book Company, Cambridge, Massachusetts; New York, New York, 1990.Google Scholar
  5. 5.
    J. C. Dehnert and R. A. Towle. Compiling for the cydra 5. The Journal of Supercomputing, 7:181–227, May 1993.CrossRefGoogle Scholar
  6. 6.
    J. Fu and J. Patel. Stride directed prefetching in scalar processors. In International Symposium on Microarchitecture, pages 102–110, 1992.Google Scholar
  7. 7.
    J. Gonzales and A. Gonzales. Speculative execution via address prediction and data prefetching. In International Conference on Supercomputing, pages 196–203, 1997.Google Scholar
  8. 8.
    M. Lipasti, W. Schmidt, S. Kunkel, and R. Roediger. SPAID: Software prefetching in pointer-and call-intensive environments. In International Symposium on Microarchitecture, pages 231–236, 1995.Google Scholar
  9. 9.
    C. K. Luk and T. Mowry. Compiler based prefetching for recursive data structures. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 222–233, 1996.Google Scholar
  10. 10.
    S. Mantripragada, S. Jain, and J. Dehnert. A new framework for integrated global local scheduling. In Conference on Parallel Architectures and Compilation Techniques, pages 167–174, Paris, France, October 1998.Google Scholar
  11. 11.
    S. Mehrotra. Data Prefetch Mechanisms for Accelerating Symbolic and Numeric Computation. PhD thesis, University of Illinois at Urbana-Champaign, 1996.Google Scholar
  12. 12.
    T. Mowry. Tolerating Latency Through Software-Controlled Data Prefetching. PhD thesis, Stanford University, 1994.Google Scholar
  13. 13.
    T. Mowry and C. K. Luk. Predicting data cache misses in non-numeric applications through correlation profiling. In International Symposium on Microarchitecture, pages 314–320, 1997.Google Scholar
  14. 14.
    T. Ozawa, Y. Kimura, and S. Nishizaki. Cache miss heuristics and preloading techniques for general-purpose programs. In International Symposium on Microarchitecture, pages 243–248, 1995.Google Scholar
  15. 15.
    B. Rau. Iterative modulo scheduling. Technical Report HPL-94-115, HP Laboratories, 1995.Google Scholar
  16. 16.
    A. Roth, A. Moshovos, and G. Sohi. Dependence based prefetching for linked data structures. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 115–126, 1998.Google Scholar
  17. 17.
    J. Ruttenberg, G. R. Gao, A. Stouchinin, and W. Lichtenstein. Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In International Conference on Programming Languages Design and Implementation, pages 1–11, Philadelphia, PA, May 1996.Google Scholar
  18. 18.
    C. Selvidge. Compilation-Based Prefetching for Memory Latency Tolerance. PhD thesis, MIT, 1992.Google Scholar
  19. 19.
    A. Stoutchinin, J. N. Amaral, G. R. Gao, J. Dehnert, and S. Jain. Automatic prefetching of induction pointers for software pipelining. Technical Report 37, November 1999.Google Scholar
  20. 20.
    R. Tarjan. Enumeration of the elementary circuits of a directed graph. SIAM Journal on Computing, 2(3):211–216, September 1973.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Artour Stoutchinin
    • 1
  • José Nelson Amaral
    • 2
  • Guang R. Gao
    • 3
  • James C. Dehnert
    • 4
  • Suneel Jain
    • 5
  • Alban Douillet
    • 3
  1. 1.STMicroelectronicsGrenobleFrance
  2. 2.Department of Computing ScienceUniversity of AlbertaEdmontonCanada
  3. 3.Computer Architecture and Parallel System LaboratoryUniversity of DelawareNewarkUSA
  4. 4.Transmeta Co.Santa ClaraUSA
  5. 5.Hewlett-Packard Co.CupertinoUSA

Personalised recommendations