Characterizing the Impact of Prefetching on Scientific Application Performance

  • Collin McCurdy
  • Gabriel MarinEmail author
  • Jeffrey S. Vetter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8551)


In order to better understand the impact of hardware and software data prefetching on scientific application performance, this paper introduces two analysis techniques, one micro-architecture-centric and the other application-centric. We use these techniques to analyze representative full-scale production applications from five important Exascale target areas. We find that despite a great diversity in prefetching effectiveness across and even within applications, there is a strong correlation between regions where prefetching is most needed, due to high levels of memory traffic, and where it is most effective. We also observe that the application-centric analysis can explain many of the differences in prefetching effectiveness observed across the studied applications.


Performance evaluation Data streaming Prefetching 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ashby, S., Beckman, P., Chen, J., Colella, P., Collins, B., Crawford, D., Dongarra, J., Kothe, D., Lusk, R., Messina, P., Mezzacappa, T., Moin, P., Norman, M., Rosner, R., Sarkar, V., Siegel, A., Streitz, F., White, A., Wright, M.: The opportunities and challenges of exascale computing. Technical report, U.S. Department of Energy (2010)Google Scholar
  2. 2.
    Chen, Y., Wang, X.: Compact modeling and corner analysis of spintronic memristor. In: IEEE/ACM International Symposium on Nanoscale Architectures 2009 (Nanoarch), pp. 7–12 (2009)Google Scholar
  3. 3.
    Hosomi, M., Yamagishi, H., Yamamoto, T., Bessho, K., Higo, Y., Yamane, K., Yamada, H., Shoji, M., Hachino, H., Fukumoto, C., Nagao, H., Kano, H.: A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram. In: Proc. International Electron Device Meeting Tech. Dig., pp. 459–462 (2005)Google Scholar
  4. 4.
    Bedeschi, F., Fackenthal, R., Resta, C., Donze, E.M., Jagasivamani, M., Buda, E.C., Pellizzer, F., Chow, D.W., Cabrini, A., Calvi, G.M.A., Faravelli, R., Fantini, A., Torelli, G., Mills, D., Gastaldi, R., Casagrande, G.: A Bipolar-Selected Phase Change Memory Featuring Multi-Level Cell Storage. IEEE Journal of Solid-State Circuits 44(1), 217–227 (2009)CrossRefGoogle Scholar
  5. 5.
    Advanced Micro Devices Inc: Software Optimization Guide for AMD Family 10h and 12h Processors (2011)Google Scholar
  6. 6.
    Chen, T.F., Baer, J.L.: A performance study of software and hardware data prefetching schemes. In: Proceedings of the 21st Annual International Symposium on Computer Architecture, ISCA 1994, pp. 223–232. IEEE Computer Society Press, Los Alamitos (1994)Google Scholar
  7. 7.
    Srinath, S., Mutlu, O., Kim, H., Patt, Y.N.: Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, HPCA 2007, pp. 63–74. IEEE Computer Society, Washington, DC (2007)Google Scholar
  8. 8.
    Ebrahimi, E., Mutlu, O., Lee, C.J., Patt, Y.N.: Coordinated control of multiple prefetchers in multi-core systems. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pp. 316–326. ACM, New York (2009)Google Scholar
  9. 9.
    Ebrahimi, E., Lee, C.J., Mutlu, O., Patt, Y.N.: Prefetch-aware shared resource management for multi-core systems. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA 2011, pp. 141–152. ACM, New York (2011)Google Scholar
  10. 10.
    Callahan, D., Kennedy, K., Porterfield, A.: Software prefetching. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-IV, pp. 40–52. ACM, New York (1991)Google Scholar
  11. 11.
    Santhanam, V., Gornish, E.H., Hsu, W.C.: Data prefetching on the hp pa-8000. In: Proceedings of the 24th Annual International Symposium on Computer Architecture, ISCA 1997, pp. 264–273. ACM, New York (1997)Google Scholar
  12. 12.
    Luk, C.K., Mowry, T.C.: Automatic compiler-inserted prefetching for pointer-based applications. IEEE Trans. Comput. 48(2), 134–141 (1999)CrossRefGoogle Scholar
  13. 13.
    Intel Corporation: Optimizing embedded system performance-impact of data prefetching on a medical imaging application (2006)Google Scholar
  14. 14.
    Puzak, T.R., Hartstein, A., Emma, P.G., Srinivasan, V.: When prefetching improves/degrades performance. In: Proceedings of the 2nd Conference on Computing Frontiers, CF 2005, pp. 342–352. ACM, New York (2005)Google Scholar
  15. 15.
    Liu, F., Solihin, Y.: Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors. SIGMETRICS Perform. Eval. Rev. 39(1), 37–48 (2011)CrossRefGoogle Scholar
  16. 16.
    Marin, G., McCurdy, C., Vetter, J.S.: Diagnosis and optimization of application prefetching performance. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS 2013, pp. 303–312. ACM, New York (2013)Google Scholar
  17. 17.
    Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proceedings of the 17th Annual International Symposium on Computer Architecture, ISCA 1990, pp. 364–373. ACM, New York (1990)Google Scholar
  18. 18.
    Advanced Micro Devices Inc: BIOS and Kernel Developer’s Guide (BKDG) For AMD Family 10h Processors (2010)Google Scholar
  19. 19.
    Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22(6), 685–701 (2010)Google Scholar
  20. 20.
    McCurdy, C., Vetter, J.: Memphis: Finding and fixing numa-related performance problems on multi-core platforms. In: Proc. of the 2010 IEEE Intl. Symp. on Performance Analysis of Systems Software, pp. 87–96 (March 2010)Google Scholar
  21. 21.
    Portland Group International Inc: PGI Compiler User’s Guide (2012)Google Scholar
  22. 22.
    Gabriel, E., et al.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  23. 23.
    Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2005, pp. 190–200. ACM, New York (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Collin McCurdy
    • 1
  • Gabriel Marin
    • 2
    Email author
  • Jeffrey S. Vetter
    • 1
    • 3
  1. 1.Oak Ridge National LaboratoryOak RidgeUSA
  2. 2.University of TennesseeKnoxvilleUSA
  3. 3.Georgia Institute of TechnologyAtlantaUSA

Personalised recommendations