Advertisement

A Profiling Tool for Detecting Cache-Critical Data Structures

  • Jie Tao
  • Tobias Gaugler
  • Wolfgang Karl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4641)

Abstract

A poor cache behavior can significantly prohibit achieving high speedup and scalability of parallel applications. This means optimizing a program with respect to cache locality can potentially introduce considerable performance gain. As a consequence, programmers usually perform cache locality optimization for acquiring the expected performance of their applications.

Within this work, we developed a data profiling tool dprof with the goal of supporting the users in this task by allowing them to detect the optimization targets in their programs. In contrast to similar tools which mostly focus on code regions, we address data structures because they are the direct objects that programmers have to work with. Based on the Performance Monitoring Unit (PMU) provided by modern processors, dprof is capable of finding cache-critical variables, arrays, or even a segment of an array. It can also locate theses access hotspots to the most concrete position such as individual functions and code lines. This feature allows the user to apply dprof for efficient cache optimization.

Keywords

Cache Line Code Line Virtual Address Cache Performance Cache Level 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dongarra, J., London, K., Moore, S., Mucci, P., Terpstra, D.: Using PAPI For Hardware Performance Monitoring On Linux Systems. In: Linux Clusters: The HPC Revolution (June 2001)Google Scholar
  2. 2.
    Bailey, D., et al.: The NAS Parallel Benchmarks. Technical Report RNR-94-007, Department of Mathematics and Computer Science, Emory University (March 1994)Google Scholar
  3. 3.
    Fung, S.: Improving Cache Locality for Thread-Level Speculation. Master’s thesis, University of Toronto (2005)Google Scholar
  4. 4.
    Fürlinger, K., Gerndt, M.: Analyzing Overheads and Scalability Characteristics of OpenMP Applications. In: Proceedings of the 7th International Meeting on High Performance Computing for Computational Science (July 2006)Google Scholar
  5. 5.
    Ghosh, S., Martonosi, M., Malik, S.: Automated Cache Optimizations using CME Driven Diagnosis. In: Proceedings of the 2000 International Conference on Supercomputing, pp. 316–326 (2000)Google Scholar
  6. 6.
    HP. Perfmon Project Web Site. Available at http://www.hpl.hp.com/research/linux/perfmon/
  7. 7.
    Intel Corporation. Intel VTune Performance Analyzer, available at http://www.cts.com.au/vt.html
  8. 8.
    Intel Corporation. Intel Itanium Architecture Software Developer’s Manual, vol. 1–3 (2002), available at http://developer.intel.com/design/itanium/manuals/iiasdmanual.htm
  9. 9.
    Intel Corporation. IA-32 Intel Architecture Software Developer’s Manual, vol. 1–3. Available at Intel’s developer website (2004)Google Scholar
  10. 10.
    Jin, H., Frumkin, M., Yan, J.: The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NASA Ames Research Center (October 1999)Google Scholar
  11. 11.
    Martonosi, M., Gupta, A., Anderson, T.: Tuning Memory Performance of Sequential and Parallel Programs. Computer 28(4), 32–40 (1995)CrossRefGoogle Scholar
  12. 12.
    Quaing, B., Tao, J., Karl, W.: YACO: A User Conducted Visualization Tool for Supporting Cache Optimization. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J.J. (eds.) HPCC 2005. LNCS, vol. 3726, pp. 694–703. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Sun Microsystems. UltraSPARC IIi User’s Manual October (1997), available at http://www.sun.com/processors/documentation.html
  14. 14.
    Welbon, E., et al.: The POWER2 Performance Monitor. IBM Journal of Research and Development 38(5) (1994)Google Scholar
  15. 15.
    WWW. Cachegrind: a cache-miss profiler, available at http://developer.kde.org/~sewardj/docs-2.2.0/cg_main.html#cg-top

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Jie Tao
    • 1
    • 2
  • Tobias Gaugler
    • 3
  • Wolfgang Karl
    • 3
  1. 1.Department of Computer Science and Technology, Jilin UniversityP.R. China
  2. 2.Institut für wissenschaftliches Rechnen, Forschungszentrum Karlsruhe GmbHGermany
  3. 3.Institut für Technische Informatik, Universität Karlsruhe (TH)Germany

Personalised recommendations