Abstract
This paper proposes a methodology to study the data reuse quality of task-parallel runtimes. We introduce an coarse-grain version of the reuse distance method called Kernel Reuse Distance (KRD). The metric is a low-overhead alternative designed to analyze data reuse at the socket level while minimizing perturbation to the parallel schedule. Using the KRD metric we show that reuse depends considerably on the system configuration (sockets, cores) and on the runtime scheduler. Furthermore, we correlate KRD with hardware metrics such as cache misses and work time inflation. Overall we found that KRD can be used effectively to assess data reuse in parallel applications. The study also revealed that several current runtimes suffer from severe bottlenecks at scale which often dominate performance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
OpenMP ARB: Openmp specification (July 2013), http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
Intel Corporation: Threading building blocks, https://www.threadingbuildingblocks.org/
MIT Csail Supertech Research Group: The cilk project, http://supertech.csail.mit.edu/cilk/
Frigo, M., Leiserson, C.E., Randall, K.H.: The Implementation of the Cilk-5 Multithreaded Language. In: Proceedings of SIGPLAN 1998 (June 1998)
Mohr, E., Kranz, D.A., Halstead, R.H.: Lazy Task Creation: A technique for Increasing the Granularity of Parallel Programs. IEEE Transactions on Parallel and Distributed Systems 2(3) (July 1991)
Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and Mitigating Work Time Inflation in Task Parallel Programs. In: Proceedings of SC 2012 (November 2012)
Tallent, N.R., Mellor-Crummey, J.M.: Effective Performance Measurement and Analysis of Multithreaded Applications. In: Proceedings of PPoPP 2009 (February 2009)
Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir Performance Analysis Tool-Set, pp. 139–155. Springer, Heidelberg (2008)
Barcelona Supercomputing Center: Extrae User Guide Manual (May 2013)
Virtual Institute - High Productivity Supercomputing: SCORE-P User Manual (2013)
McCurdy, C., Vetter, J.: Memphis: Finding and Fixing NUMA-related Performance Problems on Multi-core Platforms. In: Proceedings of ISPASS 2010 (March 2010)
Liu, X., Mellor-Crummey, J.: Pinpointing Data Locality Problems Using Data-centric Analysis. In: Proceedings of CGO 2011 (April 2011)
Intel Corporation: Intel VTune Amplifier XE 2013 (2013), http://software.intel.com/en-us/intel-vtune-amplifier-xe
Mattson, R., Gecsei, J., Slutz, D., Traiger, I.: Evaluation techniques for storage hierarchies. IBM Systems Journal 9(2), 78–117 (1970)
Yokota, R.: exafmm-dev, https://bitbucket.org/rioyokota/exafmm-dev
Taura, K., Yokota, R., Maruyama, N.: A Task Parallelism Meets Fast Multipole Methods. In: Proceedings of the SCALA 2012 Workshop (November 2012)
The MassiveThreads Team: Massivethreads: A lightweight thread library for high productivity languages, http://code.google.com/p/massivethreads/
Nakashima, J., Nakatani, S., Taura, K.: Design and implementation of a customizable work stealing scheduler. In: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2013, pp. 9:1–9:8 (2013)
Intel Corporation: TBB: Scheduling algorithm, http://www.threadingbuildingblocks.org/docs/help/reference/task_scheduler/scheduling_algorithm.htm
Acar, U.A., Blelloch, G.E., Blumofe, R.D.: The Data Locality of Work Stealing. In: Proceedings of SPAA 2000 (2000)
The Qthread Team: The qthread library, http://www.cs.sandia.gov/qthreads/
Wheeler, K., Murphy, R., Thain, D.: Qthreads: An API for programming with millions of lightweight threads. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2008)
Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Prins, J.F.: Scheduling Task Parallelism on Multi-Socket Multicore Systems. In: Proceedings of ROSS 2011, pp. 49–56 (2011)
Weaver, V.M.: Linux perf_event Features and Overhead. In: Proceedings of the 2013 FastPath Workshop (2013)
Beyls, K., D’Hollander, E.H.: Reuse distance as a metric for cache behavior. In: Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems, pp. 617–662 (2001)
Intel Corporation: Intel 64 and ia-32 architectures software developer’s manual volume 3b, http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
PAPI Team: Performance application programming interface, http://icl.cs.utk.edu/papi/
Weinberg, J., McCracken, M.O., Strohmaier, E., Snavely, A.: Quantifying Locality In The Memory Access Patterns of HPC Applications. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (November 2005)
Intel Corporation: An Introduction to the Intel QuickPath Interconnect (2009)
Hackenberg, D., Molka, D., Nagel, W.E.: Comparing Cache Architectures and Coherency Protocols on x86–64 Multicore SMP Systems. In: Proceedings of MICRO 2009 (December 2009)
Acknowledgments
This work has been supported by a JSPS postdoctoral fellowship (P-12044). We would like to thank the anonymous reviewers for their valuable feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Pericàs, M., Amer, A., Taura, K., Matsuoka, S. (2014). Analysis of Data Reuse in Task-Parallel Runtimes. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation. PMBS 2013. Lecture Notes in Computer Science(), vol 8551. Springer, Cham. https://doi.org/10.1007/978-3-319-10214-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-10214-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10213-9
Online ISBN: 978-3-319-10214-6
eBook Packages: Computer ScienceComputer Science (R0)