Analysis of Data Reuse in Task-Parallel Runtimes

Pericàs, Miquel; Amer, Abdelhalim; Taura, Kenjiro; Matsuoka, Satoshi

doi:10.1007/978-3-319-10214-6_4

Analysis of Data Reuse in Task-Parallel Runtimes

Miquel Pericàs¹⁶,
Abdelhalim Amer¹⁷,
Kenjiro Taura¹⁸ &
…
Satoshi Matsuoka^16,17

Conference paper
First Online: 01 January 2014

825 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8551))

Abstract

This paper proposes a methodology to study the data reuse quality of task-parallel runtimes. We introduce an coarse-grain version of the reuse distance method called Kernel Reuse Distance (KRD). The metric is a low-overhead alternative designed to analyze data reuse at the socket level while minimizing perturbation to the parallel schedule. Using the KRD metric we show that reuse depends considerably on the system configuration (sockets, cores) and on the runtime scheduler. Furthermore, we correlate KRD with hardware metrics such as cache misses and work time inflation. Overall we found that KRD can be used effectively to assess data reuse in parallel applications. The study also revealed that several current runtimes suffer from severe bottlenecks at scale which often dominate performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

OpenMP ARB: Openmp specification (July 2013), http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
Intel Corporation: Threading building blocks, https://www.threadingbuildingblocks.org/
MIT Csail Supertech Research Group: The cilk project, http://supertech.csail.mit.edu/cilk/
Frigo, M., Leiserson, C.E., Randall, K.H.: The Implementation of the Cilk-5 Multithreaded Language. In: Proceedings of SIGPLAN 1998 (June 1998)
Google Scholar
Mohr, E., Kranz, D.A., Halstead, R.H.: Lazy Task Creation: A technique for Increasing the Granularity of Parallel Programs. IEEE Transactions on Parallel and Distributed Systems 2(3) (July 1991)
Google Scholar
Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and Mitigating Work Time Inflation in Task Parallel Programs. In: Proceedings of SC 2012 (November 2012)
Google Scholar
Tallent, N.R., Mellor-Crummey, J.M.: Effective Performance Measurement and Analysis of Multithreaded Applications. In: Proceedings of PPoPP 2009 (February 2009)
Google Scholar
Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir Performance Analysis Tool-Set, pp. 139–155. Springer, Heidelberg (2008)
Google Scholar
Barcelona Supercomputing Center: Extrae User Guide Manual (May 2013)
Google Scholar
Virtual Institute - High Productivity Supercomputing: SCORE-P User Manual (2013)
Google Scholar
McCurdy, C., Vetter, J.: Memphis: Finding and Fixing NUMA-related Performance Problems on Multi-core Platforms. In: Proceedings of ISPASS 2010 (March 2010)
Google Scholar
Liu, X., Mellor-Crummey, J.: Pinpointing Data Locality Problems Using Data-centric Analysis. In: Proceedings of CGO 2011 (April 2011)
Google Scholar
Intel Corporation: Intel VTune Amplifier XE 2013 (2013), http://software.intel.com/en-us/intel-vtune-amplifier-xe
Mattson, R., Gecsei, J., Slutz, D., Traiger, I.: Evaluation techniques for storage hierarchies. IBM Systems Journal 9(2), 78–117 (1970)
Article Google Scholar
Yokota, R.: exafmm-dev, https://bitbucket.org/rioyokota/exafmm-dev
Taura, K., Yokota, R., Maruyama, N.: A Task Parallelism Meets Fast Multipole Methods. In: Proceedings of the SCALA 2012 Workshop (November 2012)
Google Scholar
The MassiveThreads Team: Massivethreads: A lightweight thread library for high productivity languages, http://code.google.com/p/massivethreads/
Nakashima, J., Nakatani, S., Taura, K.: Design and implementation of a customizable work stealing scheduler. In: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2013, pp. 9:1–9:8 (2013)
Google Scholar
Intel Corporation: TBB: Scheduling algorithm, http://www.threadingbuildingblocks.org/docs/help/reference/task_scheduler/scheduling_algorithm.htm
Acar, U.A., Blelloch, G.E., Blumofe, R.D.: The Data Locality of Work Stealing. In: Proceedings of SPAA 2000 (2000)
Google Scholar
The Qthread Team: The qthread library, http://www.cs.sandia.gov/qthreads/
Wheeler, K., Murphy, R., Thain, D.: Qthreads: An API for programming with millions of lightweight threads. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2008)
Google Scholar
Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Prins, J.F.: Scheduling Task Parallelism on Multi-Socket Multicore Systems. In: Proceedings of ROSS 2011, pp. 49–56 (2011)
Google Scholar
Weaver, V.M.: Linux perf_event Features and Overhead. In: Proceedings of the 2013 FastPath Workshop (2013)
Google Scholar
Beyls, K., D’Hollander, E.H.: Reuse distance as a metric for cache behavior. In: Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems, pp. 617–662 (2001)
Google Scholar
Intel Corporation: Intel 64 and ia-32 architectures software developer’s manual volume 3b, http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
PAPI Team: Performance application programming interface, http://icl.cs.utk.edu/papi/
Weinberg, J., McCracken, M.O., Strohmaier, E., Snavely, A.: Quantifying Locality In The Memory Access Patterns of HPC Applications. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (November 2005)
Google Scholar
Intel Corporation: An Introduction to the Intel QuickPath Interconnect (2009)
Google Scholar
Hackenberg, D., Molka, D., Nagel, W.E.: Comparing Cache Architectures and Coherency Protocols on x86–64 Multicore SMP Systems. In: Proceedings of MICRO 2009 (December 2009)
Google Scholar

Download references

Acknowledgments

This work has been supported by a JSPS postdoctoral fellowship (P-12044). We would like to thank the anonymous reviewers for their valuable feedback.

Author information

Authors and Affiliations

Global Scientific Information and Computing Center, Tokyo Institute of Technology, Tokyo, Japan
Miquel Pericàs & Satoshi Matsuoka
Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, Tokyo, Japan
Abdelhalim Amer & Satoshi Matsuoka
Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Kenjiro Taura

Authors

Miquel Pericàs
View author publications
You can also search for this author in PubMed Google Scholar
Abdelhalim Amer
View author publications
You can also search for this author in PubMed Google Scholar
Kenjiro Taura
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Matsuoka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miquel Pericàs .

Editor information

Editors and Affiliations

University of Warwick Coventry, West Midlands, United Kingdom
Stephen A. Jarvis
University of Warwick Coventry, West Midlands, United Kingdom
Steven A. Wright
Sandia National Laboratories CSRI, Albuquerque, New Mexico, USA
Simon D. Hammond

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pericàs, M., Amer, A., Taura, K., Matsuoka, S. (2014). Analysis of Data Reuse in Task-Parallel Runtimes. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation. PMBS 2013. Lecture Notes in Computer Science(), vol 8551. Springer, Cham. https://doi.org/10.1007/978-3-319-10214-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-10214-6_4
Published: 01 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10213-9
Online ISBN: 978-3-319-10214-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics