Global Task Data-Dependencies in PGAS Applications

Schuchart, Joseph; Gracia, José

doi:10.1007/978-3-030-20656-7_16

Global Task Data-Dependencies in PGAS Applications

Joseph Schuchart¹⁸ &
José Gracia¹⁸

Conference paper
First Online: 17 May 2019

1073 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11501))

Abstract

Recent years have seen the emergence of two independent programming models challenging the traditional two-tier combination of message passing and thread-level work-sharing: partitioned global address space (PGAS) and task-based concurrency. In the PGAS programming model, synchronization and communication between processes are decoupled, providing significant potential for reducing communication overhead. At the same time, task-based programming allows to exploit a large degree of shared-memory concurrency. The inherent lack of fine-grained synchronization in PGAS can be addressed through fine-grained task synchronization across process boundaries. In this work, we propose the use of task data dependencies describing the data-flow in the global address space to synchronize the execution of tasks created in parallel on multiple processes. We present a description of the global data dependencies, describe the necessary interactions between the distributed scheduler instances required to handle them, and discuss our implementation in the context of the DASH PGAS framework. We evaluate our approach using the Blocked Cholesky Factorization and the LULESH proxy app, demonstrating the feasibility and scalability of our approach.

We gratefully acknowledge funding by the German Research Foundation (DFG) through the German Priority Programme 1648 Software for Exascale Computing (SPPEXA) in the SmartDASH project and would like to thank all members of the DASH team. We would like to thank the members of the Innovative Computing Lab at the University of Tennessee, Knoxville for their support on PaRSEC.

Joseph Schuchart is a doctoral student at the University of Stuttgart and the main author, claiming exclusive authorship of the design and implementation of the API and distributed scheduler as well as leading authorship of the global task dependency design, the evaluation scenario selection, and interpretation of experimental results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Agullo, E., Aumage, O., Faverge, M., Furmento, N., Pruvost, F., Sergent, M., Thibault, S.P.: Achieving high performance on supercomputers with a sequential task-based programming model. IEEE Trans. Parallel Distrib. Syst. (2018). https://doi.org/10.1109/TPDS.2017.2766064
Article Google Scholar
Amarasinghe, S., et al.: Exascale software study: software challenges in extreme scale systems. Technical report, DARPA IPTO, Air Force Research Labs (2009)
Google Scholar
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11, November 2012. https://doi.org/10.1109/SC.2012.71
Belli, R., Hoefler, T.: Notified access: extending remote memory access programming models for producer-consumer synchronization. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2015)
Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemariner, P., Dongarra, J.: Dague: a generic distributed DAG engine for high performance computing, pp. 1151–1158. IEEE, Anchorage (2011)
Google Scholar
Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl. 21, 291–312 (2007)
Article Google Scholar
Chapman, B.M., Eachempati, D., Chandrasekaran, S.: OpenMP. In: Balaji, P. (ed.) Programming Models for Parallel Computing, pp. 281–322. MIT Press, Cambridge (2015)
Google Scholar
Charles, P., et al.: X10: an object-oriented approach to non-uniform cluster computing. In: ACM Sigplan Notices (2005)
Article Google Scholar
Duran, A., et al.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. (2011). https://doi.org/10.1142/S0129626411000151
Article MathSciNet Google Scholar
Fürlinger, K., et al.: DASH: data structures and algorithms with support for hierarchical locality. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 542–552. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14313-2_46
Chapter Google Scholar
Gómez-Iglesias, A., Pekurovsky, D., Hamidouche, K., Zhang, J., Vienne, J.: Porting scientific libraries to PGAS in XSEDE resources: practice and experience. In: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, XSEDE 2015. ACM (2015)
Google Scholar
Grossman, M., Kumar, V., Budimlic, Z., Sarkar, V.: Integrating asynchronous task parallelism with OpenSHMEM (2016). https://www.cs.rice.edu/~zoran/Publications_files/asyncshmem2016.pdf
Hoque, R., Herault, T., Bosilca, G., Dongarra, J.: Dynamic task discovery in parsec: a data-flow task-based runtime. In: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2017. ACM (2017). https://doi.org/10.1145/3148226.3148233
Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: PGAS 2014. ACM (2014). http://doi.acm.org/10.1145/2676870.2676883
Kalé, L., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of OOPSLA 1993 (1993)
Google Scholar
Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Technical report LLNL-TR-641973 (2013)
Google Scholar
Kumar, V., Zheng, Y., Cavé, V., Budimlić, Z., Sarkar, V.: HabaneroUPC++: a compiler-free PGAS library. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014. ACM (2014). https://doi.org/10.1145/2676870.2676879
Long, B.: Additional parallel features in fortran. SIGPLAN Fortran Forum, 16–23, July 2016. https://doi.org/10.1145/2980025.2980027
Article Google Scholar
Marjanović, V., Labarta, J., Ayguadé, E., Valero, M.: Overlapping communication and computation by using a hybrid MPI/SMPSs approach. In: Proceedings of the 24th ACM International Conference on Supercomputing, ICS 2010. ACM (2010). http://doi.acm.org/10.1145/1810085.1810091
OpenMP Architecture Review Board: OpenMP Application Programming Interface, Version 4.5 (2015). http://www.openmp.org/mp-documents/openmp-4.5.pdf
Reinders, J.: Intel threading Building Blocks: Outfitting C++ for Multicore Processor Parallelism. O’Reilly & Associates, Sebastopol (2007)
Google Scholar
Robison, A.D.: Composable parallel patterns with Intel Cilk Plus. Comput. Sci. Eng. (2013). https://doi.org/10.1109/MCSE.2013.21
Article Google Scholar
Saraswat, V., et al.: The Asynchronous Partitioned Global Address Space Model (2017)
Google Scholar
Schuchart, J., Kowalewski, R., Fuerlinger, K.: Recent experiences in Using MPI-3 RMA in the DASH PGAS runtime. In: Proceedings of Workshops of HPC Asia, HPC Asia 2018. ACM (2018). https://doi.org/10.1145/3176364.3176367
Schuchart, J., Nachtmann, M., Gracia, J.: Patterns for OpenMP task data dependency overhead measurements. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 156–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_11
Chapter Google Scholar
Schuchart, J., Tsugane, K., Gracia, J., Sato, M.: The impact of taskyield on the design of tasks communicating through MPI. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 3–17. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_1
Chapter Google Scholar
Shudler, S., Calotoiu, A., Hoefler, T., Wolf, F.: Isoefficiency in practice: configuring and understanding the performance of task-based applications. SIGPLAN Not., January 2017. https://doi.org/10.1145/3155284.3018770
Article Google Scholar
Slaughter, E., Lee, W., Treichler, S., Bauer, M., Aiken, A.: Regent: a high-productivity programming language for HPC with logical regions. In: SC15: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12, November 2015. https://doi.org/10.1145/2807591.2807629
Tejedor, E., Farreras, M., Grove, D., Badia, R.M., Almasi, G., Labarta, J.: A high-productivity task-based programming model for clusters. Concurr. Comput. Pract. Exp. (2012). https://doi.org/10.1002/cpe.2831
Article Google Scholar
Tillenius, M.: SuperGlue: a shared memory framework using data versioning for dependency-aware task-based parallelization. SIAM J. Sci. Comput. (2015). http://epubs.siam.org/doi/10.1137/140989716
Tsugane, K., Lee, J., Murai, H., Sato, M.: Multi-tasking execution in PGAS language XcalableMP and communication optimization on many-core clusters. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region. ACM (2018). https://doi.org/10.1145/3149457.3154482
YarKhan, A.: Dynamic task execution on shared and distributed memory architectures. Ph.D. thesis (2012)
Google Scholar
Yelick, K., et al.: Productivity and performance using partitioned global address space languages. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, PASCO 2007. ACM (2007)
Google Scholar
Zhou, H., Idrees, K., Gracia, J.: Leveraging MPI-3 Shared-memory extensions for efficient PGAS runtime systems. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 373–384. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48096-0_29
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

High Performance Computing Center Stuttgart (HLRS), Nobelstraße 19, 70569, Stuttgart, Germany
Joseph Schuchart & José Gracia

Authors

Joseph Schuchart
View author publications
You can also search for this author in PubMed Google Scholar
José Gracia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joseph Schuchart .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Michèle Weiland
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
Guido Juckeland
Technical University of Munich, Munich, Germany
Carsten Trinitis
Ohio State University, Columbus, USA
Ponnuswamy Sadayappan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schuchart, J., Gracia, J. (2019). Global Task Data-Dependencies in PGAS Applications. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11501. Springer, Cham. https://doi.org/10.1007/978-3-030-20656-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-20656-7_16
Published: 17 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20655-0
Online ISBN: 978-3-030-20656-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics