Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition

Tsujita, Yuki; Endo, Toshio

doi:10.1007/978-3-319-61756-5_4

Yuki Tsujita¹⁵ &
Toshio Endo¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10353))

Included in the following conference series:

597 Accesses

Abstract

Recently large scale scientific computation on heterogeneous supercomputers equipped with accelerators is receiving attraction. However, traditional static job execution methods and memory management methods are insufficient in order to harness heterogeneous computing resources including memory efficiently, since they introduce larger data movement costs and lower resource usage. This paper takes the Cholesky decomposition computation, which is an important linear algebra kernel, as the target for optimization. And we describe a scalable data-driven scheduling method and a heterogeneous memory management method in order to improve resource utilization and reduce amount of data movement. Through the performance evaluation on TSUBAME2.5, which is a heterogeneous supercomputer with NVIDIA GPUs, we demonstrate the efficiency of the proposed task scheduling method and data replacement strategies considering data reusability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In Cholesky decomposition, each task depends on two tasks or less.
2.
Note that the input tile data is received by a work thread, not by the ignition thread.

References

Top500. http://www.top500.org/
Tsubame2.5. http://tsubame.gsic.titech.ac.jp/
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. In: Concurrency and Computation: Practice and Experience, pp. 187–198, 23 Februaly 2011
Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Herault, T., Dongarra, J.: PaRSEC: exploiting heterogeneity to enhance scalability. IEEE Comput. Sci. Eng. 15(6), 36–45 (2013)
Article Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed DAG engine for high performance computing. Parallel Comput. 38, 27–51 (2012)
Article Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed DAG engine for high performance computing. Technical report ICL-UT-10-01, Innovative Computing. Laboratory 11 April 2010
Google Scholar
Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D., Whaley, R.C.: The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. In: Technial report UT CS-94-246, LAPACK Working Note 80, September 1994
Google Scholar
Endo, T., Jin, G.: Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations. In: Proceedings of IEEE Cluster Computing (CLUSTER2014), pp. 132–139 (2014)
Google Scholar
Endo, T., Nukada, A., Matsuoka, S., Maruyama, N.: Linpack evaluation on a supercomputer with heterogeneous accelerators. In: Proceedings of IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2010), pp. 1–8 (2010)
Google Scholar
Fujisawa, K., Sato, H., Matsuoka, S., Endo, T., Yamashita, M., Nakata, M.: High-performancd general solver for extremely largescale semidefinite programming problems. In: Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC12), pp. 1–11 (2012)
Google Scholar
Fujisawa, K., Endo, T., Yasui, Y., Sato, H., Matsuzawa, N., Matsuoka, S., Waki, H.: Peta-scale general solver for semidefinite programming problems with over two million constraints. In: Proceedings of the International Conference on Parallel and Distributed Processing Symposium 2014 (IPDPS2014), p. 10 (2014)
Google Scholar
Yamashita, M., Fujisawa, K., Kojima, M.: SDPARA: semidefinite programming algorithm parallel version. Parallel Comput. 29, 1053–1067 (2003)
Article MathSciNet Google Scholar

Download references

Acknowledgment

This research was supported by the Japan Science and Technology Agency (JST), the Core Research of Evolutionary Science and Technology (CREST) research project.

Author information

Authors and Affiliations

Tokyo Institute of Technology, Tokyo, Japan
Yuki Tsujita & Toshio Endo

Authors

Yuki Tsujita
View author publications
You can also search for this author in PubMed Google Scholar
Toshio Endo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuki Tsujita .

Editor information

Editors and Affiliations

Google, Seattle, USA
Narayan Desai
Google, Mountain View, USA
Walfredo Cirne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsujita, Y., Endo, T. (2017). Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition. In: Desai, N., Cirne, W. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP JSSPP 2015 2016. Lecture Notes in Computer Science(), vol 10353. Springer, Cham. https://doi.org/10.1007/978-3-319-61756-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-61756-5_4
Published: 12 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61755-8
Online ISBN: 978-3-319-61756-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics