Skip to main content

Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition

  • Conference paper
  • First Online:
Book cover Job Scheduling Strategies for Parallel Processing (JSSPP 2015, JSSPP 2016)

Abstract

Recently large scale scientific computation on heterogeneous supercomputers equipped with accelerators is receiving attraction. However, traditional static job execution methods and memory management methods are insufficient in order to harness heterogeneous computing resources including memory efficiently, since they introduce larger data movement costs and lower resource usage. This paper takes the Cholesky decomposition computation, which is an important linear algebra kernel, as the target for optimization. And we describe a scalable data-driven scheduling method and a heterogeneous memory management method in order to improve resource utilization and reduce amount of data movement. Through the performance evaluation on TSUBAME2.5, which is a heterogeneous supercomputer with NVIDIA GPUs, we demonstrate the efficiency of the proposed task scheduling method and data replacement strategies considering data reusability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In Cholesky decomposition, each task depends on two tasks or less.

  2. 2.

    Note that the input tile data is received by a work thread, not by the ignition thread.

References

  1. Top500. http://www.top500.org/

  2. Tsubame2.5. http://tsubame.gsic.titech.ac.jp/

  3. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. In: Concurrency and Computation: Practice and Experience, pp. 187–198, 23 Februaly 2011

    Google Scholar 

  4. Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Herault, T., Dongarra, J.: PaRSEC: exploiting heterogeneity to enhance scalability. IEEE Comput. Sci. Eng. 15(6), 36–45 (2013)

    Article  Google Scholar 

  5. Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed DAG engine for high performance computing. Parallel Comput. 38, 27–51 (2012)

    Article  Google Scholar 

  6. Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed DAG engine for high performance computing. Technical report ICL-UT-10-01, Innovative Computing. Laboratory 11 April 2010

    Google Scholar 

  7. Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D., Whaley, R.C.: The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. In: Technial report UT CS-94-246, LAPACK Working Note 80, September 1994

    Google Scholar 

  8. Endo, T., Jin, G.: Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations. In: Proceedings of IEEE Cluster Computing (CLUSTER2014), pp. 132–139 (2014)

    Google Scholar 

  9. Endo, T., Nukada, A., Matsuoka, S., Maruyama, N.: Linpack evaluation on a supercomputer with heterogeneous accelerators. In: Proceedings of IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2010), pp. 1–8 (2010)

    Google Scholar 

  10. Fujisawa, K., Sato, H., Matsuoka, S., Endo, T., Yamashita, M., Nakata, M.: High-performancd general solver for extremely largescale semidefinite programming problems. In: Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC12), pp. 1–11 (2012)

    Google Scholar 

  11. Fujisawa, K., Endo, T., Yasui, Y., Sato, H., Matsuzawa, N., Matsuoka, S., Waki, H.: Peta-scale general solver for semidefinite programming problems with over two million constraints. In: Proceedings of the International Conference on Parallel and Distributed Processing Symposium 2014 (IPDPS2014), p. 10 (2014)

    Google Scholar 

  12. Yamashita, M., Fujisawa, K., Kojima, M.: SDPARA: semidefinite programming algorithm parallel version. Parallel Comput. 29, 1053–1067 (2003)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgment

This research was supported by the Japan Science and Technology Agency (JST), the Core Research of Evolutionary Science and Technology (CREST) research project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuki Tsujita .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Tsujita, Y., Endo, T. (2017). Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition. In: Desai, N., Cirne, W. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP JSSPP 2015 2016. Lecture Notes in Computer Science(), vol 10353. Springer, Cham. https://doi.org/10.1007/978-3-319-61756-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-61756-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-61755-8

  • Online ISBN: 978-3-319-61756-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics