Skip to main content

An Open MPI Extension for Supporting Task Based Parallelism in Heterogeneous CPU-GPU Clusters

  • Conference paper
  • First Online:
High Performance Computer Applications (ISUM 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 595))

Included in the following conference series:

  • 925 Accesses

Abstract

In this work we identify and analyze some of the patterns appearing in the development and deployment of scientific applications over clusters equipped with heterogeneous computing resources.

The main contributions of this work are the identification of the patterns aforementioned, as well as the design and implementation of an Open MPI extension that supports the development and deployment of applications programmed using a task approach.

In order to illustrate how to use our extension, we provide the implementation and performance evaluation of two sample applications: the N-Body problem and the general matrix multiplication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Huang, C., Lawlor, O., Kalé, L.V.: Adaptive MPI. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 306–322. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Karonis, N.T., Toonen, B., Foster, I.: Mpich-g2: A grid-enabled implementation of the message passing interface (2002)

    Google Scholar 

  3. Song, F., Dongarra, J.: A scalable framework for heterogeneous gpu-based clusters. In: Proceedings of the Twenty-Fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2012, pp. 91–100. ACM, New York (2012)

    Google Scholar 

  4. Kim, J., Seo, S., Lee, J., Nah, J., Jo, G., Lee, J.: Snucl: an opencl framework for heterogeneous cpu/gpu clusters. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS 2012, pp. 341–352. ACM, New York (2012)

    Google Scholar 

  5. Kegel, P., Steuwer, M., Gorlatch, S.: dopencl: towards a uniform programming approach for distributed heterogeneous multi-/many-core systems. In: Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2012 IEEE 26th International, pp. 174–186 (2012)

    Google Scholar 

  6. Aoki, R., Oikawa, S., Tsuchiyama, R., Nakamura, T.: Hybrid opencl: connecting different opencl implementations over network. In: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), pp. 2729–2735, June 2010

    Google Scholar 

  7. Barak, A., Ben-Nun, T., Levy, E., Shiloh, A.: A package for opencl based heterogeneous computing on clusters with many gpu devices. In: 2010 IEEE International Conference on Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), pp. 1–7, September 2010

    Google Scholar 

  8. Alves, A., Rufino, J., Pina, A., Santos, L.P.: clOpenCL - supporting distributed heterogeneous computing in HPC clusters. In: Caragiannis, I., et al. (eds.) Euro-Par Workshops 2012. LNCS, vol. 7640, pp. 112–122. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  9. The MPI Forum. MPI: A Message-Passing Interface Standard, 10 2012. Ver. 3.0

    Google Scholar 

  10. Sun, E., Schaa, D., Bagley, R., Rubin, N., Kaeli, D.: Enabling task-level scheduling on heterogeneous platforms. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pp. 84–93. ACM, New York (2012)

    Google Scholar 

  11. Denis, A., Pérez, C., Priol, T.: Towards high performance CORBA and MPI middlewares for grid computing. In: Lee, C.A. (ed.) GRID 2001. LNCS, vol. 2242, pp. 14–25. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  12. Seymour, K., Nakada, H., Matsuoka, S., Dongarra, J., Lee, C., Casanova, H.: Gridrpc: A remote procedure call api for grid computing (2002)

    Google Scholar 

  13. Foster, I.: Globus toolkit version 4: software for service-oriented systems. In: Jin, H., Reed, D., Jiang, W. (eds.) NPC 2005. LNCS, vol. 3779, pp. 2–13. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Vadhiyar, S.S., Dongarra, J.J.: Gradsolvea grid-based RPC system for parallel computing with application-level scheduling. J. Parallel Distrib. Comput. 64(6), 774–783 (2004). YJPDC Special Issue on Middleware

    Article  Google Scholar 

  15. Cybenko, G.: Dynamic load balancing for distributed memory multiprocessors. J. Parallel Distrib Comput. 7(2), 279–301 (1989)

    Article  Google Scholar 

  16. Barak, A., Margolin, A., Shiloh, A.: Automatic resource-centric process migration for MPI. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds.) EuroMPI 2012. LNCS, vol. 7490, pp. 163–172. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  17. Bhatelé, A., Kalé, L.V., Kumar, S.: Dynamic topology aware load balancing algorithms for molecular dynamics applications. In: Proceedings of the 23rd International Conference on Supercomputing, ICS 2009, pp. 110–116. ACM, New York (2009)

    Google Scholar 

  18. Hu, Y.F., Blake, R.J., Emerson, D.R.: An optimal migration algorithm for dynamic load balancing. Concurrency Pract. Experience 10(6), 467–483 (1998)

    Article  MATH  Google Scholar 

  19. Li, Y., Yang, Y., Ma, M., Zhou, L.: A hybrid load balancing strategy of sequential tasks for grid computing environments. Future Gener. Comput. Syst. 25(8), 819–828 (2009)

    Article  Google Scholar 

  20. Li, Y., Lan, Z.: A survey of load balancing in grid computing. In: Zhang, J., He, J.-H., Fu, Y. (eds.) CIS 2004. LNCS, vol. 3314, pp. 280–285. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  21. Ravi, V.T., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: Proceedings of the 24th ACM International Conference on Supercomputing, ICS 2010, pp. 137–146. ACM, New York (2010)

    Google Scholar 

  22. Beltrn, M., Guzmn, A.: How to balance the load on heterogeneous clusters. Int. J. High Perform. Comput. Appl. 23(1), 99–118 (2009)

    Article  Google Scholar 

  23. Boveiri, H.R.: Aco-mts: a new approach for multiprocessor task scheduling based on ant colony optimization. In: 2010 International Conference on Intelligent and Advanced Systems (ICIAS), pp. 1–5 (2010)

    Google Scholar 

  24. Willebeek-LeMair, M.H., Reeves, A.P.: Strategies for dynamic load balancing on highly parallel computers. IEEE Trans. Parallel Distrib. Syst. 4(9), 979–993 (1993)

    Article  Google Scholar 

  25. Romdhanne, B.B., Nikaein, N., Bonnet, C.: Coordinator-master-worker model for efficient large scale network simulation. In: Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques, SimuTools 2013, ICST, Brussels, Belgium, Belgium, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), pp. 119–128 (2013)

    Google Scholar 

  26. Brown, J.A., Porter, L., Tullsen, D.M.: Fast thread migration via cache working set prediction. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), pp. 193–204 (2011)

    Google Scholar 

  27. Shirahata, K., Sato, H., Matsuoka, S.: Hybrid map task scheduling for gpu-based heterogeneous clusters. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 733–740 (2010)

    Google Scholar 

  28. Acosta, A., Blanco, V., Almeida, F.: Towards the dynamic load balancing on heterogeneous multi-gpu systems. In: 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp. 646–653 (2012)

    Google Scholar 

  29. Milojičić, D.S., Douglis, F., Paindaveine, Y., Wheeler, R., Zhou, S.: Process migration. ACM Comput. Surv. 32(3), 241–299 (2000)

    Article  Google Scholar 

  30. The Khronos Group. The OpenCL specification, 11 2012. Ver. 1.2

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Uriel Cabello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Cabello, U., Rodríguez, J., Meneses-Viveros, A. (2016). An Open MPI Extension for Supporting Task Based Parallelism in Heterogeneous CPU-GPU Clusters. In: Gitler, I., Klapp, J. (eds) High Performance Computer Applications. ISUM 2015. Communications in Computer and Information Science, vol 595. Springer, Cham. https://doi.org/10.1007/978-3-319-32243-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32243-8_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32242-1

  • Online ISBN: 978-3-319-32243-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics