Skip to main content

CAPE: A Checkpointing-Based Solution for OpenMP on Distributed-Memory Architectures

  • Conference paper
  • First Online:
Parallel Computing Technologies (PaCT 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11657))

Included in the following conference series:

  • 578 Accesses

Abstract

CAPE, which stands for Checkpointing-Aided Parallel Execution, is a framework that automatically translates and provides runtime functions to execute OpenMP programs on distributed-memory architectures based on checkpointing techniques. In order to execute an OpenMP program on distributed-memory systems, CAPE uses a set of templates to translate an OpenMP source code into a CAPE source code which is then compiled using a regular C/C++ compiler. This code can be executed on distributed-memory systems under the support of the CAPE framework.

This paper aims at presenting the design and implementation of a new execution model based on Time-stamp Incremental Checkpoints. The new execution model allows CAPE to use resources efficiently, avoid the risk of bottlenecks, overcome the requirement of matching the Bernstein’s conditions. As a result, these approaches make CAPE improving the performance, ability as well as reliability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Basumallik, A., Eigenmann, R.: Towards automatic translation of OpenMP to MPI. In: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 189–198. ACM (2005)

    Google Scholar 

  2. Bull, J.M., O’Neill, D.: A microbenchmark suite for OpenMP 2.0. ACM SIGARCH Comput. Archit. News 29(5), 41–48 (2001)

    Article  Google Scholar 

  3. Chen, Z., Sun, J., Chen, H.: Optimizing checkpoint restart with data deduplication. Sci. Program. 2016, 11 (2016)

    Google Scholar 

  4. Cores, I., Rodríguez, M., González, P., Martín, M.J.: Reducing the overhead of an MPI application-level migration approach. Parallel Comput. 54, 72–82 (2016)

    Article  MathSciNet  Google Scholar 

  5. Dorta, A.J., Badía, J.M., Quintana, E.S., de Sande, F.: Implementing OpenMP for clusters on top of MPI. In: Di Martino, B., Kranzlmüller, D., Dongarra, J. (eds.) EuroPVM/MPI 2005. LNCS, vol. 3666, pp. 148–155. Springer, Heidelberg (2005). https://doi.org/10.1007/11557265_22

    Chapter  Google Scholar 

  6. EPCC: EPCC OpenMP micro-benchmark suite. https://www.epcc.ed.ac.uk/research/computing/performance-characterisation-and-benchmarking/epcc-openmp-micro-benchmark-suite

  7. Ha, V.H., Renault, E.: Design and performance analysis of CAPE based on discontinuous incremental checkpoints. In: 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (2011)

    Google Scholar 

  8. Ha, V.H., Renault, É.: Discontinuous incremental: a new approach towards extremely lightweight checkpoints. In: 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), pp. 227–232. IEEE (2011)

    Google Scholar 

  9. Ha, V.H., Renault, E.: Improving performance of CAPE using discontinuous incremental checkpointing. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC), pp. 802–807. IEEE (2011)

    Google Scholar 

  10. Heo, J., Yi, S., Cho, Y., Hong, J., Shin, S.Y.: Space-efficient page-level incremental checkpointing. In: Proceedings of the 2005 ACM symposium on Applied computing, pp. 1558–1562. ACM (2005)

    Google Scholar 

  11. Hoeflinger, J.P.: Extending OpenMP to clusters. White Paper, Intel Corporation (2006)

    Google Scholar 

  12. Huang, L., Chapman, B., Liu, Z.: Towards a more efficient implementation of OpenMP for clusters via translation to global arrays. Parallel Comput. 31(10), 1114–1139 (2005)

    Article  Google Scholar 

  13. Karlsson, S., Lee, S.-W., Brorsson, M.: A fully compliant OpenMP implementation on software distributed shared memory. In: Sahni, S., Prasanna, V.K., Shukla, U. (eds.) HiPC 2002. LNCS, vol. 2552, pp. 195–206. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36265-7_19

    Chapter  Google Scholar 

  14. Li, C.C., Fuchs, W.K.: Catch-compiler-assisted techniques for checkpointing. In: 20th International Symposium Fault-Tolerant Computing. FTCS-20. Digest of Papers, pp. 74–81. IEEE (1990)

    Google Scholar 

  15. Morin, C., Lottiaux, R., Vallée, G., Gallard, P., Utard, G., Badrinath, R., Rilling, L.: Kerrighed: a single system image cluster operating system for high performance computing. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 1291–1294. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45209-6_175

    Chapter  Google Scholar 

  16. OpenMP ARB: OpenMP application program interface version 4.0 (2013)

    Google Scholar 

  17. Plank, J.S., Beck, M., Kingsley, G., Li, K.: Libckpt: Transparent checkpointing under unix. Computer Science Department (1994)

    Google Scholar 

  18. Renault, É.: Distributed implementation of OpenMP based on checkpointing aided parallel execution. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 195–206. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69303-1_22

    Chapter  Google Scholar 

  19. Sato, M., Harada, H., Hasegawa, A., Ishikawa, Y.: Cluster-enabled OpenMP: an OpenMP compiler for the SCASH software distributed shared memory system. Sci. Program. 9(2–3), 123–130 (2001)

    Article  Google Scholar 

  20. Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19(1), 49–66 (2005)

    Article  Google Scholar 

  21. Tran, V.L., Renault, É., Ha, V.H.: Improving the reliability and the performance of CAPE by using MPI for data exchange on network. In: Boumerdassi, S., Bouzefrane, S., Renault, É. (eds.) MSPN 2015. LNCS, vol. 9395, pp. 90–100. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25744-0_8

    Chapter  Google Scholar 

  22. Tran, V.L., Renault, E., Ha, V.H.: Analysis and evaluation of the performance of CAPE. In: IEEE International Symposium on IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress, pp. 620–627. IEEE (2016)

    Google Scholar 

  23. Tran, V.L., Renault, É., Ha, V.H., Do, X.H.: Implementation of OpenMP data-sharing on cape. In: 9th International Symposium on Information and Communication Technology SoICT 2018, pp. 359–366. ACM (2018)

    Google Scholar 

  24. Tran, V.L., Renault, É., Ha, V.H., Do, X.H.: Time-stamp incremental checkpointing and its application for an optimization of execution model to improve performance of cape. Informatica 42(3) (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Van Long Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tran, V.L., Renault, É., Ha, V.H. (2019). CAPE: A Checkpointing-Based Solution for OpenMP on Distributed-Memory Architectures. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2019. Lecture Notes in Computer Science(), vol 11657. Springer, Cham. https://doi.org/10.1007/978-3-030-25636-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-25636-4_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-25635-7

  • Online ISBN: 978-3-030-25636-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics