Skip to main content

PastryGridCP: A Decentralized Rollback-Recovery Protocol for Desktop Grid Systems

  • Conference paper
Book cover Algorithms and Architectures for Parallel Processing (ICA3PP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8285))

Abstract

Desktop Grids are composed of several thousands of resources. They are characterized by high volatility of resources, due to voluntary disconnections or failures. This could affect the proper termination of applications execution. PastryGrid is a decentralized system which manages desktop grid resources and user applications over a fully decentralized P2P network. In this paper we present PastryGridCP: our rollback-recovery protocol, which is based on checkpoints designed for the decentralized Desktop Grid system PastryGrid. It provides fault tolerance for grid applications and ensures the termination of the execution of applications in a transparent way to users. We have conducted out experimentations on 110 nodes of Grid’5000. Obtained results validate our protocol and improve the performance of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbes, H., Cérin, C., Jemni, M.: PastryGrid: decentralisation of the execution of distributed applications in desktop grid. In: MGC 2008, pp. 1–6 (2008)

    Google Scholar 

  2. Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems (2001)

    Google Scholar 

  3. Abbes, H., Cérin, C., Jemni, M., Missaoui, Y.: Fault tolerance for pastrygrid middleware. In: IPDPS Workshops, pp. 1–8 (2010)

    Google Scholar 

  4. Anderson, D.P.: BOINC: A System for Public-Resource Computing and Storage. In: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, GRID 2004, pp. 4–10. IEEE Computer Society, Washington, DC (2004)

    Google Scholar 

  5. Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the condor experience. Research articles. Concurr. Comput.: Pract. Exper. 17(2-4), 323–356 (2005)

    Article  Google Scholar 

  6. Cappello, F., Djilali, S., Fedak, G., Hérault, T., Magniette, F., Néri, V., Lodygensky, O.: Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid. Future Generation Comp. Syst. 21(3), 417–437 (2005)

    Article  Google Scholar 

  7. Chien, A., Calder, B., Elbert, S., Bhatia, K.: Entropia: architecture and performance of an enterprise desktop grid system. J. Parallel Distrib. Comput. 63(5), 597–610 (2003)

    Article  Google Scholar 

  8. Rilling, L.: Vigne: Towards a self-healing grid operating system. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 437–447. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Cirne, W., Vilar Brasileiro, F., Andrade, N., Costa, L., Andrade, A., Novaes, R., Mowbray, M.: Labs of the World, Unite!!! J. Grid Comput. 4(3), 225–246 (2006)

    Article  MATH  Google Scholar 

  10. Chakravarti, A.J., Baumgartner, G., Lauria, M.: The organic grid: self-organizing computation on a peer-to-peer network. Trans. Sys. Man Cyber. Part A 35(3), 373–384 (2005)

    Article  Google Scholar 

  11. Schulz, S., Blochinger, W., Held, M., Dangelmayr, C.: COHESION - A microkernel based Desktop Grid platform for irregular task-parallel applications. Future Gener. Comput. Syst. 24(5), 354–370 (2008)

    Article  Google Scholar 

  12. Zhou, D., Lo, V.: Cluster Computing on the Fly: Resource Discovery in a Cycle Sharing Peer-to-Peer System. In: IEEE Intl. Workshop on Global and Peer-to-Peer Computing, pp. 66–73 (2004)

    Google Scholar 

  13. Luther, A., Buyya, R., Ranjan, R., Venugopal, S.: Alchemi: A.NET-based Enterprise Grid Computing System. In: 6th International Conference on Internet Computing (ICOMP 2005), Las Vegas (2005)

    Google Scholar 

  14. Mengotti, T.: GPU, a Framework for Distributed Computing over Gnutella. Master’s thesis, ETH Zuerich, Switzerland (2004)

    Google Scholar 

  15. Abbes, H., Cérin, C., Jemni, M.: A decentralized and fault-tolerant Desktop Grid system for distributed applications. Concurrency and Computation: Practice and Experience 22(3), 261–277 (2010)

    Google Scholar 

  16. Rowstron, A., Druschel, P.: Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In: Proc. of the 18th ACM Symp. on Operating Systems Principles, pp. 188–201. ACM, New York (2001)

    Google Scholar 

  17. Duell, J.: The design and implementation of Berkeley Labs linux Checkpoint/Restart. Technical report (2003)

    Google Scholar 

  18. PastryGrid Source Code (May 2013), http://sourceforge.net/projects/pastrygrid/

  19. Mehnert-Spahn, J., Ropars, T., Schoettner, M., Morin, C.: The architecture of the xtreemOS grid checkpointing service. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 429–441. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Abbes, H., Louati, T. (2013). PastryGridCP: A Decentralized Rollback-Recovery Protocol for Desktop Grid Systems. In: Kołodziej, J., Di Martino, B., Talia, D., Xiong, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2013. Lecture Notes in Computer Science, vol 8285. Springer, Cham. https://doi.org/10.1007/978-3-319-03859-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03859-9_11

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03858-2

  • Online ISBN: 978-3-319-03859-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics