Checkpoints that store intermediate results of computation have a fundamental impact on the computing throughput of Desktop Grid systems, like BOINC. Currently, BOINC workers store their checkpoints locally. A major limitation of this approach is that whenever a worker leaves unfinished computation, no other worker can proceed from the last stable checkpoint. This forces tasks to be restarted from scratch when the original machine is no longer available.
To overcome this limitation, we propose to share checkpoints between nodes. To organize this mechanism, we arrange nodes to form complete graphs (cliques), where nodes share all the checkpoints they compute. Cliques function as survivable units, where checkpoints and tasks are not lost as long as one of the nodes of the clique remains alive. To simplify construction and maintenance of the cliques, we take advantage of the central supervisor of BOINC. To evaluate our solution, we combine simulation with some real data to answer the most fundamental question: what do we need to pay for increased throughput?
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D. Anderson. BOINC: A system for public-resource computing and storage. In 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, USA, 2004.
S. Annapureddy, M. Freedman, and D. Mazieres. Shark: Scaling File Servers via Cooperative Caching. Proceedings of the 2nd USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI), Boston, USA, May, 2005.
C. Christensen, T. Aina, and D. Stainforth. The challenge of volunteer computing with lengthy climate model simulations. In 1st IEEE International Conference on e-Science and Grid Computing, pages 8-15, Melbourne, Australia, 2005. IEEE Computer Society.
Condor-g. http://www.cs.wisc.edu/condor/condorg/.
P. Domingues, F. Araujo, and L. M. Silva. A DHT-based infrastructure for sharing checkpoints in desktop grid computing. In 2nd IEEE International Conference on e-Science and Grid Computing (eScience ’06), Amsterdam, The Netherlands, December 2006.
P. Domingues, P. Marques, and L. Silva. Resource usage of windows computer laboratories. In International Conference Parallel Processing (ICPP 2005)/Workshop PENPCGCS, pages 469-476, Oslo, Norway, 2005.
P. Domingues, J. G. Silva, and L. Silva. Sharing checkpoints to improve turnaround time in desktop grid. In 20th IEEE International Conference on Advanced Information Networking and Applications (AINA 2006), 18-20 April 2006, Vienna, Austria, pages 301-306. IEEE Computer Society, April 2006.
P. Druschel and A. Rowstron. Past: A large-scale, persistent peer-to-peer storage utility. In HotOS VIII, Schoss Elmau, Germany, May 2001.
S. Goel, M. Robson, M. Polte, and E. G. Sirer. Herbivore: A scalable and efficient protocol for anonymous communication. Technical Report TR2003-1890, Cornell University Computing and Information Science Technical, February 2003.
S. Kandula, J. K. Lee, and J. C. Hou. LARK: a light-weight, resilient application-level multicast protocol. In IEEE 18th Annual Workshop on computer Communications (CCW 2003). IEEE, October 2003.
A. Martin, T. Aina, C. Christensen, J. Kettleborough, and D. Stainforth. On two kinds of public-resource distributed computing. In Fourth UK e-Science All Hands Meeting, Nottingham, UK, 2005.
S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. Weatherspoon, and J. Kubiatowicz. Maintenance-free global data storage. IEEE Internet Computing, 5(5):40-49, 2001.
B. Richard, D. Nioclais Mac, and D. Chalon. Clique: A transparent, peer-to-peer collab- orative file sharing system. Technical Report HPL-2002-307, HP Laboratories Grenoble, 2002.
E. Sit, J. Cates, and R. Cox. A DHT-based backup system, 2003.
D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: the Condor experience. Concurrency and Computation Practice and Experience, 17(2-4):323-356, 2005.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Araujo, F., Domingues, P., Kondo, D., Silva, L.M. (2008). Using Cliques Of Nodes To Store Desktop Grid Checkpoints. In: Gorlatch, S., Fragopoulou, P., Priol, T. (eds) Grid Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09457-1_3
Download citation
DOI: https://doi.org/10.1007/978-0-387-09457-1_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09456-4
Online ISBN: 978-0-387-09457-1
eBook Packages: Computer ScienceComputer Science (R0)