Checkpointing in Parallel State-Machine Replication

  • Odorico M. Mendizabal
  • Parisa Jalili Marandi
  • Fernando Luís Dotti
  • Fernando Pedone
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8878)


State-machine replication is a popular approach to building fault-tolerant systems, which relies on the sequential execution of commands to guarantee strong consistency. Sequential execution, however, threatens performance. Recently, several proposals have suggested parallelizing the execution model of the replicas to enhance state-machine replication’s performance. Despite their success in accomplishing high performance, the implications of these models on checkpointing and recovery is mostly left unaddressed. In this paper, we focus on the checkpointing problem in the context of Parallel State-Machine Replication. We propose two novel algorithms and assess them through simulation and a real implementation.


State-machine replication checkpointing fault tolerance 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Attiya, H., Welch, J.: Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley-Interscience (2004)Google Scholar
  2. 2.
    Bessani, A., Santos, M., Felix, J., Neves, N., Correia, M.: On the efficiency of durable state machine replication. In: ATC (2001)Google Scholar
  3. 3.
    Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43(2), 225–267 (1996)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. Journal of the ACM (JACM) 35(2), 288–323 (1988)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Elnozahy, E.N.M., Alvisi, L., Wang, Y.M., Johnson, D.B.: A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv. 34(3), 375–408 (2002), CrossRefGoogle Scholar
  6. 6.
    Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. Journal of the ACM (JACM) 32(2), 374–382 (1985)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Guo, Z., Hong, C., Yang, M., Zhou, D., Zhou, L., Zhuang, L.: Rex: Replication at the speed of multi-core. In: Proceedings of the Ninth European Conference on Computer Systems, p. 11. ACM (2014)Google Scholar
  8. 8.
    Kapritsos, M., Wang, Y., Quema, V., Clement, A., Alvisi, L., Dahlin, M.: All about eve: execute-verify replication for multi-core servers. In: OSDI, pp. 237–250. USENIX Association (2012)Google Scholar
  9. 9.
    Kotla, R., Dahlin, M.: High throughput byzantine fault tolerance. In: DSN (2004)Google Scholar
  10. 10.
    Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21(7), 558–565 (1978)CrossRefzbMATHGoogle Scholar
  11. 11.
    Lamport, L.: The part-time parliament. ACM Transactions on Computer Systems (TOCS) 16(2), 133–169 (1998)CrossRefGoogle Scholar
  12. 12.
    Marandi, P.J., Bezerra, C.E.B., Pedone, F.: Rethinking state-machine replication for parallelism. In: ICDCS (2013)Google Scholar
  13. 13.
    Marandi, P.J., Primi, M., Pedone, F.: High performance state-machine replication. In: DSN (2011)Google Scholar
  14. 14.
    Marandi, P.J., Primi, M., Pedone, F.: Multi-Ring Paxos. In: DSN (2012)Google Scholar
  15. 15.
    Santos, N., Schiper, A.: Achieving high-throughput state machine replication in multi-core systems. In: ICDCS (2013)Google Scholar
  16. 16.
    Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys (CSUR) 22(4), 299–319 (1990)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Odorico M. Mendizabal
    • 1
    • 2
  • Parisa Jalili Marandi
    • 3
  • Fernando Luís Dotti
    • 1
  • Fernando Pedone
    • 3
  1. 1.Pontifícia Universidade Católica do Rio Grande do Sul – PUCRSPorto AlegreBrazil
  2. 2.Universidade Federal do Rio Grande – FURGRio GrandeBrazil
  3. 3.University of Lugano – USILuganoSwitzerland

Personalised recommendations