Abstract
The overall performance improvement in Byzantine fault-tolerant state machine replication algorithms has made them a viable option for critical high-performance systems. However, the construction of the proofs necessary to support these algorithms are complex and often make assumptions that may or may not be true in a particular implementation. Furthermore, the transition from theory to practice is difficult and can lead to the introduction of subtle bugs that may break the assumptions that support these algorithms. To address these issues we have developed Hermes, a fault-injector framework that provides an infrastructure for injecting faults in a Byzantine fault-tolerant state machine. Our main goal with Hermes is to help practitioners in the complex process of debugging their implementations of these algorithms, and at the same time increase the confidence of possible adopters, e.g., systems researchers, industry, by allowing them to test the implementations. In this paper, we discuss our experiences with Hermes to inject faults in BFT-SMaRt, a high-performance Byzantine fault-tolerant state machine replication library.
Chapter PDF
Similar content being viewed by others
Keywords
References
Clement, A., Wong, E., Alvisi, L., Dahlin, M., Marchetti, M.: Making Byzantine Fault Tolerant Systems Tolerate Byzantine faults. In: Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2009, Berkeley, CA, USA, pp. 153–168. USENIX Association (2009)
BFT-SMaRt: High-Performance Byzantine Fault-tolerant State Machine Replication, http://code.google.com/p/bft-smart/ (accessed November 4, 2013)
Kiczales, G., Hilsdale, E.: Aspect-Oriented Programming. In: ACM SIGSOFT Software Engineering Notes, vol. 26, p. 313. ACM (2001)
Spinczyk, O., Gal, A., Schröder-Preikschat, W.: AspectC++: an Aspect-Oriented Extension to the C++ Programming Language. In: Proceedings of the 40th International Conference on Tools Pacific: Objects for Internet, Mobile and Embedded Applications, pp. 53–60. Australian Computer Society, Inc. (2002)
Chandra, R., Levefer, R.M., Cukier, M., Sanders, W.H.: Loki: A State-Driven Fault Injector for Distributed Systems. In: International Conference on Dependable Systems and Networks, pp. 237–242 (June 2000)
DBench Project Final Report (May 2004)
Han, S., Rosenberg, H.A., Shin, K.G.: Doctor: An integrated software fault injection environment. In: International Computer Performance and Dependability Symposium, pp. 204–213 (April 1995)
Alvarez, G.A., Cristian, F.: Centralized Failure Injection for Distributed, Fault-Tolerant Protocol Testing. In: International Conference on Distributed Computing Systems, pp. 78–85 (May 1997)
Dawson, S., Jahanian, F., Mitton, T., Tung, T.-L.: Testing of Fault-Tolerant and Real-Time Distributed Systems via Protocol Fault Injection. In: Symposium on Fault Tolerant Computing, pp. 404–414 (June 1996)
Looker, N., Xu, J.: Assessing the Dependability of OGSA Middleware by Fault Injection. In: Proceedings of the 22nd IEEE International Symposium on Reliable Distributed Systems, SRDS 2003, pp. 293–302 (October 2003)
Marsden, E., Fabre, J.-C.: Failure Analysis of an ORB in Presence of Faults. Technical report (October 2001)
Kanawati, G.A., Kanawati, N.A., Abraham, J.A.: FERRARI: A Flexible Software-Based Fault and Error Injection System. IEEE Transactions on Computers 44(2), 248–260 (1995)
Tsai, T.K., Iyer, R.K.: Measuring Fault Tolerance with the FTAPE Fault Injection Tool. In: Beilner, H., Bause, F. (eds.) MMB 1995 and TOOLS 1995. LNCS, vol. 977, pp. 26–40. Springer, Heidelberg (1995)
Carreira, J., Madeira, H., Silva, J.G.: Xception: Software Fault Injection and Monitoring in Processor Functional Units. In: Proceedings of the 5th Annual IEEE International Working Conference on Dependable Computing for Critical Applications, DCCA 1995, pp. 135–149 (1995)
DeVale, J., Koopman, P., Guttendorf, D.: The Ballista Software Robustness Testing Service. In: Proceedings of Testing Computer Software (1999)
Hsueh, M.-C., Tsai, T.K., Iyer, R.K.: Fault Injection Techniques and Tools. Computer 30(4), 75–82 (1997)
Castro, M., Liskov, B.: Practical Byzantine Fault Tolerance and Proactive Recovery. ACM Transactions on Computer Systems 20(4), 398–461 (2002)
Abd-El-Malek, M., Ganger, G.R., Goodson, G.R., Reiter, M.K., Wylie, J.J.: Fault-scalable Byzantine Fault-Tolerant Services. SIGOPS Operating Systems Review 39(5), 59–74 (2005)
Kotla, R., Alvisi, L., Dahlin, M., Clement, A., Wong, E.: Zyzzyva: Speculative byzantine fault folerance. In: Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles, SOSP 2007, pp. 45–58. ACM, New York (2007)
Cowling, J., Myers, D., Liskov, B., Rodrigues, R., Shrira, L.: HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, SOSDI 2006, pp. 177–190. USENIX Association (2006)
Amir, U., Coan, B., Kirsch, J., Lane, J.: Prime: Byzantine Replication under Attack. IEEE Transactions on Dependable and Secure Computing 8(4), 564–577 (2011)
Amir, Y., Danilov, C., Dolev, D., Kirsch, J., Lane, J., Nita-Rotaru, C., Olsen, J., Zage, D.: Steward: Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks. IEEE Transactions on Dependable and Secure Computing 7(1), 80–93 (2010)
Yin, J., Martin, J.-P., Venkataramani, A., Alvisi, L., Dahlin, M.: Separating Agreement From Execution for Byzantine Fault Tolerant Services. ACM SIGOPS Operating Systems Review 37(5), 253–267 (2003)
Martin, J.-P., Alvisi, L.: Fast byzantine consensus. IEEE Transactions on Dependable and Secure Computing 3(3), 202–215 (2006)
Amir, Y., Coan, B., Kirsch, J., Lane, J.: Customizable Fault Tolerance forWide-Area Replication. In: Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems, SRDS 2007, pp. 65–82. IEEE (2007)
Li, J., Mazieres, D.: Beyond One-Third Faulty Replicas in Byzantine Fault Tolerant Systems. In: Proceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2007 (2007)
Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C.V., Loingtier, J.-M., Irwin, J.: Aspect-oriented programming. In: Akşit, M., Matsuoka, S. (eds.) ECOOP 1997. LNCS, vol. 1241, pp. 220–242. Springer, Heidelberg (1997)
Sousa, J., Bessani, A.: From Byzantine Consensus to BFT State Machine Replication: A Latency-Optimal Transformation. In: Proceedings of the 9th European Dependable Computing Conference, EDCC 2012, pp. 37–48. IEEE Computer Society, Washington, DC (2012)
IETF. An Architecture for Differentiated Services, http://www.ietf.org/rfc/rfc2475.txt (accessed October 17, 2011)
Dixit, M., Casimiro, A., Lollini, P., Bondavalli, A., Verissimo, P.: Adaptare: Supporting Automatic and Dependable Adaptation in Dynamic Environments. ACM Transactions on Autonomous and Adaptive Systems (TAAS) 7(2), 18 (2012)
McKeown, N., Anderson, T., Balakrishnan, H., Parulkar, G., Peterson, L., Rexford, J., Shenker, S., Turner, J.: OpenFlow: Enabling Innovation in Campus Networks. ACM SIGCOMM Computer Communication Review 38(2), 69–74 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 IFIP International Federation for Information Processing
About this paper
Cite this paper
Martins, R. et al. (2013). Experiences with Fault-Injection in a Byzantine Fault-Tolerant Protocol. In: Eyers, D., Schwan, K. (eds) Middleware 2013. Middleware 2013. Lecture Notes in Computer Science, vol 8275. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45065-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-45065-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45064-8
Online ISBN: 978-3-642-45065-5
eBook Packages: Computer ScienceComputer Science (R0)