Abstract
Distributed real-time and embedded (DRE) systems often require support for multiple simultaneous quality of service (QoS) properties, such as real-timeliness and fault tolerance, that operate within resource constrained environments. These resource constraints motivate the need for a lightweight middleware infrastructure, while the need for simultaneous QoS properties require the middleware to provide fault tolerance capabilities that respect time-critical needs of DRE systems. Conventional middleware solutions, such as Fault-tolerant CORBA (FT-CORBA) and Continuous Availability API for J2EE, have limited utility for DRE systems because they are heavyweight (e.g., the complexity of their feature-rich fault tolerance capabilities consumes excessive runtime resources), yet incomplete (e.g., they lack mechanisms that enable fault tolerance while maintaining real-time predictability).
This paper provides three contributions to the development and standardization of lightweight real-time and fault-tolerant middleware for DRE systems. First, we discuss the challenges in realizing real-time fault-tolerant solutions for DRE systems using contemporary middleware. Second, we describe recent progress towards standardizing a CORBA lightweight fault-tolerance specification for DRE systems. Third, we present the architecture of FLARe, which is a prototype based on the OMG real-time fault-tolerant CORBA middleware standardization efforts that is lightweight (e.g., leverages only those server- and client-side mechanisms required for real-time systems) and predictable (e.g., provides fault-tolerant mechanisms that respect time-critical performance needs of DRE systems).
Chapter PDF
References
Assayad, I., Girault, A., Kalla, H.: A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints. In: DSN 2004, Florence, Italy, p. 347 (2004)
Balasubramanian, J., Tambe, S., Gokhale, A., Lu, C., Gill, C., Schmidt, D.C.: FLARe: A Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems. Technical Report ISIS-07-812, Institute for Software Integrated Systems, Vander- bilt University, Nashville, TN (May 2007)
Bennani, T., Blain, L., Courtes, L., Fabre, J.-C., Killijian, M.-O., Marsden, E., Taiani, F.: Implementing Simple Replication Protocols using CORBA Portable Interceptors and Java Serialization. In: DSN 2004, Florence, Italy, pp. 549–554 (2004)
Déplanche, A.M., Théaudi‘ere, P.Y., Trinquet, Y.: Implementing a semi-active replication strategy in chorus/classix, a distributed real-time executive. In: SRDS 1999: Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems, Washington, DC, USA, p. 90. IEEE Computer Society, Los Alamitos (1999)
Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Reading (1995)
Gonzalez, O., Shrikumar, H., Stankovic, J.A., Ramamritham, K.: Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling. In: RTSS 1997, San Francisco, CA, USA, p. 79 (1997)
Douglas Jensen, E.: Distributed Real-time Specification for Java (2000), java.sun.com/aboutJava/communityprocess/jsr/jsr_050_drt.html
Kalogeraki, V., Melliar-Smith, P.M., Moser, L.E.: Dynamic Scheduling of Distributed Method Invocations. In: 21st IEEE Real-time Systems Symposium, Orlando. IEEE, Los Alamitos (2000)
Kim, K.H., Subbaraman, C.: The pstr/sns scheme for real-time fault tolerance via active object replication and network surveillance. IEEE Trans. on Know. and Data Engg. 12(2) (2000)
Lehoczky, J., Sha, L., Ding, Y.: The Rate Monotonic Scheduling Algorithm: Exact Characterization and Average Case Behavior. In: RTSS 1989, pp. 166–171 (1989)
Marin, O., Bertier, M., Sens, P.: Darx: A framework for the fault tolerant support of agent software. In: ISSRE 2003: Proceedings of the 14th International Symposium on Software Reliability Engineering, Washington, DC, USA, p. 406. IEEE Computer Society, Los Alamitos (2003)
Van Moorsel, A.P.A.: The ’qos query service’ for improved quality-of-service decision making in corba. In: SRDS 1999, Lausanne, Switzerland, p. 274 (1999)
Object Management Group. Fault Tolerant CORBA, Chapter 23, CORBA v3.0.3, OMG Document formal/04-03-10 edition (March 2004)
Object Management Group. Real-time CORBA Specification v1.2 (static), OMG Document formal/05-01-04 edition (November 2005)
Object Management Group. Lightweight Real-Time Fault Tolerant CORBA DRAFT RFP, OMG Document realtime/06-06-06 edition (June 2006)
Felber, P., Narasimhan, P.: Experiences, Approaches and Challenges in building Fault-tolerant CORBA Systems. Transactions of Computers 54(5), 497–511 (2004)
Pertet, S., Narasimhan, P.: Proactive recovery in distributed corba applications. In: DSN 2004, Florence, Italy, p. 357 (2004)
Powell, D.: Distributed fault tolerance: Lessons from delta-4. IEEE Micro. 14(1), 36–47 (1994)
Prez-Sorrosal, F., Patino-Martinez, M., Jimenez-Peris, R., Vuckovic, J.: Highly available long running transactions and activities for j2ee applications. In: ICDCS 2006: Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, Washington, DC, USA, p. 2. IEEE Computer Society, Los Alamitos (2006)
Ravindran, B., Curley, E., Anderson, J.S., Jensen, E.D.: On best-effort real-time assurances for recovering from distributable thread failures in distributed real-time systems. In: ISORC 2007: Proceedings of the 10th IEEE In-ternational Symposium on Object and Component-Oriented Real-Time Distributed Computing, Washington, DC, USA, pp. 344–353. IEEE Computer Society, Los Alamitos (2007)
Schmidt, D.C., Stal, M., Rohnert, H., Buschmann, F.: Pattern- Oriented Software Architecture: Patterns for Concurrent and Networked Objects, vol. 2. Wiley & Sons, New York (2000)
Stewart, R., Xie, Q.: Stream Control Transmission Protocol (SCTP) A Reference Guide. Addison-Wesley, Reading (2001)
Sun Microsystems. Java Specification Request, JSR 117, J2EE APIs for Continu- ous Availability, JSR 117 edition (April 2001)
Wang, F., Ramamritham, K., Stankovic, J.A.: Determining redun- dancy levels for fault tolerant real-time systems. IEEE Transactions on Computers 44(2), 292–301 (1995)
Cai, Z., Kumar, V., Cooper, B.F., Eisenhauer, G., Schwan, K., Strom, R.E.: Utility-Driven Proactive Management of Availability in Enterprise-Scale Information Flows. Proceedings of ACM/Usenix/IFIP Middleware, 382–403 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 IFIP International Federation for Information Processing
About this paper
Cite this paper
Balasubramanian, J., Gokhale, A., Schmidt, D.C., Wang, N. (2008). Towards Middleware for Fault-Tolerance in Distributed Real-Time and Embedded Systems. In: Meier, R., Terzis, S. (eds) Distributed Applications and Interoperable Systems. DAIS 2008. Lecture Notes in Computer Science, vol 5053. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68642-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-68642-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68639-2
Online ISBN: 978-3-540-68642-2
eBook Packages: Computer ScienceComputer Science (R0)