Tracking Service Availability in Long Running Business Activities

  • Werner Vogels
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2910)


An important factor in the successful deployment of federated web services-based business activities will be the ability to guarantee reliable distributed operation and execution under scalable conditions. For example advanced failure management is essential for any reliable distributed operation but especially for the target areas of web service architectures, where the activities can be constructed out of services located at different enterprises, and are accessed over heterogeneous networks topologies. In this paper we describe the first technologies and implementations coming out of the Obduro project, which has as a goal to apply the results of scalability and reliability research to global scalable service oriented architectures. We present technology developed for failure and availability tracking of processes involved in long running business activities within a web services coordination framework. The Service Tracker, Coordination Service and related development toolkits are available for public usage.


Failure Detection Service Oriented Architecture Member Service Membership State Membership List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Birman, K., van Renesse, R.: Software for reliable networks. Scientific American 274(5), 64–69 (1996)CrossRefGoogle Scholar
  2. 2.
    Cabrera, F., Copeland, G., Cox, B., Freund, T., Klein, J., Storey, T., Thatte, S.: Web Services Transaction, ws-transaction (2002),
  3. 3.
    Cabrera, F., Copeland, G., Freund, T., Klein, J., Langworthy, D., Orchard, D., Shewchuk, J., Storey, T.: Web Services Coordination, ws-coordination (2002),
  4. 4.
    Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43(2), 225–267 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Chandra, T.D., Hadzilacos, V., Toueg, S.: The Weakest Failure Detector for Solving Consensus. In: Proceedings of the. 11th annual ACM Symposium on Principles of Distributed Computing, pp. 147–158 (1992)Google Scholar
  6. 6.
    Das, A., Gupta, I., Motivala, A.: SWIM: Scalable Weakly-consistent Infectionstyle Process Group Membership Protocol. In: Proceedings. of The International Conference on Dependable Systems and Networks (DSN 2002), Washington DC, June 2002, pp. 303–312 (2002)Google Scholar
  7. 7.
    Demers, Greene, D., Hauser, C., Irish, W., Larson, J.: Epidemic algorithms for replicated database maintenance. In: Proceedings. 6th Annual ACM Symp. Principles of Distributed Computing (PODC 1987), pp. 1–12 (1987)Google Scholar
  8. 8.
    Felber, P., Defago, X., Guerraoui, R., Oser, P.: Failure detectors as first class objects. In: Proceedings. of the 9th IEEE Int’l Symp. on Distributed Objects and Applications(DOA 1999), September 1999, pp. 132–141 (1999)Google Scholar
  9. 9.
    Fetzer, C., Raynal, M., Tronel, F.: An adaptive failure detection protocol. In: Proceedings. of the 8th IEEE Pacific Rim Symp. on Dependable Computing, PRDC-8 (2001)Google Scholar
  10. 10.
    Gupta, I., Chandra, T.D., Goldszmidt, G.: On Scalable and Efficient Distributed Failure Detectors. In: Proceedings of the 20th Symposium on Principles of Distributed Computing (PODC 2001), Newport, RI (August 2001)Google Scholar
  11. 11.
    Lamport, L.: The Part-Time Parliament. ACM Transactions on Computer Systems 16(2), 133–169 (1998)CrossRefGoogle Scholar
  12. 12.
    Lampson, B.W.: How to build a highly available system using consensus. In: Proceedings of 10th Int.Workshop on Distributed Algorithms, Bologna, Italy, October 1996, pp. 9–11 (1996)Google Scholar
  13. 13.
    van Renesse, R., Minsky, Y., Hayden, M.: A gossip-style failure detection service. In: Proceedingc. of Middleware 1998, September 1998, pp. 55–70. IFIP (1998)Google Scholar
  14. 14.
    van Renesse, R., Birman, K., Vogels, W.: Astrolabe, A Robust and Scalable Technology for DIstributed System Monitoring, Management and Data Mining. ACM Tranactions on Computer Systems 21(2), 164–206 (2003)CrossRefGoogle Scholar
  15. 15.
    Stelling, P., Foster, I., Kesselman, C., Lee, C., von Laszewski, G.: A fault detection service for wide area distributed computations. In: Proceedings. of the 7th IEEE Symp. On High Performance Distributed Computing, July 1998, pp. 268–278 (1998)Google Scholar
  16. 16.
    Vogels, W., Dumitriu, D.: An Overview of the Galaxy Management Framework for Scalable Enterprise Cluster Computing. In: The Proceedings of the IEEE International Conference on Cluster Computing: Cluster-2000, Chemnitz, Germany (December 2000)Google Scholar
  17. 17.
    Vogels, W.: Technology Challenges for the Global Real-Time Enterprise. In: The Proceedings of the International Workshop on Future Directions in Distributed Computing, Bertinoro, Italy (June 2002)Google Scholar
  18. 18.
    Vogels, W., van Renesse, R., Birman, K.: Six Misconceptions about Reliable Distributed Computing. In: Proceedings of the 8th ACM SIGOPS European Workshop, Sintra, Portugal (September 1998)Google Scholar
  19. 19.
    Vogels, W.: World-Wide Failures. In: The Proceedings of the 1996 ACM SIGOPS Workshop, Connemora, Ireland (September 1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Werner Vogels
    • 1
  1. 1.Dept. of Computer ScienceCornell UniversityIthacaUSA

Personalised recommendations