Fault Tolerance in Tandem Computer Systems
Tandem builds single-fault-tolerant computer systems. At the hardware level, the system is designed as a loosely coupled multi-processor with fail-fast modules connected via dual paths. It is designed for online diagnosis and maintenance. A range of CPUs may be inter- connected via a hierarchical fault-tolerant local network. A variety of peripherals needed for online transaction processing are attached via dual ported controllers. A novel disc subsystem allows a choice between low cost-per-Mbyte and low cost-per-access. System software provides processes and messages as the basic structuring mechanism. Processes provide software modularity and fault isolation. Process pairs tolerate hardware and transient software failures. Applications are structured as requesting processes making remote procedure calls to server processes. Process server classes utilize multi-processors. The resulting process abstractions provide a distributed system which can utilize thousands of processors. Networking protocols such as SNA, OSI, and a proprietary network are built atop this base. A relational database provides distributed data and distributed transactions. An application generator allows users to develop fault-tolerant applications as though the system were a conventional computer. The resulting system has price/performance competitive with conventional systems.
KeywordsFault Tolerance Server Class Fault Isolation Process Pair Control Store
Unable to display preview. Download preview PDF.
- [Bartlett]Bartlett, J., “A NonStop Kernel,” Proceedings of the Eighth Symposium on Operating System Principles, pp. 22–29, Dec. 1981.Google Scholar
- [Borr 81]Borr, A., “Transaction Monitoring in ENCOMPASS,” Proc. 7Th VLDB, September 1981. Also Tandem Computers TR 81. 2.Google Scholar
- [Borr 84]Borr, A., “Robustness to Crash in a Distributed Database: A Non Shared-Memory Multi-processor Approach,” Proc. 9th VLDB, Sept. 1984. Also Tandem Computers TR 84. 2.Google Scholar
- [Burman]Burman, M. “Aspects of a High Volume Production Online Banking System”, Proc. Int. Workshop on High Performance Transaction Systems, Asilomar, Sept. 1985.Google Scholar
- [Elect]Anon., “Tandem Makes a Good Thing Better”, Electronics, pp. 34–38, April 14, 1986.Google Scholar
- [Gray]Gray, J., “Why Do Computers Stop and What Can We Do About It?”, Tandem Technical Report TR85.7, 1985, Cupertino, CA.Google Scholar
- [Horst 84]Horst, R. and Metz, S., “New System Manages Hundreds of Transactions/Second.” Electronics, pp. 147–151, April 19, 1984. Also Tandem Computers TR 84. 1Google Scholar
- [Horst 85]Horst, R., Chou, T., “The Hardware Architecture and Linear Expansion of Tandem NonStop Systems” Proceedings of 12th International Symposium on Computer Architecture, June 1985. or Tandem Technical Report 85. 3Google Scholar
- [Lamport]Lamport, L., Shostak, R., Pease, M., “Then Byzantine Generals Problem”, ACM Transactions on Programming Languages and Systems, Vol. 4, No. 3, July 1982.Google Scholar
- [Mourad]Mourad, S. and Andrews, D., “The Reliability of the IBM/XA Operating System”, Digest of 15th Annual Int. Sym. on Fault-Tolerant Computing, June 1985. IEEE Computer Society Press.Google Scholar
- [Tandem]Introduction to Tandem Computer Systems, Tandem Part No. 82503, March 1985, Cupertino, CA. “System Description Manual”, Tandem Part No. 82507, Cupertino, CA. “Expand(tm) Reference Manual” Tandem Part Nö. 82370, Cupertino, CA. 76 “Introduction to Pathway”, Tandem Computers Inc., Part No: 82339 - A00, Cupertino, CA.Google Scholar