Diagnostic Model and Diagnosis Algorithm of a SIMD Computer
Self-diagnosis of systems comprising large numbers of processors has been studied extensively in the literature. The APEmille SIMD machine, a project of the National Institute of Nuclear Physics (INFN) of Italy, was offered as a test bed for a self-diagnosis strategy based on a comparison model.
Because of the general machine architecture and some design constraints, the standard assumptions of the existing diagnosis models are not completely fulfilled by the diagnosis support built in APEmille. This circumstance led to the development of a specific diagnostic model derived from the PMC and comparison models. The new model introduces the concept of direction-related and direction-independent faults. The consistency of this model with the APEmille architecture is discussed, and possible fault scenarios which are particularly critical for the correctness of the diagnosis are examined. It is shown that the limited hardware redundancy, extended with simple functional tests, is sufficient for obtaining valid diagnosis with the presented model.
KeywordsDiagnostic Model Diagnosis Algorithm Mean Time Between Failure Fault Scenario Instruction Decode
Unable to display preview. Download preview PDF.
- 1.Preparata, F., P, Metze, G., and Chien, R., T., “On the Connection Assignment Problem of Diagnosable Systems”. IEEE Transactions on Computers, Vol. EC-16, No. 12, pp. 848–854, December 1967.Google Scholar
- 3.Malek, M., “A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems”, Proceedings of the 10th Symposium on Computer Architecture, pp. 31–35, May 1980.Google Scholar
- 5.Chessa, S., and Maestrini, M., “Self-Test of integrated CircuitWafers”, Proceedings of European Test Workshop, Sete, France, June 1996, pp.54–58.Google Scholar
- 6.Sallay, B., Maestrini, P., and Santi, P., “A Comparison-Based Diagnosis Algorithm Tolerating Comparator Faults”, to appear in IEE Proceedings on Computers and Digital Techniques.Google Scholar
- 8.Tripiccione, R., “Ape100 and beyond”, International Journal on Modern Physics, sec.C vol.4, 1993, pp.13–23.Google Scholar
- 9.Bartoloni, A., Battista, C., Cabasino, S., Cosimi, M., D’Agostini, U., Marzano, F., Panizzi, E., Paolucci, P.S., Rapuano, F., Rinaldi, W., Sarno, R., Todesco, G.M., Torelli, M., Vicini, P., Cabibbo, N., Fucci, A. and Tripiccione, R., “APEmille: a Parallel Processor in the Teraflops Range”, INFN report, March 1995.Google Scholar
- 10.Shigemitsu, J., “Lattice Gauge Theory: A Status Report”, Proceedings of the XXVII International Conference on High Energy Physics (edited by P. J. Bussey and I. G. Knowles). Institute of Physics Publishing, 1995, pp. 135–156.Google Scholar
- 11.Aglietti, F., Centurioni, E, Chessa, S., D’Auria, I., Franzinelli, F, Maestrini, P., Michelotti, A., Pagliai, I., and Tripiccione, R., “Self-Diagnosis of APEmille”, Proceedings of EDCC-2 Conference on Dependable Computing, Gliwice, Poland, May 1996, pp. 73–84.Google Scholar
- 15.Maestrini, P. and Santi, P., “Self-Diagnosis of Processor Arrays Using a Comparison Model”, Proceedings of the 14th SRDS-Symposium on Reliable and Distributed Systems, Bad Neuenahr, Germany, September 1995, pp. 218–228.Google Scholar
- 16.Chessa, S., Self-Diagnosis of Grid Interconnected Systems, with Application to Self-Test of VLSI Wafers, PhD Thesis, Dipartimento di Informatica, Universit_a di Pisa, January 1999.Google Scholar
- 18.Siewiorek, D. P. and Swarz, R. S., The Theory and Practice of Reliable System Design, Bedford, MS, Digital Press, 1982.Google Scholar