Advertisement

Diagnostic Model and Diagnosis Algorithm of a SIMD Computer

  • Stefano Chessa
  • Baláazs Sallay
  • Piero Maestrini
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1667)

Abstract

Self-diagnosis of systems comprising large numbers of processors has been studied extensively in the literature. The APEmille SIMD machine, a project of the National Institute of Nuclear Physics (INFN) of Italy, was offered as a test bed for a self-diagnosis strategy based on a comparison model.

Because of the general machine architecture and some design constraints, the standard assumptions of the existing diagnosis models are not completely fulfilled by the diagnosis support built in APEmille. This circumstance led to the development of a specific diagnostic model derived from the PMC and comparison models. The new model introduces the concept of direction-related and direction-independent faults. The consistency of this model with the APEmille architecture is discussed, and possible fault scenarios which are particularly critical for the correctness of the diagnosis are examined. It is shown that the limited hardware redundancy, extended with simple functional tests, is sufficient for obtaining valid diagnosis with the presented model.

Keywords

Diagnostic Model Diagnosis Algorithm Mean Time Between Failure Fault Scenario Instruction Decode 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Preparata, F., P, Metze, G., and Chien, R., T., “On the Connection Assignment Problem of Diagnosable Systems”. IEEE Transactions on Computers, Vol. EC-16, No. 12, pp. 848–854, December 1967.Google Scholar
  2. 2.
    Barsi, F., Grandoni, F., and Maestrini, P., “A Theory of Diagnosability of Digital Systems”. IEEE Transactions on Computers, Vol. C-25, No. 6, pp. 585–593, June 1976.CrossRefMathSciNetGoogle Scholar
  3. 3.
    Malek, M., “A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems”, Proceedings of the 10th Symposium on Computer Architecture, pp. 31–35, May 1980.Google Scholar
  4. 4.
    Rangarajan, S., Fussel, D., and Malek, M., Built-in Testing of Integrated Circuit Wafers, IEEE Transactions on Computers, Vol. 39, No. 2, pp. 195–205, February 1990.CrossRefGoogle Scholar
  5. 5.
    Chessa, S., and Maestrini, M., “Self-Test of integrated CircuitWafers”, Proceedings of European Test Workshop, Sete, France, June 1996, pp.54–58.Google Scholar
  6. 6.
    Sallay, B., Maestrini, P., and Santi, P., “A Comparison-Based Diagnosis Algorithm Tolerating Comparator Faults”, to appear in IEE Proceedings on Computers and Digital Techniques.Google Scholar
  7. 7.
    Barborak, M., Malek, M., and Dahbura, A., T., “The Consensus Problem in Fault-Tolerant Computing”, ACM Computing Surveys, Vol. 25, No. 2, pp. 171–220, June 1993.CrossRefGoogle Scholar
  8. 8.
    Tripiccione, R., “Ape100 and beyond”, International Journal on Modern Physics, sec.C vol.4, 1993, pp.13–23.Google Scholar
  9. 9.
    Bartoloni, A., Battista, C., Cabasino, S., Cosimi, M., D’Agostini, U., Marzano, F., Panizzi, E., Paolucci, P.S., Rapuano, F., Rinaldi, W., Sarno, R., Todesco, G.M., Torelli, M., Vicini, P., Cabibbo, N., Fucci, A. and Tripiccione, R., “APEmille: a Parallel Processor in the Teraflops Range”, INFN report, March 1995.Google Scholar
  10. 10.
    Shigemitsu, J., “Lattice Gauge Theory: A Status Report”, Proceedings of the XXVII International Conference on High Energy Physics (edited by P. J. Bussey and I. G. Knowles). Institute of Physics Publishing, 1995, pp. 135–156.Google Scholar
  11. 11.
    Aglietti, F., Centurioni, E, Chessa, S., D’Auria, I., Franzinelli, F, Maestrini, P., Michelotti, A., Pagliai, I., and Tripiccione, R., “Self-Diagnosis of APEmille”, Proceedings of EDCC-2 Conference on Dependable Computing, Gliwice, Poland, May 1996, pp. 73–84.Google Scholar
  12. 12.
    Somani, A., K. and Agarwal, V., K., “Distributed Diagnosis Algorithm for Regular Interconnected Systems”, IEEE Transactions on Parallel and Distributed Systems, Vol. 41, No. 7, pp. 899–906, July 1992.MathSciNetGoogle Scholar
  13. 13.
    LaForge, L., E., Huang, K., and Agarwal, V., K., “Almost Sure Diagnosis of Almost Every Good Element”, IEEE Transactions on Computers, Vol. 43, No. 3, pp. 295–305, March 1994.CrossRefGoogle Scholar
  14. 14.
    Huang, K., Agarwal, V.K., LaForge, L., and Thulasiraman, K., “A Diagnosis Algorithm for Constant Degree Structures and Its Application to VLSI Circuit Testing”, IEEE Transactions on Parallel and Distributed Systems, Vol. 44 No. 4, pp. 363–372, April 1995.CrossRefGoogle Scholar
  15. 15.
    Maestrini, P. and Santi, P., “Self-Diagnosis of Processor Arrays Using a Comparison Model”, Proceedings of the 14th SRDS-Symposium on Reliable and Distributed Systems, Bad Neuenahr, Germany, September 1995, pp. 218–228.Google Scholar
  16. 16.
    Chessa, S., Self-Diagnosis of Grid Interconnected Systems, with Application to Self-Test of VLSI Wafers, PhD Thesis, Dipartimento di Informatica, Universit_a di Pisa, January 1999.Google Scholar
  17. 17.
    Peterson, W. W. and Weldon, E. J., Error Correcting Codes, Boston, MIT Press, 1972.zbMATHGoogle Scholar
  18. 18.
    Siewiorek, D. P. and Swarz, R. S., The Theory and Practice of Reliable System Design, Bedford, MS, Digital Press, 1982.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Stefano Chessa
    • 1
    • 2
  • Baláazs Sallay
    • 1
  • Piero Maestrini
    • 1
  1. 1.Istituto di Elaborazione dell’Informazione del CNRPisaItaly
  2. 2.Dipartimento di MatematicaUniversity of TrentoItaly

Personalised recommendations