Reduced-precision Algorithm-based Fault Tolerance for FPGA-implemented Accelerators

Davis, James J.; Cheung, Peter Y. K.

doi:10.1007/978-3-319-30481-6_31

James J. Davis¹⁶ &
Peter Y. K. Cheung¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9625))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

1207 Accesses

Abstract

As the threat of fault susceptibility caused by mechanisms including variation and degradation increases, engineers must give growing consideration to error detection and correction. While the use of common fault tolerance strategies frequently causes the incursion of significant overheads in area, performance and/or power consumption, options exist that buck these trends. In particular, algorithm-based fault tolerance embodies a proven family of low-overhead error mitigation techniques able to be built upon to create self-verifying circuitry.

In this paper, we present our research into the application of algorithm-based fault tolerance (ABFT) in FPGA-implemented accelerators at reduced levels of precision. This allows for the introduction of a previously unexplored tradeoff: sacrificing the observability of faults associated with low-magnitude errors for gains in area, performance and efficiency by reducing the bit-widths of logic used for error detection. We describe the implementation of a novel checksum truncation technique, analysing its effects upon overheads and allowed error. Our findings include that bit-width reduction of ABFT circuitry within a fault-tolerant accelerator used for multiplying pairs of 32 \(\times \) 32 matrices resulted in the reduction of incurred area overhead by 16.7% and recovery of 8.27% of timing model \(f_\text {max}\). These came at the cost of introducing average and maximum absolute output errors of 0.430% and 0.927%, respectively, of the maximum absolute output value under transient fault injection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Braun, C., et al.: A-ABFT: Autonomous Algorithm-based Fault Tolerance for Matrix Multiplications on Graphics Processing Units. In: International Conference on Dependable Systems and Networks (DSN) (2014)
Google Scholar
Davis, J.J., et al.: Datapath Fault Tolerance for Parallel Accelerators. In: International Conference on Field-Programmable Technology (FPT) (2013)
Google Scholar
Davis, J.J., et al.: Achieving Low-overhead Fault Tolerance for Parallel Accelerators with Dynamic Partial Reconfiguration. In: International Conference on Field-programmable Logic and Applications (FPL) (2014)
Google Scholar
Huang, K.H., et al.: Algorithm-based Fault Tolerance for Matrix Operations. IEEE Trans. Comput. C–33(6), 518–528 (1984)
Article MATH Google Scholar
Jacobs, A., et al.: Overhead and Reliability Analysis of Algorithm-based Fault Tolerance in FPGA systems. In: International Conference on Field Programmable Logic and Applications (FPL) (2012)
Google Scholar
Rexford, J., et al.: Algorithm-based Fault Tolerance for Floating-point Operations in Massively Parallel Systems. In: International Symposium on Circuits and Systems (ISCAS), vol. 2 (1992)
Google Scholar
Wang, S.J., et al.: Algorithm-based Fault Tolerance for FFT Networks. IEEE Trans. Comput. 43(7), 849–854 (1994)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Imperial College London, London, SW7 2AZ, UK
James J. Davis & Peter Y. K. Cheung

Authors

James J. Davis
View author publications
You can also search for this author in PubMed Google Scholar
Peter Y. K. Cheung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James J. Davis .

Editor information

Editors and Affiliations

University of Sao Paulo, São Carlos, Brazil
Vanderlei Bonato
Imperial College London, London, United Kingdom
Christos Bouganis
AGH University of Science and Technology, Krakow, Poland
Marek Gorgon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Davis, J.J., Cheung, P.Y.K. (2016). Reduced-precision Algorithm-based Fault Tolerance for FPGA-implemented Accelerators. In: Bonato, V., Bouganis, C., Gorgon, M. (eds) Applied Reconfigurable Computing. ARC 2016. Lecture Notes in Computer Science(), vol 9625. Springer, Cham. https://doi.org/10.1007/978-3-319-30481-6_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-30481-6_31
Published: 13 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30480-9
Online ISBN: 978-3-319-30481-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics