Skip to main content

Reduced-precision Algorithm-based Fault Tolerance for FPGA-implemented Accelerators

  • Conference paper
  • First Online:
Applied Reconfigurable Computing (ARC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9625))

Included in the following conference series:

  • 1207 Accesses

Abstract

As the threat of fault susceptibility caused by mechanisms including variation and degradation increases, engineers must give growing consideration to error detection and correction. While the use of common fault tolerance strategies frequently causes the incursion of significant overheads in area, performance and/or power consumption, options exist that buck these trends. In particular, algorithm-based fault tolerance embodies a proven family of low-overhead error mitigation techniques able to be built upon to create self-verifying circuitry.

In this paper, we present our research into the application of algorithm-based fault tolerance (ABFT) in FPGA-implemented accelerators at reduced levels of precision. This allows for the introduction of a previously unexplored tradeoff: sacrificing the observability of faults associated with low-magnitude errors for gains in area, performance and efficiency by reducing the bit-widths of logic used for error detection. We describe the implementation of a novel checksum truncation technique, analysing its effects upon overheads and allowed error. Our findings include that bit-width reduction of ABFT circuitry within a fault-tolerant accelerator used for multiplying pairs of 32 \(\times \) 32 matrices resulted in the reduction of incurred area overhead by 16.7% and recovery of 8.27% of timing model \(f_\text {max}\). These came at the cost of introducing average and maximum absolute output errors of 0.430% and 0.927%, respectively, of the maximum absolute output value under transient fault injection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Braun, C., et al.: A-ABFT: Autonomous Algorithm-based Fault Tolerance for Matrix Multiplications on Graphics Processing Units. In: International Conference on Dependable Systems and Networks (DSN) (2014)

    Google Scholar 

  2. Davis, J.J., et al.: Datapath Fault Tolerance for Parallel Accelerators. In: International Conference on Field-Programmable Technology (FPT) (2013)

    Google Scholar 

  3. Davis, J.J., et al.: Achieving Low-overhead Fault Tolerance for Parallel Accelerators with Dynamic Partial Reconfiguration. In: International Conference on Field-programmable Logic and Applications (FPL) (2014)

    Google Scholar 

  4. Huang, K.H., et al.: Algorithm-based Fault Tolerance for Matrix Operations. IEEE Trans. Comput. C–33(6), 518–528 (1984)

    Article  MATH  Google Scholar 

  5. Jacobs, A., et al.: Overhead and Reliability Analysis of Algorithm-based Fault Tolerance in FPGA systems. In: International Conference on Field Programmable Logic and Applications (FPL) (2012)

    Google Scholar 

  6. Rexford, J., et al.: Algorithm-based Fault Tolerance for Floating-point Operations in Massively Parallel Systems. In: International Symposium on Circuits and Systems (ISCAS), vol. 2 (1992)

    Google Scholar 

  7. Wang, S.J., et al.: Algorithm-based Fault Tolerance for FFT Networks. IEEE Trans. Comput. 43(7), 849–854 (1994)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James J. Davis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Davis, J.J., Cheung, P.Y.K. (2016). Reduced-precision Algorithm-based Fault Tolerance for FPGA-implemented Accelerators. In: Bonato, V., Bouganis, C., Gorgon, M. (eds) Applied Reconfigurable Computing. ARC 2016. Lecture Notes in Computer Science(), vol 9625. Springer, Cham. https://doi.org/10.1007/978-3-319-30481-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30481-6_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30480-9

  • Online ISBN: 978-3-319-30481-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics