Runtime Verification with Patching and Hardware Checkers
With this chapter we begin our discussion of the functional correctness solutions that can be pursued past the release of a new microprocessor design, when the device is already shipped and installed in an end-customer’s system. Error detection at this stage of the processor’s life cycle entails monitoring its behavior dynamically, by observing the internal state of the device with dedicated hardware components residing on the silicon die. In addition to error detection, runtime validation solutions, also called in-the-field solutions, must include an effective recovery and error bypass algorithm, to ensure minimal performance loss and forward progress of the system even in presence of bugs. To make the case for dynamic validation in this chapter, we first discuss the type, criticality and number of escaped design errors reported in several processors products.We then overview two major classes of runtime solutions: checker-based approaches and patching-based ones: themain difference between these two techniques lies in the underlying error detection mechanism. Checker-based solutions focus on verifying high-level system invariants, usually specified at design-time and then mapped to dedicated hardware components. Patching techniques address bugs of which the manufacturer becomes aware after product release and provide programmablemeans to describe these bugs so that the system can later identify their occurrence at runtime. We then contrast these two frameworks in terms of error coverage, usage flow and performance overhead, and present in detail some of the most popular academic and industrial solutions known today for each of the two classes.
KeywordsControl Logic Soft Error Transient Fault Virtual Machine Monitor Main Core
Unable to display preview. Download preview PDF.
- [AMD05]Advanced Micro Devices, Inc. Revision Guide for AMD AthlonTM64 and AMD OpteronTMProcessors, August 2005. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25759.pdf.
- [Aus00]Todd Austin. DIVA: A dynamic approach to microprocessor verification. Journal of Instruction-Level Parallelism, 2:1–26, May 2000.Google Scholar
- [BDF+03]Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization. In SOSP, Proceedings of the ACM Symposium on Operating Systems Principles, pages 164–177, October 2003.Google Scholar
- [BHS99]Eric B. Brett, David P. Hunter, and Sharon L. Smith. Moving Atom toWindows NT for Alpha. Compaq DIGITAL Technical Journal, 10(2):1–26, January 1999.Google Scholar
- [BP80]Henry Baker and Clinton Parker. High level language programs run ten times faster in microstore. ACM SIGMICRO Newsletter, 11(3-4):171–177, 1980.Google Scholar
- [Car90]Adrian Carbine. U.S. Patent no. 5253255: Scan mechanism for monitoring the state of internal signals of a VLSI microprocessor chip. Intel Corporation, November 1990.Google Scholar
- [CMA08]Kypros Constantinides, OnurMutlu, and Todd Austin. Online design bug detection: RTL analysis, flexible mechanisms, and evaluation. In MICRO, Proceedings of the International Symposium on Microarchitecture, pages 282–293, November 2008.Google Scholar
- [CMH00]David Van Campenhout, Trevor Mudge, and John P. Hayes. Collection and analysis of microprocessor design errors. IEEE Design and Test of Computers, 17(4):51–60, 2000.Google Scholar
- [DBB07]Andrew DeOrio, Adam Bauserman, and Valeria Bertacco. Chico: An on-chip hardware checker for pipeline control logic. In MTV, Proceedings of the International Workshop on Microprocessor Test and Verification, pages 91–97, December 2007.Google Scholar
- [DBB08]Andrew DeOrio, Adam Bauserman, and Valeria Bertacco. Post-silicon verification for cache coherence. In ICCD, Proceedings of the International Conference on Computer Design, pages 348–355, October 2008.Google Scholar
- [DDJ06]DDJ Microprocessor Center, 2006. http://www.x86.org/.
- [GC95]Michael D. Goddard and David S. Christie. U.S. Patent no. 5796974: Microcode patching apparatus and method. Advanced Micro Devices, Inc., November 1995.Google Scholar
- [IBM05]International Business Machines Corporation. IBM PowerPC 750GX and 750GL RISC Microprocessor Errata Notice, July 2005. http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/CC0918EAD88CB09E87256F5C006FF9E6.
- [Int00]Intel Corporation. IntelR StrongARMR SA-1100 Microprocessor Specification Up-date, February 2000.Google Scholar
- [Int02a]Intel Corporation. IntelR CeleronR Processor Specification Update, September 2002. http://developer.intel.com/design/celeron/specupdt/24374847.pdf.
- [Int02b]Intel Corporation. IntelR PentiumR II Processor Invalid Instruction Erratum Overview, July 2002. http://developer.intel.com/design/pentiumii/specupdt/24333749.pdf.
- [Int04]Intel Corporation. IntelR PentiumR Processor Invalid Instruction Erratum Overview, July 2004. http://www.intel.com/support/processors/pentium/sb/cs-013151.htm.
- [Int05]Intel Corporation. IntelR PentiumR III Processor Specification Update, May 2005. http://download.intel.com/design/PentiumIII/specupdt/24445355.pdf.
- [Kon09]Dusko Koncaliev. Bugs in the Intel microprocessors, 2009. http://www.cs.earlham.edu/˜dusko/cs63/.
- [MBS08]Albert Meixner, Michael E. Bauer, and Daniel J. Sorin. Argus: Low-cost, comprehensive error detection in simple cores. IEEE Micro, 28(1):52–59, 2008.Google Scholar
- [McG07]Harlan McGhan. The gHost in the machine. Microprocessor Report, 21(3):11–33, March 2007.Google Scholar
- [Mic09]Microsoft Corporation. /QIfdiv (Enable PentiumR FDIV Fix), 2009. http://msdn2.microsoft.com/en-us/library/ms856573.aspx.
- [MP99]Kevin J. McGrath and James K. Pickett. U.S. Patent no. 6438664: Microcode patch device and method for patching microcode using match registers and patch routines. Advanced Micro Devices, Inc., October 1999.Google Scholar
- [MS06]Albert Meixner and Daniel J. Sorin. Dynamic verification of memory consistency in cache-coherent multithreaded computer architectures. In DSN, Proceedings of the International Conference on Dependable Systems and Networks, pages 73–82, June 2006.Google Scholar
- [MS07]Albert Meixner and Daniel J. Sorin. Error detection using dynamic dataflow verification. In IPACT, Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 104–118, September 2007.Google Scholar
- [NMGT06]Jun Nakano, Pablo Montesinos, Kourosh Gharachorloo, and Josep Torrellas. Re-ViveI/O: efficient handling of I/O in highly-available rollback-recovery servers. In International Symposium on High-Performance Computer Architecture, pages 203–214, February 2006.Google Scholar
- [PG74]Gerald J. Popek and Robert P. Goldberg. Formal requirements for virtualizable third generation architectures. Communications of the ACM, 17(7):412–421, 1974.Google Scholar
- [SE04]Amitabh Srivastava and Alan Eustace. ATOM: A system for building customized program analysis tools. ACM SIGPLAN Notices, 39(4):528–539, April 2004.Google Scholar
- [SMHW02]Daniel J. Sorin, Milo M.K. Martin, Mark D. Hill, and David A. Wood. SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery. In ISCA, Proceedings of the International Symposium on Computer Architecture, pages 123–134, 2002.Google Scholar
- [SNC+07]Smruti Sarangi, Satish Narayanasamy, Bruce Carneal, Abhishek Tiwari, Brad Calder, and Josep Torrellas. Patching processor design errors with programmable hardware. IEEE Micro, 27(1):12–25, 2007.Google Scholar
- [STT06]Smruti Sarangi, Abhishek Tiwari, and Josep Torrellas. Phoenix: Detecting and recovering from permanent processor design bugs with programmable hardware. In MICRO, Proceedings of the International Symposium on Microarchitecture, pages 26–37, December 2006.Google Scholar
- [WB07]Ilya Wagner and Valeria Bertacco. Engineering trust with semantic guardians. In DATE, Proceedings of Design, Automation and Test in Europe Conference, pages 743–748, April 2007.Google Scholar
- [WB09]Ilya Wagner and Valeria Bertacco. Caspar: Hardware patching for multi-core processors. In DATE, Proceedings of Design, Automation and Test in Europe Conference, pages 658–663, April 2009.Google Scholar
- [WBA08]IlyaWagner, Valeria Bertacco, and Todd Austin. Using field-repairable control logic to correct design errors in microprocessors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(2):380–393, 2008.Google Scholar