Practical Partial Hardware Reverse Engineering Analysis
- 99 Downloads
Reverse engineering typically requires expensive equipment, skilled technicians, time, a cross section of the component to be sliced out and a dedicated reconstruction software. In this paper, we present a low-cost alternative, combining fast frontside sample preparation, electron microscopy imaging, error-free standard cell recognition and within and between-die standard cell statistical analysis (SCSA). Step-by-step, we depict the process to access the transistor’s drain/source area, to acquire the full area of a single chip layer, to adapt pattern recognition for standard cells and to analyze the standard cell width, local/global location and occurrences number. The inner workings of each step are accompanied by results on 45–65-nm FCBGA devices enabling to locate specific areas (e.g. registers, hardware accelerator). We particularly point out the importance of such design information extraction for local fault injection and hardware assurance. The primary goal is to analyze how much design information of a complex integrated circuit can be retrieved with minimal costs and without outsourcing.
KeywordsStandard cell Partial reverse engineering Pattern recognition Statistical analysis Countermeasures
To counteract the difficulty of the standard reverse engineering process, we propose to retrieve sensitive information of a component (e.g. registers location and hardware accelerator) by only analyzing where the transistors’ drain/source are located. Having such information is enough to reduce the area of interest for a subsequent localized attack (e.g. electromagnetic or laser attack); check the authenticity of the circuit (e.g. hardware trojan detection); or understand the underlying hardware layer after a side-channel technique such as photon-emission analysis.
Our goal is not to reverse engineer a complete chip but instead to gain partial design information for particular purposes. They lie in the area of malicious circuit modification detection but also in combined attacks where such technique would decrease the number of samples and attack time needed. Thus, a complete attack could be applied thanks to some extracted spatial information combined with standard side-channel (e.g. power) extracted temporal information. Also, extracted spatial information can be analyzed once chip sub-functions have been roughly localized with a more global technique.
Most of the drawbacks of hardware reverse engineering disappear (cost, time, manual corrective action), and we retrieve the standard cells function or a specific group of standard cells by location (absolute and cell-to-cell), occurrences number, and width/shape analysis, which we refer to as standard cell statistical analysis (SCSA). The methodology herein is depicted from sample acquisition to a few recognition examples.
Utilizing such an approach, locating specific cells can be done regardless of the device technology node and package. For instance, it can reduce attack rating for the identification and exploitation phase and can be used in conjunction with laser fault attacks  to bypass security mechanisms . Multi-spot (bypass/fault capability) and high-power (through the substrate capability) platforms are commercially available, increasing security threat (redundancy and software check can be defeated). While technology node approaches 7 nm, the size of the implemented transistor/single standard cell is larger, and laser energy (pulse duration/power) can be reduced enough to only perturb a single standard cell below the peak of the Gaussian shape beam.
In the past, Nohl  reversed a ciphering circuit made of 400 NAND gate equivalent (GE) from optical images using normalized cross-correlation. Also, Courbon  retrieved the location of a single type of standard cell (a flip-flop cell) on a 0.5-mm2 area device manufactured in a 130-nm process. To the best of our knowledge, we are the first to develop, and explain step by step, a low-cost full area (single layer) standard cells extraction methodology on a 45-nm device (Mgates), while analyzing IC design requirements, methodology limits and countermeasures. The aforementioned methodology takes its sample preparation roots in the failure analysis world, its image processing roots in the cell (biology) analysis world.
The paper is organized as follows: we start by talking about IC design and geometries, before introducing the multi-field steps to localize standard cells in Section 2. Then, we introduce the device under investigation in Section 3, and put into practice the methodology in Section 4. Finally, we investigate partial reverse engineering applications in Section 5 and present ways on how to extend this work in Section 6.
2 From Integrated Circuit Design to Standard Cell Physical Extraction
2.1 IC Design
An IC designer uses a certain number of off-the-shelf macros (IP royalty fees apply) combined with a certain number of standard cells (from a chosen process design kit (PDK)), a ratio that primarily depends on project cost and design (i.e. timing) constraints. For instance, ARM cores hard macros are widely present at the moment in embedded devices, such as mobile phones and smart cards. There are similarities between products, as standard cells and hard macros are re-used across a large variety of devices. Herein, we analyze standard cells and hard macros XY localizations. In the era of specialization (i.e. dedicated ASIC for machine learning/server) and open-source hardware (based on RISC V instruction set architecture (ISA)), investigating hardware implementation is paramount.
2.2 IC Geometries
Integrated circuit area (length and width expressed in mm) is wider compared to the thickness of each metal layer (few hundreds nm), hence the planarity problem when delayering. Adding to the high density of transistors per mm2, this leads to long imaging time. The smallest feature (for not advanced process) is generally the transistor gate width, corresponding to the technology node. A transistor controls how much current flows through from source to drain, depending on the voltage applied on the gate. Such capability is used to obtain various Boolean functions (or to create a current amplifier). Drain and source are created by local doping (boron, phosphorus) of the semiconductor substrate which is silicon based. From bottom to top, following the substrate, we find poly-silicon that forms the transistors’ gates (separated by a dielectric Si02 down to 32 nm, then replaced by Hafnium-based (higher permittivity) dielectric). Typically, a first metal layer is then used to interconnect transistors, thus forming standard cells (NAND, OR, FLIP-FLOP). Then, non-basic functions, such as a 32-bit counter, are formed by interconnecting multiple standard cells together, while power/clock are routed in top metal layers. Metal layers are separated by a dielectric (SiO2 (glass)), and vias allow vertical connections between the subsequent layers.
2.3 Sample Preparation
ICs running secure applications come in various formats—smart card, system-on-chip (SoC), package-on-package (PoP) (the die thickness being 130 μm for smart cards and PoPs due to fitting requirements). However, we reckon that it is possible to extract the die of any circuit at almost no cost: a combination of sharp cutting tools, acids (i.e. HNO3), hot plates and protection equipments . Once the die is extracted, it is possible to easily reach the transistors’ active region using HF acid. This has very interesting features in terms of cost, full area application, speed and required skills, while the technique allows several samples to be prepared at once. There is no need of cross-sectioning, and the technique is independent of the technology node. In this paper, we show how easily one can reach such layer of a circuit, manufactured with a 45-nm process and packaged in a flip-chip ball grid array (FCBGA).
2.4 Sample Imaging
Scanning electron microscopy (SEM) is a standard for imaging deep sub-micron integrated circuits as optical microscopy has a smaller depth of focus and is limited by light diffraction (coating techniques can limit the impact of the latter but requires an extra step and thus variable). Detector type, aperture size, probe current, accelerating voltage, magnification, scanning speed and image resolution can be easily tuned. Despite being less prompt to contrast changes compared to optical microscopy, it is worth ensuring that the prepared integrated circuit remains as flat as possible after attaching it with carbon tape, given the large area to be acquired. The SEM only gives a grayscale intensity for each pixel (a certain secondary or backscattered electrons detector count), and the image is thus saved in a single-channel format (saving memory space). There are many parameters to set (mainly accelerating voltage, probe current and time per pixel), impacting acquisition time and signal-to-noise ratio. Here, we particularly point out practical features and considerations, pros and cons of SEM imaging with respect to our application.
2.5 Images Alignment
Newer SEMs include proprietary tools (e.g. ZEISS ATLAS, FEI MAPS) dedicated to large-area acquisition; it is thus easy to scan a specified area with a specific magnification, image rotation, time per pixel (dwell) and image overlap and then have the tool performing the alignment task. Another option (if a SEM without large-area acquisition dedicated software is used) is to directly use SEM APIs to write an acquisition recipe and use offline tools for alignment. Herein, we demonstrate the use of an offline artefact-free alignment tool.
2.6 Pattern Recognition
There has been an attempt to automate or semi-automate integrated circuit reverse engineering in the open-source community, Degate . This software is quite interesting as the user can load images and directly process them. However, we found some limitations in terms of pattern recognition rate, timing performance, adjusting grid lines or loading large images. While we also implement a normalized cross-correlation function  as a kernel to recognize patterns, we specifically create a lighter custom tool dedicated to single-layer analysis, fast and robust with respect to possible SEM images (sample preparation and foundry). We propose an algorithm taking into account the possible artefacts arising from previous methodology steps. A single missing pattern could ruin our statistics, and therefore we ensure that no false recognition is obtained with standard pattern recognition algorithms. We are thus able to automatically collect labelled data (error-free) and create a standard cell (single layer) library. This library can be used as it is or be the starting point for multi-sample analysis using machine learning techniques to speed up analysis.
2.7 Statistical Analysis
At the layer of interest, various repetitive shapes are visualized. They correspond to basic functions such as INV, AND, OR, MUX, DEC, half adder, DFF and latch. Having only drain and source remaining on our images, we cannot directly retrieve the function of a standard cell (as poly and M1 layers are missing). Whatever the device type, the number of these base functions is very low (few tens only). Additionally, base functions are split  into different instances as the number of inputs, the presence of signal such as reset/clock, the drive strength and different voltage domains (for a SoC) differ. Those instances are each optimally designed depending on speed, power, area requirements and foundry capabilities. The chip designer uses such instances from the design kit to implement all his/her functions (or directly use other IPs), resulting in a chip with about 200kGE (gate equivalent) for a smart card digital logic, versus a SoC with several tens/hundreds millions standard cell occurrences for the logic only. In this paper, the goal is to give a first approach on recognizing cells based on absolute/relative location, number of occurrences, width and shape of pattern within a single chip and between chips.
3 Device Under Investigation
Silicon substrate (650–850 μm)
Doped areas (transistors’ drain and source)
Poly-Silicon (transistors’ gate)
Stack of 7 + metal layers and dielectrics (ascending about 0.2 to 0.9 μm)
Passivation: Si3N4/SiO2/Si3N4 (0.6/0.1/0.6 μm)
Polyimide (5 μm)
4 Step by Step Practical Implementation
4.1 Frontside Sample Preparation
If this layer is satisfactory for your reverse engineering application (chip identification, integrity verification), a quick manual polishing (not done in either Fig. 1 images) removes copper residues, while backscattered electrons SEM imaging prevents the visualization of surface scratches (SEM image in Fig. 1).
To sum up the whole sample preparation process, its main benefits are its speed (less than 40 min), cost per sample (few $), whole sample surface application (about 100 mm2), technology node independence (45-65-90-130 nm in this paper), effectiveness (100% success rate) and accessibility (no required skills).
4.2 Frontside Image Acquisition
Regarding the sub-polyimide surface, imaging layer ICs’ features are quite large at the top metal layers. However, using an optical microscope requires a nicely polished surface. Also, the lack of imaging depth of field is problematic for large areas. In fact, SEM remains the most interesting tool for direct imaging (without required signal processing), and this layer needs far less scans due to the top layer geometries. Unless a shield is present, the top metal layer can thus be directly imaged.
Astigmatism can be set at the centre of the device.
Three focus points (for interpolation) can be taken at three chip sides.
Contrast/luminosity is a tricky parameter, different secondary electrons re-emission rates (no coating, not uniform in SEM chamber) can be problematic.
Also, using a multi-beam SEM (up to 91 simultaneous beams) would have decreased the acquisition time to less than 10 min. We used a proprietary SEM manufacturer software (additional) to acquire the full area that added a 10% overlay between each image. It individually saves images, but also provides a globally aligned image.
4.3 Image Alignment
As images are also individually saved, we thus move to an offline alternative for alignment. The same set of images has been aligned with this second approach (example image with bottom image on Fig. 4). We are able to align all images together, making compatible large image acquisition and pattern recognition. It only takes several minutes and is completely automated (matrix dimension detection, overlap calculation). Image alignment is still an area of research (mainly for speed concerns) but 2D image alignment problematics have been resolved time ago in other fields such as biology where electron microscopy is also used or standard optical acquisition.
4.4 Image Processing
4.4.1 Standard Cell Statistical Analysis Flow
Pattern recognition is then performed on obtained images. The former is specifically tuned for our task. After automatically checking for preparation/imaging artefacts, patterns are found on the chip along power lines and ranked per size. Standard correlation techniques with multiple iterations loops (with decreasing correlation coefficient) are used to avoid false detection. Information about pattern location, size and occurrences are saved. Then, co-location information combined with computer architecture and technology/tool-specific constraints allow making hypothesis on the retrieved standard cells. The main aim is to ensure that no false positives are obtained with the tool allowing on one side to have non-false positive for statistical purposes but also to obtain a dictionary of error-free patterns.
4.4.2 Enhanced Pattern Recognition Robustness via Artefacts Correction
If any tungsten remains on the surface, it will be adjacent to a NMOS/PMOS area, and therefore only affects the background of the image. Such artefact can thus be easily spotted (based on edge detection).
Large stains can be present on a circuit (non-cleanroom environment) but can be detected as nothing should be located over the substrate polarization contact (or, in other words, no crossing element between two transistors of the same type).
- Part of a shape can be missing (over etching, as seen in Fig. 5); therefore, the tool checks the presence of NMOS and PMOS components (we cannot have one without the other). If missing, an analysis of the specified area is performed and some filtering enables the retrieval of the original missing shape (as would still let a trace in the silicon).
4.4.3 Enhanced Statistical Analysis via Design Rule
A small pattern can be part of a larger pattern. One approach is to recognize larger patterns first.
Patterns are present along power rails; therefore, possible rotations of the pattern are limited. For instance, the PMOS side (usually larger than NMOS) will be located on the positive rail side. Also, the highest correlation points will only be located at the same extremity of the patterns.
The size of the complete layer has quite a large print, e.g. for this 10 × 10 mm die results in a 22.7-GB image (even if grayscale encoded only on 8bits (1byte)). We need a clever manipulation of the image (RAM constraint).
The logic only can be acquired (or another part can be acquired with less resolution; SEMs do not provide this function yet).
While substrate polarization contacts may not be present in all circuits, background can always be retrieved by analyzing intensity values across the pattern height. For instance, a pattern is found at a location if the intensity (gradient) is not continuous (a change of intensity is found between NMOS drain/source and Si and then between Si and PMOS drain/source).
Figure 5 shows a typical case where a standard cell with a different current drive strength (fan-out) (compared to the selected standard cell) has not been recognized. There are also two standard cells with a partly missing transistor side that are recognized. We expect this behaviour with the aforementioned parameters. We want to be independent of possible within-cells imaging fluctuations or missing substrate polarization contacts.
4.5 Single-Chip Information Extraction
The area analysis (e.g. a flip-flop is usually the largest element)
The number of transistors (e.g. a NAND cell has 4 transistors)
Localization (e.g. a group of gates next to the memory could be used for deciphering)
Co-localization (e.g. two groups of cells often linked)
Occurrences (e.g. 32 spatially close occurrences for a specific 32-bit register or counter (analyzing the shape too), or 64 spatially close occurrences for a XOR-based ciphering circuit)
Global number of occurrences in the chip
Subsequently, an area with possible XOR gates, large quantities of possible NAND gates and possible DFF gates may be the hint for a crypto coprocessor location (e.g. DES). The presence of ‘rare’ occurrences standard cell in a limited area may indicate the presence of a crypto coprocessor too. Post pattern recognition, it is possible to display occurrences of pattern that appear everywhere before moving to an empty area to vizualize recognized pattern.
For some circuits such as the processor under investigation, motherboard manufacturers require information on the processor; a datasheet is thus made public. Using the latter, one can thus assume a certain number of 8-/16-/32-bit registers or a certain function being in a certain area (based on registers description and ballout definition respectively) or a certain number of expected core registers or specific function registers (each FF will be next to the other, timing constraint) present in a certain voltage domain (possible thick pwell). For some circuits, it is a complete black box approach despite knowing the general architecture of the device (e.g. ARM based) or accessing public documents (e.g. public parts of certification results). Unfortunately, the complexity of the circuit does not permit to continue further statistics on the chip.
In the following, we discuss how practical it is to use standard cell statistical analysis outputs (text, file or graph format) for the two main aspects of this paper: precise laser fault attacks and hardware trojan detection.
5 Single-Cell Localization Direct Applications
5.1 Spatial Information for Laser Setup
Despite the main sample of the study being a 45-nm technology node SoC, we note that a single standard cell (several μm2) can be perturbed at once. Indeed, the laser beam has a Gaussian shape, a spot diameter of a μm and an easily controllable energy (duration by power) reaching the area of interest . This single-layer reverse engineering will help to place the laser spot at the area we are interested in. Symmetrically, we can first launch a laser fault attack to then analyze the situation using the underlying hardware structure. However, if a secure device is attacked this way, detectors might detect the intrusion leading to extra consideration to be taken for the attack (e.g. remove power before sensitive data erase). One can also think about the potential of such an approach together with photon counting techniques (specifically without timing capabilities, e.g. only a CCD camera is used).
5.2 Spatial Information for Integrity Checking
Another use case is a fabless chip designer/manufacturer (or anyone with a design reference) that would like to analyze the integrity of its components at wafer reception. The success rate of such technique is only dependant on the sample preparation/pattern recognition process as there is no triggering element. The standard cell statistical analysis can be applied on a defective device coming from a lot (or a defective die taken from a wafer) to no affect cost and yield. The correlation is made by comparing the list of standard cells physically extracted using our methodology and a design output file. Specifically, this could be done with the Design Exchange Format (DEF) file, where each gate instance is listed with its XY position. A DEF file does not include proprietary inner standard cell information, hypothetically more compliant.
5.3 Extending Scanning Electron Microscopy Use
Part of our approach and obtained data can be used as a starting point for machine learning–based (convolutional neural networks) fast pattern recognition, as our data is labelled with no false positives. In this paper, we choose a frontside destructive approach. It would be obviously more interesting to perform standard cell analysis from the backside of the device in a non-invasive way. Laser scanning microscopy has been used in the past and would be an interesting method to compare with (thinning required, setup, cells distinction (fan-out)).
6 Perspectives and Opening
6.1 Chip to Chip Analysis, Different Process Analysis
6.2 Machine Learning Framework
Machine learning is ideally used for prediction and requires some training data. It particularly makes sense to use it for domain where time is the main criteria (speech recognition). The first concern is to be able to reach the same level of detection while drastically reducing the recognition processing time. Retrieved shapes with standard correlation technique are used as an error-free dictionary to build up the machine learning model. It would be interesting to evaluate such framework in terms of error rate but also for denoising microscopy images. The latter could resolve low-resolution images and further reduce methodology time (scanning). It would be interesting to propose and share a SEM image benchmark or an online tool where test images can be loaded and analyzed according to a specifically trained model.
The partial reverse engineering main interest is its possible application on multiple circuits giving thus more data to be compared with even in a black box approach to assess countermeasures/extract design information. When designing a circuit for secure applications, countermeasures against reverse engineering first appeared in the set of required features, and so before fault attack countermeasures. Actually, it exists proprietary countermeasures at active/poly/via/metal1 layers to hide functionality of a component from non through vias to dummy cells and programmable logic using local oxide breakdown  and used by pay-TV, telecommuncations and smart card industries . Most countermeasure techniques are based on principles such as logic-locking  and netlist/physical obfuscation (doping, dummy via/cell, oxide breakdown, electric charge) .
The idea behind our technique is to analyze how standard cells distribution participate to design information extraction (and for authenticity verification too). In future work, it will be interesting to characterize IC camouflaging protection with our tool for multiple reasons. IC camouflaging is typically not applied on the entire die. Also, it would be interesting to combine in practice aforementioned statistics and local attacks. Common Criteria (CC) attack classification would be affected if less samples, expertise and time are required.
An alternative to high-cost reverse engineering is presented and applied on a commercial 45-nm SoC. This is a first step towards automatic partial design information extraction. The methodology includes any package die extraction, drain/source layer access and SEM imaging, pattern recognition and a new approach called standard cells statistical analysis (SCSA). We particularly characterize each step of the methodology and point out the low cost and time resources needed to start partial reverse engineering investigations. Single-layer reverse engineering mainly addresses combined attacks (such as EM observation/perturbation and laser fault attacks) and malicious hardware modification detection problematics.
Dr Franck Courbon is an Early Career Fellow jointly funded by The Isaac Newton Trust and The Leverhulme Trust under the agreement ECF-2017-606.
- 1.Randy T, Dick J (2009) The state-of-the-art in IC reverse engineering. CHESGoogle Scholar
- 2.Advanced IC reverse engineering techniques: in depth analysis of a modern smart card, Blackhat 2015Google Scholar
- 3.Harrod B (2016) Rapid analysis of various emerging nanoelectronics (RAVEN)Google Scholar
- 5.Principe EL, Asadizanjani N, Forte D, Tehranipoor M, Chivas R, DiBattista M, Silverman S, Marsh M, Piche N, Mastovich J (2017) Steps toward automated deprocessing of integrated circuits. ISTFAGoogle Scholar
- 6.Nanoscale X-ray tomosynthesis for rapid assessment of IC dice, Richard Lanza AIDA-2020 meeting, 2018Google Scholar
- 7.Vasselle A., Thiebeauld H., Maouhoub Q, Morisset A, Ermeneux S (2017) Laser-induced fault injection on smartphone bypassing the secure boot FDTCGoogle Scholar
- 8.Champeix C, Borrel N, Dutertre JM, Robisson B, Lisart M, Sarafianos A SEU sensitivity and modeling using pico-second pulsed laser stimulation of a D Flip-Flop in 40 nm CMOS technology. In: 2015 IEEE international symposium on defect and fault tolerance in VLSI and nanotechnology systems (DFTS)Google Scholar
- 9.Nohl K, Evans D, Starbug S, Plötz H (2008) Reverse-engineering a cryptographic RFID tag. UsenixGoogle Scholar
- 10.Courbon F, Loubet-Moundi P, Fournier JJA, Tria A (2014) Increasing the efficiency of laser fault injections using fast gate level reverse engineering. International Symposium on Hardware-Oriented Security and Trust, HOSTGoogle Scholar
- 11.Beck F (1998) Integrated circuit failure analysis: a guide to preparation techniquesGoogle Scholar
- 12.Schobert M (2009) http://www.degate.org/
- 13.Lewis JP (1995) Fast normalized cross-correlationGoogle Scholar
- 14.Faraday Technology Corporation, 90 nm Logic SP-RVT (Low-K) ProcessGoogle Scholar
- 15.Hatami N, Gavet Y, Debayle J (2017) Classification of time-series images using deep convolutional neural networksGoogle Scholar
- 16.Courbon F, Loubet-Moundi P, Fournier J, Tria A (2014) Adjusting laser injections for fully controlled faultsGoogle Scholar
- 17.Rajendran J, Sam M, Karri R (2013) Security analysis of integrated circuit camouflaging, CCSGoogle Scholar
- 18.Cocchi R Camouflage circuitry and programmable cells to secure semiconductor designs during manufacturing. In: 2015 National Aerospace and Electronics Conference (NAECON)Google Scholar
- 19.Inside Secure accelerates strategy in Silicon IP business with SypherMedia acquisition, 7th November 2017Google Scholar
- 20.Yasin M, Sinanoglu O (2017) Evolution of logic locking, VLSI-SoCGoogle Scholar
- 21.Chakraborty RS, Bhunia S HARPOON: an obfuscation-based SoC design methodology for hardware protectionGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.