# Design of a Branch-Based Carry-Select Adder IP Portable in 0.25 $\mu m$ Bulk and Silicon-On-Insulator CMOS Technologies

#### Amaury Nève and Denis Flandre

Laboratoire de Microélectronique, Université Catholique de Louvain, Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium

#### Abstract:

By reducing the parasitic node capacitances, the Branch-Based Logic design style can increase the performances of digital circuits. In order to benefit from the full potential of the design style and to be able to port it to different technologies, it is important to take into account the specific features of each technology. We investigate the case of three advanced 0.25  $\mu$ m CMOS technologies: bulk, Partially-Depleted SOI and Fully-Depleted SOI. The design of a 16-bit carry-select Branch-Based adder IP is discussed. The Branch-Based adder shows lower power consumption compared to an implementation with conventional CMOS logic gates.

**Key words:** Digital design, SOI Technology, Low Voltage Low Power design

#### 1. INTRODUCTION

As the integration density and operating frequencies of digital IP's steadily rise, the dynamic power dissipation becomes a serious concern. Most of the usual techniques used to decrease the power consumption of logic cells imply major disadvantages. Decreasing the supply voltage  $V_{dd}$  leads to higher delays or requires a lower threshold voltage  $V_{TH}$  in order to maintain the speed performances. This in turn increases the off-state leakage currents  $I_{off}$  through the devices, and thus leads to higher static power dissipation. Decreasing the operating frequency of the IP core is not a good solution if high-speed performance goals must be achieved. There is a need for low-power techniques that do not result in excessive speed degradation.

The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-0-387-35597-9\_40

M. Robert et al. (eds.), SOC Design Methodologies

This paper describes the comparison between Branch-Based Logic (BBL) and conventional static CMOS cells, for three advanced 0.25  $\mu$ m CMOS processes: bulk, Partially-Depleted Silicon-On-Insulator (PD SOI) and Fully-Depleted (FD) SOI. We demonstrate that BBL circuits have lower dynamic power dissipation than conventional CMOS with a negligible speed loss.

#### 2. THE BRANCH-BASED LOGIC DESIGN STYLE

In Branch-Based Logic (BBL), a function is implemented with branches, instead of standard logic gates or pass-gates [1]. Each branch is made of a chain of NMOS or PMOS devices between supply and the output node. Basically, the design methodology consists in writing the logic equation as a sum of products. Each product represents a branch in the cell. It can be shown that any logic function can be written as a sum of products.

Figure 1 shows a Carry-Select (CS) circuit implemented in conventional CMOS logic, with NAND and NOR gates. Figure 2 shows the same function implemented in BBL. All the branches are connected to the output node, without any other connection between the branches.

It has been reported that the main advantage of BBL is the reduction of the parasitic capacitances [2], resulting from two effects. Firstly, the node capacitances in one logic cell are reduced, thanks to the absence of wiring between two branches. Secondly, as some cells can be designed in one delay stage instead of a cascade of two stages, the number of internal nodes in the cell is reduced. All this results in a lower parasitic capacitance to be switched in one clock period and contributes to better power and speed performances.



Figure 1. Conventional CMOS implementation of CS-C1.



Figure 2. BBL implementation of CS-C1.

#### 3. DESIGN OF THE 16-BIT ADDER

For comparison purposes, we designed two 16-bit carry-select adders [3], the first with conventional CMOS logic gates, the second using BBL cells. Thanks to the independent carry network, the carry-select architecture is very fast. Our adder is composed of four parts, each one featuring two 4-bit adders, the first with the carry-in at "0", the second with the carry-in at "1". A multiplexer, whose control signal is generated by the carry network, selects the right output. The great advantage of this structure is that the carry signals coming from the 4-bit adders, referenced as  $C_{\#}^{0}$  and  $C_{\#}^{1}$  (with #=3, 7, 11 or 15) on figure 4, are computed in parallel and arrive approximately at the same time. They are then fed into the Carry-Select (CS) boxes, which compute the control signals for the multiplexers and the final carry-out. No time is lost waiting for a carry signal to ripple through all the adder cells. The logic equations of the CS boxes are represented in figure 3. The 16-bit adder architecture is depicted in figure 4.

$$\begin{aligned} C_3 &= {C_3}^1\,C_{\text{IN}} + {C_3}^0 \\ C_7 &= {C_7}^1\,{C_3}^1\,C_{\text{IN}} + {C_7}^1{C_3}^0 + {C_7}^0 \\ C_{11} &= {C_{11}}^1\,{C_7}^1\,{C_3}^1\,C_{\text{IN}} + {C_{11}}^1\,{C_7}^1{C_3}^0 + {C_{11}}^1\,{C_7}^0 + {C_{11}}^0 \\ C_{15} &= {C_{15}}^1\,\,{C_{11}}^1\,{C_7}^1\,{C_3}^1\,C_{\text{IN}} + {C_{15}}^1\,\,{C_{11}}^1\,{C_7}^1\,{C_3}^0 \\ &+ {C_{15}}^1\,\,{C_{11}}^1\,{C_7}^0 + {C_{15}}^1\,\,{C_{11}}^0 + {C_{15}}^0 \end{aligned}$$

Figure 3. Logic equations of the carry-select boxes.



Figure 4. Architecture of the 16-bit carry-select adder.

CS-C0 and CS-C1 are implemented in one stage in BBL (figures 2 and 5), while the conventional CMOS implementation requires two successive stages. CS-C2 has two stages linked by an inverter (figure 6), while the CMOS equivalent is composed of four successive stages.



Figure 5. BBL Implementation of CS-C0



Figure 6. BBL Implementation of CS-C2.

The 4-bit adders also use a carry-select architecture, with 1-bit adders and a carry-network featuring a similar structure as at the 16-bit level. The BBL 1-bit half adders have a very compact implementation in one stage only, while the CMOS half-adder requires two successive stages.

# 4. DESIGN IN THREE 0.25 $\mu$ M CMOS TECHNOLOGIES

## 4.1 Figures of merit

The simulations of the previously described cells were carried out using ELDO, a SPICE-like circuit simulator, using the parameters of three different CMOS processes: BULK silicon 0.25  $\mu$ m, Partially-Depleted (PD) SOI 0.25  $\mu$ m and Fully-Depleted (FD) SOI 0.25  $\mu$ m. In order to understand our results, figures of merit for each technology are proposed in tables 1 and 2. Except for the threshold voltage, all the other values are normalized to bulk-Si data, e.g.  $I_{DSAT} = 1$  for bulk NMOS. From the threshold voltage data, it can be observed that the bulk process has been optimised for speed, on the contrary of the two SOI processes, whose parameters are rather conservative. But thanks to the lower body effect and the better sub-threshold slope, the FD SOI transistors show a high current in the ON-state, and a low leakage current in the OFF-state. The rather high relative value of the OFF-current for SOI PD NMOS devices is associated with the parasitic floating-body effect at high drain voltages. The figure of merit we use for the capacitance combines the gate and the source/drain capacitances. The values are given

for 25°C as well as for 80°C, since the latter is a typical operating temperature for high-performance circuits.

Table 1. Threshold voltage ( $V_{TH}$ ) values and figures of merit for the NMOS devices, normalized to bulk (data corresponding to bulk = 1), for the three considered technologies at 25°C and at 80°C (value between brackets): 0.25  $\mu$ m bulk Si, 0.25  $\mu$ m PD SOI and 0.25  $\mu$ m FD SOI. Nominal channel length is 0.25  $\mu$ m.  $V_{TH}$  = Threshold voltage; IDsat = Saturation Drain-to-Source Current; Ioff = OFF-state leakage current when the gate voltage VG=0 V; Cox = gate oxide capacitance per unit area; Cj = Source/Drain-to-substrate capacitance per unit area; LS,D=Source/Drain contact region extensions.

| NMOS    | $V_{TH}$ | IDsat@VD=VG=1.5V | Ioff@VD=1.5V | $2/3.C_{OX}.L+Cj.L_{S,D}$ |
|---------|----------|------------------|--------------|---------------------------|
| Bulk-Si | 0.57 V   | 1 (1)            | 1 (1)        | 1                         |
| PD SOI  | 0.55 V   | 1.02 (0.95)      | 5.48 (2.27)  | 0.77                      |
| FD SOI  | 0.50 V   | 1.21(1.09)       | 0.04 (0.03)  | 0.77                      |

Table 2. Threshold voltage values and figures of merit for the PMOS devices, normalized to bulk (data corresponding to bulk = 1), for the three considered technologies at 25°C and at 80°C (value between brackets): 0.25  $\mu$ m bulk Si, 0.25  $\mu$ m PD SOI and 0.25  $\mu$ m FD SOI.

| PMOS    | $V_{TH}$ | IDsat@VD=VG=1.5V | Ioff@VD=1.5V  | 2/3.C <sub>OX</sub> .L+Cj.L <sub>S,D</sub> |
|---------|----------|------------------|---------------|--------------------------------------------|
| Bulk-Si | -0 .61 V | 1(1)             | 1 (1)         | 1                                          |
| PD SOI  | -0.60 V  | 0.72 (0.65)      | 0.02 (0.02)   | 0.85                                       |
| FD SOI  | -0.52 V  | 0.97 (0.83)      | 10E-4 (10E-4) | 0.85                                       |

#### 4.2 Methodology

Each cell has been optimised for speed in the three technologies. For the optimisation process, each cell is loaded with two CMOS inverters, which represents a charge similar to what the cell will have in the complete adder. The nominal gate length is chosen at the minimum allowed drawing dimension, i.e.  $L = 0.25~\mu m$ . In order to determine the W/L-ratio of the transistors, the input pattern that activates the top transistor of the critical branch in one particular cell is applied at the input. Inside one branch, the W/L-ratio is increased when going from the device closest to the output to the device closest to the supply rails, with a ratio of 1.5 between each successive transistor in the stack. The W/L-ratio of the other branches is further tuned to lower the capacitance of the output node, to which all the branches are connected.

### 4.3 Bulk 0.25 $\mu$ m

BBL design implies the use of stacks of series transistors which are particularly affected by the large substrate effect of bulk CMOS. The impact on the delay can be estimated by considering the difference between the case where the substrate of the transistors is connected to the supply rails and the case where the substrate is connected to the source. Except for the half-adder

(referenced as ha\_cin0), the substrate effect causes more delay increase for the cells implemented in BBL, due to the activation of a stack with two or three PMOS transistors in the worst case (Table 3).

Table 3. Percentage delay increase due to the bulk CMOS substrate effect in BBL and in conventional CMOS cells. To cancel out the effect of the substrate-source potential, we connected source and substrate for each transistor in the circuit. Vdd=1.5V; T=80 °C.

|         | Conventional CMOS | BBL   |
|---------|-------------------|-------|
| CS-C0   | 6.6%              | 12.9% |
| CS-C1   | 7.0%              | 24.5% |
| CS-C2   | 10.1%             | 12.4% |
| Ha_cin0 | 8.8%              | 5.0%  |

Table 4. Influence of the threshold voltage non-uniformity on the delay of the BBL and conventional CMOS cells in bulk-Si. The numbers represent the relative variation between worst case and best case. Vdd=1.5 V;T=80°C;  $\sigma_{VTH,NMOS}=10$  mV;  $\sigma_{VTH,PMOS}=15$  mV.

|         | Conventional CMOS | BBL  |
|---------|-------------------|------|
| CS-C0   | 0.11              | 0.09 |
| CS-C1   | 0.13              | 0.05 |
| CS-C2   | 0.12              | 0.10 |
| Ha_cin0 | 0.11              | 0.09 |

In order to evaluate the impact of the threshold voltage non-uniformity, we considered a maximum threshold voltage variation  $\Delta$   $V_{TH}=\pm$   $3\sigma_{VTH}$  around the nominal  $V_{TH}$  value. In bulk-Si CMOS, we obtain a delay difference of about 10 % between the best case and the worst case, the variation being slightly lower in BBL (Table 4). This is intuitively interpreted as follows. In BBL, one branch is activated in the worst case, while in CMOS, at least two cascaded sub-cells must switch, each one contributing to the increase of the impact of  $V_{TH}$  variations on the delay.

Comparing the absolute delay values, including the body effect, the two small cells (ha\_cin0 and CS-C0) are faster in BBL, whereas CS-C1 and CS-C2 are slower in BBL (Table 5). For these two cells, this can be associated with the stacks of three PMOS transistors that are activated in the worst case. In the conventional CMOS cells, a stack of three NMOS devices is activated in the worst case. Moreover, the high number of branches connected at the output nodes in BBL result in a higher parasitic capacitance at this node, even if the total cell capacitance is lower.

All the BBL cells consume between 10 % and 58 % less dynamic power than the conventional CMOS cells (Table 6). This is associated with a reduction of the short-circuit current and with the lower internal node capacitances.

| building blocks of the 16-bit adder for BULK, SOI-PD and SOI-FD technologies. Vdd=1.5 V; T=80°C. |       |        |        |  |
|--------------------------------------------------------------------------------------------------|-------|--------|--------|--|
|                                                                                                  | Bulk  | SOI-PD | SOI-FD |  |
| CS-C0                                                                                            | 24.3% | 24.4%  | 25.6%  |  |
|                                                                                                  |       |        |        |  |

Table 5. Speed increase for the BBL vs. the conventional CMOS implementation of the basic

|         | Bulk   | SOI-PD | SOI-FD |
|---------|--------|--------|--------|
| CS-C0   | 24.3%  | 24.4%  | 25.6%  |
| CS-C1   | -20.2% | -8.7%  | -12.6% |
| CS-C2   | -11.9% | -12.6% | -16.4% |
| Ha-cin0 | 29.9%  | 31.2%  | 32.5%  |

Table 6. Dynamic power reduction for the BBL vs. the conventional CMOS implementation. Vdd=1.5 V; T=80°C.

|         | Bulk  | SOI-PD | SOI-FD |
|---------|-------|--------|--------|
| CS-C0   | 58.5% | 22.7%  | 21.4%  |
| CS-C1   | 32.3% | -4.8%  | 28.9%  |
| CS-C2   | 27.2% | -16.8% | 7.7%   |
| Ha-cin0 | 10.6% | 32.0%  | 16.3%  |

#### 4.4 Partially-Depleted SOI 0.25 µm

The floating-body in PD SOI transistors is responsible for parasitic behaviors such as kink and hysteresis effects [4]. The latter represents the dependence of the threshold voltage on the initial conditions or on the history of the body charge. Table 7 presents the delay variations resulting from initial input conditions which result in different body potentials. Except for CS-C0, the delay variation is slightly higher in BBL due to the higher stacks, but this effect remains small, even when compared to the impact of V<sub>TH</sub> variations.

The floating body is also known to result in the discharge of internal nodes due to the parasitic bipolar effect. The use of BBL, which is a static design style, minimizes the risk of erroneous states, on the contrary to dynamic logic [5].

Concerning the delay, in PD SOI, only the small cells perform better in BBL than in conventional CMOS logic, as in bulk (Table 5). Concerning power, on the contrary to bulk, the PD SOI large BBL cells consume more than the conventional CMOS equivalents due to floating body effects (Table 6).

Table 7. Delay variation due to different initial conditions in BBL and conventional CMOS cells in PD SOI. The input signal is set at an initial value which is "Low" or "High". Vdd=1.5 V; T=80°C.

|         | Conventional CMOS | BBL  |
|---------|-------------------|------|
| CS-C0   | 2.4%              | 0.8% |
| CS-C1   | 3.5%              | 5.1% |
| CS-C2   | 0.1%              | 5.0% |
| Ha_cin0 | 1.6%              | 6.7% |

#### 4.5 Fully-Depleted SOI 0.25 $\mu$ m

FD SOI transistors show better on/off performances and much lesser floating body effects than PD SOI devices thanks to the complete depletion of the body. Moreover, the rise of the threshold voltage with increasing source-to-substrate voltage is lower in FD SOI than in bulk. Figure 7 shows the evolution of the delay when adding transistors to the stack. The FD SOI technology is thus particularly appropriate when stacks of transistors are used, such as in BBL. The lower substrate effect enables us to use one more transistor in the stack for the same delay as in bulk-Si.

Despite the superior performances, the concern about higher threshold voltage non-uniformity in FD SOI, when compared to bulk and PD SOI, has delayed the acceptance of FD SOI in the semiconductor industry. When including a  $\Delta V_{TH} = \pm 3\sigma_{VTH}$  variation around the nominal  $V_{TH}$  value in FD SOI, a difference of about 20 % is obtained for the delay between the best case and the worst case for both design styles (Table 8), which is indeed larger than the 10 % we obtained in bulk. By a better control of the uniformity of the silicon film thickness and related fabrication steps, this effect can be reduced as shown in [6]. Moreover, FD SOI cells do not show additional delay variations due to the floating-body effects as in PD SOI and remain much faster than bulk cells, even in the worst case.

The two small cells are faster in BBL than in conventional CMOS logic for FD SOI (Table 5). As in bulk, the dynamic power consumption is reduced when using the Branch-Based design style (Table 6).

*Table 8.* Influence of the threshold voltage non-uniformity on the delay of the BBL and conventional CMOS cells in FD SOI. The numbers represent the relative variation between worst case and best case. Vdd=1.5 V, T= $80^{\circ}$ C.  $\sigma_{\text{VTH,NMOS}}=15 \text{ mV}$ ,  $\sigma_{\text{VTH,PMOS}}=30 \text{ mV}$ .

|         | Conventional CMOS | BBL  |
|---------|-------------------|------|
| CS-C0   | 0.19              | 0.23 |
| CS-C1   | 0.22              | 0.17 |
| CS-C2   | 0.20              | 0.18 |
| Ha-cin0 | 0.15              | 0.18 |

#### 5. RESULTS FOR THE COMPLETE ADDER

The logic cells described above were used to implement two 16-bit carry-select adders, the first with conventional CMOS logic gates, the second with BBL. Figure 8 compares the power-delay products of the two versions of the adder in each technology. The data points representing the BBL adder are all shifted to the left compared to the points related to the adder in conventional



Figure 7. Delay in function of the number of devices in a stack of NMOS transistors with W/L=(2.5/0.25) for bulk-Si and FD SOI. The increase of the delay is the smallest for FD SOI thanks to the lower bulk substrate effect. Vdd=1.5 V; T=80°C.

CMOS logic, which means that the former consumes less dynamic power for a similar delay. Indeed, in bulk-Si and in FD SOI, the dynamic power is reduced by resp. 13 % and 16 % when comparing the BBL adder to the conventional CMOS adder. The delay is increased by resp. 4 % and 2 %, which is negligible. In PD SOI, the power reduction is the highest, reaching 36 %, with a delay increase less than 2 %.

From the point of view of the static power, the branch-based design style is also beneficial. Thanks to the lower number of leakage paths between  $V_{dd}$  and ground, the BBL 16-bit adder achieves a reduction of static power consumption of resp. 23 %, 18 % and 34 % in bulk-Si, in PD-SOI and in FD-SOI when compared to the conventional CMOS design.

If it is kept in mind that the two SOI processes are experimental processes, not optimised for speed, some trends can be identified in the comparison with bulk (Table 9). Even if the delay is larger in PD SOI than in bulk, the PD SOI 16-bit adder is able to achieve a 20 % power-delay product improvement over bulk. When moving the design to the FD SOI process, a reduction of 20 % delay and 35% dynamic power is obtained at 1.5 V, resulting in a power-delay improvement of nearly 50 %. When lowering the supply voltage down to 1 V, the dynamic power consumption is reduced by 40 % and the delay is still 20% lower than in bulk.

The static power consumption is also significantly lower in FD-SOI than in PD-SOI and bulk, thanks to the better sub-threshold slope, which allows a

reduction of the threshold voltage without increasing the OFF-state leakage current.

The total active area is very similar for the adder in the three technologies. The bad performances of the SOI PMOS devices in our study lead to an optimization point where the widths of the PMOS devices in the critical branches are larger than for the equivalent bulk-Si devices. This is compensated by the use of smaller widths for the SOI NMOS devices compared to bulk-Si. But thanks to the better layout efficiency of SOI, NMOS and PMOS devices can be abutted and easily interconnected, thus saving total die area. On another hand, the use of larger PMOS sizes in FD SOI would improve the matching between devices and reduce the delay variations associated with the threshold voltage non-uniformities.

Table 9. Delay, power consumption, power-delay product and active area of the 16-bit BBL adder for the three technologies. Vdd = 1.5 V;  $T = 80^{\circ}\text{C}$ ; f = 200 MHz.

|                     | BULK          | PD-SOI                  | FD-SOI   |
|---------------------|---------------|-------------------------|----------|
| Delay               | 1.833 ns      | 1.978 ns                | 1.455 ns |
| Static Power        | $1.747 \mu W$ | 656 nW                  | 6.527 nW |
| Dynamic Power       | 9.7 mW        | 7.6 mW                  | 6.3 mW   |
| Power-Delay Product | 17.8 pJ       | 15.0 pJ                 | 9.2 pJ   |
| Active Area         | 3428 μm²      | $3410  \mu  \text{m}^2$ | 3408 μm² |



Figure 8. Delay vs. dynamic power for the BBL and the conventional CMOS 16-bit carry-select adder. Vdd = 1.5 V; F = 200 MHz; T = 80 °C.

#### 6. CONCLUSION

Our results investigate a methodology to reliably port digital IP cores from conventional CMOS to BBL design style, and from bulk to SOI processes. Specifically we studied the potential of Branch-Based Logic design for high-performance IP cores. The analysis was done using the parameters of three advanced 0.25 µm CMOS processes; bulk, PD SOI and FD SOI. Each of these technologies has its specific features which have to be taken into account during the design: the influence of the body effect on the stacks in bulk, the floating-body effects in PD-SOI and the spread of the threshold voltage in FD-SOI. A comparison between Branch-Based design and conventional CMOS logic design reveals similar trends for the different processes: the delay is lower in small BBL cells only, but all the BBL cells have less dynamic power consumption in bulk and FD SOI. The complete 16-bit Branch-Based carry-select adder shows a lower dynamic power dissipation for a similar delay compared to the conventional CMOS version. Moreover, we showed that the FD-SOI performances are already sufficiently better than bulk-Si in order to maintain a comfortable power-delay product improvement, even though much larger V<sub>TH</sub> non-uniformities than in bulk would have to be accommodated. Finally, delay variations with V<sub>TH</sub> do not appear higher in FD SOI than the sum of the delay variations with V<sub>TH</sub> and those associated with the floating-body effects in PD SOI.

#### 7. REFERENCES

- [1] Masgonty J.M., Arm C. and Piguet C., "Technology- and Power Supply-Independent Cell Library", in proceedings of the IEEE Custom Integrated Circuits Conference, 1991, pp. 25.5.1-25.5.4.
- [2] Masgonty J.-M., Mosch P. and Piguet C., "Branch-Based Digital Cell Libraries", in proceedings of EURO ASIC 1991, pp. 27-31.
- [3] Hwang K., "Computer Arithmetic: Principles, Architecture and Design", Wiley, New York, 1979.
- [4] Fossum J.G., "Designing reliable SOI CMOS Circuits with Floating-Body Effects", in Proceedings of the European Solid-State Device Research Conference 1998, pp. 34-41.
- [5] Canada M. et al., "A 580 MHz RISC Microprocessor in SOI", in proceedings of the IEEE International Solid-State Circuit Conference, 1999, pp. 430-431.
- [6] Vanmackelberg M. et al., "0.25 μm fully-depleted SOI MOSFET's for RF mixed analog-digital circuits, including a comparison with partially-depleted devices with relation to high frequency noise parameters", in Solid-State Electronics, vol. 46, n°3, 2002, pp. 379-386.