Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This chapter discusses the gate-level building blocks which have been used to design the ultra-low-voltage prototypes of this work. Their aim was not only to operate at very low supply voltages in a variation-resilient manner, but also to function at speeds of n × 10 MHz. Such targets are only possible to achieve when attention is paid to both the transistor-level basic circuits and the architectural level (to be discussed in Chap. 4).

Careful design of logic gates is crucial if they should be able to efficiently work in the ultra-low-voltage region. Their topology not only has a large impact on the variation-resilience of the total design, but also on the delay, leakage power and active energy consumption. Therefore, Sect. 3.1 provides an elaborate comparison of circuit topologies, from very common logic families to more exotic circuit topologies which have been specifically proposed for operation in the ultra-low-voltage region [33]. An in-depth analysis of the characteristics of these logic families leads to the presentation of the circuit topologies that are preferred in this work in Sect. 3.2.

Section 3.3 continues this discussion of basic building blocks by exploring various memory elements. Not only their functionality differences are examined, but the trade-offs that accompany operation at low supply voltages as well.

To conclude, a summary of the different sizing options of the basic building blocks which have been employed in the four prototypes that will be presented in Chaps. 5 and 6 is given by Sect. 3.4. Finally, Sect. 3.5 ends this chapter.

3.1 Circuit Topology Comparison

To implement a certain logic function, there exist numerous possible circuit topologies. Several characteristics are important to evaluate the quality of a logic gate: speed, dynamic energy, leakage power, variation-resilience, robustness and area. This section discusses various topologies and determines their suitability for use with ultra-low supply voltages [33].

The standard value of the supply voltage at which circuit topologies will be evaluated in this comparison is 200 mV, unless stated otherwise. The analysis will be performed for the 90 nm CMOS technology at hand. However, the sizings and trade-offs are very similar for the 40 nm CMOS technology used in this work. In case of large differences, they will be explained.

3.1.1 Standard CMOS Logic

3.1.1.1 Concept

Figure 3.1 shows a generic implementation of an n-input standard CMOS logic gate. A standard CMOS gate is a combination of two complimentary networks: a Pull-Down Network (PDN) and a Pull-Up Network (PUN). The PDN provides a connection between the output and the ground when the logic function of the inputs is such that the output should be a logic ‘0’. It consists solely of nMOS transistors because incorporating pMOS transistors would result in V T loss. The PUN, on the other hand, provides a connection between the output and the supply rail when the logic function of the inputs is such that the output should be a logic ‘1’. Equivalently, the PUN only consists of pMOS transistors. The networks are arranged so that for any input pattern, one of them will be on and the other one will be off. The inputs are connected to the gates of the nMOS and pMOS transistors. Important for the circuit topology comparison is that standard CMOS logic gates are inherently inverting. The PDN turns on when the inputs are ‘1’, leading to ‘0’ at the output, and vice versa. It is therefore not possible to realize a non-inverting Boolean function with standard CMOS logic gates in a single stage.

Fig. 3.1
figure 1

Generic implementation of an n-input standard CMOS logic gate

3.1.1.2 Ultra-Low-Voltage Operation

Several characteristics of logic gates will be used to adequately compare the operation of these gates at low supply voltages. The most basic logic gate is an inverter , which is implemented with a single pMOS transistor in the PUN and a single nMOS in the PDN of Fig. 3.1. The characteristics of an inverter provide an excellent measure of the quality of a certain circuit topology for ultra-low-voltage operation. Therefore, this text first discusses several properties of an inverter. If a comparison based on these properties does not suffice, other more complicated logic gates will be taken into account for the analysis.

First, the DC characteristics will be discussed in detail. Figure 3.2a shows the Voltage Transfer Characteristic (VTC) of an inverter. The VTC plots the output voltage V out as function of the input voltage V in. A first property which can be derived from the VTC is the switching threshold voltage of an inverter V M. It can be found graphically at the intersection of the VTC curve and the line with function \(V _{\mathrm{out}} = V _{\mathrm{in}}\). V M provides a measure of the gate’s symmetry: if V M is equal to \(V _{\mathrm{dd}}/2\), the gate is unskewed . Otherwise, the gate is low or high skewed . In general, an unskewed gate is desired, as this provides maximal noise margins. However, a gate is sometimes intentionally skewed if more noise is expected on one of the logic levels, or to save area.

Fig. 3.2
figure 2

Important DC characteristics of an inverter: (a) voltage transfer characteristic and (b) definition of noise margins

The noise margin is a measure of the sensitivity of a gate to noise [31]: the low noise margin NM L and the high noise margin NM H provide the maximal allowable noise level that a logic gate can withstand so that the input will still be interpreted correctly. The noise margins can be calculated through different points on the VTC in Fig. 3.2a, where the gain of the inverter equals − 1. These unity gain points provide the minimum high and maximum low input and output voltages V IH and V IL, and V OH and V OL, respectively. Figure 3.2b visualizes the calculation of the noise margins for cascaded gates. NM L is defined as the difference between V IL and V OL, while NM H is the difference between V OH and V IH:

$$\displaystyle\begin{array}{rcl} \mathit{NM}_{\mathrm{L}} = V _{\mathrm{I}\mathrm{L}} - V _{\mathrm{O}\mathrm{L}}& &{}\end{array}$$
(3.1)
$$\displaystyle\begin{array}{rcl} \mathit{NM}_{\mathrm{H}} = V _{\mathrm{O}\mathrm{H}} - V _{\mathrm{I}\mathrm{H}}& &{}\end{array}$$
(3.2)

The region between V IH and V IL is called the undefined region because it does not represent a valid digital logic level. Evidently, the noise margins should be larger than 0 to obtain a functional digital circuit. The higher the noise margins, the lower the gate’s sensitivity to noise. An unskewed gate has equal noise margins, which maximizes immunity to arbitrary noise sources [46]. In the ultra-low-voltage perspective, equal noise margins allow for operation at the lowest supply voltage, making it very desirable to have balanced noise margins.

The gain of the inverter plays an important role as well. The gain is the slope \(\mathrm{d}V _{\mathrm{out}}/\mathrm{d}V _{\mathrm{in}}\) of the VTC. The gain defines whether the logic gate is regenerative . Regeneration signifies that a signal that deviates from the nominal levels V OL or V OH gradually converges back to those levels after passing through a number of such logic gates. In order for a gate to be regenerative, it has to satisfy some conditions: in the undefined region, the absolute value of the gain should be higher than 1, while it should be less than 1 in the valid regions. Note that this last requirement regarding the low gain regions thus directly implies positive noise margins.

Second, the transient characteristics will be discussed. Figure 3.3 gives the definition of the various delays of an inverter, and by extension for any logic gate. The rise time t r and the fall time t f provide a metric of the slopes of a waveform. They express with which delay a signal transits between different signal levels, and are defined by their transitions through 10 and 90 % of V dd.

Fig. 3.3
figure 3

Important transient characteristics of an inverter: definition of the delays

The propagation delay t p is the time required for a signal to travel from the input of a logic gate to its output. It is measured between the 50 % transition points of the input and output waveforms (Fig. 3.3). Because a gate responds differently depending on whether it concerns a rising or a falling input transition, t pLH and t pHL differentiate between both such delays. The propagation delay t p is then defined as the average of t pLH and t pHL:

$$\displaystyle\begin{array}{rcl} t_{\mathrm{p}}& =& \frac{t_{\mathrm{p\mathrm{L}\mathrm{H}}} + t_{\mathrm{p\mathrm{H}\mathrm{L}}}} {2} {}\end{array}$$
(3.3)

Naturally, these DC and transient characteristics will be compared not only for nominal operation, but also when a logic gate is subjected to inter- and intra-die variations (as discussed in Sect. 2.2.3). Because of the exponential sensitivities of variations in sub- or near-threshold operation, it is of the utmost importance to use variation-resilient circuit topologies.

A third consideration for a circuit topology is the required area to implement a logic gate function. This concerns the area necessary to optimally size a logic gate for ultra-low-voltage operation. In general, larger area leads to higher capacitances, which deteriorate the operating speed and energy consumption. Furthermore, silicon area is proportional to cost, so if a certain circuit topology requires more area to implement a logic function, the cost of the total system will be higher. The area will be expressed as the equivalent amount of minimal transistors.

The earlier introduced leakage power and dynamic energy consumptions are of course essential characteristics in the comparison of circuit topologies, as they are crucial parameters throughout this entire work.

There are various ways to optimally size a standard CMOS logic gate for ultra-low-voltage operation. As could be seen in Fig. 2.11, a standard CMOS inverter which is regular-sized for operation at nominal supply voltage suffers severely from variability at ultra-low supply values. Dedicated sub-threshold sizing can counter this partly. To enable ultra-low-voltage operation for a standard CMOS inverter, the nMOS and pMOS transistor should be carefully balanced so that the noise margin is maximized [3]. The highest priority for optimal sizing is given to balanced noise margins in this work, since an imbalance results in a deteriorated nominal functionality at ultra-low supply voltages. If the nominal behavior is already skewed, the behavior under variations will emphasize this imbalance, especially in the snfp and fnsp corners . As a result, the variation-resilience decreases, which is very undesirable. Therefore, optimizing performance by balancing or minimizing propagation delays comes only on the second place, after guaranteeing robustness. Naturally, it will be taken into account but it will not be the critical decisive factor. Moreover, many measures to improve performance can be taken on architectural level, which will be explained further in Chap. 4.

Different possibilities for this optimal sizing for a standard CMOS circuit topology will now be discussed: adjusting the width of the transistors, combining stacked transistors with adjusted width, adjusting the length of the transistors, and body biasing. The schematics visualizing these possibilities are provided in Fig. 3.4.

Fig. 3.4
figure 4

Different possibilities to obtain optimal sizing for a standard CMOS inverter: (a) width sizing, (b) stacked nMOS, (c) length sizing and (d) body biasing

3.1.1.3 Width Sizing

The first option (Fig. 3.4a), optimal sizing through the width of the transistors, is the most commonly used one. The transistors’ width is adjusted to balance the nMOS and pMOS transistor so that equal noise margins are obtained, as balanced noise margins allow to minimize the supply voltage at which the circuit is still functional. Since the on-current of nMOS transistors is significantly higher than the one of pMOS transistors for the same supply voltage, their width W nMOS is kept minimal, so the factor W n is equal to 1:

$$\displaystyle\begin{array}{rcl} W_{\mathrm{nMOS}} = W_{\mathrm{n}} \cdot W_{\mathrm{min}}& &{}\end{array}$$
(3.4)
$$\displaystyle\begin{array}{rcl} W_{\mathrm{pMOS}} = W_{\mathrm{p}} \cdot W_{\mathrm{min}}& &{}\end{array}$$
(3.5)

The pMOS width is then modified according to need. The relative width of the pMOS compared to the nMOS is expressed as P p:

$$\displaystyle\begin{array}{rcl} P_{\mathrm{p}}& =& \frac{W_{\mathrm{p}}} {W_{\mathrm{n}}}{}\end{array}$$
(3.6)

Figure 3.5 shows the various parameters which influence the optimal relative width P p, obtained from simulations at a supply voltage of 200 mV. Equal noise margins are obtained at a P p equal to 11. 1. The value of P p at which V M is equal to \(V _{\mathrm{dd}}/2\) is 11. 2. As could be expected, the optimal V M occurs at a P p value almost equal to the optimal value for equal noise margins, as they are closely related.

Fig. 3.5
figure 5

Noise margins, switching threshold voltage and propagation delays of a standard CMOS inverter as function of the relative width P p of the pMOS compared to the nMOS transistor (\(V _{\mathrm{dd}} = 200\,\mathrm{\mathrm{mV}}\))

Another metric which is sometimes used for width sizing is obtaining equal propagation delays t pLH and t pHL or aiming for a minimal t p. The former occurs at a P p of 8. 4, and the latter at 7. As can be seen, the required pMOS width is smaller to obtain minimum overall propagation delay. The reasoning behind this is that, while widening the pMOS improves the t pLH of the inverter by increasing the charging current, it also degrades the t pHL by causing a larger parasitic capacitance [31], as can be seen in Fig. 3.5.

As explained above, priority is given to the sizing that acquires equal noise margins. Therefore, the pMOS width will be sized with a P p of 11. This excessive sizing is a direct consequence of operating in the ultra-low-voltage region. To illustrate the influence of the supply voltage, Fig. 3.6 provides the optimal P p as function of V dd for the standard CMOS inverter. As can be seen, the required relative width increases significantly when lowering V dd. Consequently, stacked pMOS transistors in for example a NOR gate require even more excessive sizes, which leads to large area and capacitance.

Fig. 3.6
figure 6

Optimal P p for different inverter implementations as function of V dd

Nevertheless, this sizing is necessary to achieve the essential variation-resilience. Figure 3.7 proves that dedicated sizing of the standard CMOS inverter indeed counters the variation sensitivity (visualized earlier in Fig. 2.11) partly. Adequately sizing the inverter (\(P_{\mathrm{p}} = 11\)) clearly lowers the variation of propagation delay in the ultra-low-voltage region compared to a regular-sized standard CMOS inverter (\(P_{\mathrm{p}} = 3\)).

Fig. 3.7
figure 7

Variation of t p as function of V dd for different inverter implementations

3.1.1.4 Transistor Stacking

A solution to this excessive pMOS sizing is to employ transistor stacking. Transistor stacking is a leakage reduction technique, based on the fact that two off -devices have significantly less leakage than a single off -device [26]. Figure 3.8 shows the schematic of two stacked nMOS transistors. Leakage is reduced by four different mechanisms [13]. They are all linked to the intermediate voltage V int which is lower than V dd and higher than V ss:

  1. 1.

    The leakage current through transistor M2 is reduced due to the negative gate-to-source voltage: \(V _{\mathrm{gs}} = -V _{\mathrm{int}}\) (see Eq. (2.5)).

  2. 2.

    The body effect in M2 increases \(V _{\mathrm{T},\mathrm{M}_{2}}\) due to the positive source-to-bulk voltage: \(V _{\mathrm{sb}} = V _{\mathrm{int}}\) (see Eq. (2.9) and Fig. 2.4). M2 is thus reverse body biased.

  3. 3.

    The DIBL effect in M2 increases \(V _{\mathrm{T},\mathrm{M}_{2}}\) due to the reduced drain-to-source voltage: \(V _{\mathrm{ds}} = V _{\mathrm{dd}} - V _{\mathrm{int}}\) (see Fig. 2.6).

  4. 4.

    The DIBL effect in transistor \(\mathrm{M}_{1}\) increases \(V _{\mathrm{T},\mathrm{M}_{1}}\) due to the reduced drain-to-source-voltage: \(V _{\mathrm{ds}} = V _{\mathrm{int}}\).

Fig. 3.8
figure 8

Schematic of two stacked nMOS transistors, visualizing the different leakage reduction mechanisms

Therefore, transistor stacking is a very effective way of reducing leakage.

Stacking the nMOS transistor not only reduces the leakage current I off, but its on-current I on as well. Figure 3.9a shows the effect stacking has on the currents. As a result of the decreased I on, nMOS in a standard CMOS inverter with nMOS stacking, the pMOS sizing can be relaxed without degrading the noise margin. This is visualized in Fig. 3.9b, where the left axis shows the optimal P p as function of the number of stacked nMOS transistors. The right axis shows that the overall noise margin remains balanced when this optimal P p is used. The effect of stacking on the nMOS currents and thus on the pMOS sizing reduces with the amount of stacked transistors (as visible in Fig. 3.9). Therefore, it is optimal to stack the nMOS transistor twice, resulting in a relative pMOS width of 6. 8. Figure 3.4b shows the schematic of such a stacked nMOS inverter, while Fig. 3.6 illustrates the relaxing effect this stacking has on the optimal P p when sweeping the supply voltage.

Fig. 3.9
figure 9

For \(V _{\mathrm{dd}} = 200\,\mathrm{\mathrm{mV}}\): (a) Current percentage of stacked nMOS transistors relative to a single nMOS, as function of the amount of stacked nMOS transistors. (b) Relative width of pMOS P p (left axis) as function of the number of stacked nMOS transistors in the inverter, in order to reach a maximal and balanced noise margin (right axis)

Figure 3.10 shows the different characteristics which influence the optimal sizing for the stacked nMOS inverter. As already mentioned, optimal sizing for noise margins results in a P p of 6. 8. For V M, the optimal P p is 7. 0. Equal propagation delays are achieved at a P p of 5. 0, while minimal t p occurs at a P p of 6. 0. Consequently, the pMOS transistor for a stacked nMOS inverter has been chosen to be sized with a P p of 6, to balance the different characteristics optimally. The total equivalent area is then \(6 + 1 + 1 = 8\) for a stacked nMOS inverter, compared to \(11 + 1 = 12\) for a standard CMOS inverter, which leads to a total area reduction of 33 %.

Fig. 3.10
figure 10

Noise margins, switching threshold voltage and propagation delays of a standard CMOS inverter with stacked nMOS transistors as function of the relative width P p (\(V _{\mathrm{dd}} = 200\,\mathrm{\mathrm{mV}}\))

Stacking not only allows relaxed pMOS sizing, but decreases the leakage through the nMOS transistor as well, which reduces the static power consumption with 58. 7 % compared to the standard CMOS inverter (at \(V _{\mathrm{dd}} = 200\,\mathrm{mV}\)). The stacked nMOS inverter thus has an area and a leakage power reduction, at the penalty of an increased nominal propagation delay of 23.5 % compared to the standard CMOS inverter.

However, the stacked nMOS inverter has a positive effect on delay variations in comparison to the standard CMOS inverter. Figure 3.7 has proven that adequately sizing the standard CMOS inverter to sub-threshold restrictions lowers the variation of t p at ultra-low supply voltages compared to a regular-sized standard CMOS inverter. However, it also shows that using a stacked nMOS inverter further decreases the delay variation. To summarize, introducing nMOS stacking increases the nominal propagation delay slightly, but it also reduces the percentage variation of the delay with 3.9 % at 200 mV, as visible in Fig. 3.7. Due to the variation-resilience of the stacked nMOS inverter, lower design margins have to be introduced to cope with timing variations compared to conventional standard CMOS inverters.

3.1.1.5 Length Sizing

Up to now, the weaker pMOS transistor was strengthened by increasing its width, so as to obtain equal drive strengths of both transistors and consequently equal noise margins. Another method to counter the stronger nMOS transistor would be to weaken it by increasing its length L nMOS while keeping its width W nMOS and the width and length of the pMOS transistor minimal. This should result in a decreased current in all operating regions of the transistor, as visible in the current equations of Sect. 2.1.1 Increasing the length is therefore sometimes used in digital circuits to reduce the leakage or to limit the on-current of a transistor. The resulting schematic of a standard CMOS inverter with nMOS length sizing is shown in Fig. 3.4c, where L p would be kept equal to 1 and L n would be sized according to the required needs.

However, in the 90 nm CMOS technology at hand, the transistor models demonstrate strange behavior when adapting the length of an nMOS transistor. This can be seen in Fig. 3.11, where I off and I on are normalized to an nMOS of minimal size and plotted as a function of L nMOS. Both currents increase with higher length, instead of the expected decrease. Table 3.1 provides a comparison of the impact of nMOS sizing on its currents. According to the simulations, doubling the length of the nMOS transistor results in an increase of a factor 4. 20 in leakage current, while nMOS stacking reduces I off with a factor of 0. 41.

Fig. 3.11
figure 11

Normalized currents I on, nMOS and I off, nMOS as function of L nMOS (\(W_{\mathrm{nMOS}} = W_{\mathrm{min}}\) and \(V _{\mathrm{dd}} = 200\,\mathrm{\mathrm{mV}}\))

Table 3.1 Comparison of normalized I on and I off of nMOS transistors with different sizings

However, in the 90 nm CMOS technology at hand, the transistor models demonstrate strange behavior when adapting the length of an nMOS transistor. This can be seen in Fig. 3.11, where I off and I on are normalized to an nMOS of minimal size and plotted as a function of L nMOS. Both currents increase with higher length, instead of the expected decrease. Table 3.1 provides a comparison of the impact of nMOS sizing on its currents. According to the simulations, doubling the length of the nMOS transistor results in an increase of a factor 4. 20 in leakage current, while nMOS stacking reduces I off with a factor of 0. 41.

As already discussed in Sect. 2.3.3, the accuracy of transistor models in the weak inversion region is not always reliable. This might be a model artifact, but ignoring it to instead rely on intuitive transistor behavior severely complicates circuit simulations. Furthermore, the different mechanisms influencing transistor stacking actually result in a larger leakage reduction than doubling the channel length of the transistor [13]. Moreover, in modern deep sub-micron devices the Reverse Short-Channel Effect (RSCE) may reduce the threshold voltage of the transistor for longer channels, resulting in a less effective leakage reduction. In general, there is a high sensitivity of the current as function of the transistor’s length to process and technology parameters. Hence, sizing of standard CMOS logic gates through adjusting the length of the transistors is strongly technology-dependent [4].

3.1.1.6 Body Biasing

A fourth method to balance the on-currents of the nMOS and pMOS transistor is to employ the body effect (as explained in Sect. 2.1.2.2) to make the transistors weaker or stronger. Reverse Body Biasing (RBB) increases V T to obtain less leakage at the cost of decreased performance. Forward Body Biasing (FBB), on the other hand, reduces V T to increase performance at the cost of higher leakage. RBB is achieved by reducing V BB, n below the ground rail for nMOS transistors, and by increasing V BB, p above the supply rail for pMOS transistors. Equivalently, increasing V BB, n with respect to V ss and reducing V BB, p with respect to V dd results in FBB.

This sizing method is visualized in Fig. 3.4d for the standard CMOS inverter. Here, RBB through decreasing V BB, n could be used to increase the threshold voltage of the nMOS transistor, up to the point where its I on matches that of a minimal pMOS. Oppositely, FBB through decreasing V BB, p could make the pMOS transistor stronger by reducing its threshold voltage.

However, introducing body biasing has some consequences. Firstly, additional power supply rails to distribute the body biasing voltages, as well as a triple well technology are required. Charge pump circuits are needed to generate the additional supplies. This results in area and energy overhead. Secondly, to compensate for inter-die variability, body biasing can be employed but this requires calibration after fabrication. Each individual die then needs to be calibrated during initial measurements. Thirdly, the impact of body biasing reduces for short-channel devices, thereby affecting the scalability of this method. Fourthly, the body biasing voltages are limited by latch-up on one side and electrical breakdown on the other side. Especially in advanced nanometer technologies where the body effect coefficient γ is reduced, these limits can restrict the effectiveness of body biasing.

Both body biasing techniques have their own separate issues as well. RBB becomes less effective for leakage reduction at shorter channel lengths [41]. RBB increases the sensitivity to process variations, e.g. it worsens the V T variations across a die [28]. FBB reduces the sensitivity to process variations, but suffers severely from temperature dependencies [27].

3.1.1.7 Sizing Conclusion

Because of the aforementioned restrictions of length sizing and body biasing of standard CMOS logic, these two options will be discarded in the remainder of this circuit topology comparison. As explained in Chap. 1, this book is focused on ultra-low-voltage circuit design in bulk CMOS technologies. This conclusion is therefore only valid for these type of technologies and could be different in other technologies, e.g. SOI.

3.1.1.8 Literature

There are many ultra-low-voltage publications which utilize standard CMOS logic, as these gates are readily available in standard cell libraries. In some cases, regular standard cells are used which have been resimulated at low target voltages to check their functionality at such supplies, the unfunctional cells were then discarded. For example, standard CMOS gates with large stacks were often avoided, e.g. in [8, 15, 16, 19, 20, 23, 24, 30]. In most cases, recharacterization of the standard cell library at low supply voltages has been carried out [1, 7, 17, 25, 37, 48].

Sizing for sub-threshold operation has been done with both width sizing and length sizing, but the former method has been much more often used than the latter. Hanson et al. [11] and Bol et al. [6] have suggested to increase the channel length to improve the transistor’s sub-threshold behavior. In measured ultra-low-voltage designs, length upsizing has been employed by for instance [9, 24].

Body biasing has been extensively used to compensate for variations after manufacturing of sub- and near-threshold designs, e.g. in [10, 12, 14, 18, 21, 44]. However, as shown in Sect. 1.5, the designs using body biasing do not outperform the other designs.

3.1.2 Pseudo-nMOS Logic

3.1.2.1 Concept

Figure 3.12a shows the generic implementation of an n-input pseudo-nMOS logic gate. The PDN is identical to the PDN of a standard CMOS logic gate, but the PUN has been replaced by a single pMOS transistor that is grounded so that it acts as a current source. Hence, the PDN realizes the logic function, while the pMOS transistor functions as load. When the PDN is off, the pMOS load pulls the output to ‘1’. When the PDN turns on, it fights the load. Therefore, the pMOS load must be weak enough so that the output pulls down to an acceptable ‘0’ level. In order for this logic gate to work correctly, the pMOS sizing is thus critical.

Fig. 3.12
figure 12

(a) Generic implementation of an n-input pseudo-nMOS logic gate.

(b) Schematic of a pseudo-nMOS inverter

Pseudo-nMOS logic is a form of so-called ratioed logic. In general, ratioed circuits depend on device sizing to produce acceptable output levels. In ratioless logic , on the other hand, the output levels do not depend on the sizing of the devices. The other topologies discussed in this chapter are all ratioless circuits.

3.1.2.2 Ultra-Low-Voltage Operation

Ratioed logic reduces the number of transistors to implement a given logic function with respect to standard CMOS logic: N + 1 transistors are required instead of 2N. The advantage of ratioed logic is the decreased number of devices and the smaller area. However, ratioed logic introduces several disadvantages.

Figure 3.12b shows the schematic of a pseudo-nMOS inverter. When the output is pulled high, the operation is the same as for a standard CMOS inverter. When the output is pulled low, the nMOS transistor is turned on while the pMOS load also conducts current. This has two important consequences. Firstly, the nominal low output voltage is higher than V ss, resulting in a decreased low noise margin NM L. Secondly, the inverter has a large static power dissipation due to the direct path from the supply to the ground in the low output state.

The area reduction thus comes at the cost of decreased robustness and static leakage. In fact, the sizing of the pseudo-nMOS logic gate results in a trade-off between noise margin, power dissipation, and delay [31]. The first two parameters get worse as the pMOS size increases. On the other hand, a smaller pMOS results in a lower rise time. Since robustness and leakage are of primary concerns for ultra-low-voltage operation, the pMOS transistor will be sized minimally in this implementation. The nMOS transistor is then sized in order to obtain equal noise margins. This results in a W p of 1 and a W n of 6 at a 200 mV supply for the 90 nm CMOS technology at hand.

Figure 3.13 visualizes the most important drawback of pseudo-nMOS logic: its sensitivity to variations. In this figure, the VTCs of a pseudo-nMOS inverter (\(W_{\mathrm{p}} = 1\), \(W_{\mathrm{n}} = 6\)) and a standard CMOS inverter (\(P_{\mathrm{p}} = 11\)) are compared in different process corners at a supply of 200 mV. As can be seen, the standard CMOS inverter displays good behavior under inter-die variations when properly sized for ultra-low-voltage operation. The pseudo-nMOS inverter on the contrary, suffers severely from these inter-die variations, even though it is properly sized for nominal operation. This makes pseudo-nMOS logic unusable in the ultra-low-voltage region.

Fig. 3.13
figure 13

VTC in process corners of a pseudo-nMOS inverter (left) in comparison with a standard CMOS inverter (right) at \(V _{\mathrm{dd}} = 200\,\mathrm{\mathrm{mV}}\)

3.1.2.3 Literature

Pseudo-nMOS logic has been introduced by Soeleman and Roy in 1999 as a favorable circuit topology for use in the sub-threshold region [35]. Listed advantages were the reduced area and the improved performance due to the reduction of the load capacitance, in comparison to standard CMOS logic. This was one of the first groups who performed research on sub-threshold logic and ultra-low-voltage operation in general, and they published a few papers on simulation results of sub-threshold pseudo-nMOS logic until 2001. However, it has not been adopted by other groups, because of the unacceptably high sensitivity to variations.

3.1.3 Pass Transistor Logic

3.1.3.1 Concept

The previously discussed logic families only allow inputs to drive the gate terminal of a transistor. Pass transistor logic is a circuit topology which not only allows inputs to drive gate terminals, but source/drain terminals of transistors as well. Fundamentally, transistors are used as switches, as shown in Fig. 3.14. A single pass transistor can be realized with an nMOS or a pMOS transistor. Logic gates can easily be constructed with pass transistor logic, e.g. Fig. 3.14c presents the schematic of a NOR gate implemented with nMOS pass transistor logic. Compared to a standard CMOS implementation of a NOR gate, pass transistor logic requires much less transistors. Historically, this is the main motivation behind the use of pass transistor logic.

Fig. 3.14
figure 14

Schematics of pass transistor logic: (a) nMOS switch, (b) pMOS switch and (c) nMOS NOR gate

Note that pass transistor logic is still static logic , as are the previously discussed circuit topologies. The outputs of static logic are always connected to either V dd or V ss through a low resistive path, which is advantageous for noise resilience [31]. It is clear that for example in the NOR gate depicted in Fig. 3.14c always one of the pass transistors will be conducting, ensuring the static property of pass transistor logic. Section 3.1.5.3 will discuss dynamic logic, which relies on temporary storage on the capacitance of a high impedance node.

Unfortunately, pass transistor logic always suffers from signal loss: nMOS transistors pass a strong ‘0’ but a weak ‘1’, i.e. a V T loss will occur on the logic high level. Equivalently, pMOS transistors pass a strong ‘1’ but a weak ‘0’, i.e. a V T loss will occur on the logic low level. Because of this inherent V T loss, pass transistor gates cannot be cascaded by connecting the output of a pass transistor to the gate input of a subsequent pass transistor.

3.1.3.2 Ultra-Low-Voltage Operation

This inherent V T loss makes pass transistor logic unsuitable for operation at ultra-low supply voltages. The voltage drop could be solved by pulling the output to the supply rails, but this requires additional circuitry after every logic gate. Adding an inverter could for example ensure this level restoration. However, the extra transistors added for the additional level restoring circuitry compromise the benefit of pass transistor logic, which was the low transistor count. A more elegant solution to the voltage drop will be proposed as the next circuit topology.

3.1.3.3 Literature

One of the first differential implementations of pass transistor logic is described in [47]. A 16-bit multiplier is constructed with Complementary Pass transistor Logic (CPL). CPL consists of differential inputs and outputs, an nMOS-only pass transistor logic network and standard CMOS output inverters. Basically, logic gates are constructed with differential inputs and nMOS pass transistors. The main reason why [47] used CPL was to achieve high speed due to lower input capacitance and higher logic functionality. The published circuits utilizing CPL were functioning at nominal supply.

The only pass transistor based family which was designed to be used at ultra-low supply voltages has been proposed by the Berkeley Wireless Research Center in 2007 [2]. The so-called Sense Amplifier-based Pass Transistor Logic (SAPTL) consists of three major components. Firstly, there is a pass transistor tree, called the stack, which computes the desired logic function. An inverter drives the root node of the stack and injects signals into the stack. At the output of the stack, a sense amplifier is used to recover both voltage swing and performance. The drivers and sense amplifiers thus provide gain to the circuit. Since the pass transistor stack has no V dd or V ss connections, the only leakage paths appear in the gain circuits. SAPTL can operate synchronously using a clock, or asynchronously using additional hand-shaking circuitry. The authors claim that the low leakage and the low energy consumption are the main advantages of SAPTL. However, the supply voltage that can be used in SAPTL is limited by the input voltage difference that the sense amplifiers can sense. To decrease the input swing of the sense amplifier, its design becomes more difficult and its area or energy consumption will probably increase.

3.1.4 Transmission Gate Logic

3.1.4.1 Concept

The main disadvantage of pass transistor logic is the V T loss at one of the signal levels. This can be solved by using the complementary properties of nMOS and pMOS transistors: instead of placing a single transistor to pass a signal, two complementary transistors could be placed. This is called a transmission gate , and is visualized in Fig. 3.15a. While switching, current will flow through the parallel combination of the nMOS and pMOS transistor. The nMOS passes a strong ‘0’, while the pMOS passes a strong ‘1’, thereby eliminating the V T loss on both logic levels.

Fig. 3.15
figure 15

Schematics of transmission gate logic: (a) transmission gate and (b) TG NOR gate

When this technique is used to implement logic gates, it is called Transmission Gate (TG) logic. Figure 3.15b shows a NOR gate implemented with TG logic. Compared to pass transistor logic (recall Fig. 3.14c), TG logic requires double the amount of transistors, but it eliminates the problematic voltage drop. TG logic is commonly built using equal-sized minimal nMOS and pMOS transistors. Boosting the size of the pMOS, as in standard CMOS logic, only slightly improves its effective resistance while significantly increasing the capacitance [46]. As opposed to standard CMOS logic, there is no need for transistor balancing through sizing in TG logic since there is always an nMOS and a pMOS included in a conducting path. Compared to a standard CMOS NOR gate, the required area is therefore much lower. Hence, TG logic is still attractive from an area point of view, despite the transistor doubling compared to pass transistor logic.

Note that TG logic requires complementary input signals, as can be seen in Fig. 3.15. The required extra wires increase routing complexity, as opposed to standard CMOS or pseudo-nMOS logic.

3.1.4.2 Ultra-Low-Voltage Operation

This section will provide an in-depth analysis of TG logic, and a detailed comparison to standard CMOS logic will be performed.

One of the attractive properties of TG logic in ultra-low-voltage operation is that it suffers less from reliability issues due to inter-die variations compared to standard CMOS logic. An intuitive explanation will first be discussed, and will afterwards be followed by supporting simulation results. Figure 3.16a shows the schematic of an inverter implemented in TG logic, while Fig. 3.16b provides an alternative representation of the same TG inverter. This alternative representation shows that the TG inverter is in fact a standard CMOS inverter, extended with an ‘inverse’ standard CMOS inverter. The inverse inverter has the complementary input signal of the regular inverter, and has an nMOS in its PUN and a pMOS in its PDN. The process corners which are most problematic from a functionality perspective are the fnsp and snfp corners where the speed difference of the transistors is largest. Exactly for these corners, this inverse inverter aids significantly, since there are always both an nMOS and a pMOS in parallel that can compensate each other’s weaknesses.

Fig. 3.16
figure 16

Schematics of an inverter: (a) in TG logic, (b) alternative representation in TG logic, (c) in stacked nMOS TG logic

Before evaluating process corner simulation results, the exact sizing of TG logic in the 90 nm technology at hand must be discussed. The TG logic implementation of Fig. 3.16a with a single nMOS and pMOS transistor in each transmission gate actually poses problems. These problems are related to the \(I_{\mathrm{on}}/I_{\mathrm{off}}\) ratios discussed in Sect. 2.2.2 In this technology, I off, nMOS is only 21. 7 times lower than I on, pMOS at \(V _{\mathrm{dd}} = 200\,\mathrm{\mathrm{mV}}\). Extensive MC simulations of the \(I_{\mathrm{on},\mathrm{pMOS}}/I_{\mathrm{off},\mathrm{nMOS}}\) ratio resulted in a CDF of which the critical tail at low current ratios is shown in Fig. 3.17. Nominally the current ratio is already very low, but taking into account 6σ intra-die variations the worst-case ratio becomes insufficient, as can be seen from the fitted lognormal distribution on the lower end tail of the ratio with a single nMOS. An important point to make is that the \(I_{\mathrm{on},\mathrm{pMOS}}/I_{\mathrm{off},\mathrm{nMOS}}\) ratio is much smaller than the \(I_{\mathrm{on},\mathrm{nMOS}}/I_{\mathrm{off},\mathrm{pMOS}}\) ratio, as already shown in Fig. 2.8

Fig. 3.17
figure 17

Critical tail of the CDF of the \(I_{\mathrm{on},\mathrm{pMOS}}/I_{\mathrm{off},\mathrm{nMOS}}\) ratio obtained with MC simulations. Both the single nMOS and the stacked nMOS implementations are fitted with lognormal distributions (\(V _{\mathrm{dd}} = 200\,\mathrm{\mathrm{mV}}\))

To improve this problematically low \(I_{\mathrm{on},\mathrm{pMOS}}/I_{\mathrm{off},\mathrm{nMOS}}\) ratio, the nMOS transistor is stacked (as shown in Fig. 3.16c). This results in a decreased I off, nMOS (see Table 3.1). Stacking the nMOS thereby mitigates the current ratio problems, since it increases the \(I_{\mathrm{on},\mathrm{pMOS}}/I_{\mathrm{off},\mathrm{nMOS}}\) ratio while the complementary current ratio remains sufficiently high. Moreover, using stacked nMOS transistors results in a significantly higher worst-case current ratio than using a single nMOS (Fig. 3.17). Without nMOS stacking, there is a large difference between the rise and fall time of a TG. The rise time is the critical timing specification because it is dominated by the weak pMOS. Stacking the nMOS transistor of TG logic results in a more balanced rise and fall time and thus has a negligible effect on the overall speed of the logic gate. To conclude, the increased robustness and the reduced leakage outweigh the slight speed degradation cause by nMOS stacking.

Note that nMOS stacking is here used to decrease I off, nMOS, while in the case of standard CMOS, the main reason for the use of nMOS stacking is the reduced I on, nMOS.

To evaluate the aforementioned inter-die variation-resilience of TG logic, Fig. 3.18 provides the simulated VTCs in process corners of a stacked nMOS TG inverter (\(W_{\mathrm{p}} = 1\) and \(W_{\mathrm{n}} = 2\) in Fig. 3.16c) and a standard CMOS inverter (\(P_{\mathrm{p}} = 11\)). It can be seen that the TG inverter has less spread over the different process corners than the standard CMOS inverter. The higher inter-die variation-resilience of TG logic arises from the inclusion of both nMOS and pMOS transistors in each conducting path. With stacked nMOS TG logic, functionality is thus ensured under all possible inter-die variations.

Fig. 3.18
figure 18

VTC in process corners of a stacked nMOS TG inverter (left) in comparison with a standard CMOS inverter (right) at \(V _{\mathrm{dd}} = 200\,\mathrm{\mathrm{mV}}\)

Until now, the analysis on TG logic has been on an inverter. However, using this topology for logic gates has far more interesting benefits, of which the area is only one. In the following analysis, standard CMOS logic with pMOS width upsizing (abbreviated to CMOS) and TG logic extended with nMOS stacking (abbreviated to TG) are compared on various logic gate characteristics. The analysis will be performed on a NOR gate because it is an elementary logic function and a difficult gate in standard CMOS logic since it requires pMOS stacking. Stacked pMOS transistors in NOR gates require excessive sizes, as can be seen in Fig. 3.19a. The sizing of the TG NOR is shown in Fig. 3.19b. The pMOS transistors are sized minimally, and the nMOS transistors are stacked and have a width of \(2 \cdot W_{\mathrm{min}}\). At first sight, this seems counterintuitive because increasing the width of a transistor normally increases its I on and I off. However, in this 90 nm CMOS technology, I off and I on of stacked nMOS transistors with 2 ⋅ W min reduce with 54 % and 27 % compared to minimal-sized stacked nMOS transistors, respectively. This is due to the Inverse Narrow Width Effect (INWE) (also called Reverse Narrow Channel Effect) of which the impact in the sub-threshold region has been discussed in [49]. INWE only has an impact for transistor widths that approach the minimum width: it effectively reduces the threshold voltage for very narrow transistor widths. Therefore, slightly increasing the nMOS width is beneficial to further reduce its leakage. Note that INWE is only present in the 90 nm and not in the 40 nm CMOS technology used in this work. As a result, stacked nMOS transistors in TG logic of the 40 nm prototypes are sized minimally, as will be seen in Sect. 3.4. To summarize, the sizing of TG logic is relaxed considerably compared to CMOS logic, e.g. the area of the CMOS NOR gate is 4. 6 times bigger than the TG NOR.

Fig. 3.19
figure 19

Schematics of a NOR gate: (a) in standard CMOS logic and (b) in stacked nMOS TG logic

In the analysis, the NOR gate is subjected to inter- and intra-die variations. Due to the exponential sensitivity to variations, it is of the utmost importance to design variation-resilient sub-threshold circuits.

Because of the small supply voltage swing, an important characteristic in ultra-low-voltage design is the output signal loss of logic gates. Too much signal loss can cause the subsequent gate to wrongly interpret the logic value. Signal losses can be overcome by regenerating the signal, e.g. through an inverter. For example in a datapath with a high logic depth, intermediate signals of cascaded logic gates can be regenerated to ensure correct output levels. However, the lower the amount of signal loss, the less frequently inverters need to be inserted to restore the signal levels to the supply rails.

Figure 3.20 compares the TG and CMOS NOR gates on the percentage signal loss their output has relative to the total supply swing, under inter-die variations. Only the worst-case corners are shown as a function of V dd. In the case of signal loss on the logic low level, the logic gates perform worst in the snfp corner because of the weakened nMOS transistor versus the strengthened pMOS transistor. Respectively, at signal loss on the logic high level, this worst-case applies to the fnsp corner. Figure 3.20 shows that the signal loss aggravates when the supply voltage lowers and the circuits operate more in sub-threshold. It is clear that the TG NOR outperforms the CMOS NOR in signal loss on logic low level, and TG logic is also the better option in the case of signal loss on logic high level. The output swing degradation analysis is performed for intra-die variations as well, by carrying out extensive MC simulations for a 200 mV supply. Figure 3.21 demonstrates that the TG NOR performs significantly better under intra-die variations for signal loss on logic low level and comparably for logic high level.

Fig. 3.20
figure 20

Percentage signal loss for different NOR topologies in the worst-case corner: (a) snfp corner for logic low and (b) fnsp corner for logic high level, as function of V dd

Fig. 3.21
figure 21

CDF of the signal loss for different NOR topologies of (a) logic low and (b) logic high level for \(V _{\mathrm{dd}} = 200\,\mathrm{\mathrm{mV}}\), obtained with MC simulations around the tntp corner

Note that this signal loss story changes when cascading multiple gates in TG logic, as will be discussed profoundly on architectural level in Sect. 4.2

Another essential characteristic is the variation of gate delay. As previously mentioned, intra-die variations have a very deteriorating influence on the variation in delay. Therefore, Fig. 3.22 shows the variation of the propagation delay as function of V dd. The TG NOR displays overall less delay variations than the CMOS NOR.

Fig. 3.22
figure 22

Variation of the propagation delay for different NOR topologies as function of the supply, obtained with MC simulations

An additional, important benefit of TG logic is the fact that it does not have direct leakage paths from the supply to the ground. As such, a TG logic gate has the attractive property of an almost non-existing leakage power. Table 3.2 provides the leakage power figures of both NOR topologies. The leakage power of the CMOS NOR is a factor of more than 19 higher than the one of the TG NOR. These numbers take only the inherent leakage of the NOR gates into account, not the contribution to leakage of possible circuits required to regenerate intermediate signal levels, which will be examined in detail later on.

Table 3.2 Comparison of leakage power for different NOR topologies (\(V _{\mathrm{dd}} = 200\,\mathrm{\mathrm{mV}}\))

To conclude this analysis, TG logic is a very attractive solution for ultra-low-voltage logic gates due to its higher variation-resilience and lower leakage than standard CMOS logic. This analysis has been performed on a NOR gate, because it is an elementary logic function. Important to note is that since all logic gates in TG logic have the same generic structure, the results for other TG logic gates will be very similar to the ones of the NOR. More information about this will be provided in Sect. 3.2.1.

3.1.4.3 Literature

In literature, transmission gates have been reported to be used in an ultra-low-voltage design once by another research group. Wang and Chandrakasan from MIT [42, 43] presented a sub-threshold FFT processor where transmission gate logic was used, but only for a few specific logic gates, e.g. a XOR and a MUX. This concerned regular transmission gate logic, so no transistor stacking was employed. However, to avoid sneak leakage paths and thus ensure functionality, they buffered all inputs and outputs to the transmission gate cells. This of course adds significantly to the resulting leakage and energy of inserting such a TG cell.

3.1.5 Other Topologies

This section covers some other, less frequently used circuit topologies for ultra-low-voltage or ultra-low-energy operation.

3.1.5.1 Sub-Threshold Source-Coupled Logic

Sub-Threshold Source-Coupled Logic (STSCL) has been proposed by Tajalli and Leblebici from the Ecole Polytechnique Fédérale de Lausanne (EPFL) in Switzerland [39]. Figure 3.23 shows the schematic of an STSCL inverter. In an STSCL gate, the logic operation takes place mainly in the current domain to achieve a very high speed. The input source-coupled nMOS differential pair switches a constant current between two branches, based on the input logic levels. This differential pair can be expanded to a network of nMOS source-coupled pairs to implement more complex logic functions. The current is converted to an output voltage through the pMOS load transistors. The voltage swing at the output should be large enough to completely switch the current in the input transistors of the next stage. Hence, the load resistors should have a high enough resistivity. Minimal-sized pMOS transistors with shorted drain-substrate contacts are used as gate-controlled, highly resistive load devices. The bias current (through the nMOS transistor below) is usually kept at very low current levels.

Fig. 3.23
figure 23

Schematic of an STSCL inverter

Operating in sub-threshold regime, the circuit can be used in a very wide frequency range by adjusting the bias current without any need for resizing the devices. The power consumption of an STSCL gate depends on the tail bias current. Unlike standard CMOS circuits where there is no constant current dissipation, each STSCL gate consumes a certain amount of constant bias current. This current is charging or discharging the load capacitance, and thus directly translates into the speed of the output transition. The most interesting aspect of STSCL circuits is that both speed and power consumption can be adjusted linearly by altering the amount of bias current. Hence, this allows a wide range of operating frequencies. However, because of this static power consumption, STSCL logic is mostly power-efficient in circuits with high activity. Evidently, bias circuits are required to provide bias currents of both nMOS current source and the pMOS loads.

To maintain enough headroom for the current source, a minimum supply voltage of around \(10 \cdot V _{\mathrm{th}}\) is necessary [39]. Measurements of an 8-bit carry-save multiplier in a 0. 18 μm CMOS technology [38] confirm that this theoretical value is approximately correct, since the multiplier is functional down to 300 mV. Hence, extremely low-supply operation is not possible with STSCL circuits. Unfortunately, no larger STSCL systems than this multiplier have been fabricated, making it difficult to assess the scalability of this type of logic. Because STSCL logic cannot be used for real ultra-low-voltage operation, the main advantage should be the low power consumption, but the question is if the constant static power consumption of the logic and the power consumption of the bias circuits do not jeopardize this characteristic.

3.1.5.2 Adiabatic Logic

Adiabatic logic for low-power operation has been studied by the group of Schmitt-Landsiedel from the Technical University of Munich. The following information has been summarized from [40]. The idea of adiabatic or energy recovering logic is to not use a constant voltage supply, but instead use a pulsed power supply. Moreover, adiabatic logic does not abruptly switch from 0 to V dd, or vice versa, but a voltage ramp is used to charge and recover the energy from the output. A slowly varying voltage source requires less energy to charge a capacitance if its period is longer than the time constant of the charging path. Furthermore, when the supply voltage decreases, the output capacitance is discharged and its stored energy can be recovered by the supply source.

Therefore, adiabatic logic circuits are operated with an oscillating power supply, called the power-clock. Each power-clock cycle consists of four intervals, visualized in Fig. 3.24a. There are four phases of the same power-clock, each shifted 90 . Cascaded logic gates are powered by successive phases ϕ i of the power-clock. Therefore, adiabatic logic is inherently pipelined. If at a certain location no logic gate is necessary, buffers have to be inserted for synchronization reasons.

Fig. 3.24
figure 24

Adiabatic logic: (a) four phases of the power-clock, (b) PFAL inverter schematic and (c) ECRL inverter schematic

Two adiabatic logic families have been found to provide the best energy-efficiency: Positive Feedback Adiabatic Logic (PFAL) and Efficient Charge Recovery Logic (ECRL). Both exhibit a memory functionality, PFAL through a latch element (Fig. 3.24b) and ECRL through a cross-coupled pMOS transistor pair (Fig. 3.24c). According to the authors, the area consumption of adiabatic logic is comparable to standard CMOS logic, but this is only true for complex functions and not for basic functions with a few inputs due to the overhead of the memory functionality.

In the evaluate (E) mode of a logic gate powered by ϕ 0, the outputs are evaluated from the stable input signals (Fig. 3.24a). These outputs are then kept stable for the subsequent gate in the following mode, i.e. the hold (H) mode. In the recovery (R) mode, energy is recovered by the supply source. The wait (W) mode is inserted for symmetry reasons, as it is easier to generate symmetric signals according to the author.

Adiabatic logic is claimed to save energy compared to standard CMOS logic, but only for moderate operating frequencies. Due to the fact that some energy losses in adiabatic logic are frequency-dependent, there is an optimum frequency for energy-efficiency. For example for a 130 nm CMOS technology, this is supposed to lie around 100 MHz. As for standard CMOS logic, voltage scaling reduces the energy of adiabatic logic. However, the expected energy gain of adiabatic logic compared to standard CMOS logic reduces when lowering V dd. Moreover, there exists a functional supply limit for ECRL and PFAL. The minimum supply is \(\mathrm{max}(V _{\mathrm{T},\mathrm{nMOS}},V _{\mathrm{T},\mathrm{pMOS}})\) for ECRL and \(2 \cdot V _{\mathrm{T},\mathrm{nMOS}}\) for PFAL. Below these supply voltages, the circuits malfunction. More information about these lower bounds can be found in [40].

Each adiabatic system consists of two main parts: the digital core design made up of adiabatic gates and the generator of the power-clock signals. An efficient generation of the four phases making up the power-clock is essential to get high energy savings compared to standard CMOS logic with its fixed supply voltage.

Two measured datapath elements have been reported in a 130 nm CMOS technology: an 8-bit ripple carry adder in [5] and a Finite Impulse Response (FIR) filter in [40]. However, both chips have not been measured at frequencies beyond 20 MHz due to test setup limitations, making it difficult to claim that more energy savings would be obtained at 100 MHz. Moreover, both have been measured at quasi-nominal supplies: the adder at 1. 2 V and the FIR was reported to function down to 800 mV, which is not spectacularly low. The largest drawback of this adiabatic logic is however that this research group has never measured a full adiabatic system with the power-clock generation on-chip. Since this is essential to evaluate the claimed energy savings, it is unclear if this adiabatic logic really exhibits low-power potential.

3.1.5.3 Dynamic Logic

As opposed to static logic where the output is always connected to one of the supply rails through a low resistive path, dynamic circuits rely on temporary storage of signal values on the capacitance of high-impedance circuit nodes [31]. A dynamic circuit can be obtained by transforming the pMOS load of pseudo-nMOS logic to a clocked pull-up pMOS transistor, as visible in Fig. 3.25. As a result, dynamic operation has two modes, depending on the clock level [46]. When the clock is ‘0’, the output is precharged to ‘1’. This is called the precharge mode. When the clock is ‘1’, the clocked pMOS is turned off and the output may remain high or may be discharged through the PDN, which is the evaluation mode. The clocked nMOS foot transistor in Fig. 3.25 is optional, depending on whether the input is guaranteed to produce ‘0’ during precharge mode.

Fig. 3.25
figure 25

Generic implementation of a dynamic gate

Once the output is discharged in the evaluation mode, it cannot be charged again until the next precharge mode. The inputs to the gate can thus make at most one transition during evaluation [31]. Moreover, this must be a low-to-high transition. Therefore, dynamic circuits cannot be cascaded as such, since if their outputs make a transition, it will always be a high-to-low transition. By inserting an inverter after every dynamic gate, this problem can be solved. This is called Domino logic. Consequently, only non-inverting gates can be implemented in Domino logic.

Dynamic logic obtains a similar reduction in transistor count as pseudo-nMOS logic, but avoids the high static power consumption. Furthermore, dynamic logic provides high-speed operation for circuits which are operating at nominal supply voltage. However, it has several disadvantages in ultra-low-voltage operation [45]. Because of the low supply level at which the output will be precharged, only a small amount of charge is stored on the dynamic node. Therefore, this node becomes very sensitive to noise and idle leakage. This is worsened by variations, when for example the precharge pMOS transistor is weakened compared to the PDN. Robustness can therefore not be guaranteed for dynamic circuits in ultra-low-voltage operation.

Sub-threshold dynamic logic, called Sub-Domino logic, has been proposed by Soeleman et al. In [36], simulations in a 0. 35 μm CMOS technology showed that Sub-Domino logic was considerably faster and occupied smaller area than standard CMOS logic operating in the sub-threshold region. However, variations have not been studied in this paper, while it is paramount to have a variation-resilient circuit topology for ultra-low-voltage operation. Therefore, it is doubtful that operating dynamic logic at ultra-low supply voltages will provide the required robust functionality.

3.2 Chosen Circuit Topologies

This section discusses the chosen circuit topologies which are used in the ultra-low-voltage prototypes of Chaps. 5 and 6.

3.2.1 Logic Gates

The topology used for logic gates is a crucial choice in ultra-low-voltage design to ensure their efficient functionality. It is critical in terms of variation-resilience, energy consumption and speed. From the extensive comparison carried out in Sect. 3.1.4, TG logic has been chosen as preferred topology for logic gates operating in the ultra-low-voltage region. The main reasons for this choice are the inherent robustness of TG logic and its low contribution to leakage. The variation-resilience of TG logic arises from the inclusion of both nMOS and pMOS transistors in each conducting path. Statistically, the effect of variations on both transistors tends to be compensated by the presence of the complementary transistor. The leakage power consumption of TG logic is very low because it does not have direct leakage paths from the supply to the ground.

Another advantage of TG logic is that it uses considerably smaller transistor dimensions compared to standard CMOS logic while achieving better variation-resilience. Moreover, upsizing is often necessary to reduce the sensitivity to variations of ultra-low-voltage standard CMOS logic [22, 29]. These extra margins are not necessary for TG logic. TG design also avoids pMOS stacking and does not require body biasing.

To conclude, TG logic is the most attractive solution for ultra-low-voltage logic gates taking variability into account. Consequently, TG logic is the building block for all logic gates.

Figure 3.26 shows the schematic of the employed TG logic. With this generic logic block, it is possible to construct all 2-input logic gates. Only the order of the inputs needs to be changed to achieve a different logic functionality, as can be seen in Fig. 3.26. For example, a 2-input OR gate requires two inputs A and B and the supply voltage V dd, while its differential equivalent, the NOR gate, has the same inputs at the transistors’ gates, but the complementary inputs \(\overline{\mathit{A}}\) and V ss at their sources. In this manner, all 2-input logic gates can be constructed (OR, NOR, AND, NAND, XOR, XNOR), as well as the 3-input MUX and its differential equivalent. Moreover, with TG logic non-inverted gates like AND and OR gates are possible, which is not the case in standard CMOS logic.

Fig. 3.26
figure 26

Preferred TG logic gate topology: (left) schematic of a generic logic gate and (right) inputs required to implement the feasible logic functionality

In other words, the design and layout of these logic gates is simplified to the design and layout of just one generic logic block. This modular design considerably simplifies the design of a library of logic gates. The design of this generic block has to be optimized only once for the specific technology at hand, using techniques such as sizing for optimal noise margins and transistor stacking for leakage reduction. The fact that all logic gates have the exact same layout is also beneficial for mismatch .

TG logic requires differential input signals, but the pipelined architecture which is used in the prototypes provides these signals in an efficient way, as will be discussed in Chap. 4.

3.2.2 Inverter

Transmission gates are, unfortunately, not ideal switches because they have a series resistance associated with them [31]. Such logic gates cannot be infinitely cascaded since TG logic suffers from some signal loss at the output. By cascading too many logic gates, the robustness can be deteriorated because of too large output signal losses. It is thus necessary to regenerate intermediate signal levels. This regeneration can be performed by inverters or memory elements, such as latches or flip-flops. The inverter topology will be discussed in this section, while memory elements are examined in Sect. 3.3.

Figure 3.27 shows the preferred inverter topology. It consists of a standard CMOS inverter extended with nMOS stacking to relax pMOS sizing, as presented in Sect. 3.1.1.4. This type of inverter is preferred to a regular standard CMOS inverter because of the reduced area and the increased variation-resilience .

Fig. 3.27
figure 27

Schematic of the preferred inverter topology

Remember that the reasons to use nMOS stacking differ from TG logic to this preferred inverter topology. In TG logic, the primary reason is the reduction of the off -current I off, nMOS, whereas nMOS stacking is employed in the inverter primarily to decrease the on-current I on, nMOS.

3.3 Memory Elements

There are two important types of basic memory elements: a latch and a flip-flop . Both can be used to store information and are controlled by a clock signal. Figure 3.28 visualizes their functionality. Flip-flops are edge-triggered, i.e. when the clock makes a low-to-high transition, the input is copied to the output. The output is stored until the next rising clock edge. On the other hand, latches work in two phases. When the clock is high, the latch is transparent (T) and the data at the input propagates through to the output. When the clock is low, the latch is locked (L) and the output retains the value it last had when transparent. A latch is therefore said to be level-sensitive. The implementation of both elements in the ultra-low-voltage region will now be discussed.

Fig. 3.28
figure 28

Functionality of (a) a level-sensitive latch versus (b) an edge-triggered flip-flop

Both memory elements are clocked. As a result, when they are used in an architecture, this architecture becomes pipelined. More information on pipelining will be provided in Chap. 4.

3.3.1 Latch

In order to be able to store data, some form of feedback is necessary. Figure 3.29 shows a generic schematic of a latch with a single input and differential outputs. The cross-coupled inverters provide the feedback functionality. As visible, one of the cross-coupled inverters is a tristate inverter , which can be switched on and off. Therefore, the cross-coupling can be turned off. This ratioless behavior is important for robust ultra-low-voltage functionality of the latch. The feedback loop could be implemented in a ratioed fashion as well. However, as already discussed in Sect. 3.1.2, this is undesirable for ultra-low-voltage systems due to their high sensitivity to variations. The interested reader is referred to [8] for a more elaborate discussion on the unsuitability of ratioed latches for sub-threshold operation. Because of the cross-coupled inverters, latches restore the signal levels of their input signals, which is a beneficial characteristic in ultra-low-voltage design.

Fig. 3.29
figure 29

Generic schematic of a single-input, differential-output latch

The clock signals of Fig. 3.28 will from now on be addressed as enable signals, abbreviated to en. The latch in Fig. 3.29 consists of a regular inverter and two tristate inverters, controlled by a differential enable signal. The latch functionality of the circuit can easily be verified: in the transparent phase, when en is high, the input tristate inverter as well as the regular inverter conduct the input to the output. The feedback path is cut off through the other tristate inverter. In the locked phase, the input tristate inverter is turned off, while the cross-coupled inverters store the data.

Two possible implementations of a tristate inverter are shown in Fig. 3.30: a full-CMOS tristate inverter and a TG-based tristate inverter. In the full-CMOS tristate inverter , the enable transistors are placed in the PUN and PDN of the inverter. However, this introduces pMOS stacking and therefore excessive pMOS sizes are required to ensure good performance in all process corners, as explained in Sect. 3.1.1.3. On the other hand, the TG-based implementation is switched on and off by a transmission gate. When the transmission gate is switched on, both the nMOS and pMOS transistors are turned on. The worst corner for pMOS stacking is the fnsp corner. When the PUN of the inverter is now conducting, the stronger nMOS transistors of the transmission gate can compensate the weaker pMOS. Therefore, excessive sizing for process corners is relaxed because the effect of the pMOS stacking is reduced in the TG-based tristate inverter. As a result, the TG-based tristate inverter is preferred for ultra-low-voltage operation because it avoids pure pMOS stacking and hence occupies less area and is more variation-resilient [32].

Fig. 3.30
figure 30

Schematics of (a) a full-CMOS tristate inverter and (b) a TG-based tristate inverter

As explained in Sect. 3.2.1, TG logic requires differential input signals. An attractive property of this latch is that it provides differential output signals in an efficient way, since the complementary output signal is already available without the need for extra circuitry.

In the prototypes which will be presented in Chaps. 5 and 6, two types of latches have been used, as shown in Fig. 3.31. Both have differential outputs which serve as input for the TG logic, but their input signals differ. Figure 3.31a shows a single-input latch [32], while Fig. 3.31b shows a latch with differential inputs [34]. The latter latch can be used when differential input signals are available, i.e. when all TG logic is implemented differentially, whereas the single-input latch can only be used in non-differential cases. In the latches, the same methodology as before has been used: the inverters are implemented as stacked nMOS inverters and the transmission gates which serve as control switches have stacked nMOS transistors as well.

Fig. 3.31
figure 31

Schematics of differential-output latches with (a) a single input and (b) differential inputs

Note that the differential implementation of the latch has a few advantages over the single-input one. First, the number of inverters can be reduced when going from a single input to complementary inputs. This seems counterintuitive, but can be explained by the fact that the inverter at the input is not necessary anymore, since the outputs are still regenerated in both the transparent and the locked phase. On the contrary, if the input inverter of the single-input latch would be removed, \(\overline{\mathit{out}}\) would not be amplified through an inverter in the transparent phase. Since the inverters have a significantly higher contribution to leakage than the transmission gate switches, minimizing the number of inverters while ensuring regeneration of the signal levels minimizes leakage. Second, the full differential nature of the latch adds to the variation-resilience of the total design. This is due to the fact that chances are much lower that variations will compromise the correct interpretation of two complementary inputs than of a single input.

3.3.2 Flip-Flop

By cascading two level-sensitive latches, one sensitive on the high level of the clock and the other on the low level of the clock, an edge-triggered flip-flop is constructed. The first latch is then called the master and the second the slave. If flip-flops are used in the prototypes presented in this work, they all exhibit this master-slave configuration. In literature, they are also called registers, but throughout this text, the word ‘flip-flop’ will be used.

3.4 Sizing in Different Prototypes

To summarize, Table 3.3 provides the sizing of the basic building blocks which have been discussed in this chapter for the four different prototypes. These prototypes will be presented in Chaps. 5 and 6.

Table 3.3 Sizing of the basic building blocks in the four prototypes

3.5 Conclusion

This chapter explored the design of gate-level building blocks that can ensure robust operation in the ultra-low-voltage region. These basic building blocks will be used to build the prototypes of Chaps. 5 and 6. The critical factor which was decisive in the evaluation of the circuit topologies has been variation-resilience. As a result, preferred implementations for logic gates, inverters, latches and flip-flops have been achieved. The following chapter will make use of these building blocks when discussing the various architectural sub- and near-threshold trade-offs.