Wires and Clocks

Abbas, Karim

doi:10.1007/978-3-030-37195-1_13

Wires and Clocks

Karim Abbas²

Chapter
First Online: 15 January 2020

3750 Accesses
1 Citations

Abstract

The wires that connect electronic devices are assumed to be short circuits. This assumption is as common as it is devastatingly wrong. Wires are longer, narrower, and more densely packed than ever. Signals will not only be delayed in these wires but can also cross talk. And this is for signals, wires that carry supply, ground, and clock are even worse. All these special networks go to almost the entire circuit, facing an enormous capacitive load.

Download chapter PDF

13.1 Basics

1.
Model metal wires in layers
2.
Understand the parallel plate model for wire capacitance
3.
Examine options to reduce wire capacitance
4.
Write the equation for wire resistance
5.
Examine options to reduce wire resistance
6.
Understand the skin effect
7.
Derive a simplified model for skin effect.

Metal wires on an ASIC are deposited using PVD. They are then either dry etched or polished down using CMP (Sects. 7.5 and 7.7). Metal wires are also built in multiple layers, with some technologies offering more wire layers than others (Sect. 8.5). The cross section of a wafer with multiple deposited wire layers is shown in Fig. 13.1.

But the word “wire” simply means any long section of conductive material used to carry a signal. For example, consider the layout in Fig. 8.20 where polysilicon is used as a local wire inside a cell to connect the inputs of transistors. Also consider the layout in Fig. 12.11 where a long diffusion layer wire is used to carry ground while a long polysilicon line is used to carry the word-line.

As we will shortly see, metal wires are significantly better in terms of delay and power than wires made of silicon, but sometimes area considerations imposed by the layout force the designer to use silicon wires. Thus, we have to be able to characterize both metal and silicon wires.

We are concerned with wires because they impact the performance of gates. Wires add capacitance and potentially resistance that significantly change the delay of gates by changing their loading. They also increase the dynamic power consumption of circuits by increasing the switched capacitance as well as dissipating their own power in their resistance.

Figure 13.2 shows a very simple model for a wire running over the substrate. The wire has a length L, width W, and thickness tw. It is a distance (t) above the substrate and is separated from it by oxide, mostly a thick field oxide.

Since the substrate and wire are conductive and the oxide is insulating, there is a capacitance between the wire and the substrate. Notice that as the wire runs its length, it runs over other metal layers, polysilicon, and diffusion. Thus its capacitance is supposed to be divided between all said layers. However, for the majority of its trip, the wire runs over the substrate and the wells, and thus its capacitance can be reduced to be between the wire layer and the substrate/well.

The substrate and well both lie at signal ground. Thus the wire to substrate capacitance lies between the wire and ground. If the height of the wire is much smaller than both its length and width, i.e., t ≪ L, W, then the capacitance can be considered a parallel place capacitance where the upper plate is the wire and the lower plate is the substrate. The value of capacitance is

$$ C_{\rm wire} = \frac{\varepsilon A}{d} = \frac{{\varepsilon_{0} \varepsilon_{\rm rox} WL}}{t} $$

The height of the wire above the substrate, t, is dictated by the particular process and the metal layer. The relative permittivity of the oxide is a material property. The width of the wire has a lower bound dictated by the design rules (Sect. 8.5). The designer has control over W and L within the limits of DRC, the rest of the expression of the capacitance is dictated by technology. Thus we can define

$$ C_{\rm wire} = \frac{{\varepsilon_{0} \varepsilon_{\rm rox} }}{t}WL = C_{a} WL $$

where C_a is the wire capacitance per unit area. C_a is dictated by the technology for each metal layer. Note, however, that C_a is specific per metal layer. Because it is dependent on t, higher metal layers will have lower C_a. We can also define a capacitance per unit length C_l:

$$ C_{\rm wire} = \frac{{\varepsilon_{0} \varepsilon_{\rm rox} W}}{t}L = C_{l} L $$

This expression is useful when the designer always uses the minimum W dictated by DRC. In such a case, W also becomes “dictated by technology” and L becomes the only design parameter.

As we will see in the next section, wire capacitance increases the time-constant of a gate. Thus, it is in our interest to reduce wire capacitance. The equation suggests several ways to do this:

Increase t. This is dictated by technology. While it might seem like the height dimension is open and we can always distribute metal layers to higher heights, this is not strictly true. Putting metal layers higher means we also have to separate the metal layers from each other more. This means that when metal layers contact each other, the vias have to be much deeper. This increases the resistance of vias and makes them more prone to electromigration and other defects
Reduce W. DRC imposes a lower limit on W. But more importantly reducing W will increase resistance as we will shortly see
Reduce L. Length is dictated by the design. PAR tools (Sect. 8.7) make significant efforts to reduce length of interconnects, but closure cannot always be guaranteed
Reduce relative permittivity. This is a viable option although it imposes technical challenges.

Table 13.1 lists the relative permittivity of select materials. In the context of solid insulators, silicon dioxide has a reasonably low permittivity. However, we can do better. The best relative permittivity is achieved by air, but this would not give structural support for metal layers or protection for the die. Some polymers like polyethylene and polypropylene have better permittivity than silicon dioxide. However, because silicon dioxide can be grown or deposited precisely and easily in CMOS processes, the inertia to keep using oxide, especially native oxide, as an insulator is very strong.

Table 13.1 Select dielectrics and their relative permittivity

Full size table

Wires also have resistance. Looking back at Fig. 13.2, current flows through the front cross section of the wire. The cross-sectional area through which current flows is Wt_w, thus the resistance of the wire is

$$ R = \frac{L}{{\sigma Wt_{w} }} $$

The resistance of the wire has a devastating effect on the delay of gates, more so than capacitance. The following are factors that can help control resistance:

Reduce L. As with capacitance, this is dictated by the design and the effort of the PAR tools. Because L affects both C and R in the same way, the imperative to reduce L is even higher
Increase W. This creates more area for current flow. But it increases capacitance
Increase tw. To a first order, this helps reduce R without impacting C. However, we will see in Sect. 13.2 that this is not accurate due to inter-wire capacitance. But increasing tw is still a viable option for controlling wire resistance
Increase conductivity. Although some technical, chemical, and economic factors may limit this option, using materials with higher conductivity is a win-win proposition. This is the main reason most modern interconnects are made of copper instead of the historical aluminum, despite the fact that copper is much harder to pattern (Sect. 7.7)

Table 13.2 shows that even when conductivity is increased by changing the metal, the increase is limited to at most an order of magnitude. As we will see in Sect. 13.3, the problem of resistance is a lot more obvious for silicon wires where conductivity can be four orders of magnitude lower than metals.

Table 13.2 Selected metals and their conductivity. Most modern processes use copper. Legacy CMOS processes used aluminum

Full size table

By examining the resistance equation, only W and L are within the designer’s control. Thus, we can divide the resistance expression into a technology-controlled portion and a designer-controlled portion:

$$ R = \frac{L}{{\sigma Wt_{w} }} = \frac{1}{{\sigma t_{w} }}\frac{L}{W} = R_{s} \frac{L}{W} $$

The technology-defined portion of the equation R_s is called the square resistance or the sheet resistance. As the equation shows, it is the resistance of any square wire, i.e., a wire where W = L, regardless of the value of W.

Square resistance is sometimes a more useful parameter than conductivity because it also folds in the technology-specific parameter tw. Conductivity is material dependent while square resistance is dictated by both the material and the process. Note that square resistance will be different for different metal layers in the same process if the thickness of metal layers is different.

We will shortly see (Sect. 13.2) that the resistance of metal wires is very small. And is usually negligible next to their capacitance and to the resistance of silicon wires. However, in modern chips, even metal wires have resistance that has to be taken into consideration, the cause is a phenomenon called the skin effect.

The skin effect is a phenomenon where AC current fails to flow through the entire cross section of a wire. Instead, the current flows through an outer shell or skin of the wire as shown in Fig. 13.3.

The “skin effect” is highly dependent on frequency. The higher the frequency, the thinner the skin through which the current effectively flows. Characterizing this problem is a complicated electromagnetic problem.

The skin effect does not strictly mean that current only flows in the outer shell. Instead it means that the largest current density flows near the outer border of the wire, decreasing monotonically away from it and becoming minimum near the center of the wire. The current density can be characterized as

$$ J = J_{0} e^{{ - \left( {1 + j} \right)d/\delta }} $$

where Jo is the maximum current at the surface. d is the depth below the surface. δ is a parameter that indicates the depth at which the current drops to 1/e from its maximum value at the surface. Simplified models for skin effect consider current to flow uniformly at depths shallower than δ and no current flows at depth deeper than δ.

Thus δ, the skin depth, is a critical parameter. It can be estimated as

$$ \delta = \sqrt {\frac{1}{\pi f\mu \sigma }} \sqrt {\sqrt {1 + \left( {\frac{2\pi f\varepsilon }{\sigma }} \right)^{2} } + \frac{2\pi f\varepsilon }{\sigma }} $$

For typical values of permittivity, conductivity, and frequency, the entire second square root is typically near unity. Thus the expression for skin depth most often used in ASICs is approximately

$$ \delta = \sqrt {\frac{1}{\pi f\mu \sigma }} $$

According to the simplified skin effect model in Fig. 13.4, the skin effect has no impact for any frequencies where the skin depth is not low enough to cause the peripheral current carrying rectangles to not overlap. In other words if δ is not smaller than both tw/2 and W/2, then one of the two dimensions will have overlapping current carrying skins. And the entire wire will carry current, leading to no skin effect.

Thus, the condition to observe skin effect is

$$ \delta < \hbox{max} \left\{ {W,t_{w} } \right\}/2 $$

The skin depth is a function of metal permeability and conductivity. But more critically, it is a function of frequency. If we substitute into the inequality above for skin depth, we can translate it into a condition on frequency at which skin effect becomes visible:

$$ \begin{array}{*{20}l} {\sqrt {\frac{1}{\pi f\mu \sigma }} < \hbox{max} \left\{ {W,t_{w} } \right\}/2} \hfill \\ {\frac{1}{\pi f\mu \sigma } < \frac{{(\hbox{max} \left\{ {W,t_{w} } \right\})^{2} }}{4}} \hfill \\ {f > \frac{4}{{\pi \mu \sigma (\hbox{max} \left\{ {W,t_{w} } \right\})^{2} }}} \hfill \\ \end{array} $$

The danger of skin effect is that it significantly reduces the available cross section, thus causing effective resistance to increase. The resistance of a wire suffering from skin effect according to the model in Fig. 13.4 is

$$ \begin{aligned} R & = \frac{L}{{\sigma \left( {2W\delta + 2\left( {t_{w} - 2\delta } \right)\delta } \right)}} \\ R & = \frac{L}{{\sigma \left( {2W\delta + 2t_{w} \delta - 4\delta^{2} } \right)}} \\ \end{aligned} $$

13.2 Lumped C Wires

1.
Model a wire running over a substrate as a parallel plate capacitor
2.
Use the lumped capacitor model to estimate total delay in CMOS
3.
Understand that the skin effect reduces the validity of the lumped capacitor model
4.
Recognize inter-wire capacitance
5.
Understand why inter-wire capacitance increases in modern technology.

Metal wires are used to carry signals over long distances. The outputs of CMOS gates are provided through MOSFET drains, thus through the diffusion layer. The output is through a common PMOS and NMOS drain node. As shown in Fig. 8.20 in Sect. 8.3, the two drains have to be connected through a metal line. When we avoid metal wires, we do so to avoid the overhead of contacting the metal layer. However, as shown in Fig. 8.20, CMOS outputs are in the metal layer in any case, and thus, wires from the output of a CMOS gate to the input of another CMOS gate might as well be in metal layers.

Figure 13.5 shows a CMOS inverter feeding another inverter through an intermediate metal wire. The metal wire has limited resistance due to the high conductivity of the metal. Thus, we can consider the metal as a single, large, lumped capacitor that exists between the wire and ground (Sect. 13.1). The ground represents the substrate and well over which the metal line runs. Notice that the metal line through its path also overruns polysilicon, diffusion, and other metal layers. However, the overwhelming majority of its plate area runs over the substrate/well.

Figure 13.6 shows the delay model of the network in Fig. 13.5 with the lumped capacitance of the metal wire added. The model is not very different from that where wires are considered ideal short circuits (Chap. 3). The effect of the metal wire is to increase the loading capacitance on the first inverter. Thus the time-constant at out1 for an ideal (short circuit) wire versus a lumped capacitance wire is

$$ \begin{array}{*{20}c} {\tau_{\rm ideal} = R_{n1} \left( {C_{d1} + C_{g2} } \right)} \\ {\tau_{\rm lumped} = R_{n1} \left( {C_{d1} + C_{g2} + C_{\rm wire} } \right)} \\ \end{array} $$

The wire capacitance C_wire can be calculated as in Sect. 13.1. The time-constants above are for high to low transitions at out1, but the only difference in low to high transitions would be to replace R_n1 with R_p1. The metal wire obviously increases delay by increasing the time-constant through increasing the capacitive load on the gate.

The lumped capacitance model is simple and useful. It allows us to make quick estimates of the impact of wires on delay and power. However, it has some limitations we must be aware of, primary among them: inter-wire capacitance and skin effect.

As discussed in Sect. 13.1, the skin effect reduces the available area for current flow in a wire. At very high frequencies, this can significantly increase the resistance of metal wires, eventually making the lumped capacitance model misleading. In such cases, metal wires have to be modeled similarly to silicon wires (Sect. 13.3).

In Sect. 13.4, we will find that one way to address increasing wire delays in modern technologies is to relatively increase the thickness of wires t_w. This leads to the second limitation in the parallel plate capacitance model. This is because increasing t_w leads to more inter-wire capacitance.

Figure 13.7 shows adjacent wires in a CMOS technology with a large channel length. The wires are much wider than they are thick, thus W ≫ t_w. Individual wires are also very widely separated. As shown, there are two types of capacitances from every wire. The first is the wire to substrate capacitance which we have been discussing so far. The second is inter-wire capacitance. This capacitance exists because of two wires. This is because any two wires in the same metal layer have metal faces separated by the insulating oxide. The inter-wire plates have area L∙tw, and thus the inter-wire capacitance has the value:

$$ C_{\rm interwire} = \frac{{\varepsilon_{\rm ox} t_{w} L}}{{L_{s} }} $$

where L_s is the separation between the two wires. This inter-wire capacitance is small and negligible relative to the wire to substrate capacitance for two reasons: t_w is very small, especially relative to W and L_s is very large relative to t_w. Thus, traditionally we would ignore inter-wire capacitance and use a single lumped wire to substrate capacitance.

Figure 13.8 shows the situation in a more modern process. The same equation still applies to calculating inter-wire capacitance. However, the value of this capacitance is much more important due to two factors: t_w is increased or at least not scaled as fast as W for reasons that will become clear in Sect. 13.4. Also, Lsep is reduced as modern technologies scale down dimensions, pack wires closer to support increased functionality, and use more permissive design rules.

Wire to substrate capacitance increases delay by increasing the capacitive load on gate outputs. Inter-wire capacitance has a more profound impact: it allows wires to couple to each other. Thus, it allows a transition on a wire to be, at least partially, transferred to another wire. Separation DRC rules usually stipulate that higher metal layers be more widely spaced than lower layers. This is because higher metal layers are also thicker, and thus should be more widely spaced to prevent them from coupling.

13.3 Silicon Wires

1.
Recognize cases where silicon wires have to be used
2.
Understand where diffusion capacitance comes from
3.
Use the Elmore delay method to calculate delay for a distributed RC wire
4.
Use the Elmore method to calculate delay in a loaded and driven wire.

Silicon wires should only be used for very short wires. However, as shown repeatedly in Chap. 12, sometimes we do use silicon over considerable distances to save on area. The main problem with silicon wires is their resistance. The sheet resistance of polysilicon and silicon is 2–3 orders of magnitude higher than commonly used ASIC metals. In this section, we will develop a model to deal with wires where both resistance and capacitance are significant. This can also be applied to high-frequency metal lines where skin effect has a significant impact.

The resistance and capacitance of polysilicon lines can be modeled similarly to metal lines in Sect. 13.2. The resistance of diffusion wires can also be modeled similarly. However, its capacitance to substrate needs a little more consideration.

Figure 13.9 shows a diffusion wire running through the substrate. The capacitance here is a diffusion capacitance of a reverse-biased PN junction. As discussed in Sect. 1.9, this capacitance is nonlinear and difficult to characterize. However, we can still consider it proportional to the length and width of the wire.

In wires where there is resistance and capacitance, it is very challenging to develop a model to use in circuits. At first glance, it might look like we can calculate the wire capacitance and wire resistance from Sect. 13.1 and then use these values to modify the time-constant of the circuit. However, the resistance and capacitance of the wire are intertwined and cannot be lumped at a single location.

Would the capacitance be lumped at the beginning of the wire or at its end? Or should it be divided between the beginning and the end? The reality is that both the capacitance and the resistance are fully distributed throughout the wire. At every location there is capacitance to substrate and resistance through the cross section, Fig. 13.10.

The wire resistance, wherever we decide to model it, will exist serially through the wire. This will create an RC ladder structure where the time-constant cannot be calculated in a straightforward manner as in Chap. 3. To overcome this, we introduce the Elmore time-constant, a method to approximate the equivalent time-constant of a network which we cannot reduce to a single capacitance and resistance.

The Elmore time-constant is fairly easy to calculate for tree networks where the feedforward is purely resistive, and there is no feedback. Figure 13.11 shows an example of such a network. There are four capacitive nodes separated by resistances. To find the time-constant at node “out” in response to a transition at node “in”, we calculate the Elmore time-constant.

The Elmore time-constant is found by calculating a partial time-constant for each capacitance and then adding these partial time-constants. The partial time-constants are calculated by multiplying the capacitance of the node by the value of resistance from the input node to the capacitance node, as long as that resistance is also part of the resistance from the input to the output.

Thus, the four partial time-constants in Fig. 13.11 are

$$ \begin{array}{*{20}c} {\tau_{1} = C_{1} R_{1} } \\ {\tau_{2} = C_{2} R_{1} } \\ {\tau_{3} = C_{3} \left( {R_{1} + R_{3} } \right)} \\ {\tau_{4} = C_{4} \left( {R_{1} + R_{3} + R_{4} } \right)} \\ \end{array} $$

Notice that the resistance multiplied by C₂ is only R₁ since R₂ is not also part of the input to output resistive path. The time-constant at out is thus

$$ \tau = R_{1} \left( {C_{1} + C_{2} + C_{3} + C_{4} } \right) + R_{3} \left( {C_{3} + C_{4} } \right) + R_{4} C_{4} $$

This expression shows another way to calculate the Elmore time-constant: each resistance should be multiplied by those capacitances that see the resistance in the path from input to output. For example, R₃ is only present in the path from input to C₃ and C₄, and is thus multiplied only by these two capacitances.

Figure 13.12 shows a resistive wire divided into RC sections. Because the resistance and the capacitance are distributed throughout the wire, the model is more accurate the more sections it is divided into. The model is most accurate when the number of sections is infinite.

Now assume the total wire resistance is R_wire and it is divided into N sections, then the resistance of a section is R_w = R_wire/N. Similarly if C_wire is the total wire capacitance then C_w = C_wire/N.

Calculating the Elmore time-constant at the output of the N sections in Fig. 13.12:

$$ \begin{array}{*{20}c} {\tau = R_{w} C_{w} + 2R_{w} C_{w} + 3R_{w} C_{w} + \ldots + NR_{w} C_{w} } \\ {\tau = R_{w} C_{w} \left( {1 + 2 + 3 + \ldots + N} \right)} \\ \end{array} $$

The bracket contains an arithmetic series with a base and common difference of 1:

$$ \tau = R_{w} C_{w} .\frac{{N\left( {N + 1} \right)}}{2} $$

As N tends to infinity:

$$ \tau = R_{w} C_{w} .\frac{N.N}{2} = \frac{{R_{w} N.C_{w} N}}{2} = \frac{{R_{\rm wire} C_{\rm wire} }}{2} $$

This result suggests that if the wire has distributed resistance and capacitance, then it can be represented by a lumped resistance and lumped capacitance just as long as one of the two has its value reduced by half. However, this is misleading. This result is only valid when the wire is driven through null impedance and is driving null load.

To derive a more realistic result, consider Fig. 13.5, but assume that the wire connecting the two inverters is also resistive. In this case, the delay model is as seen in Fig. 13.13. Rdrv is the drive resistance, which is Rn1 for high to low delay and Rp1 for low to high delay. C_d1 is the summation of self-loading of the first inverter, and exists before the wire. C_g2 is the summation of gate capacitance from the second inverter and exists at the end of the wire.

To find the time-constant at the input of the second inverter, we use the Elmore method:

$$ \begin{aligned} \hfill \\ \begin{array}{*{20}c} {\tau = R_{\rm drv} C_{d1} + \left( {R_{\rm drv} + R_{w} } \right)C_{w} + \left( {R_{\rm drv} + 2R_{w} } \right)C_{w} + .. + \left( {R_{\rm drv} + NR_{w} } \right)C_{w} + \left( {R_{\rm drv} + NR_{w} } \right)C_{g2} } \\ {\tau = R_{\rm drv} C_{d1} + R_{\rm drv} NC_{w} + R_{w} C_{w} \left( {1 + 2 + \ldots + N} \right) + R_{\rm drv} C_{g2} + NR_{w} C_{g2} } \\ {\tau = R_{\rm drv} C_{d1} + R_{\rm drv} C_{\rm wire} + \frac{{R_{\rm wire} C_{\rm wire} }}{2} + R_{\rm drv} C_{g2} + R_{\rm wire} C_{g2} } \\ \end{array} \hfill \\ \end{aligned} $$

We can confirm that for R_wire = 0, the result reduces to that in Sect. 13.2 for the lumped C model.

13.4 Scaling Wires

1.
Derive the scaling behavior of wires
2.
Understand the distinction between local and global wires
3.
Recognize that resistance is the main culprit in the dismal behavior of wire delay scaling
4.
Realize that wire thickness is an open dimension for preserving wire conductance
5.
Distinguish wire thickness for lower local wires and higher global wires.

How significant is wire delay relative to gate delay? Historically wire delay was very small. It was considered negligible relative to gates. The reason is gates were large and their capacitances dominated the time-constants of circuits. In modern circuits, wires scale much worse than gates. This has led to a pattern where gate delay consistently decreased with technology while wire delays remained constant or got worse. This trend has caused wire delays to be as important as gate delays, if not more so. This puts a strain on PAR tools and favors designs with less routing congestion.

Consider a gate-to-gate time-constant where the wire is negligible. The time-constant consists of a transistor channel resistance and a capacitance consisting of a gate and a drain component (Sect. 3.5). If the technology advances so that all dimensions are scaled down by K, this is usually accompanied by a much smaller drop in supply voltage, which we will give the symbol U. Thus

$$ \begin{array}{*{20}c} {W,L,t_{\rm ox} \to \frac{W}{K},\frac{L}{K},\frac{{t_{\rm ox} }}{K}} \\ {V_{\rm th} ,V_{\rm DD} \to \frac{{V_{\rm th} }}{U},\frac{{V_{\rm DD} }}{U}} \\ \end{array} $$

Voltage has to scale down slower than dimensions to preserve noise margins. Notice that historically, according to Moore’s law, K has been around 2 every two technology nodes. If voltages had kept up with this scale, we would be using supplies significantly below the noise floor.

Capacitance per unit area has the following scaling behavior:

$$ C_{\rm ox} = \frac{\varepsilon }{{t_{\rm ox} }} \to \frac{K\varepsilon }{{t_{\rm ox} }} = KC_{\rm ox} $$

And thus gate capacitance has the scaling behavior:

$$ C_{\rm gate} = C_{\rm ox} WL \to KC_{\rm ox} .\frac{W}{K}.\frac{L}{K} = \frac{{C_{\rm gate} }}{K} $$

The current flowing through a velocity saturated device has the following scaling behavior:

$$ I = WC_{\rm ox} v_{\rm sat} \left( {V_{\rm gs} - V_{\rm th} - V_{\rm dssat} /2} \right) $$

$$ I \to \frac{W}{K}.KC_{\rm ox} v_{\rm sat} .\frac{{V_{\rm gs} - V_{\rm th} - \frac{{V_{\rm dssat} }}{2}}}{U} = \frac{I}{U} $$

And thus the channel resistance of a MOSFET does not scale

$$ R = \frac{V}{I} \to \frac{V}{U}.\frac{U}{I} = R $$

The scaling behavior of gate time-constant is

$$ \tau = RC_{\rm gate} \to \frac{{RC_{\rm gate} }}{K} = \frac{\tau }{K} $$

which means that the delay of a velocity saturated device scales down at the same rate as dimensions.

If we try to repeat the same analysis for wires, we hit an obstacle. The dimensions of wires according to Fig. 13.2 are W, L, and tw. Its separation from the substrate is t. We can assume that W, t, and tw all scaled down by K, the same ratio that shrinks gate dimensions.

However, the length of the wire does not scale down by any predictable pattern. The length of wires depends on placement and routing, the size of the die, and the complexity of the design. With every technology node, the dimensions of the device decrease, but typically, the size of the die also increases. The ratio by which the die size increases is usually smaller than an independent from the ratio of device dimension scaling.

We distinguish two extreme cases for wires: long range wires, and short range wires. Short range wires are used to connect devices in the same gate. This could be, for example, the metal section used to connect PMOS and NMOS drains in a CMOS inverter. Long range wires carry the signal across large distances, usually from one side of the die to another. They normally connect modules and subsystems.

The length of short range wires can be assumed to scale the same way as device dimensions as well as W, tw, and t. Long range wire length actually increases, thus we assume its scaling behavior is

$$ L \to LK_{w} $$

where K_w is the ratio by which the die dimension rises.

For short range wires, the capacitance scaling is

$$ C_{\rm wire} = \frac{\varepsilon WL}{t} \to \varepsilon .\frac{W}{K}.\frac{L}{K}.\frac{K}{t} = \frac{{C_{\rm wire} }}{K} $$

Resistance scaling is

$$ R_{\rm wire} = \frac{L}{{\sigma Wt_{w} }} \to \frac{L}{K}.\frac{K}{W}.\frac{K}{{t_{w} }}.\frac{1}{\sigma } = KR_{\rm wire} $$

The wire time-constant does not scale

$$ \tau = R_{\rm wire} C_{\rm wire} \to KR_{\rm wire} .\frac{{C_{\rm wire} }}{K} = R_{\rm wire} C_{\rm wire} $$

For long range wires, capacitance scaling is

$$ C_{\rm wire} = \frac{\varepsilon WL}{t} \to \varepsilon .\frac{W}{K}.LK_{w} .\frac{K}{t} = \frac{{C_{\rm wire} K_{w} }}{K} $$

And resistance scaling is

$$ R_{\rm wire} = \frac{L}{{\sigma Wt_{w} }} \to LK_{w} .\frac{K}{W}.\frac{K}{{t_{w} }}.\frac{1}{\sigma } = K_{w} K^{2} R_{\rm wire} $$

Leading to time-constant scaling of

$$ \tau = R_{\rm wire} C_{\rm wire} \to K_{w} K^{2} R_{\rm wire} .\frac{{C_{\rm wire} K_{w} }}{K} = KK_{w}^{2} R_{\rm wire} C_{\rm wire} $$

Thus wire delay at best does not scale at all, and at worst it increases linearly with technology and quadratically with the dimension of the die. Compared to the scaling of gate delays, both kinds of wires have their relative delay increase with technology. But long range wires are of particular concern.

Modern processes use higher conductivity metals and lower permittivity insulators to decrease resistance and capacitance, respectively. However, both options are limited by available materials and are barely enough to keep up with the effects of technology scaling on wires.

If we examine the scaling behavior of wires, we find that resistance is the primary culprit in the dismal behavior of the time-constant. Resistance always increases, and for long range wires it increases quadratically with the technology scaling parameter.

Digging deeper, we find that the problem is that the available area for current flow decreases quadratically because both W and tw decrease by a factor K. W has to scale with technology because wires have to get narrower to allow the same number of wires to fit between gates. In fact, if we use scalable design rules (Sect. 8.5), we find that the minimum width of wires is always proportional to the minimum dimensions of transistors.

However, there is no primary limitation on tw. This wire thickness dimension can be scaled independently from W and t, Fig. 13.14. The height of the wire affects the thickness of the oxide over the substrate, a dimension we normally are not concerned with.

If we assume the wire thickness scales by a factor K_t < K, then the time-constant of a long range wire scales by

$$ \begin{array}{*{20}c} {C_{\rm wire} = \frac{\varepsilon WL}{t} \to \varepsilon .\frac{W}{K}.LK_{w} .\frac{K}{t} = \frac{{C_{\rm wire} K_{w} }}{K}} \\ {R_{\rm wire} = \frac{L}{{\sigma Wt_{w} }} \to LK_{w} .\frac{K}{W}.\frac{{K_{t} }}{{t_{w} }}.\frac{1}{\sigma } = K_{w} K_{t} KR_{\rm wire} } \\ {\tau = R_{\rm wire} C_{\rm wire} \to K_{w} K_{t} KR_{\rm wire} .\frac{{C_{\rm wire} K_{w} }}{K} = K_{t} K_{w}^{2} R_{\rm wire} C_{\rm wire} } \\ \end{array} $$

And for short range wires:

$$ \begin{array}{*{20}c} {C_{\rm wire} = \frac{\varepsilon WL}{t} \to \varepsilon .\frac{W}{K}.\frac{L}{K}.\frac{K}{t} = \frac{{C_{\rm wire} }}{K}} \\ {R_{\rm wire} = \frac{L}{{\sigma Wt_{w} }} \to \frac{L}{K}.\frac{K}{W}.\frac{{K_{t} }}{{t_{w} }}.\frac{1}{\sigma } = K_{t} R_{\rm wire} } \\ {\tau = R_{\rm wire} C_{\rm wire} \to K_{t} R_{\rm wire} .\frac{{C_{\rm wire} }}{K} = \frac{{K_{t} }}{K}.R_{\rm wire} C_{\rm wire} } \\ \end{array} $$

If we choose the extreme case of not scaling down wire thickness at all, i.e., K_t = 1, then long range wire delay becomes dependent only on die dimension. Short range wire delay would scale down similar to gate delay.

However, keeping tw unscaled or scaling it slower than other dimensions, combined with the scaling down of W increases the inter-wire capacitance. This is because the distance between wires drops while the lateral area of the wire plates facing each other increases.

Using different metal layers judiciously allows us to utilize the advantages of unscaled tw without suffering from inter-wire capacitance. In most CMOS technologies higher metal layers have larger tw while lower metals tend to be thinner. Long range wires tend to be much fewer than short range wires.

Thus lower, thinner metal layers are used to perform local short range connections. Higher metal layers are used to make the far fewer long range connections. These few wires can then be kept far apart from each other to control inter-wire capacitance. This is illustrated in Fig. 13.15.

13.5 Interchip Communication

1.
Understand the requirements of interchip communication
2.
Understand components of the pad circuitry
3.
Realize the need for ESD protection
4.
Design a circuit for level conversion
5.
Recognize when wires need to be modeled as transmission lines
6.
Compare source and load termination scenarios in terms of signal settling behavior.

All wires on a chip eventually end up at or come from output or input pins. At the output pin, the signal travels over a PCB track to another chip’s input pin. This setup is shown in Fig. 13.16. The nature of signals on chips and on PCB tracks is very different. PCBs are unprotected, and signals in PCB tracks have to travel huge distances compared to on-chip signals. Thus, signals on the PCB tend to be larger in order to better resist noise and interference. PCB tracks are also very wide to reduce resistive drops over the large distances.

The wide, long PCB tracks offer considerable capacitive loads to chips. The output pins have to drive this load while transforming the signal into levels more suitable for off-chip communication. Thus pins contain complicated circuitry to manage the transition to and from the outside world.

Figure 13.17 shows a pin pad. The pad is a very large metal square on the periphery or top of the circuit depending on package type (Sect. 14.6). The pad protrudes through the overglass and is connected to the pin by metal wires during packaging, allowing communication to the outside world. As shown in Fig. 13.17, interface circuitry is attached to the pad, interceding between it and the core of the die.

Pads are surrounded by one or more sets of guard rings. A guard ring is a closed loop of semiconductor material of the opposite type to the surrounding environment. So, if the pad exists in the p substrate, we use n-type rings. If the pad exists in an N-well or substrate, we use p-type rings. Rings are biased to create reverse-biased junctions to the environment. The guard rings provide isolation for the pad from the rest of the circuit. Preventing the large signals in the pads from interfering with small signals in the core and pad interface, and preventing the high frequency noise of the core from affecting the pad interface.

The pin, pad, and interface have to support the following functions:

Drive large off-chip capacitance for output pins
Provide electrostatic discharge protection for input pins
Provide level conversion for both types of pins
Protect the pad from latch-up

Drive off-chip capacitance

This problem stems from the fact that PCB tracks provide huge capacitive loads due to their length and width while die core driver gates are significantly smaller. Thus the problem is how to optimally drive a large capacitance starting from a small gate.

This problem is discussed in detail in Sect. 4.2. The best way to drive such a load is through a sequence of progressively larger inverters. The optimal sizing of each inverter and the optimal number of stages are derived systematically in Sect. 4.2 and can be easily obtained using logical effort.

If we follow the equal fan-out sizing in Sect. 4.2, then the inverters in later stages of the buffer will be extremely huge. These inverters will contain transistors with extremely large W, capable of driving the large off-chip capacitance.

Figure 13.18 shows how extreme-sized transistors can be laid out. The gate of the transistor consists of parallel fingers; this creates a number of transistors in series. Metal lines are used to short sources and drains of these component transistors. This causes them to become parallel, creating a single equivalent, extremely wide transistor.

Practical pin driver transistors will consist of several of the wide transistors in Fig. 13.18 connected again in parallel, thus adding two levels of parallelism and creating an even wider transistor.

One curious aspect of the layout of output buffers is the preponderance of substrate/well contacts. While a common guideline is to add one contact per transistor, for output pads, we add as many contacts as the area allows. The reason is detailed later under “latch-up”.

ESD protection

Pins are exposed to the outside world. Static charges build up in humans, printed circuit boards, and probes that may come in contact with the chip pins. Because such bodies (especially humans) are significantly larger than chips, the amount of charge they carry can be enormous.

If this charge is transferred to an input pin, it builds up on the polysilicon gates of the first MOSFETs after the input pad. The large charges cause a huge voltage to form on the small MOSFET gate capacitance. Static charges can be at kilovolt potential on human bodies; when transferred to MOSFETs, it can cause even larger voltages.

Modern MOSFET oxides tend to be very thin to manage subthreshold conduction (Sect. 10.4). Thus, their breakdown voltage is very low. Voltages as low as a few dozen volts are usually enough to destroy a MOSFET irreparably.

Thus, if left unprotected and in the absence of perfect grounding, all input MOSFETs would certainly be destroyed. Thus all input pad interfaces must contain electrostatic discharge (ESD) protection circuitry.

The ESD protection circuit is shown in Fig. 13.19. The CMOS inverter is not part of the protection circuit, but rather the first CMOS gate in the die. The two MOSFET gates are the ones we need to protect from charge buildup. The protection is provided by the resistor R and the two diodes D1 and D2.

If enough charge builds up to raise the voltage of the anode of D1 above a certain limit (defined as Vx), then D1 turns on. This causes current to flow through D1 from its anode. This current carries away all the excess charge that would have built up on the gates. The voltage Vx should be adjusted to allow enough voltage to build up to turn on the MOSFET gate during normal operation, but turn on the diode before enough charge builds up to break the oxide.

If the built up static charge is negative, then the voltage of the cathode of D2 would drop. D2 turns on before enough negative charge builds up to break the MOSFET gates. When D2 turns on, it allows this excess charge to leak to V_ss.

The resistor R plays an important role. It limits the current that flows when the diodes turn on. Without R, D1 and D2 could temporarily cause enormous current to flow due to their low breakdown resistance.

The resistance R should be large enough that current caused by typical ESD build up is limited. But R should not be too high because it causes extra power dissipation. Any current that flows through R causes power dissipation. This includes not only ESD current but also switching current during normal operation. Because input pins can potentially carry very large current, this can be a significant concern.

The resistance R is implemented as a passive semiconductor resistor usually using the polysilicon layer or the diffusion layer, whichever has lower conductivity. To increase the resistance, the wire is kept as thin as the DRC allows, and a serpentine layout is used to increase the effective length, as shown in Fig. 13.20.

Level conversion

Off-chip signals have to be large to combat the large noise and coupling in such an environment. On-chip signals have to be much smaller to manage power and dielectric breakdown. Interface circuits must transform the level of signals between the two domains. One such level-conversion circuit is shown in Fig. 13.21.

Inputs at point A are between ground and a first supply reference level V_dd1. Outputs at point B use ground for “0” and a second supply level V_dd2 for ‘1’. The static CMOS inverter between A and B uses the first reference level V_dd1 while the sources of the PMOS transistors M3 and M4 are connected to the second supply V_dd2.

When input A is 0 V, M1 is off. The inverter produces an output V_dd1, which turns M2 on, causing output Y to be 0 V. When input A is V_dd1, the inverter outputs 0 V at B. M2 is off. M1 is on, passing 0 V to X. X causes M4 to turn on, passing V_dd2 to Y.

Thus 0 V passes as 0 V while V_dd1 is transferred to V_dd2. This setup also allows us to transform the signal ground, moving inputs at V_ss1 to V_ss2. This would happen if the sources of M1 and M2 are connected to V_ss2 instead of the common ground of the inverter.

Latch-up protection

Latch-up is a concern in all CMOS circuits. It is discussed in detail in Sect. 7.7. In input/output pins, latch-up is even more disturbing than in the core. The reason is that pins handle the largest current swings in the circuit. Thus the pad interface circuits can see the largest ground and supply drops, making them particularly susceptible to latch-up.

One way in which latch-up is prevented is by adding well and substrate contacts. These contacts reduce the resistance on the path to supply and ground, reducing the possibility of positive feedback in the latch. Pad circuitry uses as many well/substrate contacts as area allows, in opposition to normal CMOS circuitry which uses only one per transistor.

Inductive effects

So far we have only considered the impact of capacitance and resistance on wire delay. Wires also have inductance. On-chip, the short length of wires and their straight paths reduces the inductance. Inductance appears where wires become larger, particularly in pin pads. But in cutting-edge circuits inductance can have an impact even on in-chip interconnects, especially long-range wires.

Off-chip, PCB tracks have substantial inductance, and their behavior cannot be fully predicted just from capacitance and resistance. The danger of inductance is not mainly in its impact on delay, but in the way it changes signal behavior. One such way is Ldi/dt noise which is explored in Sect. 13.6.

But the main way that inductance impacts wires is that it forces us to treat them as transmission lines. Strictly speaking, all wires are transmission lines. But under certain conditions, we can treat them as RC lines.

Figure 13.22 shows a very simplified view of a transmission line (TL) wire. It is treated as a black box. Signals entering the T_L on one end travel through as an electromagnetic wave. The velocity of said wave can be calculated from the capacitance and inductance per unit length, l and c as

$$ v = \frac{1}{{\sqrt {lc} }} $$

This velocity can also be related to the permittivity and the permeability of the medium and the wire as

$$ v = \frac{{c_{0} }}{{\sqrt {\mu_{r} \varepsilon_{r} } }} $$

where C_o is the speed of light through vacuum. Thus for materials with low inductance, the velocity of the signal through the wire is very high. This makes the flight time of the signal extremely fast, allowing us to treat wires as RC structures.

If, however, the material has high inductance, or the signal is of very high frequency, then the limited velocity of the signal becomes a factor in calculating how long it takes the signal to get from one point to another. When should we ignore inductance, and when should it be taken into consideration?

In general, if the frequency of the signal is much smaller than the reciprocal of the time of flight of the signal, we can reduce the wire to RC. Thus, there are several factors that affect the decision: the frequency of the signal, the length of the wire, and the velocity through the medium.

In a wire of length Lw, the time of flight T_f is

$$ T_{f} = \frac{{L_{w} }}{v} = \frac{{L_{w} \sqrt {\mu_{r} \varepsilon_{r} } }}{{c_{0} }} = L_{w} \sqrt {lc} = \sqrt {LC} $$

where L and C are the lumped inductance and capacitance of the length of the wire, respectively. Thus transmission line effects can be ignored only if

$$ t_{\rm pd} \gg \sqrt {LC} $$

where tpd is the typical delay in the circuit. Consider how these numbers translate in chip-to-chip communication. Assume two chips communicate using PCB tracks over a length of 20 cm. If the relative permittivity of the PCB insulator is 4 and the relative permeability of copper is 1, what is the range of chip frequencies for which transmission line effects must be taken into consideration?

The velocity in the material is

$$ v = \frac{{c_{0} }}{{\sqrt {\varepsilon_{r} } }} = 3 \times\frac{{10^{8} }}{2} = 1.5\times10^{8} {\rm m/s} $$

The time of flight over the PCB track is

$$ T_{f} = \frac{0.2}{{1.5\times10^{8} }} = 1.33\times10^{ - 9} {\rm s} = 1.33\,{\rm ns} $$

Thus if gate frequency is not much smaller than 750 MHz, transmission line effects must be taken into consideration.

Transmission lines have something called the characteristic impedance. This is a property of the geometry of the wire and the surrounding insulation. There are closed forms for the characteristic impedances of different wire geometries. However, most transmission lines are designed to have characteristic impedance of 50 ohm.

Characteristic impedance is important because it defines reflection coefficients at the termination of a transmission line. In general, if the resistance at one end of a T_L is Z_L, then the reflection coefficient at that end is

$$ \Gamma = \frac{{Z_{L} - Z_{0} }}{{Z_{L} + Z_{0} }} $$

The reflection factor defines a new phenomenon: even if the signal manages to reach the end of the wire, part of it might still reflect back through the wire. Why is this dangerous? Reflections can cause the signal to take much longer than the time of flight to settle at the receiver, thus increasing the effective delay.

Figure 13.23 shows a transmission line connecting two chips. The originating chip is called a transmitter, the load chip is called the receiver. The load at the transmitter is called the source resistance. To understand the impact of reflections, we must consider specific termination cases. Both the receiver load resistance as well as the transmitter source resistance are important, because each defines the reflection factor at either side. Thus the reflection coefficients at transmitter (tx) and receiver (rx) are

$$ \begin{aligned}\Gamma _{\rm tx} & = \frac{{Z_{s} - Z_{0} }}{{Z_{s} + Z_{0} }} \\\Gamma _{\rm rx} & = \frac{{Z_{L} - Z_{0} }}{{Z_{L} + Z_{0} }} \\ \end{aligned} $$

At the load end of the transmission line, the signal is the summation of the incident and the reflected wave (constructive interference), thus

$$ V_{\rm load} = V_{\rm incident} +\Gamma _{\rm rx} V_{\rm incident} = \left( {1 +\Gamma _{\rm rx} } \right)V_{\rm incident} $$

At the source, the initial wave is a voltage divider between the T_L characteristic impedance and the source resistance:

$$ V_{\rm initial} = V_{s} .\frac{{Z_{0} }}{{Z_{s} + Z_{0} }} $$

Afterward, the signal at the source must also include reflections:

$$ V_{\rm source} = \left( {1 +\Gamma _{\rm tx} } \right)V_{\rm incident} $$

There are many scenarios regarding what happens to the signal based on the values of source and load resistance. We are discussing this in the context of CMOS chips. Thus, the “receiver” will have CMOS gates at its input. The input to CMOS gates is purely capacitive. Thus, for all practical scenarios we will consider load resistance to be infinite, variation only comes from the value of source resistance.

Case 1: Z_s = Z_o and Z_L = infinite:

In this case, the reflection factor at the load is 1 and at the source is 0.

Figure 13.24 shows how the waveform progresses at the source and the load for a matched source termination. Initially a signal is injected from the source into the transmission line according to the voltage divider between the source and the transmission line:

$$ V_{\rm source} \left( 0 \right) = V_{s} .\frac{{Z_{0} }}{{Z_{0} + Z_{0} }} = \frac{{V_{s} }}{2} $$

This signal reaches the load after one time of flight (T_f); the total signal at the load is the summation of the incident and reflected waves:

$$ V_{\rm load} \left( {{\text{T}}_{f} } \right) = \left( {1 +\Gamma _{\rm rx} } \right)V_{\rm incident} = \frac{{\left( {1 + 1} \right)V_{s} }}{2} = V_{s} $$

The reflected component is Vs/2, this reaches the source at 2T_f, for a total signal at the source of

$$ V_{\rm source} \left( {2{\text{T}}_{f} } \right) = V_{\rm source} \left( 0 \right) + V_{\rm incident} = \frac{{V_{s} }}{2} + \frac{{V_{s} }}{2} = V_{s} $$

Since the reflection coefficient at the source is zero, there is no reflection at the source and the signal settles on both ends at Vs. This shows that if the load is capacitive, then the best thing we can do is match source impedance to the transmission line. This kills reflection at the source and allows the signal to settle after a single time of flight.

Case 2: Z_s = 3Z_o and Z_L = infinite:

This also applies to all cases where the source load is larger than the characteristic impedance. The source reflection coefficient is 1/2, the load coefficient is still 1. Initially the injected signal at the source is

$$ V_{\rm source} \left( 0 \right) = V_{s} .\frac{{Z_{0} }}{{3Z_{0} + Z_{0} }} = \frac{{V_{s} }}{4} $$

After a single time of flight, the signal at the load is

$$ V_{\rm load} \left( {{\text{T}}_{f} } \right) = \left( {1 +\Gamma _{\rm rx} } \right)V_{\rm incident} = \frac{{\left( {1 + 1} \right)V_{s} }}{4} = \frac{{V_{s} }}{2} $$

The reflected component reaches the source after another time of flight:

$$ V_{\rm source} \left( {2{\text{T}}_{f} } \right) = V_{\rm source} \left( 0 \right) + V_{\rm incident} \left( {1 +\Gamma _{\rm tx} } \right) = \frac{{V_{s} }}{4} + \frac{3}{2}.\frac{{V_{s} }}{4} = 0.625V_{s} $$

At 3T_f, the signal at the load becomes

$$ V_{\rm load} \left( {3{\text{T}}_{f} } \right) = V_{\rm load} \left( {{\text{T}}_{f} } \right) + \left( {1 +\Gamma _{\rm rx} } \right)V_{\rm incident} = 0.5V_{s} + \left( {1 + 1} \right)*0.125V_{s} = 0.75V_{s} $$

The pattern expressed by the equations and Fig. 13.25 shows the signal approaching Vs asymptotically at both the source and the load. Evidently the “delay” for the signal to reach the load is much higher than a single time of flight, taking many multiples to settle. The speed of settling or the time-constant is dominated by the source termination rather than by the resistance and capacitance of the line.

Case 3: Z_s = Z_o/3 and Z_L = infinite:

This also applies in all cases where the source resistance is lower than the line impedance. The source reflection coefficient is –1/2 and the load reflection coefficient is 1. The initial injected signal at the source is

$$ V_{\rm source} \left( 0 \right) = V_{s} .\frac{{Z_{0} }}{{Z_{0} /3 + Z_{0} }} = \frac{{3V_{s} }}{4} $$

After a single time of flight, the signal at the load is

$$ V_{\rm load} \left( {{\text{T}}_{f} } \right) = \left( {1 +\Gamma _{\rm rx} } \right)V_{\rm incident} = \frac{{\left( {1 + 1} \right)3V_{s} }}{4} = 1.5V_{s} $$

The reflected component reaches the source after another time of flight:

$$ V_{\rm source} \left( {2{\text{T}}_{f} } \right) = V_{\rm source} \left( 0 \right) + V_{\rm incident} \left( {1 +\Gamma _{\rm tx} } \right) = 0.75V_{s} + \left( {1 - 0.5} \right)*0.75V_{s} = 1.125V_{s} $$

The reflected component from the source is

$$ - 0.375V_{s} $$

At 3T_f, the signal at the load becomes

$$ V_{\rm load} \left( {3{\text{T}}_{f} } \right) = V_{\rm load} \left( {{\text{T}}_{f} } \right) + \left( {1 +\Gamma _{\rm rx} } \right)V_{\rm incident} = 1.5V_{s} + \left( {1 + 1} \right)*\left( { - 0.375} \right)V_{s} = 0.75V_{s} $$

From Fig. 13.26 and the equations, we observe a resonant effect at both the source and termination. The signal overshoots on both ends. It then rings continuously, asymptotically approaching the final value of V_s.

We can easily conclude that when the destination is capacitive, source matching is the best-case scenario, reducing the delay to a single time of flight. When the source mismatches the line, there is either ringing or an underdamped approach to the final value. If the source is not matched and we try to apply a new signal before the old signal settles at the load, then inter-symbol interference occurs leading to loss of information on both symbols.

Source termination should always be the first choice in chip-to-chip communication. Chips naturally have capacitive inputs, which leave us no option but matching at the source. Assuming we do have the option to match at the load, would that be better than source matching? The answer is no, for two reasons: load matching dissipates power, and it does not allow the whole input signal to reach the load.

A resistive load to a transmission line means the flow of DC current. Because transmission line impedance is typically low, this current can be large, leading to significant power in the source, line, and load.

But more critically, assume the source is unmatched and is equal to Z_s = Z_o/3, but the load is matched Z_L = Z_o. The initial injected signal at the source is

$$ V_{\rm source} \left( 0 \right) = V_{s} .\frac{{Z_{0} }}{{Z_{0} /3 + Z_{0} }} = \frac{{3V_{s} }}{4} $$

The reflection coefficient at the load is 0. Thus after T_f, the signal at the load is

$$ V_{\rm load} \left( {{\text{T}}_{f} } \right) = \left( {1 +\Gamma _{\rm rx} } \right)V_{\rm incident} = \frac{{\left( {1 + 0} \right)3V_{s} }}{4} = 0.75V_{s} $$

There is no reflection from the load. Thus, the load and source voltages both settle at 0.75Z_o, never reaching the full swing value Vs.

13.6 Supply and Ground

1.
Recall supply/ground distribution schemes
2.
Understand the problem with resistive drops in supply lines
3.
Summarize sources of inductance in supply/ground distribution
4.
Realize Ldi/dt drops in supply can be a major problem
5.
Use a decoupling capacitor to ease ground and supply bounces
6.
Trace current and impedance behavior in the presence of a decoupling capacitor

Supply and ground run through wires, similar to signals. However, supply and ground are distinguished by the fact that they must be directly provided to all parts of the chip. With the exception of some transmission gates, all combinational and sequential circuits have to be provided with supply and ground. All the supplies and grounds through the chip originate from at most a few ground and supply pins. Thus supply and ground routing inside a chip is constrained by the fact that they have single origins outside the chip. This can be seen in Figs. 8.28 through 8.30

The main problem with supply and ground distribution is the resistance of wires. Delay is not a major concern because neither ground nor supply carry a signal. However, resistance is a major issue because it causes supply to drop and ground to rise for gates far away from the pins. In Sect. 8.3, we discussed several layout techniques used to reduce these drops. This included wires whose thickness increased the closer to the pin they came. It also included distributing ground and supply from both left and right, or even from all four sides of the chip. This helps confine the maximum drops to the center of the chip.

The ultimate solution to this problem is to dedicate an entire metal layer for ground and another for supply. This is particularly popular in technologies where a large number of metal layers are available. In this case an entire higher level layer becomes either a ground or supply. Because the higher level metal layers are also thicker, this allows grounds and supplies to be provided to most locations with minimal drops. Local access to ground and supply is through vias, so special attention has to be given to via resistance and to local distribution of ground and supply.

The problem with resistive supply drop is that it reduces the available on-current to the gate. This leads to increase in delay. If this occurs in a path with small slack (Chap. 6), it can cause unplanned setup-time violations. These problems are dependent on the particular sequence of signals that cause them, and can thus escape traditional fault detection tests (Chap. 14).

Another very similar complication occurs due to parasitic inductance on ground and supply lines. We consider not only the parasitic inductance of the wires but also inductance due to the bonding pad, pin, and even off-chip PCB tracks. Figure 13.27 illustrates these sources of inductance. The largest sources are due to off-die causes. This is because sizes are significantly larger at these locations.

Figure 13.28 shows the parasitic inductances lumped into single inductances in series with the supply and ground pins. True supply, marked by V_ddt and true ground, marked by 0 V, occur below and above the inductors. The ground seen by the chip is marked as GND, the supply seen by the chip is marked as V_dd.

With all signals steady, V_dd = V_ddt and GND = 0 V. However, signals are rarely steady. In normal operation, tens of millions of CMOS gates are switching at any particularly instant. Switching CMOS gates draws current from supply or source current to ground in order to charge or discharge their capacitances (Chap. 3).

This switching current leads to dynamic power dissipation. This current is a peak current, normally the saturation current of the NMOS or PMOS depending on the direction of switching. This current flows for a limited duration, it is not a steady-state current.

However, because a huge number of gates may be switching at a time, their total peak current can be large. This total peak current flows through the supply/ground distribution network and ultimately ends up flowing through the ground and/or supply pins.

Resistance in the wires can cause drops due to these currents, however, inductance can be significantly more dangerous. The inductance current–voltage equation is

$$ V = L.\frac{di}{dt} $$

Thus, the voltage drop on an inductor is not a function of the magnitude of the current, but rather on the rate at which such a current changes. While switching, gates cause large peak currents to flow. If properly designed, gate delay should be optimal and such currents should flow for a short time. This leads to a very large total di/dt.

The large current slew combined with the large pin inductance leads to a large voltage drop. Thus GND does not remain at 0 V, instead it bounces to Ldi/dt. V_dd drops to V_ddt–Ldi/dt. di/dt is the slew rate of the total current drawn from or sourced to the chip.

This phenomenon is called ground bounce. It is very dangerous and is complicated to estimate since it cannot be predicted entirely from the chip design alone. Most of the inductance comes from off-die sources. Thus engineers at all design stages must be cognizant of ground bounce.

The problem with ground bounce is that it is highly variable and has a complex relation to the behavior of the circuit. When there is little switching activity in the circuit, we might not even notice it. When there is large switching activity, it might cause circuit failure. As shown in Fig. 13.29, the bounces repeat, can happen on both lines equally, both lines differently, or on either line alone. It is very difficult to predict in advance.

One particular case where ground and supply bounces are noticeable is when the chip is switched on. During this period, abnormally large transient currents are drawn to charge internal ground and supply planes/lines. The bounce resulting from this transient current can easily trigger a latch-up state in the chip (Sect. 7.7), which is a severe failure.

Ground bounce is usually addressed by including a decoupling capacitance, Fig. 13.30. These are very large capacitors inserted in series between ground and supply. The decoupling capacitor can be manufactured on-chip or assembled off-chip on the PCB. If the capacitor is off-chip, it will not decouple any bounce due to inductance from on-chip sources. If it is on-chip, then the value of the capacitance is limited by area.

Figure 13.31 shows the action of the decoupling capacitor. We assume only ground bounce occurs. But the same discussion applies to supply bounce or to simultaneous bounce.

C is very large, so in steady-state it forms a near-perfect open circuit, having no impact on circuit operation. During transience when large currents are sourced, di/dt is large. This is due to both the large di and the small time dt in which these peaks occur. The small dt indicates these currents are mostly high frequency.

In high frequency, the impedance of C becomes extremely small, especially because C is large. On the other hand the impedance of L is very high in high frequency, which is why we had significant bounces in the first place.

As shown in Fig. 13.31, i is not only sourced to true ground through L but can also flow into C. Thus i = iL + iC. The majority of the transient current will flow into the capacitor instead of the inductor because the capacitive impedance is much lower during fast current switching. This significantly reduces di/dt observed in the inductor and thus ground bounce.

The current that flows into C leads to charge accumulation on the lower plate. If charge keeps accumulating on C due to repeated current sinking, the capacitor may become overloaded leading to insulator breakdown and failure.

However, the charge stored on the capacitor does not remain there. Figure 13.32 shows the current sourced from the pin, the current into C, and the current into L. When the pin sources a large current in a small time, the majority of the current goes to C. When i dies down, frequency drops to naught, and C sees a charge, and thus DC voltage at its lower plate. At DC, the impedance of C shoots up, and the impedance of L drops to zero. Thus, all the charge on C starts escaping through L. However, this charge seeps at a limited frequency, leading to a negligible Ldi/dt and thus negligible bounce.

When C stores the extra charge, would it not also cause a ground bounce? Recall that I = CdV/dt. Thus, the flow of current into a capacitor causes its voltage to rise. This is the main reason we use a very large C, because the current that flows into it causes a much smaller voltage rise than the bounce that would be caused by the inductor. The bounce on C is related to the amount of sustained current while the bounce observed on L is dependent on the rate di/dt.

To reiterate, decoupling works as follows (Fig. 13.33): when current changes suddenly, the impedance of C drops and L rises. Current flows into C causing a limited increase in voltage due to the size of C. When the current transient dies down, the voltage on C becomes DC, and the impedance of the inductor drops to near null. The stored charge on the capacitor leaks through the inductor to ground at low frequency, leading to little bounce.

Thus, the role of the decoupling capacitor is to absorb the current transients while minimizing their voltage impact. Then allowing these transients to escape through the inductor over a much longer duration where they do not cause bounce. Because we are allowing the same amount of charge to escape over a longer period, we can reduce the amount of bounce. The rise in voltage on the capacitor is much smaller than would have been observed on the inductor because capacitor voltage is proportional to charge rather than rate of change of current.

13.7 Clock Networks

1.
Understand the peculiarities of the clock signal
2.
Distinguish skew from jitter
3.
List sources of skew and jitter
4.
Understand the impact of skew and jitter on setup and hold violations
5.
Compare grids and trees for clock distribution
6.
Understand how hybrid clock distribution methods can combine the advantages of multiple clock distribution methods.

In Chap. 6, we discussed synchronous circuits as the most important class of digital circuits. This entails a single clock distributed to a very large number of registers throughout a large die. This is not an easy task and practical designs have to think of clocks as being synchronous only in local islands.

The clock signal is distributed all over the chip. Its load is enormous. The reference clock is provided by an external pin. This pin is used as an input to a “clock driver” or a set of clock drivers. Clock drivers have the task of distributing the clock signal to the large load of registers on the chip while minimizing delay. Clock drivers can be designed as optimal inverter chains with logical effort methods used to minimize total delay (Sect. 4.2).

However, regardless of the strength of clock drivers, delays in wires create unavoidable differences between local clocks. Figure 13.34 shows a situation where the wire leading the clock to register R₁ is much shorter than that for register R₂. Because the clock is distributed throughout the die, differences in wire lengths can be substantial. Due to this effect, clk1 and clk2 are at different phases regardless of the design of the buffer.

Clk1 and clk2 will necessarily have the same frequency because they come from the same source. However, they will have a small but important relative delay to each other. This delay is called the “clock skew” and is at least partly determined by the positions of the two registers relative to the driver.

Thus, a clock distribution network has to be carefully designed. The main objectives of a clock network are

Clocks should reach all registers with the same phase, or at least with a well-characterized phase shift (skew)
The clock has to be low on phase noise
Clock distribution has to be independent from process variations and variations of temperature
Preserve a well-known duty cycle
Power and area minimization. Clock networks use substantial metal wire resources, leading to power dissipation that could run amok
The network should be independent of the design. In other words, clock networks that fulfill the above but are a custom fit for only a single layout, will not support any late stage changes to the underlying design

So we are trying to distribute clocks with minimal distortions to all registers. Meanwhile we have to maintain simplicity, openness to modification, and low power operation. This task is made difficult by the many causes of clock distribution imperfections. While Fig. 13.34 suggests the main source of imperfection is variations in wire length, the list of possible sources of imperfection is much longer:

Clock generation. The clock comes from an external source. This source is often a crystal oscillator. The oscillator contains phase noise. Thus, the clock not only is imperfect at the source but it also suffers from more phase noise as it travels from the oscillator to the die core over PCB tracks, pins, and pads
Buffer mismatches. As we will shortly see, distributing the clock requires multiple levels of buffers. If buffers at the same level are mismatched, this could lead to differential drives and thus relative skew. Matching buffers perfectly is nearly impossible
Wire mismatches. This is the classical source we think of when we mention skew. However, in practice, it is only a single component of skew
Load mismatches. This is a major source of skew. Because different registers cannot be perfectly matched, they offer a different load to clock distribution branches. Thus even if the driving buffers are matched, and the wire traces to the register are identical, the load will still introduce skew
Temperature variations. Because different parts of the die have different activity, they will also have different temperature profiles. The different temperatures cause variations in signal behavior that introduce skew. They also introduce noise that translates into phase noise
Noise—particularly coupling noise. This leads to variations in the clock from cycle to cycle, thus phase noise
Power supply variations. The level of power and ground that reaches every device varies both by position and time. Because there are drops over power and ground lines (Sects. 8.3 and 13.6), the spatial variations cause variable delay from buffers and thus skew. But variations are also dependent on the momentary current being drawn, which introduces a temporal effect in the form of phase noise

We can see two major types of clock “errors”. There is a spatial phenomenon as shown in Fig. 13.34 that we call “skew”. But there is also a temporal component which represents uncertainty in the phase of the clock from cycle to cycle. We have termed this component “phase noise” so far. Phase noise is also commonly known as jitter, Fig. 13.35. We can summarize the impact of each source of clock imperfection as shown in Table 13.3. Notice that a single source of clock imperfection may manifest in the form of both or either skew and jitter.

Table 13.3 Sources of jitter and skew

Full size table

But why do we care if there is skew or jitter in a clock? Because it has a direct impact on the available clock period. This can create a setup-time violation where none can be discovered from gate and signal interconnect delay. Even more critically, skew and jitter can tighten the already tight conditions on hold-time. This can create new contamination paths with hold-time violations that can be very difficult to detect.

In Sect. 6.6, we deduced that the condition determining the clock period is

$$ T > t_{\rm CQ} + t_{\rm pd} + t_{\rm su} $$

However, as seen in Fig. 13.36, the available time for CLB propagation, register setup, and register propagation is changed by skew. The second edge of clk2 does not come T seconds after the first edge of clk1. Instead it arrives T + T_s seconds later.

The period available in the presence of skew is

$$ \begin{array}{*{20}c} {T + T_{s} > t_{\rm CQ} + t_{\rm pd} + t_{\rm su} } \\ {T > t_{\rm CQ} + t_{\rm pd} + t_{\rm su} - T_{s} } \\ \end{array} $$

Thus, skew reduces the required clock period. Surprisingly, skew allows us to improve performance! Does this mean that skew would never cause new setup-time violations to occur? Notice that the above results are obtained when data and clock flow in the same direction. This lead to clk2 being delayed relative to clk1. If the clock and data are driven in opposite directions, then clk2 would be ahead of clk1. This would translate into a negative T_s, which increases the required clock period. This then tightens the clock period, and leads to new setup-time violations if skew is not considered while calculating the critical path.

Still, why not always distribute clock and data in the same direction? First, we often do not have the option. Notice that routing data is a very complicated problem (Sect. 8.7). Clock distribution is also a complicated problem in its own right. Thus we cannot always choose the direction in which clocks are driven relative to data.

Also, we have to examine what happens to conditions on hold-time (Sect. 6.5):

$$ t_{\rm hold} < \hbox{min} \left\{ {t_{\rm pd} } \right\} + t_{\rm CQ} $$

Following the same approach in determining clock period, skew impacts hold-time condition in the following way:

$$ t_{\rm hold} + T_{s} < \hbox{min} \left\{ {t_{\rm pd} } \right\} + t_{\rm CQ} $$

Thus, if clock and data are distributed in the same direction, skew would tighten the hold-time condition. If they are distributed in opposite directions, skew would relax the hold-time condition. There are conflicting requirements about clock-data distribution directions. If we relax conditions on hold-time violation, we tighten conditions on setup-time violations and vice versa. As a result, most clock distribution strategies focus on reducing the magnitude of skew, not on managing the direction of clock distribution relative to data.

Jitter has a similar impact on clock period, as shown in Fig. 13.37. Here we are assuming positive jitter on the first clock edge and a negative jitter on the second clock edge. This reduces the available period between registers R₁ and R₂ by two entire jitters, thus

$$ \begin{array}{*{20}c} {T - 2T_{j} > t_{\rm CQ} + t_{\rm pd} + t_{\rm su} } \\ {T > t_{\rm CQ} + t_{\rm pd} + t_{\rm su} + 2T_{j} } \\ \end{array} $$

Unlike skew, there is no “best case” for jitter. In some successive cycles, jitter might arrive so that conditions are relaxed for the clock cycle. However, this is a stochastic event, and we know for a fact that the worst-case scenario will happen at some point, and thus we have to design for it. This is as opposed to skew, where we can design so that the best case sometimes occurs. This is due to the fact that skew is a spatial deterministic phenomenon while jitter is a temporal stochastic phenomenon.

Because jitter is a random process, there is no single value of T_j. So unlike T_s, we have to use a statistical value for T_j. Using the mean value for T_j is meaningless since it is zero-mean. Thus T_j is usually chosen to be a few standard deviations of the distribution of phase noise. Somewhere between a single and double standard deviations is normally enough.

Note that jitter does not have a direct impact on hold-time violations. The hold-time condition occurs entirely following a single active edge of the clock. It does not occur due to the budget of available time between two edges. In other words, everything that has to do with hold-time occurs and concludes shortly after a single edge. If jitter moves the active edge, it moves it for all registers simultaneously. Thus regardless of the position of the edge, no new hold-time violations can occur. Skew had the potential to create new hold-time violations because it caused different registers to observe different clocks. Thus, it forces R₂ to hold on clk2 but allows R₁ to generate outputs on clk1, allowing data to potentially race and create a violation.

At first glance, we might conclude that jitter principally comes from clock generation while skew principally comes from wire delay mismatches. In reality, the majority of skew is caused by power supply variations while the majority of skew is caused by load variations. Temperature variations also impact skew significantly, but are too slow to show up as jitter.

Clock distribution networks aim to either minimize skew or to minimize its impact. The two objectives (reducing absolute skew or reducing its impact) have their own advantages and disadvantages and lead to very different topologies.

The most intuitive clock distribution method is the grid. This is shown in Fig. 13.38. This approach has some analogies to the plane power–ground distribution method (Sect. 8.3). In this case we distribute the clock using a grid of metal lines.

The clock is distributed in a mesh of redundant horizontal and vertical paths. The highest metal layers are reserved for supply and ground, and thus an intermediate layer is used. The many parallel paths reduce the net resistance from the buffer chain on the top to any point in the grid.

Thus the grid method aims to minimize the absolute skew from the driver to any point on the grid. The maximum skew in Fig. 13.38 occurs at the bottom, which is farthest away from the driver.

Figure 13.39 shows a grid driven from all four directions. This minimizes the skew further by reducing the distance from any point in the grid to the nearest driver. The maximum skew now occurs in a single point in the center of the grid which is equidistant from all four drivers.

The size of the grid is obviously a factor in the maximum skew observed, with larger grids seeing more maximum skew. Thus, in very large floorplans, the layout is divided into multiple grids as shown in Fig. 13.40. If each sub-grid is driven from all four directions, the maximum skew observed anywhere is put under very tight control.

The grid method for clock distribution has many advantages. It achieves low skew in most locations. It is also independent from the underlying design. The grid can distribute the clock with nearly the same skew to any functional network. Load variations will lead to differences in the achieved skew, however, no changes in the clock grid are possible or necessary if the underlying design is modified. This is particularly valuable in the late stages of the design flow.

However, grids are a brute-force approach to clock distribution. We simply throw a lot of metal at the problem in the hope of solving it. The large metal resources occupy a lot of area, practically monopolizing an entire metal layer. But more critically power dissipation can be very high in the metal resistance.

The opposite approach to the grid is the tree approach. As shown in Fig. 13.41, the tree tries to match delays at different levels of the design. Thus skew at A and B is nearly matched, but is very different from skew at C, D, E, and F. The tree approach relies on the fact that it is not the absolute value of skew that matters, but rather the relative skew between two consecutive registers in a pipeline.

Thus if there are two registers at A and B, it is unimportant if they have significant skew relative to registers at C and D, as long as A and B only communicate with each other and not with C or D. Thus, the aim of the clock tree is not be to minimize delay from the clock generator to every point on the floorplan, but rather to minimize delay between nearby registers. This is the opposite of the grid approach.

Figure 13.42 shows a popular tree structure for clock distribution called the H tree. In the H tree, progressively smaller H shapes are used to distribute the clock. The aim is to equalize RC delays to points at the same level in the hierarchy.

The tree approach is very efficient. It uses only as much metal as necessary. Thus its area and power budget are not nearly as bad as the grid approach. On the other hand, trees are very susceptible to mismatches. Note that the primary aim is to equalize delay in branches. This aim can and will be sabotaged by many sources of skew unless considered in design. Primary among the saboteurs is load mismatch.

The grid approach was independent of these mismatches because it used a brute-force approach to minimize absolute skew rather than equalize relative skew. The other disadvantage of trees is that they are tightly knit to the design. The location of local drivers and H shapes are dependent on the final location of cells and modules in the layout. Thus, if any change is made to the design, a corresponding modification has to be made to the clock network.

Practical clock distribution is usually a hybrid, combining different approaches at different levels. The ultimate aim is to reduce the impact of skew whether by removing it, or by equalizing it, or a combination of both.

Figure 13.43 shows a hybrid clock distribution network. The clock from the pin is fed to a central spine through a cascade of buffers. The central spine is a specialized tree structure. At every level in the spine, a thick metal wire is used to short all the buffer outputs, further reducing the relative skew at that level. In the final level of the spine, the clock is distributed along the vertical direction with minimum skew.

To distribute the clock with minimal skew along the horizontal direction would require an arbitrarily large number of horizontal spines. Thus, instead an H tree is used to distribute the clock horizontally. The tree carries the clock to local domains. The relative skew between these domains is minimized if the H tree is carefully designed. At each local clock domain, a grid is used to minimize the absolute skew within the domain.

This approach combines the advantages of grid and tree approaches. Efficiency is maintained by using an H tree to distribute the clock to large areas or domains on the chip. In each domain, the grid makes the clock network independent from the design. This allows changes to the design to be independent from the clock network, just as long as such changes are limited to within a domain.

13.8 Metastability

1.
Recognize the need for systems with multiple clocks
2.
Understand that setup and hold-time violations are inevitable when passing data between clock domains
3.
Understand why metastability is a failure
4.
Calculate the probability to enter metastability and the probability to stay in it
5.
Calculate the MTBF for metastable failures as a function of time

Modern integrated circuits contain billions of transistors. They are also normally systems on a chip, with multiple functional subsystems coexisting on the same piece of silicon. We have so far assumed that all registers in a circuit use the same clock, skew notwithstanding. However, this is neither possible, nor is it useful.

When large systems are designed, subsystems can have wildly different critical paths. Forcing all subsystems to work synchronously is inefficient because it forces all subsystems to work at the speed of the slowest path in the entire chip. Given the fact that signals move mostly within subsystems, and occasionally between them, we are slowing down all our processing for the sake of the occasional intersystem communications.

Additionally, some blocks have to work at different clocks simply because they receive inputs or are expected to provide outputs at different rates. This is particularly common in communication systems where the transceiver moves from symbol to bit domain, requiring very different clock speeds.

Thus, we often end up using multiple clocks on the same chip. By multiple we mean a few. On-chip and especially on-FPGA resources rarely allow anything above nine independent clocks. Notice that independent clocks mean independent clock distribution networks (Sect. 13.7), which imposes severe conditions on the metal layer and on the place and route tool.

Notice also that clocks we call “different” are normally clocks with totally independent frequencies. Clocks whose frequencies are multiples of another clock are usually much easier to handle, and are often not considered “different”.

So what is the problem with using multiple clocks? The issue occurs when data tries to cross from a block using clk1 to a block using clk2. This is the “occasional” inter-subsystem communication we discussed above. However occasional this communication is, it happens, and it must happen reliably (Fig. 13.44).

What happens if we simply pass the data from block 1 using clk1 to block 2 using clk2? As shown in Fig. 13.45, most of the time this would work fine. We might not get a word registered (received) by block 2 in every cycle, but this is to be expected and should not be considered a failure because block 2 understands that there might not be a new sample in every cycle.

The main problem with crossing clock domains is the creation of unexpected setup and hold-time violations. This is marked clearly on Fig. 13.45. In the cycle marked by “setup-time violation” in Fig. 13.45, data from block 1 arrives so close to the edge of clk2 that it causes a setup-time violation. This happens when data from block 1 arrives less that tsu2 before the active edge of clk2. A similar problem also occurs if the data arrives within thold2 after the active edge of clk2. Thus, there is a window around the active edge of clk2 that is problematic.

The above event will occur, with predictable regularity as long as the frequencies of the two clocks are independent. Next, we have to understand why this setup/hold violation is problematic, how it manifests at the output of a register, and how likely it is to occur.

To understand why setup-time violations are harmful, consider Fig. 13.46, recreated from Chap. 6. Setup time is instituted to allow data to pass through and settle at the outputs of inverters I2 and I3. “Settling” in this context means that the inverters are in the low-gain range of their VTC, producing either a DC low output or a DC high output.

If the positive edge of the clock arrives under this condition, the outputs of I2 and I3 will be complements. The positive feedback will close through T₂ and the state would be preserved correctly in the master latch.

A setup-time violation in the context explained in Fig. 13.46 entails that not enough time is given for the output to “settle” at the output of either or both of I2 and I3. We can focus on I2 for now. A setup-time violation means that the active edge catches the output of I2, Qi, at a transitional point. Thus, as shown in Fig. 13.47, the inverter is in neither low-gain region. It is in the high-gain transitional region caught in the middle while switching between two low-gain regions.

An inverter caught in the high-gain region is often called “metastable”. This is strictly a misnomer, but is so commonly used that its use is expected. Strictly speaking, the only “metastable” point in the VTC in Fig. 13.47 is the point (V_in = V_m, V_out = V_m). According to the regenerative property in Chap. 3, the inverter is unstable in the metastable region and will exit it in a few inverter stages. The only stable regions for an inverter are the low-gain regions.

However, the exact midpoint (V_in = V_m, V_out = V_m) is also stable. If the input is exactly at V_m, the output will also be at Vm, and no matter how many inverter stages are cascaded, the outputs will all remain at V_m. However, to observe this stability the input has to be exactly at the inverter logic threshold. Any deviation will cause the inverter chain to diverge and exit the high-gain region. The exact equations that describe this will be derived shortly. Because we can never ensure any signal has an exact value due to noise, interference, and mismatches; then it is inevitable that the inverter will exit the logic threshold point. Thus the point is metastable rather than stable.

In a less formal manner, we call an inverter caught with an input anywhere in the transition region metastable. If I2 is metastable and the active edge occurs, then the outputs of I2 and I3 will be in the metastable region. An equally bad situation occurs if I2 manages to settle, but I3 is still metastable.

In Sect. 3.2, we showed that inverters in the metastable region will “quickly” regenerate values to full logic after only a few inverter stages. When T₂ closes the feedback, I2 and I3 form a loop. This loop should allow the output Qi to settle at a full logic value “quickly”, so why is metastability even a problem? For two reasons:

When metastability occurs, there is no guarantee how the inverters will exit it. We know that eventually I2 and I3 should end up in a stable state. However, there is no guarantee that Qi will settle at the correct value of D. What values I2 and I3 resolve to depends on the initial conditions when metastability occurred, as well as device variations in I2 and I3, and noise. This behavior is nearly impossible to predict. We can only assume that we cannot guarantee a resolution to the correct value of D
It can take a very long time for I2 and I3 to resolve. Although the high gain of the transitional region suggests a quick resolution of metastability, the time to resolve is also a very strong function of the initial voltage difference between Qi and the output of inverter I3. In all cases, the additional time to resolve metastability is an uncalculated overhead after the active edge of the clock. Thus, the output Q will not be D Tcq after the active edge of the clock. This represents a departure from the correct behavior of a register, and thus a failure

Metastability does not mean that we observe electrical values between logic “1” and logic “0” at the output of the register. In some cases such intermediate values can be observed at the output of the master latch Qi. However, Qi is an internal signal that the designer does not get to see.

The output of the register is two inverters removed from the node Qi. These two inverters I4 and I5 in Fig. 13.46 have high-gain transitional regions. Thus, even if Qi is an intermediate value, Q appears as either “1” or “0” according to Sect. 3.3.

However, as shown in Fig. 13.48, due to metastability point Qi can resolve in many different ways. It can overshoot then settle. It can also resolve out of metastability asymptotically. Depending on the logic thresholds of I4 and I5 and the behavior at Qi, Q can glitch, or it can make a single transition. In all cases, however, metastability is a failure for the following reasons:

If Q glitches, that means it settles to a logic value at a time more than Tcq after the clock edge
If Q does not glitch, it will settle to a value a time more than Tcq after the clock edge
In both the above cases, there is no guarantee that the output Q will settle to the correct value of D
Even if Q settles to the “correct” value of D, the fact that it settles more than Tcq after the edge is as good as resolving to the wrong value

Back to the two clock domains in Fig. 13.45. If data passes from clk1 domain to clk2 domain, then a setup-time violation will inevitably occur with a certain frequency. When this violation occurs, block 2 is basically unable to tell if the data coming from block 1 belongs to the upcoming cycle or to the cycle that just ended. This causes a metastable behavior, and the data at the output of the first register in block 2 will appear later than Tcq after the active edge of clk2.

Because all registers downstream in block 2 have clock cycles calculated based on data exiting registers after Tcq, this leads to the wrong values being registered and propagated in the receiver. These wrong values are again not intermediate electrical values; they are either “0” or “1”, but are certainly incorrect. The wrong bit is propagated downstream in the receiver subsystem leading to overall system failure.

Metastability is inevitable when crossing clock domains, and it causes unmitigated failure. To address metastability, we have to find ways to reduce its impact. To do this, we have to figure out how commonly it occurs. After all, if metastability is a rare or infrequent event, then why not just ignore it.

Figure 13.49 shows the time budget around the active edge of clk2. This is the budget that determines if we see metastability. The cycle has a period T₂. There is a window around the clock edge where an arriving signal from block 1 is metastable. This window is Tsu + Thold, with Tsu seconds before the edge and Thold seconds after the edge. Let T_m = Thold + Tsu.

To find the probability that metastability occurs, we assume that data from block 1 is uniformly distributed over period T₂. This assumption is acceptable when the two clock frequencies are fully independent. In this case, the probability that metastability occurs is

$$ P\left( {\rm metastability} \right) = \frac{{T_{m} }}{{T_{2} }} $$

and data arrives from block 1 every T₁, thus the rate at which metastability occurs is

$$ R\left( {\rm metastability} \right) = \frac{{T_{m} }}{{T_{2} }}f_{1} = \frac{{T_{m} }}{{T_{1} T_{2} }} $$

Metastability is a failure that happens as the circuit is operating and the rate R above is its failure rate. This makes metastability a matter of reliability. From Sect. 14.8, we see that mean time between failures (MTBF) is the most useful measure of reliability. MTBF answers the question: if a failure occurs, how long does it take for the same failure to repeat. If MTBF is high for any kind of failure, then we do not need to solve it. This is simply because if the failure occurs, then all we have to do is reset the system, and the same failure will not repeat for a long time represented by MTBF. MTBF is the reciprocal of failure rate.

Assume typical values for the clocks:

$$ \begin{array}{*{20}c} {T_{m} = 10\,{\rm ps}} \\ {T_{1} = 2\,{\rm ns}} \\ {T_{2} = 1\,{\rm ns}} \\ \end{array} $$

These numbers assume a very tight T_m, thus favoring a lower failure rate. However, we still find that R:

$$ R = \frac{{10\times10^{ - 12} }}{{2\times10^{ - 9} \times 1\times 10^{ - 9} }} = 5/ \upmu s $$

This translates into MTBF of

$$ MTBF = 0.2\,\upmu s $$

Thus if a failure due to metastability occurs and we fix it or ignore it, then the next metastability would occur only 0.2 μs later. So obviously metastability is a major issue. It fundamentally interferes with the operation of circuits that pass data between clock domains, rendering them useless. And it happens so frequently that simply ignoring it is unviable. Similar results would be obtained for a very large range of T_m, T₁, and T₂. Thus none of these parameters can be used to manage MTBF. We need a more fundamental solution.

To find a solution for metastability, we have to better understand how latches exit it. Figure 13.50 shows a model for inverters I1 and I2 and their output nodes X and Qi. The “output” of this circuit is actually the differential voltage V_qi−V_x. We call this differential voltage V_d. When the absolute value of V_d reaches V_dd, the two inverters are considered to have exited metastability.

The two inverters I1 and I2 are in the transitional region, thus their transistors are saturated, and they have to be modeled as amplifiers. The model is shown in Fig. 13.50. The two transistors in each of the inverters are simultaneously in saturation region, leading to amplifier-like behavior. A deeper discussion of inverters acting as amplifiers is presented in Sect. 12.6.

The transconductance of either inverter is the parallel transconductance of its NMOS and PMOS:

$$ g_{m} = g_{\rm mn} + g_{\rm mp} $$

We will assume inverters I2 and I3 are matched and thus have the same transconductance. Mismatches between inverters are inevitable, however, the general conclusions we will draw here are not fundamentally affected by such mismatches.

We also assume the capacitive loading at the outputs of both inverters is equal. This assumption is unlikely to be true because it assumes that the complex load offered at the output of I2 is equal to the loading at the output of I3.

With the matching assumptions in mind, we can perform KCL at X and Qi, the current exiting the inverter amplifiers equals the current entering the capacitors:

$$ \begin{array}{*{20}c} { - g_{m} V_{x} = C.\frac{{{\rm d}V_{\rm Qi} }}{{{\rm d}t}}} \\ { - g_{m} V_{\rm Qi} = C.\frac{{{\rm d}V_{x} }}{{{\rm d}t}}} \\ \end{array} $$

Note the negative sign because amplifiers by default sink signals. Subtracting the two equations:

$$ \begin{array}{*{20}c} {g_{m} \left( {V_{x} - V_{\rm Qi} } \right) = - C.\frac{{d(V_{x} - V_{\rm Qi} )}}{{{\rm d}t}}} \\ {g_{m} V_{d} = - C.\frac{{{\rm d}V_{d} }}{{{\rm d}t}}} \\ \end{array} $$

The time-constant at points X and Qi is

$$ \tau = \frac{C}{{g_{m} }} $$

The differential equation has an exponential solution:

$$ V_{d} = V_{d} \left( 0 \right)e^{t/\tau } $$

where V_d(0) is the initial voltage difference when clk2 catches I2 and I3 in a metastable state. In short, how far apart V_x and V_qi are, and how far from the true metastable point both are. We can assume that if V_d equals V_dd, the inverters have exited metastability. But in fact it is enough to allow the two nodes to just enter the low-gain region of the VTC, we do not need to wait for them to reach the full rail values. Thus, to find the time to exit metastability, we use V_d = V_dd, knowing that in reality V_d can be slightly lower. In fact according to Fig. 13.47, V_d can be as low as V_ih–V_il. This allows us to calculate Texit, the time to exit metastability:

$$ T_{\rm exit} = \tau \ln (V_{\rm DD} /V_{d} \left( 0 \right)) $$

Thus, the time to get out of metastability is a function of the time-constant at the output of the inverters. But more critically, it is a strong function of the ratio of the final and initial voltages. Assuming that the distribution of V_d(0) is uniform between 0 V and V_dd, then the likelihood that we are still in metastability at time t = 0 is 1. This probability then drops exponentially, as the voltage difference increases exponentially, until it reaches 0 at time t = infinity. Thus, the probability that we are still in metastability after a time t is

$$ P\left( {metastable,t} \right) = e^{ - t/\tau } $$

This is the conditional probability that we are still metastable at time t given metastability has already occurred. The probability that we both enter metastability and have not exited at time t is

$$ \begin{array}{*{20}c} {P\left( {still\,metastable,t} \right) = P\left( {metastability} \right).P\left( {metastable,t} \right)} \\ {P\left( {still\,metastable,t} \right) = \frac{{T_{m} }}{{T_{2} }}e^{ - t/\tau } } \\ \end{array} $$

So if metastability occurs, what is the probability that the metastability has not resolved a full cycle later:

$$ P\left( {still metastable,T_{2} } \right) = \frac{{T_{m} }}{{T_{2} }}e^{{ - T_{2} /\tau }} $$

And the rate of this event is

$$ R\left( {still metastable,T_{2} } \right) = \frac{{T_{m} }}{{T_{1} T_{2} }}e^{{ - T_{2} /\tau }} $$

Leading to an MTBF of

$$ MTBF = \frac{{T_{1} T_{2} }}{{T_{m} }}.e^{{T_{2} /\tau }} $$

Substituting for the typical values:

$$ \begin{array}{*{20}c} {T_{m} = 10\,{\rm ps}} \\ {T_{1} = 2\,{\rm ns}} \\ {T_{2} = 1\,{\rm ns}} \\ {\tau = 20\,{\rm ps}} \\ \end{array} $$

Gives an MTBF of

$$ \begin{array}{*{20}c} {MTBF = \frac{{T_{1} T_{2} }}{{T_{m} }}.e^{{\frac{{T_{2} }}{\tau }}} = \frac{2000*1000}{10}*e^{{\frac{1000}{10}}} ps} \\ {MTBF = 1.7\times10^{20} billion\,years} \\ \end{array} $$

Thus, if we observe the output after a full cycle has passed and a failure occurs due to metastability, the same failure would recur only after billions of billions of years, which we can safely say is as good as “never”.

Thus, the simplest solution to metastability is not to sample inputs incoming from clock domain 1 at the edge of clk2, but to delay such sampling a whole period of clk2. This might sound wasteful, but it has a pipeline-like nature that means we do not actually “waste” the extra period.

However, finding a systematic way to communicate using this result is difficult. We will do so in the next section. And along the way, we will discover that using this approach to synchronize the two clock domains is only viable for very occasional communications. Burst communication requires different hardware.

13.9 Synchronization

1.
Use a two register synchronizer
2.
Understand scenarios for the two register synchronizer
3.
Trace handshaking using synchronizers
4.
Design an asynchronous FIFO
5.
Understand the conservative nature of synchronizers in asynchronous FIFOs
6.
Understand when asynchronous FIFOs are useful.

Limiting metastability by introducing a cycle of delay is best done in practice by passing data coming from clk1 through two cascaded registers using clk2. Because the output Q2 in Fig. 13.51 samples Q1 a whole cycle later, the MTBF of metastability at Q2 is huge as derived in Sect. 13.8.

The two register approach in Fig. 13.51 is a popular approach to synchronization. We can show how a bit entering D1 in a metastable window will not show up as metastable on Q2. This can be done by tracing all the different ways Q1 can change and observing how this appears on Q2. However, two critical notes help in understanding the behavior at Q2:

The value at Q1 due to metastable latching in the master of the first register is never an intermediate value. Q1 will be a full logic value, albeit not necessarily the correct one. This is because there are four inverters in series from D to Q1, helping regenerate Q1 into a full logic value (Sect. 13.8)
If the fist register is metastable, this can manifest on Q1 in one of three ways: Q1 can glitch, it can register the correct value after more than Tcq, or it can register the wrong value after more than Tcq

Note that if D is maintained for another clk2 cycle, the probability that on the next active edge Q1 has not resolved into the correct value is negligible according to the MTBF calculations in Sect. 13.8. If D makes the transition from 0 to 1, we can distinguish the following scenarios about the behavior of Q1 and Q2. These scenarios are shown in Fig. 13.52 and summarized in Table 13.4:

Table 13.4 Behavior of two register synchronizer over three cycles

Full size table

1.
The first register can catch the correct value of the transition at D. This can be either because metastability has not occurred, or because metastability has occurred but Q1 has resolved to the correct value of D. There is an important difference between the two cases, if metastability has not occurred, then D appears to Q1 after Tcq. If metastability has occurred, D appears on Q1 more than Tcq after the edge. In both cases, register 2 will see the correct value on Q1 with no setup-time violations on the next clock edge. Thus, the correct value of D appears on Q2 in cycle 2. But is there not a chance that the metastable case would take too long to resolve, thus causing a setup-time violation at the input of R₂? Yes, but as shown in the calculation in the end of the last section, the MTBF of such an event is astronomically large
2.
The first register can completely miss the transition in D. Thus Q1 does not sample the transition on D at all, instead remaining at its old value. This can be because D changes after the window of clk2, or because metastability occurs and resolves to the wrong value. In the next cycle, Q1 will certainly register the correct value of D. This correct value will appear on Q2 on the third cycle
3.
The first register can glitch, going to the correct value, and back to the wrong value. Q2 will sample Q1 in the first cycle before the glitching happens, because metastability manifests after Tcq. By the time the second cycle comes along, Q1 will sample D without any setup-time violations. This sampling is then handed over to Q2 on the third edge of clk2

Notice there is only one thing in common between the three scenarios: when a transition occurs on Q2, that transition is correct, it will not glitch, it is not a resolution to a wrong value, and it is free from metastability.

However, there is uncertainty about when exactly the signal appears on Q2. In some scenarios it appears on the second edge, in some others it appears on the third edge. When the signal appears on the second edge, there is no extra delay in the process, it is simply a pipeline. If it appears on the third cycle, then a whole period was “wasted”. However, this waste is not truly waste, it is the inevitable overhead of sometimes having to wait for metastability to resolve itself.

Signals that pass between clock domains are usually buses rather than single bit signals. If we use the double register synchronizer to try to synchronize a data bus across two clock domains as shown in Fig. 13.53, we face a fundamental problem. The one cycle uncertainty about when data is registered in the receiver makes reading the bus impossible. Some bits in the bus are synchronized on the second cycle, some are synchronized on the third. While we can always read on the third cycle, this reduces the efficiency of the method.

Another point of contention is how long the data should be held at the output of the transmitter. The transmitter needs to keep its data unchanged until the receiver has correctly registered the data. So there has to be some feedback information from the receiver to the transmitter telling the latter that it is safe to change data.

Synchronizers are systematically used to manage clock domain transitions by synchronizing single bit control signals. These signals are handshake controls that manage the exchange of data between the transmitter and the receiver.

The setup is shown in Fig. 13.54. Data is readied on the data bus synchronous with the transmitter clock clk1. Simultaneously, the req flag is raised. The req flag indicates the transmitter is requesting a transfer to the receiver. The req flag is raised synchronous with data on clk1.

The req flag is synchronized to the receiver clock clk2 using a two register synchronizer. When the receiver senses req, it will register data. It will then generate an acknowledge flag, ack, on the edge of clk2. Ack is synchronized to the transmitter clock, and when read indicates data has been received properly.

The steps of handshaking are

1.
A data word is prepared on the bus synchronous with clk1. Simultaneously req is raised on clk1
2.
Req is synchronized to the receiver clock clk2 using a two register synchronizer. The receiver will either read the req on the second or third edge of the clock
3.
Because req and data were generated on the same clock, when req is synchronized at the receiver, it is safe to read the entire data bus. Thus the receiver registers the data bus
4.
The receiver raises the “ack” flag on clk2, indicating the word has been read
5.
The ack flag is synchronized to the transmitter clock using a two register synchronizer. The transmitter reads the ack flag on the second or third edge
6.
When the transmitter reads ack, it lowers the req signal, indicating it now understands the receiver has registered the data and it is now ready to send a new word
7.
The falling req signal is synchronized to the receiver, the receiver then lowers the ack signal, indicating it understands the transmitter might send a new word
8.
The falling ack signal is synchronized to the transmitter, indicating the receiver is ready to receive a new word. The cycle can now restart and a new word can be transmitted.

This handshaking approach is very safe. The MTBF of failures due to metastability is as shown in Sect. 13.8, indicating metastability is as good as solved. However, the large number of steps and the cycles consumed in every step insinuate this method has a significant delay overhead.

To calculate the overhead for sending a single word on the bus, we calculate the number of cycles in every step. Steps where synchronization takes place between the two clock domains consume 1 or 2 cycles, so we will assume that on average they consume 1.5 cycles. Steps where data is simply received by one side take only one cycle. Thus, the total cycle budget (written in order) is

$$ {\text{Cycles}}\,{\text{to}}\,{\text{send}}\,{\text{and}}\,{\text{receive}}\,{\text{one}}\,{\text{sample}} = 1 + 1.5 + 1 + 1 + 1.5 + 1 + 1.5 + 1.5 = 9 $$

This budget assumes very efficient data management on both sides where any two actions that can be done in the same cycle are. Note also that it is not strictly correct to add the cycles as we did above because some of the cycles are from clk1 while others are from clk2. But in the best case, it takes at least 9 of the shorter of the two clock cycles to get a single data word across, which is very inefficient.

More efficient handshaking approaches can be found. And some clever solutions, like sending multiple words in parallel on the data bus in Fig. 13.54, can reduce the overhead. But any method that uses handshaking for synchronizing has fundamentally high overhead. Thus handshaking is only viable for communication channels where words are transmitted only occasionally. It is also useful when one of clk1 and clk2 is many multiples of the other.

However, when we need to communicate long bursts of data between two clock domains with unrelated but close clock frequencies, another approach has to be followed. This approach is the asynchronous FIFO.

A FIFO is at its core a RAM. But as shown in Fig. 13.55, it has some additional circuitry to indicate the status of the queue. The FIFO has two ports. One port is used exclusively for writing, the other for reading. There are two pointers in the FIFO, the read pointer and the write pointer. The read pointer indicates the next position we should read from. Thus, if we have just read from position 2, the read pointer should be 3. When the next read is performed, we read from position 3. The write pointer is very similar, it indicates the next position to write to.

There are two flags derived from the read and write pointers. These flags are critical for correct operation of the FIFO. Notice that the write port should not write to a position that has not been read yet. Only a clear address can be written, and an address is “clear” only if it has been already read.

But what if there are no positions that can be written to? If this happens, then all the FIFO is yet to be read, and we must wait until a read is performed before we can write. In this condition, we say the FIFO is full. When is a FIFO full? When all locations are unread, Fig. 13.56.

Note the read pointer indicates the next location to be read and the write pointer indicates the next position to be written. Note also that addresses in a FIFO are always incremented. Thus, if we write a word and increment the write pointer, and then find that the write pointer is equal to the read pointer, then the FIFO is full. This is because the next position to be written is also the next position to be read, which indicates such position is still unread. There can be no other randomly placed clear positions in the FIFO because both reading and writing take place sequentially by incrementing the address in order.

The other flag is the empty flag, Fig. 13.57. When a FIFO is empty, it has no locations carrying unread information to be read. Thus, we have to wait for a new word to be written to the queue before we try to read. So, if we read a word and increment the address, then find that the read pointer is equal to the write pointer, then the FIFO is empty. This is because the next location to be read is also the next location to be written. Which means the location has already been read before and has not had new data written to it. There can be no other locations carrying data that has not been read yet because the FIFO reads and writes in order.

Note that the condition for empty and full is the same: The read and write pointers are equal. The only distinguishing feature between the two is the action that leads to this equality. If we increase the read pointer and find equality, then the FIFO is empty. If we incremented the write pointer and find equality, then the FIFO is full. Thus, the last action before equality decides the state.

An asynchronous FIFO is a FIFO where the read and write ports use different clocks. Figure 13.58 shows an asynchronous FIFO used to get data across two clock domains. The transmitter handles the write port. The receiver controls the read port. Thus, the transmitter dumps data into the queue while the receiver reads it out.

The read pointer is generated by the receiver on clk2. The write pointer is generated by the transmitter on clk1. Calculating the empty and full flags requires a comparison (subtraction) of the read and write pointers. However, these two pointers are generated in different clock domains.

As shown in Fig. 13.58, to calculate the full flag, the read pointer is synchronized to clk1 by a two register synchronizer. This is then subtracted from the write pointer, to produce the flag. Similarly, to calculate the empty flag, the write pointer is synchronized to clk2 by a two register synchronizer before the receiver calculates the empty flag.

The synchronization allows the empty and full flags to be calculated reliably without being affected by metastability. However, synchronization randomly takes between 1 and 2 cycles to produce a result; we have to check that such a random delay would not cause a problem. A “problem” in a FIFO occurs if the receiver tries to read a word it has already read, i.e., if it reads from a clear position. It is also problematic if the transmitter tries to write to a location that has not been read, causing such data to be overwritten and lost.

Both problems described above are a result of missing an empty or full flag. So we have to check if this would happen due to the single cycle uncertainty in synchronizers.

The empty flag is raised when a read happens and the read and write pointers become equal. The empty flag is calculated at the receiver. The read pointer is generated by the receiver. The write pointer comes from the transmitter and has to be synchronized to the receiver. Thus if there is synchronization delay while calculating the empty flag, it is due to the write pointer rather than the read pointer.

If we increment the read pointer and find equality, the empty flag is raised. But what if the write pointer had updated, but is being delayed in the synchronizer. In such a case, the write pointer should have been higher than it is. And thus, due to the synchronization delay, we see an equality when we should not, and we raise the empty flag when we should not. So we cause the receiver to not read for a single cycle when it could have safely read. This can be called conservative or wasteful, but it is certainly not a failure. A failure would be to read when we should not have read.

Similarly, the full flag is calculated at the transmitter. It is triggered by an increment in the write pointer. The write pointer is generated on clk1. We synchronize the read pointer to clk1 to perform the comparison.

Due to the synchronization uncertainty, we could declare a full condition when we did not have to. This would cause the transmitter to stop writing when it could have continued to. Thus, again this is conservative, rather than a failure.

The asynchronous FIFO is extremely efficient at transmitting long bursts of data. Its only delay overhead occurs when the empty or full flags are raised. In conditions where the receiver is much faster than the transmitter, the empty flag is raised often. In conditions where the transmitter is much faster than the receiver, the full flag is raised often. The latter case can be mitigated by increasing the size of the FIFO. In conditions where the two clocks are close to each other in frequency, the communication can be very efficient, with the empty or full flags raised rarely.

The read and write flags are both multi-bit words. Yet we still pass them through two register synchronizers. We discussed earlier in this section that buses should never be passed through synchronizers because the one cycle uncertainty of synchronization affects different bits differently.

However, the read and write pointers are always Gray encoded. This means that they do not increment in binary order, but rather in a Gray-encoded order. Thus, there is at most a single bit difference in every cycle if the pointers increment. Thus, the synchronizers are effectively only synchronizing a single bit change, and they can be safely used.

Author information

Authors and Affiliations

Cairo University, Giza, Egypt
Karim Abbas

Authors

Karim Abbas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karim Abbas .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Abbas, K. (2020). Wires and Clocks. In: Handbook of Digital CMOS Technology, Circuits, and Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-37195-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-37195-1_13
Published: 15 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37194-4
Online ISBN: 978-3-030-37195-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Abstract

13.1 Basics

13.2 Lumped C Wires

13.3 Silicon Wires

13.4 Scaling Wires

13.5 Interchip Communication

13.6 Supply and Ground

13.7 Clock Networks

13.8 Metastability

13.9 Synchronization

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation