Test Flow Selection for Stacked Integrated Circuits
 27 Downloads
Abstract
Integrated circuits (ICs) with a single chip (die) are typically tested with a test flow consisting of two test instances: (1) wafer sort for the bare chip and (2) package test for the packaged IC. For ICs with stacked chips  3D Stacked ICs  there are many possible test instances, even more test flows, and no commonly used test flow. In this paper, we propose a test flow selection algorithm (TFSA) to obtain a test flow for a given 3D Stacked IC. The TFSA results in a test flow for a given 3D Stacked IC, such that the expected total test time to produce each good package is minimized. We implemented the TFSA, three straightforward test flow schemes and an exhaustive search, and experimentally compared the test flow schemes on three different test architecture design approaches. The results demonstrate the importance to have methods both to select the test flow and design the test architecture.
Keywords
3D IC Stacked integrated circuits Test flow Test time Yield Test plan IEEE 1500 Test architecture Expected time Effective yield Quantity1 Introduction
The constant development in semiconductor technologies enables increasingly advanced integrated circuits (ICs). Today, it is possible to manufacture wafers where each individual chip (die) contains billions of transistors. After manufacturing, the chips are first cut from the wafer and then wire bonded to connect the chip to the package. Finally, the chips are packaged. The most recent advancement in semiconductor technologies is to stack several chips on top of each other and package them in one IC − 3D Stacked IC. The chips in such a 3D Stacked IC are connected by through silicon vias (TSVs) [8].
IC manufacturing is extremely complex, which increases the risk of defects. To detect manufacturing defects, each and every IC is carefully tested. ICs with a single chip are commonly tested with a test flow consisting of two test instances; wafer sort and package test. The bare chip is tested at wafer sort to avoid packaging of defective chips. If no defects are found at wafer sort, the chip is wirebonded, packaged, and retested during package test as defects may be introduced during wirebonding and packaging. For 3D Stacked ICs there are many more test instances. It is possible to test each individual chip during wafer sort instances, at intermediate test instances where the partially complete stacks can be tested, and at package test instance where the complete 3D Stacked IC is tested. For a 3D Stacked IC with N chips there are 2N instances when a test may be performed, N during wafer sort, N − 1 for intermediate stacks and 1 for package test. Hence, there are 2^{2N} possible variations of test flows [8].
The test cost of 3D Stacked ICs depend on a large number of factors, such as the hardware manufacturing cost that includes wafer fabrication, stacking and packaging, DfT (design for test), fault coverage, test resource and test equipment, the test time and yield. In this paper we reduce the test cost per good 3D Stacked IC produced, by selecting a test flow optimizing two of the major contributing factors: test time and yield. Eventually, we address minimization of the expected test time for each good 3D Stacked IC produced, and implement that along with test architecture optimization as described in previous articles [12, 13].
The expected test time may vary both with the choice of test flow as well as the applied test schedule. The time spent on testing defective ICs also contributes to the test cost. Hence, the manufacturing yield needs to be taken into account. For 3D Stacked ICs, where it is possible to stack a number of different chips, it is of interest to know how many chips of different types are needed in the manufacturing process to obtain a fixed number of good packages of 3D Stacked IC.
The test cost can be reduced by improving the yield and/or by reducing the time spent on testing. Improving yield implies reducing defects in the manufacturing process, for example, by upgrading production technologies. Yield improvement is not in the scope of this paper. In this paper, we focus on minimizing the time spent on testing.
In this paper, we assume a 3D Stacked IC, where the test time and yield at each test instance is known. With the goal of minimizing the expected test time per good 3D Stacked IC package, we propose a method to compute the effective yield, the number of chips that need to be tested at each test instance, and the expected test time, depending on the selected test flow. To find the most suitable test flow, we propose the Test Flow Selection Algorithm (TFSA). We performed experiments on several 3D Stacked ICs to compare the test flow obtained using TFSA against test flows obtained by exhaustive search and three straightforward test flows − wafer sort of each individual chip followed by package test (WSPT), test at all possible instances (TA) and test performed only at package test (PT). The results demonstrate that (1) TFSA produces results that are better than the three straightforward test flow schemes, (2) TFSA produces optimal test flow in most cases, and (3) the straightforward test flow where wafer sort of each chip is followed by package test of the complete 3D Stacked IC gives the best result among the three straightforward test flow schemes. We also integrated test flow selection with test architecture design, adopting the test planning scheme proposed in [12]. While [12] optimizes the test architecture for a single test flow that consists of wafer sorts of the individual chips followed by package test of the complete 3D Stacked IC, in this paper we generalize the approach for any given test flow. The experimental results validate that it is important to have methods to find the test flow as well as methods to design the test architecture. The test flow model and TFSA complies with all die orientations − facetoface, facetoback and backtoback; as well as all wafer bonding technologies − wafertowafer, dietodie and dieto wafer.
The limitations of this work are as follows. First, we assume that all test flows include package test. This is motivated by the fact that if the final test instance (package test) is not performed, all defects in the last test instance are not checked. Hence it is selfevident to assume package test to be mandatory. Second, in our experiments we calculate the test time of a intermediate stage as the sum of the test times of each individual die in the partial stack and that of the interconnects. Thus, each die would undergo the same test during wafer sort and all successive intermediate stages. The yields at all intermediate stages are also assumed to be equal during the experiments. Third, in this paper we address reduction of a part of the test cost by minimizing the test time associated with test flow selection. Optimization of all factors contributing to the test cost, like manufacturing cost or fault coverage, would lead to even higher complexity to the problem of selecting the most suitable test flow, and have therefore not considered in this paper. The expressions are however scalable to accommodate the tradeoff with additional contributing factors, which is addressed in our future work.
The rest of the paper is organized as follows. In Section 2 we discuss related research. The test architecture is elaborated in Section 3. In Section 4, we illustrate with an example at three different yield sets the need of finding a suitable test flow. In Section 5, we introduce notations and formulae, while in Section 6 we present the TFSA. In Section 7 we report the results from the experiments. The paper is concluded in Section 8.
2 Related Work
Several works have addressed test planning for corebased ICs having a single chip with the aim of optimizing the test cost [2, 6, 7]. Design and optimization of test architecture for nonstacked ICs with IEEE 1500 is described in [4, 5, 11, 15]. In [5], Iyengar et al. address optimization of test access mechanisms (TAMs) for SystemonChips (SoCs) to reduce coretest time by balancing core scan chains. Mullane et al. in [11] propose a hybrid scan for nonstacked ICs provided with IEEE 1500 core wrappers, by combining the serial and the parallel ports of the wrapper, resulting in an efficient test access that reduces the test time. However, for 3D Stacked ICs, test architecture optimized for each chip in the stack during wafer sort may not lead to an optimized test architecture when all the chips are tested jointly during package test.
We have proposed methods to reduce the test time for corebased 3D Stacked ICs, by optimizing the test architecture and the test plan [12, 13]. We used a straightforward test flow, where each individual chip is tested at wafer sort and the complete stack at package test. The test planning approaches were not adapted for arbitrary test flows.
Taouil et al. propose test cost models to predict the impact of test flows on the product quality and overall stack cost at an early design stage, which is important for a tradeoff between quality and cost [14]. They present a model that predicts the product quality, defined in terms of Defective Parts Per Million (DPPM) for different test flows. A framework is provided for covering different test flows and cost models to identify the most cost effective test flow [3]. Simulation results show that test flows that include wafer sort generally reduce the overall cost and that the most costeffective test flow strongly depends on the stack yield. They concluded, after experiments on different test flows, that adapting the tests according to the stack yield is a good approach. The paper analyzes several test flows for given 3D Stacked ICs, but do not provide a method for the selection of the most suitable test flow. Agrawal et al. in [1] have proposed a low complexity test flow selection scheme for 3D Stacked ICs to achieve a low test cost. It is shown that the test flow selection method takes considerably lower computation time as compared to exhaustive methods that completely explore the exponentially growing search space for 3D Stacked ICs. However, an estimate of the margin of the increase in test cost by using the proposed method against exhaustive search is lacking. Both [1, 14] are built on the compromise between yield and test cost. However, the tradeoff among two major contributing factors to the test cost, namely, test time and manufacturing cost of each component has been overlooked. The work is not verified against any test architecture design and test planning scheme. To the best of our knowledge, no work has previously proposed a test flow selection algorithm and verified it against any test architecture and test planning scheme.
A new test standard, IEEE P1838 is being developed to enable efficient modular testing of SICs [10]. The standard involves a die level wrapper on each chip in the stack. In addition, an IEEE 1149.1 based TAP controller is provided in the bottom die that controls the WIRs of the die wrappers. [12, 13] addresses test scheduling for SICs to minimize the test cost. In [12] a scalable test architecture is assumed where each chip is provided with a IEEE 1500 based wrapper, in accordance with the developing IEEE P1838 standard. Similarly, [13] assumes a IEEE 1149.1 based test architecture for the SICs for test planning.
3 IEEE 1500 Based Test Architecture
In this section we discuss the test architecture for a 3D Stacked IC, where each chip of the stack is supported by a IEEE 1500 based infrastructure as proposed by [8].
The TSV interconnect between chips may be tested using the boundary scan registers, which connects all input/output via TSVs. Boundary scan registers are implemented on both chips and are used in TSV interconnect test. Test stimuli are applied on outgoing TSVs and test responses are captured on incoming TSVs. Since the boundary scan register is a separate register, testing of TSVs cannot be performed concurrently with core tests.
The TSV interconnect tests contribute with a constant term to the overall test time and could not be scheduled with any core tests. Therefore, the time required to perform TSV interconnect tests are overseen while addressing the total test time in the remainder of the paper.
4 Impact of Test Flow on the Expected Test Time
In this section we show with an example the effect of test flow on the expected total test time. The example considers a 3D Stacked IC with two chips. Four test instances exist for the 3D Stacked IC, viz, wafer sort of each individual chip (WS1 and WS2, respectively), intermediate test (IT) of the two chips, and package test (PT) of the final 3D Stacked IC.
SIC with three different sets of yield
Test instance  WS1  WS2  IT  PT 

I _{ i j}  I _{11}  I _{21}  I _{22}  I _{32} 
Test time T_{ij}  10  10  30  70 
Yield y_{ij}(Case 1)  0.90  0.91  0.92  0.93 
Yield y_{ij}(Case 2)  0.70  0.71  0.72  0.73 
Yield y_{ij}(Case 3)  0.50  0.51  0.52  0.53 

Test all (TA): tests are applied at every possible test instance;

Wafer sort and package test (WSPT): each individual chip is tested at wafer sort and the complete 3D Stacked IC is tested at package test;

Package test (PT): testing is only applied to the complete 3D Stacked IC at the final test instance, package test;
The results show that for Case 1 PT has the lowest expected test time, while in Case 2, the test flow with the lowest expected test time is WSPT, and that in Case 3 TA results in the lowest expected test time. Thus, it can be concluded from the results that a straightforward test flow may not provide the lowest expected test time for any given 3D Stacked IC. In this paper we present a method to obtain a test flow for any given 3D Stacked IC, such that the expected total test time is minimized.
5 Expected Total Test Time Estimation
In this section we derive an expression to calculate the expected total test time required to produce each faultfree 3D Stacked IC for any assumed test flow. For reference, we assume the design provided in Table 1 with yield Case 1.
List of notations
Input data  
N  Number of chips constituting the stack 
I _{ i j}  Test instance corresponding to i^{th} chip 
T _{ i j}  Time taken to test a single unit at instance I_{ij} 
y _{ i j}  Yield of the manufacturing stage I_{ij} 
i  i^{th} chip in the stack, 1 ≤ i ≤ N 
index N + 1 is used for package tests  
j  j = 1 ⇒ wafer sort 
j = 2 ⇒ intermediate and package test  
Known data  
I _{12}  Does not exist 
T_{12} = 0  
y_{12} = 1  
I _{ N+ 11}  Does not exist 
T_{N+ 11} = 0  
y_{N+ 11} = 1  
Q _{ N+ 12}  = 1 
Q _{ i2}  = Q_{i− 12} 
Calculated  
x _{ i j}  Binary decision variable 
1; if test is performed at instance I_{ij}  
0; otherwise  
\(\bar {x}_{ij}\)  1 − x_{ij} 
X  Test flow vector composed of x_{ij} 
X = (x_{i1}1 ≤ i ≤ N), (x_{i2}2 ≤ i ≤ N), (x_{N+ 12} = 1)  
Q _{ i j}  Number of good units that need to be 
produced at instance I_{ij}  
T_{eff}(ij)  Expected time taken to produce each good 
unit at instance I_{ij}  
Y _{ i j}  Effective yield at instance I_{ij} 
τ  Expected total test time taken by the test flow given by X 
The wafer sort instances are illustrated by the boxes in the upper row, where j = 1, which means I_{11} to I_{N1} are the instances for wafer sort. The wafer sort instance of a chip i is indicated with I_{i1} to the left in Fig. 4. We have j = 2 for intermediate tests of partial stacks with i chips, and for package test instances, I_{i2}. In Fig. 5, the bottom row indicates the intermediate test instances and the package test. For an intermediate test instance I_{i2}, the intermediate stack consists of all chips from 1 to i. For example, if i = 3, the intermediate test instance I_{32} consists of the partial stack of chip 1, 2 and 3. The intermediate test instances and package test are illustrated to the right in Fig. 4.
A partial stack, with i chips, is tested during the intermediate instance, I_{i2}, which comprises of components from two previous instances − a partial stack with i − 1 chips, and chip i. This is illustrated by arrows connecting instance I_{i− 12} → I_{i2}, and I_{i1} → I_{i2}, in Fig. 5.
We now discuss the values that need to be computed to obtain the desired test flow. A test flow is represented by the vector X = (x_{11}...x_{N1}),(x_{22}...x_{N2}),(x_{N+ 12}), with 2N elements, that include N wafer sorts, N − 1 intermediate tests and 1 package test. Each element is denoted by the binary decision variables x_{ij}, for each box in Fig. 5, we set x_{ij} = 1 when a test is performed at instance I_{ij}, and x_{ij} = 0 when no tests are performed at instance I_{ij}. Let us consider the example of a 3D Stacked IC with 2 chips in the stack, as illustrated in Section 4. The vectors for TA, WSPT and PT would be represented as: X = (1,1),(1),(1), X = (1,1),(0),(1) and X = (0,0),(0),(1) respectively. For convenience, we also define \(\bar {x}_{ij} = 1  x_{ij}\).
τ is the total expected time taken by the test flow given by X. The total expected time depends on what tests are applied in a test flow. At each instance, the effective yield is computed depending on the tests that have been previously performed. To enable computation of test time, for a test instance I_{ij}, we let Q_{ij} denote the number of good units that need to be produced at instance I_{ij}, T_{eff}(ij) denote the expected time taken to produce each good unit at instance I_{ij}, and Y_{ij} the effective yield. The objective of this paper is to minimize the expected total test time τ.
Finally, it is given for all 3D Stacked ICs that, instances I_{12} and I_{N+ 11} do not exist. Instance I_{12} corresponds to the box at the bottom left of Figure 5, for the intermediate test of only chip 1, which does not exist as at least two stacked chips are tested at any intermediate test. Again, for a 3D Stacked IC comprising of N chips in the stack, instance I_{N+ 11} corresponding to the box at the top right corner of Fig. 5, refers to wafer sort of chip N + 1, also does not exist. Therefore, it may be assumed that these instances require no test time, i.e., T_{12} = T_{N+ 11} = 0, and also have a perfect yield, y_{12} = y_{N+ 11} = 1, for the generic expressions.
In the following Sections 5.1, 5.2, and 5.3 respectively, we elaborate each component of the expression, viz, yield Y_{ij}, quantity Q_{ij} and time T_{eff}(ij).
5.1 Yield
The effective yield at each test instance is presented here, which is given as a function of the given yield values of the preceding test instances, depending on whether a test was performed during that instance. We will discuss effective yield first for wafer sort and then for intermediate and package tests.
Let us consider an arbitrary wafer sort test instance I_{i1}, which has the given yield of y_{i1}. As there are no prior tests of the chip at wafer sort, the effective yield depends only on the yield at test instance I_{i1}.
Where, it is given that y_{N+ 11} = 1, as noted in Table 2 and we set \({\bar {x}_{N+1 1}}=1\).
5.2 Quantity
At any test instance, the number of units tested is greater than the number of good units obtained, due to imperfect (< 1) yield. Therefore, we calculate the expected quantity of good units required at the end of each instance such that a fixed number of good units are obtained from a succeeding manufacturing stage.
Let us start with the package test instance, illustrated by the rightmost box in Fig. 5. With a yield of y_{N+ 12}(< 1), to obtain Q_{N+ 12} = 1 good packages we need to test Q_{N+ 12}/y_{N+ 12} packages. Therefore, at the preceding instance I_{N2}, we need to produce Q_{N2} = Q_{N+ 12}/y_{N+ 12} good units. Now, to produce Q_{N2} good units at the intermediate test instance I_{N2}, we need to test \(Q_{N2}/(y_{N2}\cdot y_{N1}^{\bar {x}_{N1}})\) units. It is useful to note here that the number of instances that need to be tested during the intermediate test instance I_{N2} increases by 1/y_{N1} times if test was not performed at the wafer sort instance I_{N1}. This is due to the share of the defective wafers that pass on to the intermediate stack. Consequently, we need to produce \(Q_{N1}=Q_{N2}/(y_{N2}\cdot y_{N1}^{\bar {x}_{N1}})\) good intermediate stacks and back calculate the number of good units that need to be produced after each test instance up to Q_{11}. The quantity of good units required after each test instance Q_{ij} is formulated as follows.
It should be noted that at any intermediate test instance I_{i2}, for each intermediate stack comprising of chips 1 to i − 1 obtained from instance I_{i− 12}, per chip i from instance I_{i1} is stacked. Therefore, we need equal number of units from preceding instances I_{i1} and I_{i− 12}, giving Q_{i1} = Q_{i− 12}.
5.3 Time
Effective test time at each test instance Table 1
Test Flow  Teff(11)  Teff(21)  Teff(22)  Teff(32)  τ 

Case 1  
TA  12.99  12.84  35.06  75.27  136.16 
WSPT  12.99  12.84  0.00  81.81  107.64 
PT  0.00  0.00  0.00  99.89  99.89 
Case 2  
TA  27.18  26.80  57.08  95.89  206.94 
WSPT  27.18  26.80  0.00  133.18  187.16 
PT  0.00  0.00  0.00  267.97  267.97 
Case 3  
TA  72.57  71.15  108.85  132.08  384.64 
WSPT  72.57  71.15  0.00  253.99  397.71 
PT  0.00  0.00  0.00  996.04  996.04 
The objective is to find a suitable test flow for any given 3D Stacked IC, such that the expected total test time τ is minimized.
6 Test Flow Selection Algorithm (TFSA)
In this section, we first detail the Test Flow Selection Algorithm (TFSA) and then we detail the computational complexity of the algorithm.
Given the test time T_{ij} and yield y_{ij} at all test instances I_{ij}, the TFSA generates a test flow, X, by iteratively trying to reduce the expected total test time τ. At each iteration, the test instance that contributes to most reduction in τ is selected. As discussed in the previous section, we represent a test flow with the vector X = (x_{11}...x_{N1}),(x_{22}...x_{N2}),(x_{N+ 12}), where (x_{N+ 12}) = 1, since package test is always performed.
As noted in Table 2, T_{eff}(12) = 0 and T_{eff}(31) = 0.
When only package test is applied, the actual yield at package test takes the yield at all instances into account.
A variable, Counter, is active between lines 3 → 19, to ascertain 2N − 1 iterations. In this example, Counter iterates from 1 → 3. Variables \(\acute {i}\) and \(\acute {j}\) are reset for the iteration, in line 4.
To scan through all 2N − 1 test instances I_{ij}, variables i and j are defined between lines 5 → 17 and lines 6 → 16, respectively, where 1 ≤ i ≤ N and 1 ≤ j ≤ 2.
During an iteration, each inactive test instance x_{ij} = 0 is set to x_{ij} = 1, in line 7 → 8. The corresponding test cost \(\acute {\tau }\) is computed in line 9, as a result of the modified test flow, to evaluate if there is a benefit to include the test instance in the test flow. In the first iteration, the matrix is updated to (1,0),(0),(1).
Note, in this case, wafer sort is applied to chip 1, which means y_{11} is used at test instance I_{11}, and not in test instance I_{32}.
If the new test cost \(\acute {\tau }\) is lower than all previously computed test costs τ (line 10), the test cost is updated as \(\tau = \acute {\tau }\) and the indices i and j are recorded, at lines 11 and 12, respectively. In the example, \(\acute {\tau }=640\) and τ = 996. Hence, a better solution is found, thus replacing τ by \(\acute {\tau }\). The algorithm continues with the test flow (0,1),(0),(1) which gives \(\acute {\tau }=640\), same as the present value of τ = 640. Hence, \(\acute {i}\) and \(\acute {j}\) are not updated. For the test flow (0,0),(1),(1), we get \(\acute {\tau }=558 < \tau =640\), and update indices \(\acute {i}=2\) and \(\acute {j}=2\).
Hence, at the end of the first iteration, in line 18, we update the test flow to X = (0,0),(1),(1).
Similarly, at the end of the second iteration Counter = 2, we will have X = (1,0),(1),(1) and τ = 487. Eventually, at the third and final iteration Counter = 2N − 1 = 3, we will have X = (1,1),(1),(1) and τ = 384. Thus, for this example, (1,1),(1),(1), means wafer sort is applied to both chip 1 and chip 2, intermediate test is applied to the stack of chips 1 + 2 and package test is applied to the complete stack. The expected total test time is 384.
6.1 Complexity Estimation
There are two nested iterations in algorithm 1. The outer iteration, for loops between lines 3 → 19, iterates Counter from 1 to 2N − 1, and the inner iteration, for loops between lines 5 → 17, iterates i from 1 to N. For each i, j takes two values 1 and 2. Thus, the complexity is the product of the number of iterations of each loop, i.e., (2N − 1) ⋅ N ⋅ 2 = 4N^{2} − 2N, which is of order O(N^{2}).
7 Experiments
In this section we present two sets of experiments. First we compare the expected total test times obtained from TFSA with respect to three straightforward test flows and the test flow obtained by exhaustive search. Next, the TFSA is integrated with test planning of corebased 3D Stacked ICs with a IEEE 1500 based test architecture, to optimize the test time.
7.1 Test Flow Selection
The objective is to compare the expected total test time by applying TFSA and that required with the three straightforward test flow schemes (TA, WSPT, and PT) as well as against exhaustive search.
Test times and yields of Chips 1 to 10
Chip  1  2  3  4  5  6  7  8  9  10 

Set 1  
Test time  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000 
Yield  0.62  0.66  0.70  0.74  0.78  0.82  0.86  0.90  0.94  0.98 
Set 2  
Test time  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000 
Yield  0.98  0.94  0.90  0.86  0.82  0.78  0.74  0.70  0.66  0.62 
Designs
3D Stacked IC designs  Chips in the 3D Stacked ICs as detailed in Table 4. First chip is lowermost. 

S I C _{1}  1, 2 
S I C _{2}  1, 2, 3 
S I C _{3}  1, 2, 3, 4 
S I C _{4}  1, 2, 3, 4, 5 
S I C _{5}  1, 2, 3, 4, 5, 6 
S I C _{6}  1, 2, 3, 4, 5, 6, 7 
S I C _{7}  1, 2, 3, 4, 5, 6, 7, 8 
S I C _{8}  1, 2, 3, 4, 5, 6, 7, 8, 9 
S I C _{9}  1, 2, 3, 4, 5, 6, 7, 8, 9, 10 

TFSA generates test flows and corresponding test times very close to exhaustive search in most cases.

PT has a low expected total test time for 3D Stacked ICs with up to three chips in the stack: for SIC_{1} of Set 1, the result is only 1% away from optimum, whereas the optimal is obtained for SIC_{1} and SIC_{2} of Set 2. However, for all other cases, PT produces results that are far from optimum. As the number of chips in the 3D Stacked IC increases, the performance of PT deteriorates.

TA results give expected total test times that are about 40% more than optimum for Set 1 and over 80% worse at an average for Set 2.

WSPT is not as efficient as the TFSA. However, it is interesting to note that WSPT produces optimal results when the number of chips is less than 4, and WSPT is only a few % away from optimum when the number of chips in the stack is less than eight.
Comparison of expected total test times
3D Stacked ICs  Total test time (τ)  Difference(%)  

Exhaustive  TFSA  TA  PT  WSPT  TFSA  TA  PT  WSPT  
Set 1  
S I C _{1}  10874  10874  13812  10972  10874  0  27  1  0 
S I C _{2}  21656  21656  29402  33588  21656  0  36  55  0 
S I C _{3}  38277  38277  53088  86456  38277  0  39  126  0 
S I C _{4}  63842  63842  88354  197930  63842  0  38  210  0 
S I C _{5}  101631  101631  140179  413792  103042  0  36  302  1 
S I C _{6}  159216  178426  215669  801922  162933  12  35  404  2 
S I C _{7}  184017  184017  324978  1454734  254110  0  77  691  38 
S I C _{8}  346572  346572  482610  2487197  392444  0  39  618  13 
S I C _{9}  503730  503730  709280  4028502  601658  0  41  700  19 
Set 2  
S I C _{1}  4874  4874  11682  4874  8743  0  140  0  79 
S I C _{2}  11604  11604  25711  11604  17965  0  122  0  55 
S I C _{3}  25702  25702  47430  25702  32619  0  85  10  27 
S I C _{4}  45946  45946  80144  55971  55632  0  74  22  21 
S I C _{5}  75558  75558  128577  123013  91441  0  70  63  21 
S I C _{6}  121359  127854  199481  277057  146744  5  64  128  21 
S I C _{7}  188408  188408  302500  646197  231631  0  61  243  23 
S I C _{8}  284257  284257  451419  1573533  361253  0  59  454  27 
S I C _{9}  421316  421316  665931  4028502  558309  0  58  856  33 
Comparison of test flows obtained with exhaustive search against TFSA
Design  Test flow for exhaustive search  Test flow for TFSA 

Set 1  
S I C _{1}  (1, 1), (0), (1)  (1, 1), (0), (1) 
S I C _{2}  (1, 1, 1), (0, 0), (1)  (1, 1, 1), (0, 0), (1) 
S I C _{3}  (1, 1, 1, 1), (0, 0, 0), (1)  (1, 1, 1, 1), (0, 0, 0), (1) 
S I C _{4}  (1, 1, 1, 1, 1), (0, 0, 0, 0), (1)  (1, 1, 1, 1, 1), (0, 0, 0, 0), (1) 
S I C _{5}  (1, 1, 1, 1, 1, 1), (0, 1, 0, 0, 0), (1)  (1, 1, 1, 1, 1, 1), (0, 1, 0, 0, 0), (1) 
S I C _{6}  (0, 0, 1, 1, 1, 1, 1), (1, 0, 1, 0, 0, 0), (1)  (1, 1, 1, 1, 1, 1, 1), (0, 1, 0, 0, 0, 0), (1) 
S I C _{7}  (1, 1, 1, 1, 1, 1, 1, 1), (0, 0, 0, 0, 0, 1, 0), (1)  (1, 1, 1, 1, 1, 1, 1, 1), (0, 1, 0, 0, 0, 0, 0), (1) 
S I C _{8}  (1, 1, 1, 1, 1, 1, 1, 1, 1), (0, 1, 0, 0, 1, 0, 0, 0), (1)  (1, 1, 1, 1, 1, 1, 1, 1, 1), (0, 1, 0, 0, 1, 0, 0, 0), (1) 
S I C _{9}  (1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (0, 1, 0, 0, 1, 0, 0, 0, 0), (1)  (1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (0, 1, 0, 0, 1, 0, 0, 0, 0), (1) 
Set 2  
S I C _{1}  (0, 0), (0), (1)  (0, 0), (0), (1) 
S I C _{2}  (0, 0, 0), (0, 0), (1)  (0, 0, 0), (0, 0), (1) 
S I C _{3}  (0, 0, 1, 1), (1, 0, 0), (1)  (0, 0, 1, 1), (0, 1, 0), (1) 
S I C _{4}  (0, 0, 0, 1, 1), (0, 0, 1, 0), (1)  (0, 0, 0, 1, 1), (0, 0, 1, 0), (1) 
S I C _{5}  (0, 0, 0, 1, 1, 1), (0, 0, 1, 1, 0), (1)  (0, 0, 0, 1, 1, 1), (0, 0, 1, 1, 0), (1) 
S I C _{6}  (0, 0, 1, 1, 1, 1, 1), (0, 0, 1, 1, 0, 0), (1)  (0, 0, 0, 1, 1, 1, 1), (0, 0, 1, 1, 0, 0), (1) 
S I C _{7}  (0, 0, 0, 1, 1, 1, 1, 1), (0, 0, 1, 1, 1, 0, 0), (1)  (0, 0, 0, 1, 1, 1, 1, 1), (0, 0, 1, 1, 1, 0, 0), (1) 
S I C _{8}  (0, 0, 0, 1, 1, 1, 1, 1, 1), (0, 0, 1, 1, 1, 0, 0, 0), (1)  (0, 0, 0, 1, 1, 1, 1, 1, 1), (0, 0, 1, 1, 1, 0, 0, 0), (1) 
S I C _{9}  (0, 0, 0, 1, 1, 1, 1, 1, 1, 1), (0, 0, 1, 1, 1, 0, 0, 0, 0), (1)  (0, 0, 0, 1, 1, 1, 1, 1, 1, 1), (0, 0, 1, 1, 1, 0, 0, 0, 0), (1) 
7.2 Test Architecture Design
In the second set of experiments, the goal is to compare the expected total test times obtained by integrating test architecture design and test planning schemes with different test flows. We evaluate (1) TFSA against three straightforward test flow schemes (TA, WSPT, and PT) against an exhaustive search of all possible test flows, and (2) the test flows on three test architecture designs and test planning schemes. The objective here is to integrate test flow selection and test architecture design to obtain the minimal test cost.

Scheme 1, the TAM for each chip is optimized independently of all other chips in the 3D Stacked IC. It means that each chip gets the TAM that is most suitable for its wafer sort. Note that after the optimization additional TAM wires can be added to a chip. For example, if the top chip requires a wide TAM while all other chips only need a narrow TAM, the wide TAM is added to all chips to make testing of the top chip possible at package test.

Scheme 2, the TAM for the lowest chip is optimized and that TAM architecture is used for all chips in the 3D Stacked IC. In this case, all chips use the TAM optimized for wafer sort test of the lowest chip in the 3D Stacked IC.

Scheme SIC, ILP is used to optimize the test architecture for a given test flow. Our ILP scheme [12] is extended from only accepting WSPT to allow an arbitrary test flow.
Experimental data
Label  Design  Cores  Time  Yield 

D  d695  11  695828  0.65 
G  g1023  15  731423  0.65 
P  p34392  20  16372887  0.75 
T  t512505  32  165324037  0.75 
Intermediate test  10000  0.65  
Package test  10000  0.75 
Each ITC’02 benchmark in Table 8 represents a chip in a 3D Stacked IC. By combining the four benchmarks in various ways, we constructed 3D Stacked ICs with 2, 3 and 4 chips. In total we created 9 designs (DP, DT, GP, GT, DGP, DGT, DPT, GPT, and DGPT) where for example the DP design is a 3D Stacked IC with 2 chips consisting of d695 and p34392 where d695 is the lowest chip. The test time required to test each unit at any test instance is obtained from test architecture design and test planning using either Scheme 1, Scheme 2 or Scheme SIC, detailed in [12]. Note that the test time obtained from test architecture design and test planning of the 3D Stacked IC does not change with yield or the output quantity [12].
Expected total test times required by 3D stacked ICs for different test flow and test architecture schemes
Test architecture scheme  vs SIC (%)  

Test  Scheme  Scheme  Scheme  Scheme  Scheme 
flow  SIC  1  2  1  2 
SIC: DP; TAM width = 10  
TFSA  5.3E + 6  5.9E + 6  6.7E + 6  11  14 
WSPT  5.3E + 6  5.9E + 6  6.7E + 6  11  14 
TA  1.0E + 7  1.2E + 7  1.4E + 7  20  17 
PT  8.3E + 6  9.5E + 6  1.1E + 7  14  16 
vs TFSA (%)  
WSPT  0  0  0  
TA  94  102  104  
PT  56  62  65  
SIC: DT; TAM width = 16  
TFSA  3.0E + 7  3.7E + 7  4.5E + 7  23  22 
WSPT  3.0E + 7  3.7E + 7  4.5E + 7  23  22 
TA  6.2E + 7  7.6E + 7  9.6E + 7  23  26 
PT  5.0E + 7  5.9E + 7  7.2E + 7  18  22 
vs TFSA (%)  
WSPT  0  0  0  
TA  106  105  111  
PT  66  60  57  
SIC: GP; TAM width = 22  
TFSA  2.5E + 6  2.8E + 6  3.1E + 6  12  11 
WSPT  2.5E + 6  2.8E + 6  3.1E + 6  12  11 
TA  4.7E + 6  5.3E + 6  5.6E + 6  13  6 
PT  3.8E + 6  4.4E + 6  4.9E + 6  16  11 
vs TFSA (%)  
WSPT  0  0  0  
TA  92  88  82  
PT  56  56  60  
SIC: GT; TAM width = 22  
TFSA  2.2E + 7  2.6E + 7  3.2E + 7  18  23 
WSPT  2.2E + 7  2.6E + 7  3.2E + 7  18  23 
TA  4.5E + 7  5.1E + 7  5.6E + 7  13  10 
PT  3.6E + 7  4.2E + 7  5.2E + 7  17  24 
vs TFSA (%)  
WSPT  0  0  0  
TA  106  94  76  
PT  65  61  64  
SIC: DGP; TAM width = 25  
TFSA  4.1E + 6  4.5E + 6  5.2E + 6  10  16 
WSPT  4.4E + 6  5.0E + 6  5.8E + 6  14  16 
TA  5.2E + 6  6.0E + 6  7.0E + 6  15  17 
PT  4.6E + 6  5.3E + 6  6.2E + 6  15  17 
vs TFSA (%)  
WSPT  10  12  12  
TA  25  31  32  
PT  12  16  17  
SIC: DGT; TAM width = 25  
TFSA  3.7E + 7  4.1E + 7  4.7E + 7  11  15 
WSPT  4.1E + 7  4.6E + 7  5.3E + 7  12  15 
TA  4.7E + 7  5.4E + 7  6.2E + 7  15  15 
PT  4.2E + 7  4.8E + 7  5.6E + 7  14  17 
vs TFSA (%)  
WSPT  12  14  15  
TA  27  33  34  
PT  14  18  20  
SIC: DPT; TAM width = 30  
TFSA  3.5E + 7  3.8E + 7  4.4E + 7  9  16 
WSPT  3.7E + 7  4.2E + 7  4.9E + 7  14  17 
TA  4.4E + 7  5.1E + 7  5.9E + 7  16  16 
PT  3.8E + 7  4.3E + 7  5.1E + 7  13  19 
vs TFSA (%)  
WSPT  7  9  10  
TA  25  30  33  
PT  9  13  15  
SIC: GPT; TAM width = 16  
TFSA  6.5E + 7  7.2E + 7  8.3E + 7  11  15 
WSPT  7.0E + 7  7.9E + 7  9.1E + 7  13  15 
TA  8.2E + 7  9.5E + 7  1.1E + 8  16  16 
PT  7.1E + 7  8.1E + 7  9.5E + 7  14  17 
vs TFSA (%)  
WSPT  7  9  10  
TA  26  32  33  
PT  9  13  15  
SIC: DGPT; TAM width = 30  
TFSA  3.5E + 7  4.3E + 7  6.3E + 7  23  47 
WSPT  5.0E + 7  6.4E + 7  9.9E + 7  28  55 
TA  5.2E + 7  6.2E + 7  9.2E + 7  19  48 
PT  6.8E + 7  9.0E + 7  1.3E + 8  32  44 
vs TFSA (%)  
WSPT  45  48  57  
TA  45  45  45  
PT  97  108  110 
The results indicate that Scheme SIC is best for all cases. In some cases, Scheme SIC versus Scheme 2 on design GP is 9% better, but in some cases, for example DGPT Scheme SIC is 56% better than Scheme 2, with each test architecture scheme using the test flow obtained by TFSA.
Test flow used with SIC Scheme to minimize test time
Design  Exhaustive  TFSA 

DP  (1, 1), (0), (1)  (1, 1), (0), (1) 
DT  (1, 1), (0), (1)  (1, 1), (0), (1) 
GP  (1, 1), (0), (1)  (1, 1), (0), (1) 
GT  (1, 1), (0), (1)  (1, 1), (0), (1) 
DGP  (1, 1, 0), (1, 0), (1)  (1, 1, 0), (1, 0), (1) 
DGT  (1, 1, 0), (1, 0), (1)  (1, 1, 0), (1, 0), (1) 
DPT  (1, 1, 0), (1, 0), (1)  (1, 1, 0), (1, 0), (1) 
GPT  (1, 1, 0), (1, 0), (1)  (1, 1, 0), (1, 0), (1) 
DGPT  (1, 1, 1, 0), (1, 1, 0), (1)  (1, 1, 1, 0), (1, 1, 0), (1) 
8 Conclusion
 1)
TFSA generates test flows and the corresponding test times very close to exhaustive search;
 2)
TFSA and a 3D Stacked IC optimized test architecture performs best with respect to test time, and
 3)
WSPT provides the minimum test time among the three straightforward test flow schemes and in many cases is equal to that with TFSA.
Notes
References
 1.Agrawal M, Chakrabarty K (2015) TestCost Modeling and Optimal TestFlow Selection of 3DStacked ICs. In: IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, pp 1523–1536Google Scholar
 2.Chou RM, Saluja KK, Agrawal VD (1997) Scheduling tests for VLSI systems under power constraints. IEEE Trans VLSI Syst 5(2):175–185CrossRefGoogle Scholar
 3.Hamdioui S, Taouil M (2011) Yield Improvement and Test Cost Optimization for 3D Stacked ICs. In: Asian Test Symposium (ATS), pp 480–485Google Scholar
 4.Higgins M, MacNamee C, Mullane B (2010) Design and implementation challenges for adoption of the IEEE 1500 standard. IET Comput Digit Techn 4(1):38–49CrossRefGoogle Scholar
 5.Iyengar V, Chakrabarty K, Marinissen EJ (2002) Test Wrapper and Test Access Mechanism CoOptimization for SystemonChip. In: Journal of Electronic Testing: Theory and Applications, vol 18, pp 213–230Google Scholar
 6.Iyengar V, Chakrabarty K, Marinissen EJ (2003) Test Access Mechanism Optimization, Test Scheduling, and Tester Data Volume Reduction for SystemonChip. IEEE Trans Comput 52(12):1619–1632CrossRefGoogle Scholar
 7.Larsson E, Arvidsson K, Fujiwara H, Peng Z (2004) Efficient test solutions for corebased designs. IEEE Trans ComputAided Des Integr Circ Syst 23(5):758–775CrossRefGoogle Scholar
 8.Marinissen EJ, Zorian Y (2009) Testing 3D Chips Containing ThroughSilicon Vias. In: IEEE International Test Conference (ITC), pp 1–11Google Scholar
 9.Marinissen EJ, Verbree J, Konijnenburg M (2010) A Structured and Scalable Test Access Architecture for TSVBased 3D Stacked ICs. In: IEEE VLSI Test Symposium (VTS), pp. 1–6Google Scholar
 10.Marinissen EJ, McLaurin T, Jiao H (2016) Ieee std p1838: Dft standardunderdevelopment for 2.5d, 3d, and 5.5dsics. in: 2016 21th IEEE european test symposium (ETS), pp 1–10Google Scholar
 11.Mullane B, Higgins M, MacNamee C (2008) IEEE 1500 Core wrapper optimization techniques and implementation. In: IEEE International test conference (ITC), no. 29.2, pp 110Google Scholar
 12.SenGupta B, Larsson E (2014) Test Planning and Test Access Mechanism Design for Stacked Chips using ILP. In: IEEE VLSI Test Symposium (VTS), pp 1–6Google Scholar
 13.SenGupta B, Ingelsson U, Larsson E (2012) Scheduling Tests for 3D Stacked Chips under Power Constraints. J Electron Test: Theory Appl (JETTA) 28(1):121–135CrossRefGoogle Scholar
 14.Taouil M, Hamdioui S (2012) Yield Improvement for 3D WafertoWafer Stacked Memories. In: Journal of Electronic Testing: Theory and Applications (JETTA), vol 28, pp 523–534Google Scholar
 15.Yi H, Song J, Park S (2008) LowCost Scan Test for IEEE1500Based SoC. IEEE Trans Instrum Measur 57(5):1071–1078CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.