Optimally Factored IFIR Filters
- 225 Downloads
This paper presents a new design method and a corresponding architecture for creating FIR filters that are significantly more hardware-efficient than presently known implementations. These optimally factored IFIR filters are also easily pipelined, thereby allowing operation at much higher data-rates. Using examples introduced by previous researchers, we show surprisingly better hardware efficiency. Two such examples show hardware reductions in the vicinity of 50%, relative to the conventional Remez structures, whereas previous research targeting this matter reports more modest results. We also show new features and further benefits that can be obtained by using optimally factored IFIR filters.
KeywordsFIR filter Factored filter IFIR Interpolated FIR Optimal FIR filter Cascade filter Optimal filter design Filter hardware complexity
An interpolated FIR digital filter (often referred to as an “IFIR filter”) is well known to filter designers [6, 7, 11, 17, 19]. It uses a filter architecture that can be very efficient for making narrow-band lowpass (or, by simple mapping, highpass) filters. The IFIR transfer function H(z) is constructed as a cascade connection of two FIR filters H(z) = G(zL)I(z) where the n-tap FIR filter G(z) (often called the model filter) has its argument z replaced by zL for a positive integer L (called the “stretch factor,” subsequently referred to as “SF”), and this replacement is equivalent to “stretching” the length of filter G to become approximately L times as long—more precisely, it will have 1 + (n − 1)L taps, with the majority of the tap coefficients having the value zero, hence having zero hardware cost for such tap-coefficient multipliers and their structural adders. This stretching in the time domain is equivalent to “shrinking” the transfer function G(ejω) by the factor L in the frequency domain which gives insight as to why such functions can be efficient when used for narrow-band filters. Such frequency-domain shrinking, however, causes unwanted passbands, centered at ω = 2π/L, 4π/L, …, 2π(L − 1)/L, to appear and these must be removed (or masked) by using the cheap (due to its wider transition band) lowpass filter I(z), called the interpolator or masking filter. Refs. [6, 17, 19] provide more details on IFIR filters and their properties, and [7, 11] show how to choose an optimum stretch factor (SF) so that a filter will most efficiently meet given passband and stopband specifications.
Hardware-complexity summary and comparison (order-15 Example 1) parameter B is the data width (wordlength)
1.1 In Regard to Optimal Factoring Filter Design
In the Fig. 4 factored filter, this yields the three fourth-order stages and it leaves the 180° zero and the 87° zero-pair to stand alone as first-order and second-order blocks. Using a similar approach, we now present a method for making optimally factored IFIR filters that has not previously been explored.
Table 1, line 2, gives our assessment of a 10.1% hardware savings to be expected by this Fig. 4 non-IFIR factoring example. This may seem a relatively minor improvement. However, we shall see that this small (degree-15) filter, which does not display many features that would identify it as a particularly good candidate for IFIR implementation (for example, it does not have a very narrow transition band) still achieves a rather impressive 25% hardware reduction once we combine the optimal factoring with the use of the IFIR architecture (as is shown in the last row of Table 1). For larger and more demanding filters, we have found, and will demonstrate herein, that even greater percentage-reductions in hardware can be expected. Before discussing this IFIR factoring further, we shall briefly explain our Table 1 computations. (Similar hardware-assessment techniques will be used throughout our subsequent discussions.)
1.2 Concerning Our Assessments of Filter Hardware Costs
In our subsequent discussions on IFIR transfer functions of the form G(zL), the presence of stretch factors L > 1 will cause us to consider FIR filter structures having a cascade of numerous z−1 delays as alternatives for structures having fewer delays but more numerous (and often more expensive) tap-coefficient multipliers. We consider all FIR filters discussed here to have fixed-point binary multiplier coefficient values, implemented efficiently by circuits that employ hard-wired data shifts and additions (multiplier adders) of this shifted data. This is true (and commonplace) for direct-form as well as transposed-form FIR filter hardware implementations.
The “hardware efficiency” of a circuit will be affected by the number of additions required, which we assess in terms of the number of “multiplier adders” (MA) used. Also, other adders, so-called “structural adders” (SA), will be required to implement, for example, “plus or minus” operations like those shown within the boxes comprising the cascade structure at the top of Fig. 4 or, more generally, the additions for combining data that would take place in a conventional direct-form or a conventional transposed-form FIR filter. Ultimately this “adder hardware” will be assessed as the total number of single-bit “full adders” required to build these multiplier adders and the structural adders. And in doing this for our various examples we also account for the type of simplifications that are routinely employed by one skilled in the art, such as sub-expression sharing.
In Sect. 2, we now use the small order-15 FIR filter, whose transfer-function magnitude |H(ejω)| is plotted in Fig. 1 (dashed line), to illustrate some of the basic concepts for our new optimally factored IFIR filter design and implementation.
Notice that the important new concept of joint (versus individual) sequencing of the two sets of model filter and interpolator filter stages will also be introduced. The resulting filter structures are compared with the non-interpolated optimally factored (Fig. 4) design and with the conventional Remez implementation.
2 Degree-15 Filter Example: Choice of Stretch Factor and New Joint Stage-Sequencing Technique
Following , the choice of an optimum stretch factor for this filter is obtained and, as discussed previously, the two choices, indicated in Fig. 5, are SF = 2 and SF = 3, where, as summarized in Table 1, SF = 3 is the better choice due to its slightly greater hardware efficiency. Notice that the exact hardware costs will ultimately involve the details of any specific implementation.
Table 1 gives the hardware-complexity comparison for the optimally factored IFIR filters versus the conventional Remez (direct-form) FIR filter, as well as the (non-IFIR) optimally factored filter, and the IFIR non-factored filters. Clearly, the optimally factored SF = 3 IFIR filter has the fewest adders and lowest total complexity. Due to the modest 22-dB stopband attenuation target in Example 1, the wordlength of the signal path can require as few as just six bits for the optimally factored IFIR cascade implementations.
3 An Order-59 Filter Example : Factored IFIR Efficiency, and Additional Benefits
Hardware-complexity comparison for order-59 Example 2 FIR filter (B represents the wordlength of the datapath)
Benefit: If desired, the optimally factored IFIR filter easily allows a non-uniform datapath wordlength across the stages of the cascade. This can efficiently deliver better noise performance, as the dynamic range of each stage output can easily be optimally and independently adjusted, as will be discussed in Sect. 4.
Parameter B in Table 2 is the datapath wordlength, which should be at least 12 bits (including the sign bit) to allow a single-stage conventional design to provide enough resolution to be able to realize a 60-dB attenuation of the incoming signal. For the Fig. 15 factored IFIR filter, as discussed earlier, a wordlength of 14 bits is required. Table 2 also provides complexity comparisons of the FIRGAM method , the original CSD implementation of this example filter  (which, being an early CSD filter, was focused on reducing adder costs only), the PMILP algorithm , the minimum-adder MILP , the cascade method , and the genetic algorithm cascade .
Comparisons: Area, Speed, and Power Consumption
The fully pipelined optimally factored structure has fourteen pipelining registers (one register at the output of each of the first fourteen stages in Fig. 15). Table 3 shows that when the factored IFIR filter is not pipelined it has the smallest area, but the longest critical path and, as stated earlier, it is suitable only for applications where high speed is not required. Due to its long critical path, the synthesis tool had to increase its logic gate sizes in order to operate at 100 MHz, resulting in slightly higher power consumption than the pipelined designs. Its maximum operating speed was then 160 MHz. In contrast, the fully pipelined optimally factored designs had the shortest critical paths and hence the synthesis tool was able to achieve very high sampling rates using mostly small gate cells. While the transposed design and the fully pipelined factored design can both reach, at most, a speed of 900 MHz, notice that the transposed design requires a considerable increase in gate sizes (hence, considerable increases in area and power) in order to operate at this speed.
Notice that the conventional transposed design’s area and power requirements at 900 MHz are, respectively, 3.5 times and 53% higher than those of the optimally factored IFIR filter.
Also the conventional direct-form filter can operate only at speeds up to 500 MHz and even at that relatively low speed, it consumes a 2.3 times larger area and 28% more power than the fully pipelined Fig. 15 optimally factored IFIR filter.
Test 1) The input signal is white Gaussian noise (uniform power across all frequencies). We expect the filter to attenuate by 60 dB the portion of the signal within the stopband.
Test 2) The input signal is colored Gaussian noise with uniform power within the stopband. It is a sum of 100 random-phase sinusoids, uniformly distributed across the stopband (ω ≥ 0.14π). We expect a 60-dB attenuation of the entire signal.
Test 3) The input signal is one sinusoid at the passband edge.
Test 4) The input signal is one sinusoid at the stopband edge.
Figures 18 and 19 show that the Fig. 15 optimally factored IFIR filter is able to fully attenuate (by at least 60 dB) the stopband portions of the input signal (including a sinusoid at the edge of the stopband) and it is able to perfectly pass the passband signals (including a sinusoid at passband edge) with negligible (less than 0.1-dB) attenuation.
Figure 19 shows the progress of the RMS stage outputs throughout the cascade for the two sinusoidal test cases at the passband and stopband edges (Test 3 and Test 4).
An AdditionalBenefitprovided by the inherent flexibility of the optimally factored IFIR filter inFig. 15 :
If a (very modest) 0.019-dB increase is allowed in the passband ripple (i.e., changing from ± 0.1035 to ± 0.1225 dB), then the 8th-stage [1 − 0.46875z−3 + z−6] in the Fig. 15 structure can be further simplified to become [1 − 0.5z−3 + z−6], while the rest of the cascade factors can remain intact. The resulting modified stage has only trivial coefficients, which yields a further reduction in the shift-add operations necessary for implementing the Fig. 15 filter coefficients (a reduction by 10% from ten down to nine multiplier adders). The importance of this observation is that:
In general, we have found that, given a minor (usually acceptable) allowance in some of the target filter specifications, it is oftenpossible to exploit it to further simplify a specific stage (or stages!) of the optimally factored IFIR filter. In particular, this can be donewithout the need to change any of the other stagesin order to reduce the filter’s overall hardware complexity.
4 A Hardware-Efficient Wideband Filter Design Via Optimally Factored IFIR Implementation: Order-62 Filter Example from [2, 8, 27]
Similar to the order-59 filter in Sect. 3, this filter, referred to as filter L2 in , is a convenient example because several previous publications [2, 8, 18, 21, 22, 25, 27] have chosen to use it when presenting their own filter design and implementation methods. These include the FIRGAM and Remez algorithms , an algorithm (LIM) from , the Partial Mixed-Integer Linear Programming (PMILP) algorithm of  and the single-stage and dual-stage designs using the coefficient optimization algorithms in [21, 22].
We first demonstrate an optimally factored IFIR implementation of filter L2, and we compare its complexity with the above-cited designs. Our filter implementation will also provide the opportunity to demonstrate: ANOTHER BENEFIT of our optimally factored filters: i.e., due to the relatively small size of our FIR factors, it is often possible to find some (otherwise not particularly obvious) opportunities to further reduce the number of add operations required for implementing some FIR coefficients.
Model filter G(z): quantized stages and binary representations, L2 filter using optimal pairing identified in Fig. 21a
The binary values of coefficients for G(z) and I(z), listed in Tables 4 and 5, indicate that most factors can be implemented very cheaply. Indeed, only Factor 5 and Factor 6 (the two largest factors) have coefficients that require more than one MA (multiplier adder) in their implementation. The Appendix explains how we can implement each of these factors with just two MA. (Admittedly, we do somewhat blur the distinction between MA and SA: we increase the number of SA.) Overall, however, we achieve a net reduction of one addition for each factor: we need 2 MA and 8 SA for Factor 5, and the same for Factor 6.
Hardware-complexity comparison for order-62 wideband FIR filter (B represents the wordlength of the datapath)
Test 1) Input signal is an ensemble of 50 random-phase in-band sinusoids (ω ≤ 0.2π). We expect the signal to traverse the factored filter unaffected, and the output to be a delayed version of the input.
Test 2) Input signal is white Gaussian noise (uniform power across all frequencies). We expect to attenuate the portion of the signal that falls within the stopband (ω ≥ 0.28π) by 60 dB.
Test 3) Input signal is colored Gaussian noise with uniform power only in the stopband. We realize this using a sum of 100 random-phase sinusoids, uniformly distributed in the stopband (ω ≥ 0.28π). We expect our filter to attenuate the entire signal by at least 60 dB.
Test 4) Input signal is a sinusoid at ωp = 0.2π passband edge.
Test 5) Input signal is a sinusoid at ωs = 0.28π stopband edge.
A slightly more efficient realization is also possible, employing the inherent flexibility of the factored structure which can (as mentioned for Example 2) accommodate a non-uniform datapath wordlength (i.e., truncation/rounding levels) throughout the cascade. According to Fig. 25, while 15 bits are needed for truncation at the outputs of stages #1, #2, #3, #10, #11, #12 and #13 (to accommodate up to a 6-dB increase in the stage-output RMS values, compared to the RMS of the filter input), only 14 bits are needed for truncation at the outputs of stages #4, #5, #6, #7, #8, and #9.
Noise analysis for the factored IFIR structure in Fig. 22 :
In this paper, an apparently quite superior general method, and a corresponding structure, for achieving significantly more hardware-efficient implementations of FIR filters has been presented. This advancement employs our recently announced “optimal factoring of FIR filters.” We have demonstrated that by applying optimal factoring to well-designed IFIR filters we can implement much better (more hardware-efficient) FIR digital filters. When assessing hardware cost as the sum of the required full adders and flip-flops, we have demonstrated that such optimally factored IFIR filters can provide substantially lower hardware cost than that achieved by the methods presented in previous research publications. (Two of our examples show hardware reductions in the vicinity of 50%, in comparison to conventional Remez implementations. Indeed, the recent publication  shows these results to be quite close to a new “lower bound” for the hardware complexity of any FIR implementation that meets the specifications of these two FIR filters.) As shown in Table 3, our optimally factored IFIR filters can be particularly beneficial when specifications that push the technology speed limits are required, and in these cases the area and power savings for our optimally factored IFIR filters still appear quite substantial. Further properties, benefits, and alternative implementations of these filters have also been demonstrated when implementing well-known examples. This further confirms the utility of the optimally factored IFIR filters in comparison with more conventional implementations.
An extension of this paper’s optimal factoring of IFIR filters to the optimal factoring of FRM (frequency response masking) filters is also evident. (Please see  for FRM details.) Basically, the FRM structure is an extension of the IFIR structure which includes additional FIR-type hardware (for the purpose of facilitating a broader class of FIR filters, including certain highpass and bandpass FIR filters whose direct implementation via an IFIR structure could seem problematic). In Figs. 3(a) and 5 of  it is shown that one can start with an IFIR structure and include two more FIR blocks to obtain an FRM filter implementation that may seem more suited for some desired filters, primarily bandpass FIR structures. While certain complications may arise when attempting to implement the FIR factoring efficiently in an FRM filter (i.e., one basic issue could concern a desire to preserve a pure delay chain z−Ln with what could have a substantial length L, and which may thus seem inconsistent with FIR factoring), it can still be envisioned that the type of FIR factoring that we have presented here could be extended to FRM filters. This would, of course, be a possible topic for future research.
This paper’s optimally factored IFIR structure and the design methods for finding optimal factors, scaling and sequencing them are patent pending.
- 5.M. Faust, C. H. Chang, Optimization of structural adders in fixed coefficient transposed direct form FIR filters, in Proceedings of the IEEE International Symposium on Circuits and Systems (2009), pp. 2185–2188Google Scholar
- 11.A. Mehrnia, A. N. Willson, Jr., On optimal IFIR filter design, in Proceedings of the IEEE International Symposium on Circuits and Systems, vol 3 (2004), pp. 133–136Google Scholar
- 20.W. Schüssler, On structures for nonrecursive digital filters. Arch. Elek. Übertragung 26(6), 255–258 (1972)Google Scholar
- 25.C.-Y. Yao, C.-J. Chien, A partial MILP algorithm for the design of linear phase FIR filters with SPT coefficients. IEICE Trans. Fundam. E85-A, 2302–2310 (2002)Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.