1 Introduction

Window filtering techniques [1] are widely used and important methods in signal processing. These functions are used for both time and frequency-based signal processing. For finite impulse response filters (FIR), several window functions are developed depending upon the requirements like reduction of sidelobe, dynamic range. In the past few decades, hardware efficient VLSI architectures for window function generator were designed using lookup tables but these architectures occupied large space and have more latency and word length. By this technique, the length of window could not vary. The fixed length window functions are inefficient. Therefore, a flexible size window-based functions which are reconfigurable were implemented based on CORDIC algorithm [2, 3]. The CORDIC algorithm has various applications like design of digital filters [4], FFTs [5] and several window functions. Also, CORDIC is useful in the calculation of many transcendental algebraic functions, which can be used in various applications, VLSI design such as multiplication, division, hyperbolic tangent and sigmoid function. CORDIC is an efficient hardware and has simplicity as well as low computational complexity property. The major drawback of the CORDIC implementation is that it results in high latency or large expense of hardware of scale factor compensation network design. For the minimization of latency and reduction of numbers of iterations, parallel CORDIC architectures [6, 7] have been proposed. In parallel CORDIC architectures, the latency is reduced but the cost of hardware and time to implement the scale factor compensation is increased. The savings of hardware obtained by employing variable scale factor [8, 9] compensation network, but these methods increase the area or otherwise affect throughput or latency. Scale-free CORDIC processor [10,11,12] has been suggested for optimum solution for hardware area savings and less complex hardware as well as high throughput and low latency. Our proposed work deals with the study of scale-free CORDIC processor design with two different window functions of variable length. That is, in the proposed study, reconfigurable architectures have been used.

Rest of the paper is organized as follows: Sect. 2 presents overview of CORDIC algorithm. In Sect. 3, we present design aspects of the scale-free CORDIC processor. The design of scale-free CORDIC processor-based architecture of different window functions is given in Sect. 4. Section 5 presents simulation results along with the performance measures and finally the conclusions are given in Sect. 6.

2 CORDIC Algorithm

  1. (A)

    Basic CORDIC Algorithm:

A great scientist Jack. E. Volder is an inventor of an original CORDIC algorithm [13, 14]. It converts the rectangular coordinate s(x, y) to polar coordinates (R, θ). It is a shift and adds steps to perform the vector rotation. The basic CORDIC algorithm equations are:

$$ x^{{\prime }} = x\cos\theta - y \sin\theta = \cos\theta \left( {x - y\tan\theta } \right) $$
(1)
$$ y^{{\prime }} = y\cos\theta + x\sin\theta = \cos\theta \left( {y + x\tan\theta } \right) $$
(2)

If the rotation angle θ is divided into a set of small angles for rotation in a set of steps θ can be approximated by \( \theta = \sum\nolimits_{i = 0}^{n} {\delta i\theta i} \), where \( \delta_{i} = \left\{ {1, - 1} \right\},\,\delta i \) is the sign of rotation (+ve for counterclockwise and −ve for clockwise rotation). There are several admissible values that may be chosen for the rotation steps. If the iteration is chosen as \( \theta_{i} = \tan^{ - 1} 2^{ - i} \). This value is selected because it is easier to implement in hardware; therefore, the new coordinates after each rotation \( \left( {x_{i + 1 } ,\,y_{i + 1} } \right) \) can be expressed as

$$ x_{i + 1} = \cos\theta i(xi - yi\,.\,\delta i\,.\,2^{{ - {\text{i}}}} ) $$
(3)
$$ y_{i + 1} = \cos\theta i(yi + xi\,.\,\delta i\,.\,2^{{ - {\text{i}}}} ) $$
(4)
$$ K_{i} = \cos \theta_{i} = \frac{1}{{\sqrt {1 + 2^{ - 2i} } }} $$
(5)
$$ {\text{K}} = \prod\limits_{0}^{N} {{\text{K}}i} \;{\text{is}} $$
(6)

Defined as the scale factor.

  1. (B)

    Unified CORDIC Algorithm:

In 1971, J. S. Walther reinvented the generalized CORDIC algorithm [15, 16] having three different trajectories like circular (m = 1), linear (m = 0) and hyperbolic (m = −1). For each trajectory, two rotation directions are included (vectoring and rotation). For vectoring, a vector with starting coordinates \( \left( {x_{0} ,\,y_{0} } \right) \) is rotated in such a way that the vector finally lies on the abscissa by iteratively converging \( y_{n} \) to zero. For a rotation, a vector with starting coordinates \( \left( {x_{0} ,\,y_{0} } \right) \) is rotated by an angle \( \theta_{0} \) in such a way that the final value of the angle register converges to zero. The unified CORDIC algorithm is defined as follows:

$$ \left[ {\begin{array}{*{20}c} {x_{i + 1} } \\ {y_{i + 1} } \\ \end{array} } \right] = {\text{K}}_{\text{i}} \left[ {\begin{array}{*{20}c} 1 & { - m\delta_{i} 2^{ - i} } \\ {\delta_{i} 2^{ - i} } & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {x_{i} } \\ {y_{i} } \\ \end{array} } \right] $$
(7)
$$ K_{i} = \frac{1}{{\sqrt {1 + m\,.\,2^{ - 2i} } }} $$
(8)
$$ \theta = \sum\limits_{i = 0}^{n} {\delta_{i} \theta_{i} } $$
(9)
$$ \theta_{i} = \frac{1}{\sqrt m }\tan^{ - 1} \sqrt {m\,.\,2^{ - i} } $$
(10)

where \( m = \left\{ {\begin{array}{*{20}c} { - 1} & {for\;hyperbolic} \\ 0 & {for\;linear} \\ 1 & {for\;circular} \\ \end{array} } \right. \)

There are six operational modes exist by using different combination of three trajectories and two modes, and they are summarized in Table 1.

Table 1 Modes m of operation for the CORDIC algorithm

Direct computation—multiplication—\( x \times y \), Division—\( \frac{y}{x} \).

Trigonometric functions-\( \sin z,\,\cos z,\,{ \tan }z \), \( \text{sinhz,}\,\text{coshz},\,\text{tanhz,}\tan^{ - 1} {\text{z}},\,\tanh^{ - 1} {\text{z}} \).

Additional function may be computed by choosing appropriate combination of multiples modes of operation and appropriate initialization.

$$ \begin{array}{*{20}l} {\tanh z = \frac{\sinh z}{\cosh z}} \hfill & {{\text{modes:}}\,\text{m} = - 1,\,0} \hfill \\ {e^{z} = \sinh z + \cosh z} \hfill & {{\text{modes:}}\,\text{m} = - 1} \hfill \\ {\text{y} = {\text{f}}(\text{z}) = 1/1 + {\text{e}}^{{ - \text{z}}} } \hfill & {{\text{modes:}}\,\text{m} = - 1,\,0} \hfill \\ \end{array} $$
  1. (C)

    Scale-free CORDIC Algorithm

There are several improvements in CORDIC algorithm. For improvements of architecture performance and reduction of cost, an abundance of development has been established in the area of algorithm design and advancement of architecture. For enhancement of throughput the parallel and pipelined CORDIC architectures are preferred. The pipelined scaling-free CORDIC [10,11,12] is a very enormous development in the research area of upgradation of the CORDIC algorithm. The Taylor series-based scale-free CORDIC algorithm is a great invention in the field of CORDIC algorithm improvements [17]. The rotation matrix for scaling-free CORDIC is given as:

$$ Rp = \left[ {\begin{array}{*{20}c} {1 - 2^{ - (2i + 1)} } & { - 2^{ - i} } \\ {2^{ - i} } & {1 - 2^{ - (2i + 1)} } \\ \end{array} } \right] $$
(11)

The sine and cosine approximated to

$$ \sin \alpha_{i} = 2^{ - i} $$
(12)
$$ cos\alpha_{i} = 1 - 2^{ - (2i + 1)} $$
(13)

The conventional CORDIC processor gives two direction rotations but the Taylor series-based scale-free CORDIC processor gives only one direction rotation. The Taylor series of sine and cosine terms defined as

$$ \cos \alpha_{i} = \sum\limits_{n = 0}^{\infty } {( - 1)^{n} \frac{{\alpha i^{2n} }}{!2n} = 1 - \frac{{\alpha i^{2} }}{!2} + \frac{{\alpha i^{4} }}{!4}{-\!\!-}} $$
(14)
$$ \sin \alpha_{i} = \sum\limits_{n = 0}^{\infty } {( - 1)^{n} \frac{{\alpha i^{2n + 1} }}{!2n + 1} = \alpha i - \frac{{\alpha i^{3} }}{!3} + \frac{{\alpha i^{5} }}{!5}{-\!\!-}} $$
(15)
  1. (D)

    Window Filtering Techniques

During spectral analysis, the input signals are to be truncated to fit a finite observation window according to the length of FFT processor. In frequency domain, there are several phenomena occur like picket fence effect and spectral leakage due to the direct truncation by using rectangular window. There are some different window functions by which we reduce these effects. Window filtering is a popular process for limiting any signals to small-time segments in a desired fields. The most common accessible windowing techniques are rectangular, Gaussian, Hamming, Hanning, Blackman-Harris and Kaiser. The assortment of the available windows depends on the spectral characteristics desired by the applications. As given below, the equations explain the Hanning, Hamming and the Blackman window functions [18]

$$ W_{Hann} (n) = 0.5 - 0.5\cos \left( {\frac{2\pi n}{(N - 1)}} \right) $$
(16)

where N is the window length

$$ W_{Hamm} (n) = \alpha - \beta \cos \left( {\frac{2\pi n}{(N - 1)}} \right) $$
(17)

where \( \upalpha +\upbeta = 1 \).

To maximize sidelobe cancellation, the values of α and β are determined. For Hamming window, the coefficients are calculated as α = 25/46 and β = 21/46.

$$ W_{Blackman} (n) = \alpha_{0} + \alpha_{1} \cos \left( {\frac{2\pi n}{N}} \right) + \alpha_{2} \cos \left( {\frac{4\pi n}{N}} \right) $$
(18)

where α0 + α1 + α2 = 1. The Blackman window with coefficients \( \alpha_{0} = 0.42,\,\alpha_{1} = 0.5 \) and \( \alpha_{2} = 0.08 \).

3 Design Aspects of Scale-Free CORDIC Processor

Decomposition of angle of rotation into micro-rotations in conventional CORDIC, the angle of rotation is used as follows: (i) the elementary angles are defined according to the \( 2^{ - i} \) where i is the no iteration and ROM is used as a storage circuit for the elementary angles, (ii) the micro-rotation corresponding to all the elementary angles are performed in clockwise or anti-clockwise and (iii) each elementary angle is non-repeated, but in scale-free CORDIC processor, the micro-rotations are rotated in only one direction with multiple times corresponding to the initial shifts, and for other shifts, non-repeated iterations are included.

  1. (a)

    For elimination of the ROM which is used for storage of elementary angles and for simplification of the hardware define the elementary angles [19] as: \( \alpha i = 2^{{ - s_{i} }} \) where \( s_{i} \) is the number of shifts for ith iteration.

  2. (b)

    The most significant one location represents the bit position of the one (1) in an input string of bits starting from most significant bit (MSB). The MSO location identifier (MSO-LI) generates an n-bit output for a \( 2^{n} \) bit input string. It is used for finding the shift index. \( s_{i} = N - M \), N is the word length of the input data and M is the location of the most significant bit (one) in N input string.

  3. (c)

    The order of approximation of Taylor series decides the largest elementary angle. The basic shift and the largest elementary angle for third order of approximation are to be:

    $$ s_{b} = \left\lfloor {\frac{l - \log 2(4!)}{4}} \right\rfloor $$
    (19)
    $$ \alpha_{max} = 2^{{ - s_{b} }} $$
    (20)

    where l is the word length. For 16-bit word length, \( s_{b} = \left\lfloor {2.854} \right\rfloor \). Depending upon the desired accuracy, one can either select \( s_{b} = 2 \) or \( s_{b} = 3 \). Any rotation angle θ is expressed as:

    $$ \theta = n_{1} \,.\,\alpha_{max} + n_{2} \,.\,{\sum }\alpha_{si} $$
    (21)

    where \( s_{i} \ge s_{b} \) and \( {\text{n}} = n_{1} + n_{2} \), n is the total number of iterations ‘n’ is a constant. The number of frequentness for third-order Taylor series approximation is seven.

For designing of scale-free CORDIC processor and the micro-rotation sequence generation, we take input angle to be rotated \( \theta_{i} \) and most significant ones bit location is represented by ML (location identifier). If ML = 15, then elementary angle α = 0.25 radians, shift \( {\text{si}} = 2,\, \theta_{i + 1} = \theta i - \alpha \). If ML is other than 15, then shift \( si = 16 - ML\;and\;\theta_{i + 1} = \theta i\;with\;\theta i[ML] = \,^{{\prime }} 0^{{\prime }} \).

Table 2 shows that the elementary angles corresponding to the basic shift values.

Table 2 Elementary angles versus corresponding shifts [19]

The percentage error for the sin and cos is indistinguishable from the range \( \left( {0,\pi /4} \right) \). So the maximum angle of rotation handled by micro-rotation sequence generation lies in the range \( \left( {0,\pi /4} \right) \).

The following points are important for the designing of micro-rotation sequence generator.

  1. (i)

    For \( \left( {N - MSOB_{location} < s_{b} } \right) \). Then the shift index would be used corresponding the highest elementary angle \( \alpha_{max} = 0.25 \) radians with shift index = 2.

  2. (ii)

    For \( \left( {N - MSOB_{location} \ge s_{b} } \right) \). The highest elementary angle \( \left( {\alpha_{si} } \right) \) would be employed for the CORDIC iteration corresponding to the \( s_{i} = 16 - M \).

The third-order Taylor series augmentation of sine–cosine functions gives the revolving matrix for the proposed architecture. In complete scale-free CORDIC algorithm for simplification of equations of rotation matrix, the Taylor series coefficient!3 is shifted by 23.

$$ Ri = \left[ {\begin{array}{*{20}c} {1 - 2^{ - (2si + 1)} } & { - (2^{ - si} - 2^{ - (3si + 3)} } \\ {2^{ - si} - 2^{ - (3si + 3)} } & {1 - 2^{ - (2si + 1)} } \\ \end{array} } \right] $$
(22)

Figure 1 shows the coordinate calculation unit by which calculate the \( x_{i} \) and \( y_{i} \) value. Shift index calculation \( s_{i} \) unit shown in Fig. 2. Shift index calculation depends on the elementary angles. The elementary angles’ calculation or micro-rotation sequence generator unit is shown in Fig. 3.

Fig. 1
figure 1

Design of coordinate calculation unit

Fig. 2
figure 2

Shift index si calculation unit

Fig. 3
figure 3

Micro-rotation sequence generator

4 Architecture for Window Functions

For implementation of windows functions, we use the pipelined architecture. We have designed the window architecture for 16-bit output width. Here we designed the window functions by using circular CORDIC processor, linear CORDIC processor and angle generator circuit and these window functions are also designed by using circular CORDIC processor, window coefficient multiplier which is designed by using booth multiplier. Figure 4 shows the block diagram of Angle Generator Unit. Figures 5 and 6 show the block diagram for generating different window functions. The different window functions are depending on the window select pins as the Hanning (ws0 = 0, ws1 = 0), Hamming (ws0 = 1, ws1 = 0), Blackman (ws0 = 0, ws1 = 1) [3, 18] window families. The circuit is the combination of blocks of angle generator unit (AGU), window coefficient multiplier (WCM), circular CORDIC processor (CCP) and first input first output register. Angle generator unit generates two angles \( \uptheta = \frac{2\pi n}{N} \) and \( 2\theta = \frac{4\pi n}{N} \). For multiplication of the window coefficient used a linear CORDIC which is based on conventional CORDIC algorithm or optimized shift-add network which is designed using booth multiplier. CORDIC processor which is in rotation mode and circular trajectory is employed for producing the cosine terms, and it is used in the window functions equations.

Fig. 4
figure 4

Angle generator unit

Fig. 5
figure 5

Block diagram for generating window functions using LCP and CCP

Fig. 6
figure 6

Block diagram for generating window functions using shift-add n/w and CCP

Angle generator unit produces two angles for evaluating the window functions.

$$ \uptheta = \frac{2\pi n}{N}\;{\text{and}}\;2\theta = \frac{4\pi n}{N} $$
(23)

where N is a multiple of 2 such that \( {\text{N}} = 2^{M} \)

The difference between the consecutive values of \( \theta \) is given by

$$ \Delta \theta = \theta_{n + 1} - \theta_{n} ,\;\Delta\varvec{\theta}= \frac{2\pi }{(N - 1)} $$
(24)

For \( {\text{N}} = 2^{M} \)

$$ \Delta\varvec{\theta}= \frac{2\pi }{{\left( {2^{M} - 1 } \right)}} = \frac{{2\pi \left( {1 - 2^{ - M} } \right)^{ - 1} }}{{2^{M} }} $$
(25)

Using binomial theorem (BT), we simplify to

$$ \Delta\varvec{\theta}= \frac{{{\mathbf{2}}\varvec{\pi}}}{{{\mathbf{2}}^{\varvec{M}} }} + \frac{{{\mathbf{2}}\varvec{\pi}}}{{{\mathbf{2}}^{{{\mathbf{2}}\varvec{M}}} }} + \frac{{{\mathbf{2}}\varvec{\pi}}}{{{\mathbf{2}}^{{{\mathbf{3}}\varvec{M}}} }} $$
(26)

The CCP unit is designed for the target angle range \( \left[ {0, \pi /4} \right] \). The range of target angle is enhanced by using the octant symmetry which is shown in Fig. 7 and Table 3 shows the initial coordinate values for enhancement of the angle of target angle.

Fig. 7
figure 7

Octant symmetry

Table 3 Initial coordinate values for octant symmetry

5 FPGA Implementation Results

Scale-free CORDIC algorithm-based window functions architectures designed using linear CORDIC and circular CORDIC processor and also by using add-shift network and circular CORDIC processor. These are designed by using Xilinx13.1 VHDL module and are mapped into Virtex-5(XC5VLX20T-FF323) device. Table 4 shows that for 16-bit implementation, the first design consumes 1128 slices and 7674 4- input LUTs, with a maximum operating frequency 70.598 MHz. The total delay is 50.786 ns. In this design, logical delay is 8.850 ns and route delay is 41.936 ns. The second design consumes 1098 slices and 8119 4-input LUTs, with a maximum operating frequency 70.961 MHz. The total delay is 2.862 ns in which 2.54 ns is for logical delay and 0.286 ns for route delay.

Table 4 Complexity comparison: window function generator

5.1 Area

In the design with linear and circular CORDIC processor, seven, 16-bit adder/subtractor and 261 registers were used. The number of latches, comparators and multiplexers used is 94,803 and 150, respectively. The number of XOR logic gate used is 12177.

Device usage summary: Selected Device: 5vlx20tff323-2

Slice Logic Utilization: (a) Number of Slice Registers: 2137 out of 12480-17% (b) Number of Slice LUTs: 7703 out of 12480-61% (c) Number used as Logic: 7703 out of 12480-61%.

In the design with add-shift network and circular CORDIC processor, 106, 16-bit adder/subtractor and 98, 32-bit adder/subtractor were used. The numbers of registers and latches used are 1232 and 142, respectively. The comparators are 803. The number of XOR logic gate is 4752.

Device usage summary: Selected Device: 5vlx20tff323-2. Slice Logic Utilization: (a) Number of Slice Registers: 3146 out of 12480-25% (b) Number of Slice LUTs: 8011 out of 12480-64% (c) Number used as Logic: 8011 out of 12480-64%.

5.2 Latency and Delay

The throughput of all the architecture is equal; it is one data/clock cycle. Latency is the number of iteration of the pipelined CORDIC processor. So it is different for both the architecture. In the first design, there are two circular and three linear CORDIC processors. So the total pipeline stages for first design are 26, while in the second design the total pipeline stages are only 10. This shows that the latency is low in second architecture as compared to the first architecture, and also, the total delay in the second architecture is less as compared to the first architecture.

6 Conclusion

In this paper, we performed comparative study of the two different types of window function generator one is designed using circular CORDIC processor and linear CORDIC processor. Another one is designed using circular CORDIC processor and add-shift network. Add-shift network is designed using booth multiplier. We observe that the total delay is comparatively small in the proposed architecture, i.e. the design with CORDIC processor and add-shift network. This is due to the use of add-shift network reduces the number of pipelining stages which results in the number of iteration and latency. Further, we observe that all the operations of multiplications could be performed directly by the use of add-shift network with booth multiplier.