# Accuracy Constraint Determination in Fixed-Point System Design

- 1.4k Downloads
- 12 Citations

**Part of the following topical collections:**

## Abstract

Most of digital signal processing applications are specified and designed with floatingpoint arithmetic but are finally implemented using fixed-point architectures. Thus, the design flow requires a floating-point to fixed-point conversion stage which optimizes the implementation cost under execution time and accuracy constraints. This accuracy constraint is linked to the application performances and the determination of this constraint is one of the key issues of the conversion process. In this paper, a method is proposed to determine the accuracy constraint from the application performance. The fixed-point system is modeled with an infinite precision version of the system and a single noise source located at the system output. Then, an iterative approach for optimizing the fixed-point specification under the application performance constraint is defined and detailed. Finally the efficiency of our approach is demonstrated by experiments on an MP3 encoder.

### Keywords

Quantization Noise Application Performance Accuracy Constraint Modify Discrete Cosine Transform Quantization Noise Power## 1. Introduction

In digital image and signal processing domains, computing oriented applications are widespread in embedded systems. To satisfy cost and power consumption challenges, fixed-point arithmetic is favored compared to floating-point arithmetic. In fixed-point architectures, memory and bus widths are smaller, leading to a definitively lower cost and power consumption. Moreover, floating-point operators are more complex, having to deal with the exponent and the mantissa, and hence, their area and latency are greater than those of fixed-point operators. Nevertheless, digital signal processing (DSP) algorithms are usually specified and designed with floating-point data types. Therefore, prior to the implementation, a fixed-point conversion is required.

Finite precision computation modifies the application functionalities and degrades the desired performances. Fixed-point conversion must, however, maintain a sufficient level of accuracy. The unavoidable error due to fixed-point arithmetic can be evaluated through analytical- or simulation-based approaches. In our case, the analytical approach has been favored to obtain reasonable optimization times for fixed-point design space exploration. In an analytical approach, the performance degradations are not analyzed directly in the conversion process. An intermediate metric is used to measure the computational accuracy. Thus, the global conversion method is split into two main steps. Firstly, a computational accuracy constraint is determined according to the application performances, and secondly the fixed-point conversion is carried out. The implementation cost is minimized under this accuracy constraint during the fixed-point conversion process. The determination of the computational accuracy constraint is a difficult and open problem and this value cannot be defined directly. This accuracy constraint has to be linked to the quality evaluation and to the performance of the application.

A fixed-point conversion method has been developed for software implementation in [1] and for hardware implementation in [2]. This method is based on an analytical approach to evaluate the fixed-point accuracy. The implementation cost is optimized under an accuracy constraint. In this paper, an approach to determine this accuracy constraint from the application performance requirements is proposed. This module for accuracy constraint determination allows the achieving of a complete fixed-point design flow for which the user only specifies the application performance requirements and not an intermediate metric. The approach proposed in this paper is based on an iterative process to adjust the accuracy constraint. The first value of the accuracy constraint is determined through simulations and depends on the application performance requirements. The fixed-point system behavior is modeled with an infinite precision version of the system and a single noise source located at the system output. The accuracy constraint is thus determined as the maximal value of the noise source power which maintains the desired application quality. Our noise model is valid for rounding quantization law and for systems based on arithmetic operations (addition, subtraction, multiplication, division). This includes LTI and non-LTI systems with or without feedbacks. In summary, the contributions of this paper are (i) a technique to determine the accuracy constraint according to the application performance requirements, (ii) a noise model to estimate the application performances according to the quantization noise level, (iii) an iterative process to adjust the accuracy constraint.

The paper is organized as follows. After the description of the problem and the related works in Section 2, our proposed fixed-point design flow is presented in Section 3. The noise model used to determine the fixed-point accuracy is detailed in Section 4. The case study of an MP3 coder is presented in Section 5. Different experiments and simulations have been conducted to illustrate our approach ability to model quantization effect and to predict performance degradations due to fixed-point arithmetic.

## 2. Problem Description and Related Works

The aim of fixed-point design is to optimize the fixed-point specification by minimizing the implementation cost. Nevertheless, fixed-point arithmetic introduces an unalterable quantization error which modifies the application functionalities and degrades the desired performance. A minimum computational accuracy must be guaranteed to maintain the application performance. Thus, in the fixed-point conversion process, the fixed-point specification is optimized. The implementation cost is minimized as long as the application performances are fulfilled. In the case of software implementations, the cost corresponds to the execution time, the memory size, or the energy consumption. In the case of hardware implementations, the cost corresponds to the chip area, the critical path delay, or the power consumption.

One of the most critical parts of the conversion process is the evaluation of the degradation of the application performance due to fixed-point arithmetic. This degradation can be evaluated with two kinds of methods corresponding to analytical- and simulation-based approaches. In the simulation-based method, fixed-point simulations are carried out to analyze the application performances [3]. This simulation can be done with system-level design tools such as CoCentric (Synopsys) [4] or Matlab-Simulink (Mathworks) [5]. Also, C++ classes to emulate the fixed-point mechanisms have been developed as in *SystemC* [4] or *Algorithmic C data types* [6]. These techniques suffer from a major drawback which is the time required for the simulation [7]. It becomes a severe limitation when these methods are used in the fixed-point specification optimization process where multiple simulations are needed. This optimization process needs to explore the design-space of different data word-lengths. A new fixed-point simulation is required when a fixed-point format is modified. The simulations are made on floating-point machines and the extra-code used to emulate fixed-point mechanisms increases the execution time to between one and two orders of magnitude compared to traditional simulations with floating-point data types [8]. Different techniques [7, 9, 10] have been investigated to reduce this emulation extra-cost. To obtain an accurate estimation of the application performance, a great number of samples must be taken for the simulation. For example, in the digital communication domain, to measure a bit error rate of Open image in new window , at least Open image in new window samples are required. This large number of samples combined with the fixed-point mechanism emulation leads to very long simulation times. For example, in our case, one fixed-point C code simulation of an MP3 coder required 480 seconds. Thus, fixed-point optimization based on simulation leads to too long execution times.

In the case of analytical approaches, a mathematical expression of a metric is determined. Determining an expression of the performance for every kind of application is generally an issue. Thus, the performance degradations are not analyzed directly in the conversion process and an intermediate metric which measures the fixed-point accuracy must be used. This computational accuracy metric can be the quantization error bounds [11], the mean square error [12], or the quantization noise power [10, 13]. In the conversion process, the implementation cost is minimized as long as the fixed-point accuracy metric is greater than the accuracy constraint. The analytical expression of the fixed-point accuracy metric is first determined. Then, in the optimization process, this mathematical expression is evaluated to obtain the accuracy value for a given fixed-point specification. This evaluation is much more rapid than in the case of a simulation-based approach. The determination of the accuracy constraint is a difficult problem and this value cannot be defined directly. This accuracy constraint has to be linked to the quality evaluation and performances of the application.

Most of the existing fixed-point conversion methods based on an analytical approach [1, 11, 13, 14, 15] evaluate the output noise level, but they do not predict the application performance degradations due to fixed-point arithmetic. In [12], an analytical expression is proposed to link the bit error rate and the mean square error. Nevertheless, to our knowledge, no general method was proposed to link computational accuracy constraint with any application performance metric. In this paper, a global fixed-point design flow is presented to optimize the fixed-point specification under application performance requirements. A technique to determine the fixed-point accuracy constraint is proposed and the associated noise model is detailed.

## 3. Proposed Fixed-Point Design Process

### 3.1. Global Process

A fixed-point datum Open image in new window of Open image in new window bits is made up of an integer part and a fractional part. The number of bits associated with each part does not change during the processing leading to a fixed binary position. Let Open image in new window and Open image in new window be the binary-point position referenced, respectively, from the most significant bit (MSB) and the least significant bit (LSB). The terms Open image in new window and Open image in new window correspond, respectively, to the integer and fractional part word-length. The word-length Open image in new window is equal to the sum of Open image in new window and Open image in new window . The aim of the fixed-point conversion is to determine the number of bits for each part and for each datum.

### 3.2. Accuracy Constraint Determination

The first step corresponds to the initial accuracy constraint determination Open image in new window which is the maximal value of the noise level satisfying the performance objective Open image in new window . For example, in a digital communication receiver, the maximal quantization noise level is determined according to the desired bit error rate.

First, a prediction of the application performance is performed with the technique presented below. Let Open image in new window be the function representing the predicted performances according to the noise level Open image in new window . To determine the initial accuracy constraint value ( Open image in new window ), equation Open image in new window is solved graphically, and Open image in new window is the solution of this equation.

#### 3.2.1. Performance Prediction

Most of the time, the floating-point simulation has already been developed during the application design step, and the application output samples can be used directly. Therefore, the time required for exploring the noise power values is significantly reduced and becomes negligible with regard to the global implementation flow. Nevertheless, this technique cannot be applied for systems where the decision on the output is used inside the system like, for example, decision-feedback equalization. In this case, a new floating-point simulation is required for each noise level which is tested.

### 3.3. Fixed-Point Conversion Process

The first part corresponds to the determination of the integer part word-length of each datum. The number of bits Open image in new window for this integer part must allow the representation of all the values taken by the data and is obtained from the data bound values. Thus, firstly the dynamic range is evaluated for each datum. Then, these results are used to determine, for each data, the binary-point position which minimizes the integer part word-length and which avoids overflow. Moreover, scaling operations are inserted in the application to adapt the fixed-point format of a datum to its dynamic range or to align the binary-point of the addition inputs.

The vector Open image in new window is the optimized fixed-point specification obtained for the constraint value Open image in new window at iteration Open image in new window of the process.

The data word-length determination corresponds to an optimization problem where the implementation cost and the application accuracy must be evaluated. The major challenge is to evaluate the fixed-point accuracy. To obtain reasonable optimization times, analytical approaches to evaluate the accuracy have been favored. The computational accuracy is evaluated using the quantization noise power. The mathematical expression of this noise power is computed for systems based on arithmetic operations with the technique presented in [16]. This mathematical expression is determined only once and is used for the different iterations of the fixed-point conversion process and for the different iterations of the global design flow.

### 3.4. Performance Evaluation and Accuracy Constraint Adjustment

where the term Open image in new window is the tolerance on the objective value.

The choice to increment or decrement depends on the slope sign of Open image in new window and the sign of the difference between Open image in new window and Open image in new window . For the next iterations, two or more measured points are available. The two consecutive points of abscissa Open image in new window and Open image in new window such as Open image in new window are selected and let Open image in new window be the linear equation linking the two points Open image in new window and Open image in new window . The adjusted accuracy constraint used for the next iteration Open image in new window is Open image in new window defined such as Open image in new window . The adjustment process is illustrated in Section 5.3 through an example.

## 4. Noise Model

### 4.1. Noise Model Description

#### 4.1.1. Quantization Noise Model

The use of fixed-point arithmetic introduces an unavoidable quantization error when a signal is quantified. A common model for signal quantization has been proposed by Widrow in [17] and refined in [18]. The quantization of a signal is modeled by the sum of this signal and a random variable Open image in new window , which represents the quantization noise. This additive noise Open image in new window is a uniformly distributed white noise that is uncorrelated with the signal, and independent from the other quantization noises. In this study, the round-off method is used rather than truncation. For convergent rounding, the quantization leads to an error with a zero mean. For classical rounding, the mean can be assumed to be null as soon as several bits (more than 3 bits) are eliminated in the quantization process. The expression of the statistical parameters of the noise sources can be found in [16]. If Open image in new window is the quantization step (accuracy), the noise values are in the interval Open image in new window .

#### 4.1.2. Noise Model for Fixed-Point System

where the term Open image in new window represents the impulse response of the system having Open image in new window as output and Open image in new window as input. In the case of linear time invariant (LTI) systems, the different terms Open image in new window are constant. In the case of non-LTI systems the terms Open image in new window are time varying. In this context, two extreme cases can be distinguished. In the first case, a quantization noise Open image in new window predominates in terms of variance compared to the other noise sources. A typical example is an extensive reduction of the number of bits at the system output compared to the other fixed-point formats. In this case, the level of this output quantization noise exceeds the other noise source levels. Thus, the probability density function of the output quantization noise is very close to that of the predominant noise source and can be assimilated to a uniform distribution. In the second case, an important number of independent noise sources have similar statistical parameters and no noise source predominates. All the noise sources are uniformly distributed and independent of each other. By using the central limit theorem, the sum of the different noise sources can be modeled by a centered normally distributed noise.

The weight Open image in new window is set in the interval Open image in new window and allows the representation of the different intermediate cases between the two extreme cases presented above. The weight Open image in new window fixes the global noise variance.

#### 4.1.3. Choice of Noise Model Parameters

The noise Open image in new window is assumed to be white noise. Nevertheless, the spectral density function of the real quantization noise depends on the system and most of the time is not white. If the application performance is sensitive to the noise spectral characteristic, this assumption will degrade the performance prediction. Nevertheless, the imperfections of the noise model are compensated by the iteration process which adapts the accuracy constraint. The effects of the noise model imperfections increase in the number of iterations required to converge to the optimized solution.

To take account of the noise spectral characteristics, the initial accuracy constraint Open image in new window can be adjusted and determined in a two-step process. The accuracy constraint Open image in new window is determined firstly assuming that the noise Open image in new window is white. Then, the fixed-point conversion is carried out and the fixed-point specification Open image in new window is simulated. The spectral characteristics of the real output quantization noise Open image in new window are measured. Afterwards, the accuracy constraint Open image in new window is adjusted and determined a second time assuming that the noise Open image in new window has the same spectral characteristics as the real quantization noise Open image in new window .

Like for the spectral characteristics, the weight Open image in new window is set to an arbitrary value depending of the kind of implementation. Then, after the first iteration, the Open image in new window value is adjusted by using the measured Open image in new window value obtained from the real output quantization noise Open image in new window .

In most of the processors the architecture is based on a double precision computation. Inside the processing unit, most of the computations are carried out without loss of information and truncation occurs when the data are stored in memory. This approach tends to obtain a predominant noise source at the system output. Thus, for software implementation the weight Open image in new window is fixed to 1. For hardware implementation the optimization of the operator word-length leads to a fixed-point system where no noise source is predominant. The optimization distributes the noise to each operation. Thus, for hardware implementation the weight Open image in new window is fixed to 0.

### 4.2. Validation of the Proposed Model

#### 4.2.1. Validation Methodology

The aim of this section is to analyze the accuracy of our model with real quantization noises. The real noises are obtained through simulations. The output quantization noise is the difference between the system outputs obtained with a fixed-point and a floating-point simulation. The floating-point simulation which uses double-precision types is considered to be the reference. Indeed, in this case, the error due to the floating-point arithmetic is definitely less than the error due to the fixed-point arithmetic. Thus, the floating-point arithmetic errors can be neglected.

This statistical test follows the Open image in new window distribution with Open image in new window degrees of freedom. Therefore, if the distance is higher than the threshold Open image in new window , then the hypothesis Open image in new window ( Open image in new window follows the probability density function Open image in new window ) is rejected. The significance level of the test is the probability of rejecting Open image in new window when the hypothesis is true. Choosing a certain value for this level will set the threshold distance for the test. According to [20], the significance level Open image in new window should be in Open image in new window .

The real quantization noise can be modeled with (5) if the optimized value Open image in new window is lower than the threshold Open image in new window .

#### 4.2.2. FIR Filter Example

where Open image in new window is the filter input and Open image in new window the filter coefficients.

The word-lengths of the input signal ( Open image in new window ) and of the coefficient ( Open image in new window ) are equal to 16 bits. If no bit is eliminated during the multiplication, the multiplier output word-length Open image in new window is equal to 32 bits. The adder input and output word-length are equal to Open image in new window . At the filter output, the data is stored in memory with a word-length Open image in new window equal to 16 bits.

The adder word-length Open image in new window is varying between 16 and 32 bits, while the output of the system is always quantized on 16 bits.

#### 4.2.3. Benchmarks

To validate our noise model, different DSP application benchmarks have been tested and the adequacy between our model and real noises has been measured. For each application, different output noises have been obtained by evaluating several fixed-point specifications and different application parameters. The number of output noises analyzed for one application is defined through the term Open image in new window . For these different applications based on arithmetic operations, the input and the output word-length are fixed to 16 bits. The different fixed-point specifications are obtained by modifying the adder input and output word-lengths. Eight values are tested for the adder: Open image in new window .

Adequacy between our model and real noises for different DSP applications.

Applications | Test | Significance level | |
---|---|---|---|

| |||

FFT (16 and 32 samples) | 16 | 100% | 100% |

IIR 8 direct form I | 192 | 98% | 99% |

IIR 8 direct form II | 192 | 100% | 100% |

IIR 8 transposed form | 192 | 97% | 99% |

Adaptive APA filter | 8 | 87% | 100% |

Volterra filter | 8 | 100% | 100% |

WCDMA receiver | 16 | 100% | 100% |

MP3 | 28800 | 78% | 87% |

This metric corresponds to the ratio of output noises for which a weight Open image in new window can be found to model the noise probability density function with (5).

The different applications used to test our approach are presented in this paragraph. A fast Fourier transform (FFT) has been performed on vectors made-up of 16 or 32 samples. Linear time-invariant (LTI) recursive systems have been tested through an eight-order infinite impulse filter (IIR). This filter is implemented with a cascaded form based on four second order cells. For this cascaded eight-order IIR filter, 24 permutations of the second-order cells can be tested leading to very different output noise characteristics [21]. Three forms have been tested corresponding to *Direct-form I*, *Direct-Form II* and *Transposed-Form*. An adaptive filter based on the affine projection algorithm (APA) structure [22] has been tested. This filter is made-up of eight taps and the observation vector length is equal to five. A nonlinear nonrecursive filter has been tested using a second-order Volterra filter. Our benchmarks do not include non linear systems with memory and thus do not validate this specific class of algorithms.

More complex applications have been studied through a WCDMA receiver and an MP3 coder. The MP3 coder is presented in Section 5.1. For the third generation mobile communication systems based on the WCDMA technique, the receiver is mainly made up of an FIR receiving filter and a rake receiver including synchronization mechanisms [23]. The rake receiver is made-up of three parts corresponding to the transmission channel estimation, synchronization and symbol decoding. The synchronization of the code and the received signal is realized with a delay-locked loop (DLL). The noises are observed at the output of the symbol decoding part.

The results from Table 1 show that our noise model can be applied to most of the real noises obtained for different applications. For some applications, like FFT, FIR, WCDMA receiver and the Volterra filter, a balance coefficient Open image in new window can always be found. These four applications are nonrecursive and the FFT, FIR, WCDMA receiver are LTI systems.

For the eight-order infinite impulse filter, almost all the noises (97%–100%) can be modeled with our approach. For these filters, 90% of the output quantization noise are modeled with a balance coefficient Open image in new window equal to 0. Thus, the output noise is a purely normally distributed noise. In LTI system, the output noise Open image in new window due to the noise Open image in new window corresponds to the convolution of the noise Open image in new window with Open image in new window . This term Open image in new window is the impulse response of the transfer function between the noise source and the output. Thus, the output noise is the weighted sum of the delayed version of the noise Open image in new window . The noise Open image in new window is a uniformly distributed white noise, thus the delayed versions of the noise Open image in new window are uncorrelated. The samples are uncorrelated but are not independent and thus the central limit theorem cannot be applied directly. Even if only one noise source is located in the filter, the output noise is a sum of noncorrelated noises and this output noise tends to have a Gaussian distribution. For the MP3 coder, when the level is 0.001 the test is successful about 87% of the time (78% when Open image in new window is 0.05).

For the different applications, the metric Open image in new window is close to 100%. These results show that our model is suitable to model the output quantization noise of fixed-point systems.

## 5. Case Study: MP3 Coder

The application used to illustrate our approach and to underline its efficiency comes from audio compression and corresponds to an MP3 coder. First, the application and the associated quality criteria are briefly described. Then, the ability of our noise model to predict application performance is evaluated. Finally, a case study to obtain an optimized fixed-point specification which ensures the desired performances is detailed.

### 5.1. Application Presentation

#### 5.1.1. MP3 Coder Description

The BLADE [25] coder has been used with a 192 Kbits/s constant bit rate. This coder leads to a good quality compression with floating-point data types. A sample group of audio data has been defined for the experiments. This group contains various kinds of sounds, where each can lead to different problems during encoding (harmonic purity, high or low dynamic range, Open image in new window ). Ten different input tracks have been selected and tested.

#### 5.1.2. Quality Criteria

In the case of an MP3 coder, the output noise power metric cannot be used directly as a compression quality criterion. The compression is indeed based on adding quantization noises where it is imperceptible, or at least barely audible. The compression quality has been tested using EAQUAL [26] which stands for evaluation of audio quality. It is an objective measurement tool very similar to the ITU-R recommendation BS.1387 based on PEAQ technique. This has to be used because listening tests are impossible to formalize. In EAQUAL, the degradations due to compression are measured with the objective degradation grade (ODG) metric. This metric varies from 0 (no degradation) to −4 (inaudible). The level of −1 is the threshold beyond the degradation becomes annoying for ears. This ODG is used to measure the degradation due to fixed-point computation. Thus for the fixed-point design, the aim is to obtain the fixed-point specification of the coder which minimizes the implementation cost and maintains an ODG lower or equal to −1 for the different audio tracks of the sample group.

### 5.2. Performance Prediction

The efficiency of our approach depends on the quality of the noise model used to determine the accuracy constraint. To validate this latter, its ability to model real quantization noises and its capability to predict the application performance according to the quantization noise level are analyzed through experiments.

The predicted and real performances are very close except for two noise levels equal to −98.5 dB and −88 dB. In these cases, the difference between the two functions Open image in new window and Open image in new window is, respectively, equal to 0.2 and 0.4. In the other cases, the difference is less than 0.1. It must be underlined that when the ODG is lower than −1.5, the ODG evolution slope is higher, and a slight difference in the noise level leads to a great difference on the ODG. The case where only the polyphase filter is considered to use fixed-point data type (the MDCT is computed with floating-point data-types) has been tested. The difference between the two functions Open image in new window and Open image in new window is less than 0.13. Thus, our approach allows the accurate prediction of the application performance.

### 5.3. Fixed-Point Optimization under Performance Constraint

In this section, the design process to obtain an optimized fixed-point specification which guarantees a given level of performances is detailed. The ODG objective Open image in new window is fixed to −1 corresponding to the acceptable degradation limit.

First, to determine the initial value of the accuracy constraint, a prediction of the application performance is made. This initial value determination corresponds to the first stage of the design flow presented in Figure 1. The function Open image in new window is determined for different noise levels with a balance coefficient varying from 0 to 1. Two results can be underlined. Firstly, the influence of the balance coefficient Open image in new window has been tested and is relatively low. Between the extreme values of Open image in new window , the ODG variation is on average less than 0.1. In these experiments, a hardware implementation is under consideration, thus the balance coefficient Open image in new window is fixed to 0.

Secondly, the ODG change is strongly linked to the kind of input tracks used for the compression.

In these experiments, 10 different input tracks have been tested. To obtain an ODG equal to −1, a difference of 33 dB is obtained between the minimal and the maximal ODG. These results underline the necessity to have inputs which are representative of the different audio tracks encountered in the real world. In the rest of the study, the audio sample which leads to the minimal ODG is used as a reference. The results obtained in this case are shown in Figure 8(a) and a zoom is given in Figure 8(b).

Description of the values obtained for the different iterations Open image in new window . The term Open image in new window corresponds to the accuracy constraint for the fixed-point conversion. The term Open image in new window is the measured performance obtained after the fixed-point conversion.

Iteration Open image in new window | Noise level | Measured performances |
---|---|---|

Open image in new window | ||

0 | −99.35 | −1.2 |

1 | −105.35 | −0.77 |

2 | −102,2 | −0.86 |

3 | −101 | −0.97 |

4 | −100,85 | −0.98 |

5 | −100,7 | −0.99 |

#### 5.3.1. Execution Time

For our approach, the execution time of the iterative process has been measured. First, the initial accuracy constraint is determined. The floating-point simulations have already been carried out in the algorithm design process and this floating-point simulation time is not considered. For each tested noise level, the noise is added to the MDCT output and then the ODG is computed. The global execution time Open image in new window of this first stage is equal to 420 seconds. This stage is carried out only once.

For the fixed-point conversion, the analytical model for noise level estimation is determined at the first iteration. This execution time Open image in new window of this process is equal to 120 seconds. It takes a small amount of time but it is done only once. Then, this model is used in the process of fixed-point optimization and the fixed-point accuracy is computed instantaneously by evaluating a mathematical expression. For this example, each fixed-point conversion Open image in new window takes only 0.7 seconds due to the analytical approach.

In this fixed-point design process, most of the time is spent in the fixed-point simulation (stage 3). This simulation is carried out with C++ code with optimized fixed-point data types. For this example, the execution time Open image in new window of each fixed-point simulation is equal to 480 seconds. But, only one fixed-point simulation is required by iteration and a small number of iterations are needed.

where Open image in new window represents the number of iterations required to obtain the optimized fixed-point specification for a given ODG constraint. In this example, six iterations are needed to obtain an optimized fixed-point specification which leads to an ODG equal to −0.99 and three iterations are needed to obtain an ODG of −0.95. Thus, the global execution time is equal to 49 minutes for an ODG of −0.99 and 33 minutes for an ODG of −0.95.

where Open image in new window is the number of iterations of the optimization process based on simulation.

An optimization algorithm based on Open image in new window *bits procedure* [27] is used. This algorithm allows the limitation of the iteration number in the optimization process. The number of variables in the optimization process has been restricted to 9 to limit the fixed-point design search space. In this case, the number of iterations Open image in new window is equal to 388. Given that each fixed-point simulation requires 480 seconds, the global execution becomes huge and is equal to 51 hours.

In our case, the optimization time is definitively lower. For this real application, a fixed-point simulation requires several minutes. For this example, the analytical approach reduces the execution time by a factor 63. Moreover, the fixed-point design space is very large and it cannot be explored with classical techniques based on fixed-point simulations.

## 6. Conclusion

In embedded systems, fixed-point arithmetic is favored but the application performances are reduced due to finite precision computation. During the fixed-point optimization process, the performance degradations are not analyzed directly during the conversion. An intermediate metric is used to measure the computational accuracy. In this paper, a technique to determine the accuracy constraint associated with a global noise model has been proposed. The probability density function of the noise model has been detailed and the choice of the parameters has been discussed. The different experiments show that our model predicts sufficiently accurately the application performances according to the noise level. The technique proposed to determine the accuracy constraint and the iterative process used to adjust this constraint allow the obtention of an optimized fixed-point specification guaranteeing minimum performance. The optimization time is definitively lower and has been divided by factor of 63 compared to the simulation based approach. Our future work is focused on the case of quantization by truncation.

### References

- 1.Menard D, Chillet D, Sentieys O:
**Floating-to-fixed-point conversion for digital signal processors.***EURASIP Journal on Applied Signal Processing*2006,**2006:**-19.Google Scholar - 2.Herve N, Menard D, Sentieys O:
**Data wordlength optimization for FPGA synthesis.***Proceedings of the IEEE Workshop on Signal Processing Systems (SIPS '05), November 2005, Athens, Grece*623-628.Google Scholar - 3.Belanovic P, Rupp M:
**Automated floating-point to fixed-point conversion with the fixify environment.***Proceedings of the 16th IEEE International Workshop on Rapid System Prototyping (RSP '05), June 2005, Montreal, Canada*172-178.CrossRefGoogle Scholar - 4.Berens F, Naser N:
**Algorithm to System-on-Chip Design Flow that Leverages System Studio and SystemC 2.0.1.**Synopsys Inc., March 2004Google Scholar - 5.Mathworks :
**Fixed-Point Blockset User's Guide (ver. 2.0).**2001.Google Scholar - 6.Mentor Graphics :
**Algorithmic C Data Types.**Mentor Graphics, version 1.2 edition, May 2007Google Scholar - 7.De Coster L, Adé M, Lauwereins R, Peperstraete J:
**Code generation for compiled bit-true simulation of DSP applications.***Proceedings of the 11th IEEE International Symposium on System Synthesis (ISSS '98), December 1998, Hsinchu, Taiwan*9-14.CrossRefGoogle Scholar - 8.Keding H, Willems M, Coors M, Meyr H:
**FRIDGE: a fixed-point design and simulation environment.***Proceedings of the Conference on Design, Automation and Test in Europe (DATE '98), February 1998, Paris, France*429-435.CrossRefGoogle Scholar - 9.Keding H, Coors M, Luthje O, Meyr H:
**Fast bit-true simulation.***Proceedings of the 38th Design Automation Conference (DAC '01), June 2001, Las Vegas, Nev, USA*708-713.Google Scholar - 10.Kim S, Kum K-I, Sung W:
**Fixed-point optimization utility for C and C++ based digital signal processing programs.***IEEE Transactions on Circuits and Systems II*1998,**45**(11):1455-1464. 10.1109/82.735357CrossRefGoogle Scholar - 11.Özer E, Nisbet AP, Gregg D:
*Stochastic bit-width approximation using extreme value theory for customizable processors.*Trinity College, Dublin, Ireland; October 2003.Google Scholar - 12.Shi C, Brodersen RW:
**Floating-point to fixed-point conversion with decision errors due to quantization.***Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Canada***5:**41-44.Google Scholar - 13.Keding H, Hurtgen F, Willems M, Coors M:
**Transformation of floating-point into fixed-point algorithms by interpolation applying a statistical approach.***Proceeding of the 9th International Conference on Signal Processing Applications and Technology (ICSPAT '98), September 1998, Toronto, Canada*Google Scholar - 14.Constantinides GA:
**Perturbation analysis for word-length optimization.***Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '03), April 2003, Napa, Calif, USA*81-90.Google Scholar - 15.Lopez JA, Caffarena G, Carreras C, Nieto-Taladriz O:
**Fast and accurate computation of the roundoff noise of linear time-invariant systems.***IET Circuits, Devices & Systems*2008,**2**(4):393-408. 10.1049/iet-cds:20070198CrossRefGoogle Scholar - 16.Rocher R, Menard D, Sentieys O, Scalart P:
**Analytical accuracy evaluation of fixed-point systems.***Proceedings of the 15th European Signal Processing Conference (EUSIPCO '07), September 2007, Poznań, Poland*Google Scholar - 17.Widrow B:
**Statistical analysis of amplitude quantized sampled-data systems.***Transactions of the American Institute of Electrical Engineers–Part II*1960,**79:**555-568.Google Scholar - 18.Sripad A, Snyder D:
**A necessary and sufficient condition for quantization errors to be uniform and white.***IEEE Transactions on Acoustics, Speech and Signal Processing*1977,**25**(5):442-448. 10.1109/TASSP.1977.1162977CrossRefMATHGoogle Scholar - 19.Knuth DE:
*The Art of Computer Programming, Addison-Wesley Series in Computer Science and Information*. 2nd edition. Addison-Wesley, Boston, Mass, USA; 1978.Google Scholar - 20.Menezes AJ, van Oorschot PC, Vanstone SA:
*Handbook of Applied Cryptography*. CRC Press, Boca Raton, Fla, USA; 1996.CrossRefGoogle Scholar - 21.Rocher R, Menard D, Herve N, Sentieys O:
**Fixed-point configurable hardware components.***EURASIP Journal on Embedded Systems*2006,**2006:**-13.Google Scholar - 22.Ozeki K, Umeda T:
**An adaptive filtering algorithm using an orthogonal projection to an affine subspace and its properties.***Electronics and Communications in Japan*1984,**67**(5):19-27.MathSciNetCrossRefGoogle Scholar - 23.Ojanperä T, Prasad R:
*WCDMA: Towards IP Mobility and Mobile Internet, Artech House Universal Personal Communications Series*. Artech House, Boston, Mass, USA; 2002.Google Scholar - 24.Pan D:
**A tutorial on MPEG/audio compression.***IEEE MultiMedia*1995,**2**(2):60-74. 10.1109/93.388209CrossRefGoogle Scholar - 25.Jansson T:
**BladeEnc MP3 encoder.**2002Google Scholar - 26.Lerch A:
**EAQUAL–Evaluation of Audio QUALity.**Software repository: January 2002, http://sourceforge.net/projects/eaqual Google Scholar - 27.Cantin M-A, Savaria Y, Lavoie P:
**A comparison of automatic word length optimization procedures.***Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '02), May 2002, Scottsdale, Ariz, USA***2:**612-615.Google Scholar

## Copyright information

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.