# A novel cost function to estimate parameters of oscillatory biochemical systems

## Abstract

Oscillatory pathways are among the most important classes of biochemical systems with examples ranging from circadian rhythms and cell cycle maintenance. Mathematical modeling of these highly interconnected biochemical networks is needed to meet numerous objectives such as investigating, predicting and controlling the dynamics of these systems. Identifying the kinetic rate parameters is essential for fully modeling these and other biological processes. These kinetic parameters, however, are not usually available from measurements and most of them have to be estimated by parameter fitting techniques. One of the issues with estimating kinetic parameters in oscillatory systems is the irregularities in the least square (LS) cost function surface used to estimate these parameters, which is caused by the periodicity of the measurements. These irregularities result in numerous local minima, which limit the performance of even some of the most robust global optimization algorithms. We proposed a parameter estimation framework to address these issues that integrates temporal information with periodic information embedded in the measurements used to estimate these parameters. This periodic information is used to build a proposed cost function with better surface properties leading to fewer local minima and better performance of global optimization algorithms. We verified for three oscillatory biochemical systems that our proposed cost function results in an increased ability to estimate accurate kinetic parameters as compared to the traditional LS cost function. We combine this cost function with an improved noise removal approach that leverages periodic characteristics embedded in the measurements to effectively reduce noise. The results provide strong evidence on the efficacy of this noise removal approach over the previous commonly used wavelet hard-thresholding noise removal methods. This proposed optimization framework results in more accurate kinetic parameters that will eventually lead to biochemical models that are more precise, predictable, and controllable.

### Keywords

Cost Function Root Mean Square Error Circadian Clock Oscillatory System Fundamental Period## 1 Introduction

Oscillatory biochemical pathways are an important class of biochemical systems [1, 2] that play significant roles in living systems. For instance, "circadian rhythms" are fundamental daily time-keeping mechanisms in a wide range of species from unicellular organisms to complex eukaryotes [3]. One of their most important roles is in regulating physiological processes such as the sleep-wake cycle in mammals [4]. "Cell cycles" are also another vital class of biochemical oscillations. The cell cycle is the sequence of events by which a growing cell replicates all its components and divides into two daughter cells [5]. Inappropriate cell proliferation due to malfunctioning cell cycle control mechanisms can cause development of certain types of cancers [5]. There are also other classes of biochemical rhythms such as cardiac rhythms [6], ovarian cycles [7] and cAMP oscillations [8] that have their own significance in systems biology.

A complete modeling of a biochemical system includes characterization of all nonlinear structures of the network along with the associated kinetic rates. In other words, without fully identifying all the kinetic parameter values, these models are still incomplete even if the full structure of the model has been determined. Few kinetic rates are available directly from experimentation or literature. Most of them, however, have to be estimated by parameter fitting techniques to complete the modeling of the biochemical pathway. Thus, a mathematical framework is needed to fit the kinetic parameters using the observables. Optimization frameworks that focus specifically on estimating parameters associated with biochemical pathways have received much attention in recent years [9, 10, 11, 12, 13, 14].

Two main issues in estimating kinetic parameters in biochemical systems are data related issues and computational issues [14]. The measurement dataset used to fit these parameters are usually noisy and incomplete. Measurement datasets are also affected by uncertainties related to experimental conditions such as temperature and light [14]. Much study is done recently to reduce noise for different biochemical signals [15, 16, 17]. Mostacci et al. [15] proposed a denoising method for mass spectrometry data by integrating wavelet soft thresholding and principal component analysis. Weng et al. [16] suggested a noise removal approach for oscillatory ECG signals based on a recently developed method known as empirical mode decomposition. Ren et al. [17] also developed a method of denoising biochemical spectra by introducing a new thresholding function integrated with the "translation invariant" approach to lower the root mean square error (RMSE) in the measurements in comparison to the traditional soft and hard thresholding methods.

The computational issues include the challenges optimization algorithms face when identifying an optimal fit to measurement data. There are problems with optimization methods such as slow convergence toward global optima, complicated error surfaces and lack of convergence proofs [14]. Much study has been done to address these issues in parameter estimation in biochemical systems [12, 13, 18, 19, 20, 21]. Zhan et al. proposed a method to reduce the computational time of each trial by integrating the spline functions theory with nonlinear programming to eliminate the need of solving the system of ordinary differential equations (ODEs) [21]. Rodriguez-Fernandez et al. [12] suggested a hybrid optimization method to speed up the convergence toward the global optima. A variety of different algorithms has also been adapted to perform the inverse problem. A comprehensive list of such studies is provided in [14].

Furthermore, heuristic approaches have been developed to address the optimization problem in fitting parameters in oscillatory systems [9, 10, 11]. These methods improved the optimization by constructing error functions based on the features extracted from the data. Locke et al. [11] proposed a cost function based on the comparison of entrained period, phase and strength of oscillation for the circadian clock in *Arabidopsis thaliana*. Also, Zeilinger et al. [10] performed another parameter estimation approach for the *A. thaliana* model by investigating amplitudes of some species in dark/light cycles, periods under dark and light conditions and the period of one mutant phenotype under constant light. In [9], Bagheri et al. built up an optimization process to model *Drosophila melanogaster* circadian clock by defining three cost functions based on free running period, light/dark entrained period, differences in amplitude and differences in the phase of the components in the system. These methods are more applicable for problems where characteristics in the system and/or data can be exploited to improve the performance of the parameter estimation. These methods, however, require more information about the system than purely data-driven comparison methods. For instance, the cost function proposed in [9] needs the period information of both the light and dark cycles of their investigated model, which requires a greater level of first principles knowledge. These methods are also model specific, which makes it difficult to apply them to general oscillatory systems. For example, the dark/light cycle characteristics that were introduced in parameter fitting problem of [10] may not be a suitable feature for parameter fitting of non-circadian biorhythms.

This article focuses on the problem of estimating the kinetic parameters in oscillatory biochemical systems. We show that periodicity in the measurements of oscillatory systems results in irregularly surface properties of the LS cost function leading to numerous local minima. These multiple local optima cause premature convergence of even robust optimization algorithms. This eventually results in incorrect estimates, bad predictions of dynamics, and incorrect acceptance of functional hypotheses. This, compounded with uncertainties or noisy measurements leads to a difficult estimation problem to solve.

We develop a parameter estimation framework to address these issues by integrating information of oscillatory systems in the modeling process (parameter estimation and denoising). This periodic information is used to build a cost function with better surface properties. Our proposed cost function takes advantage of the basic properties of these oscillatory systems, which allows us to generalize our cost function to a variety of biochemical systems with sustained oscillations. The proposed cost function also needs less first principles knowledge to generate the cost function in comparison to the previous methods that was developed for oscillatory systems [9, 10, 11]. We verified for three oscillatory biochemical systems that our proposed cost function results in increased ability to estimate accurate kinetic parameters as compared to the traditional LS cost function. We combined this cost 6 function with an improved denoising method that also leverages periodic characteristics embedded in the measurements to effectively reduce noise. The results provide strong evidence on the efficacy of this noise removal approach over the previous commonly used wavelet hard-thresholding noise removal method. This proposed optimization framework results in more accurate kinetic parameters that will eventually lead to biochemical models that are more accurate, predictable, and controllable.

## 2 Methodology

Here, **x** ∈ ℝ^{m×1}is the state vector of the *m* components of the pathway, **p** ∈ ℝ^{n×1}is the vector of *n* kinetic parameters, **f:** ℝ^{m×1}→ ℝ^{m×1}is a nonlinear vector function, **x**_{0} ∈ ℝ^{m×1}is the vector of the initial component concentrations at time *t*_{0} and *t*_{0}*< t < t*_{ e } represents the time of interest.

**p**) of the system described in (1) that cannot be measured directly using a set of experimental data. The criteria for verifying the quality of the estimates is often an error function such as Φ as shown in (2). This function quantifies the ability of the estimates to reproduce the same results as the measurements. This objective function is minimized such that $\mathbf{p}=\widehat{\mathbf{p}}$results in the minimum value of Φ. In that way, $\widehat{\mathbf{p}}$ is called the estimated point.

*least square (LS) estimator*[22]. This estimator is based on the sum of the squares of the point by point errors between measured experimental data and the simulated measurements from the estimated model as described in (3):

Here, *x*_{ ij } , is the measurement at time *j* of the *i* th state of the system, ${\widehat{x}}_{ij}$ is the reproduced data at time *j* for the *i* th state of the system given some parameter **p**, *N*_{ m } is the number of time points where measurements are obtained and *N*_{ x } is the number of measured outputs (in this manuscript, they are considered to be the measured states of the system).

### 2.1 Fundamental frequency estimation

*x*(

*t*) such that:

the smallest value of *T* ≠ 0 for which (4) is valid is the *"fundamental period"* of oscillation. The inverse of the fundamental period is the fundamental frequency (*f*_{0}). Several approaches has been proposed to estimate *f*_{0}[24]. We used a frequency-based method called *component frequency ratio*[24] to extract the fundamental frequency of the measured data due to the fact that the time-series methods may not be adequate for biochemical measurements due to their low rate of sampling and low temporal resolution. This method starts with transforming the data to the Fourier domain by taking their Fourier transform. The locations of the peaks in the spectrum are then identified. The peaks in the frequency spectrum are the harmonics of the fundamental frequency. The final step is to find the greatest common factor of these frequencies in which peaks occur.

#### 2.1.1 Effect of noise on estimation of *f*_{0}

*f*

_{0}. We considered three model systems identified from the literature: the two-state Tyson model [25], the two-state Brusselator model [26] and the five-state Golbeter model [27]. We considered the measurements of the states of these models with the sampling rate equal to 1 (sample/hour). Then, we added AWGN noise with various SNRs to these signals and we estimate their fundamental period using the method

*component frequency ratio*. Figure 2 shows the absolute error between the estimated and the nominal fundamental period of the three models for various amount of additive noise.

Figure 2 shows that the method used to estimate the fundamental period is robust enough to the additive noise.

### 2.2 Removing noise

#### 2.2.1 Improving the hard-thresholding method in oscillatory systems

- 1.Partition the measurements
*X*(*nT*_{ s }) based on their calculated fundamental period to the sets of*X*_{ k }according to (5)${X}_{k}\left(n{T}_{s}\right)\phantom{\rule{0.5em}{0ex}}=X\left(n{T}_{s}\right)\phantom{\rule{1em}{0ex}}k\mathcal{T}\le n{T}_{s}<\left(k+\mathsf{\text{1}}\right)\mathcal{T},\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}0\le k\phantom{\rule{0.5em}{0ex}}m,$(5)

*T*

_{ s }is the sampling period and $\mathcal{T}\phantom{\rule{0.5em}{0ex}}$ is the fundamental period of the measurements

*X*.

- 2.Shift each
*x*_{ k }by the value of $k\mathcal{T}$. This will result in a single period of the measurements with higher resolution. The shifted versions of*x*_{ k }'s are calculated based on (6)$\bigcup _{k=0}^{N-1}{x}_{k}\left(nt-k\mathcal{T}\right)$(6)

*x*(

*t*) = sin 2

*πt*with the rate of 2 (sample/sec) and its shifted version. We see that Figure 5b shows only the first period of the sine function but with higher resolution.

*in-silico*measurements of the [

*M*] component with sampling rate = 1 (sample/hour) in the model of the circadian clock in

*D. melanogaster*proposed by Tyson [25]. The noise in the measurements is additive white Gaussian noise (AWGN) with SNR = 20 dB. Figure 6b shows the shifted version of the noisy measurements of Figure 6a using a calculated fundamental period of 24.21.

Wavelet decomposition, thresholding and reconstruction are then applied to this "shifted version" of the noisy data. MATLAB was used to implement a three level wavelet decomposition using the "*Daubechies 6"* wavelet and the threshold value equaling 0.3. The wavelet type, number of levels, and the threshold value were chosen empirically and may vary from system to system. The results are shown in Figure 6c. The final step is to reconstruct the original signal by shifting the samples back to their respective periods (Figure 6d).

*Pt*] component with sampling rate = 1 (sample/hour) in the Tyson model of circadian clock in

*D. Melanogaster*[25]. Then, we added AWGN noise with SNR = 20 to the dataset in 200 trials. We then removed noise using two approaches: the traditional wavelet hardthresholding method [29] and our proposed 12 method. Figure 7 compares three errors for each of the 200 trials. (1) The RMSE between the noisy data and the original dataset (the original error), (2) the RMSE between the denoised data resulting from the traditional thresholding method and the original dataset (Approach 1), and (3) the RMSE between the denoised data resulting from the proposed denoising method and the original dataset (Approach 2). This figure shows that our proposed method of denoising is more effective at removing noise than the wavelet hardthresholding method, consistently lowering the RMSE between the original signal and the denoised signal.

#### 2.2.2 The effect of error in estimating *f*_{0}on proposed denoising method

Figure 8 shows that the results of the proposed denoising method has lower RMSEs than the traditional wavelet thresholding with small errors in the estimation fundamental period. However, if the fundamental period is estimated with errors approximately more than 0.25 for these models, the proposed method does not yield lower RMSEs. However, Figure 2 shows that the error in fundamental period estimation due to noise is much smaller than the order of error that is considered in Figure 8.

### 2.3 Optimization

#### 2.3.1 Forming cost function

*f*= 1 is the frequency and

*ϕ*= 0 is the initial phase. Figure 9 illustrates the surface of the LS cost function (3) for ranges of the signal parameters,

*f*and Φ.

Figure 9 shows significant rippling especially along the *f* direction of the LS cost function. This happens due to the varying degree of overlap between various periods of two oscillatory signals in the LS objective function along the *f* axis. This potentially results in numerous local basins of attractions that hinder the optimizer's ability to find the global optimum. These ripples are fundamental characteristics of the LS cost function for systems with oscillatory dynamics. This phenomenon can be observed for a large class of oscillatory systems especially along the parameter axes to which the fundamental frequency is more sensitive.

*k*is characterized by the fundamental period of the oscillations. A plot describing this is shown in Figure 10. All the parameter values for

*k*in "area 1" produces sustained oscillations. This figure shows that the fundamental period of the sustained oscillation over this range may change. The values in "area 2", on the other hand, lead to dynamics that are not sustained oscillations.

If the simulated data are periodic, we introduced only the samples of one period of the data into the cost function. Likewise, only the samples of one period of the measurements will also be incorporated into this cost function. If the fundamental period of the measured data is not equal to the fundamental period of the simulated data, the signal with the smallest period is padded with zeros until the lengths of the signals are equal. This results in monotonic changes in error with respect to changes in fundamental period of the simulated data.

*z*

_{ ij }and ${\widehat{z}}_{ij}$ for periodic ${\widehat{x}}_{i}$ are calculated as:

*z*

_{ ij }and ${\widehat{z}}_{ij}$ for non-periodic ${\widehat{x}}_{i}$ are calculated as:

Here, *x*_{ ij } is the measurement at time *t*_{ j } of the *i* th state of the system, ${\widehat{x}}_{ij}$ is the simulated data at time *t*_{ j } for the *i* th state of the system. *z*_{ ij } and ${\widehat{z}}_{ij}$ are the truncated and zero padded *x*_{ ij } and ${\widehat{x}}_{ij}$, respectively, for the oscillatory ${\widehat{x}}_{i}$. For non oscillatory ${\widehat{x}}_{i}$, *z*_{ ij } , and ${\widehat{z}}_{ij}$ are equal to *x*_{ ij } and ${\widehat{x}}_{ij}$, respectively. ${N}_{{z}_{i}}$ is the length of the *z*_{ i } and ${\widehat{z}}_{i}$. *N*_{ x } is the number of states of the system, *T*_{ i } is the fundamental period of the measurements (*x*_{ i } ), which was computed using the component frequency ratio approach and ${\widehat{T}}_{i}$ is the fundamental period of the simulated data $\left({\widehat{x}}_{i}\right)$, which is estimated for each candidate parameter value. ${\widehat{T}}_{i}$ was estimated using the YIN approach [30], which is a modified version of the time-domain autocorrelation method.

*f*= 1 and

*ϕ*= 0 similar to the LS cost function of (3) shown in Figures 9. However, visual inspection of these two figures shows that the surface of proposed cost function is smoother than the surface of the LS cost function for the example of (7). We hypothesize that this improvement of the cost function surface will improve the performance of the optimization search algorithm.

##### The effect of error in estimating *f*_{0}on the performance of the cost function

The performance of the proposed cost function (8) is not affected significantly by errors in the estimation of the fundamental frequency of the measurements. This is because of the fact that the measurements used in (8) have a certain sampling rate. Basically, if the error of the estimated fundamental period is small with respect to this sampling rate, it will not affect the number of samples that lies in one fundamental period of the data. Also, adding or reducing one sample in the summation of (8) obviously will not change the performance of the proposed cost function dramatically.

#### 2.3.2 The optimization method

The optimization of the proposed cost function was performed using a hybrid approach. Hybrid methods, i.e. the combinations of global and local search methods, have been shown to yield results with smaller errors than global searches individually [12, 23]. The global search algorithm that we adopt in this study is the "Genetic Algorithm", which is a widely-used approach of a class of global search methods called *evolutionary strategies*[31]. We used two consecutive local search methods of MATLAB [32] in this research. The first one was the derivative-based, constrained routine of fmincon, and the second one was the derivative-free routine of fminsearch that is based on the simplex algorithm [33].

## 3 Results

- 1.
The global optimization (MATLAB ga routine).

- 2.
The first local optimization (MATLAB fmincon routine)

- 3.
The second local optimization (MATLAB fminsearch routine)

### 3.1 Comparison of two different cost function

Here, *N*_{ x }*, x*_{ ij } , and ${\widehat{x}}_{ij}$ are defined as (3) and ${N}_{{T}_{i}}$ is the number of samples that are extracted in (0 *< t < T*_{ i } ) assuming *T*_{ i } is the fundamental period of the *x*_{ i } .

### 3.2 Parameter estimation results for two-state Tyson model

*D. melanogaster*. This organism has circadian clocks similar to mice and bread molds. This model, shown in (12), consists of two states and nine kinetic parameters. The nominal values of the parameters of this system are shown in Table 1.

*k*

_{ m }and

*J*

_{ p }and

*k*

_{p 3}and

*P*

_{ crit }over specific ranges. Characteristics of these parameters are representative of the characteristics of all kinetic parameters of the Tyson model (results are not shown). The values of the remaining parameters are held constant at their nominal values in all figures.

*k*

_{ m }

*, k*

_{p 3},

*P*

_{ crit }, and

*J*

_{ p }. Figure 14 shows the cross-sections of the cost functions above (dashed lines) together with the fundamental period of the data (solid line) for ranges of values in the same order of magnitude as the nominal value.

Figure 14a shows that the system produces sustained oscillations only for *k*_{ m } in the range [0.03 0.44]. The fundamental period of the sustained oscillations falls from 58 to 6.6 along this range. This radical change in the fundamental period produces irregularities in the LS cost function over this interval. However, the proposed cost function maintains good surface properties in spite of this extreme change in the fundamental period of the system. This emphasizes that our proposed cost function addresses the issue of surface irregularities of the LS cost function caused by introducing multiple periods of the data in calculating the error. Figure 14b shows similar results.

Figure 14c, d shows that the fundamental period for different values of *P*_{ crit } is between 15.4 and 25.4 which is less than the changes in fundamental period that shown in Figure 14a, b. The LS cost function still shows varying levels of surface irregularities particularly along the *P*_{ crit } axis. The proposed cost function again shows smoother surface characteristics under these conditions as well.

#### 3.2.1 Results of parameter estimation

*M*] and [

*P*

_{ t }] components with the rate of one sample per hour and the AWGN noise of SNR = 20. We removed the noise using the proposed approach before the optimization step. The RMSE between the noisy samples and their real values of the samples were 0.0989. This was suppressed to 0.0413 after denoising. The population size was set to 200 and number of generations equals 50 for the ga routine. We calculated ${N}_{{T}_{i}}$ from (11) to be 24 for the Tyson model. The computed scores for the estimated parameters from the 15 runs of optimization are shown in Figure 15 at the three steps of the hybrid optimization process. The mean, median and the minima of the computed scores at each level for the two cost functions are also shown in Table 2. Figure 15 and Table 2 show visually and numerically that the optimization routine performs better using the proposed cost function than the LS cost function at all steps. These results are also consistent with our visual inspections of the cost functions in Figures 13 and 14.

The results of optimization with minimum score for Tyson model.

Parameter | Nominal value | Estimation of the proposed cost function | Estimation of the LS cost function |
---|---|---|---|

| 1 | | 0.9472 |

| 0.1 | 0.1049 | 0.1097 |

| 0.5 | 0.4668 | 0.4740 |

| 10 | | |

| 0.03 | | |

| 0.1 | | |

| 200 | | |

| 0.1 | 0.1076 | |

| 0.05 | 0.0511 | |

Score | 0.1378 | 0.1084 | 0.2441 |

Statistics of optimization results for Tyson model.

Step 1 | Step 2 | Step 3 | ||||
---|---|---|---|---|---|---|

Proposed | LS | Proposed | LS | Proposed | LS | |

Mean | 2.4497 | 3.5838 | 1.4465 | 1.7760 | 0.3131 | 1.1116 |

Median | 1.9998 | 3.5731 | 1.4117 | 1.3595 | 0.2354 | 0.7788 |

Min | 0.5706 | 0.7788 | 0.1118 | 0.2589 | 0.1084 | 0.2441 |

The optimized results with the lowest score out of 15 runs for the LS cost function and the proposed cost function are shown in Table 1.

The estimate results in the lowest score using noise-free measurements produces six of nine kinetic parameters with less than 10% errors (results not shown). Table 1 shows that the noisy case results in four of nine estimated parameters with more than 10% error. In both cases, proposed cost function yields more accurate results in comparison to the LS cost function. The large number of inaccuracies for the noisy case is more a result of system sloppiness versus inaccuracies of the estimation procedure [35, 36], which results in 21 a wide range of parameters with similar system dynamics. It is evident that our proposed cost function was able to produce better overall system dynamics than the traditional LS cost function, which is clearly conveyed by the lower overall error. Our proposed method, similar to the LS cost function, only takes into account the accuracy of dynamics. Thus, the sloppiness can results in moderate level of parameter accuracy. Recently, Apgar et al. proposed an experiment design framework to improve estimates of sloppy parameters in biochemical models [37]. This, however, is beyond the scope of this article.

### 3.3 Parameter estimation for two-state Brusselator model

Figure 16a shows the fundamental period of sustained oscillation falls from 45.9 to 4.3 for *k*_{1} in the range [0.7 2.8]. This change in the fundamental period again produce irregularities in the LS cost function over this interval. The proposed cost function, on the other hand, maintains good surface properties in spite of this change in the fundamental period of the 22 system. This further verifies that the proposed cost function is able to address the irregularities of the LS cost function resulting from sustained dynamics embedded in the dynamics used to evaluate the cost function.

#### 3.3.1 Results of parameter estimation

The results of optimization with minimum score for Brusselator model.

Parameter | Nominal value | Estimation of the proposed cost function | Estimation of the LS cost function |
---|---|---|---|

| 1 | 0.9912 | |

| 1 | 0.9112 | |

| 1 | 0.9526 | |

| 1 | 0.9335 | |

Score | 0.3128 | 0.2763 | 0.7619 |

Statistics of optimization results for Brusselator model.

Step 1 | Step 2 | Step 3 | ||||
---|---|---|---|---|---|---|

Proposed | LS | Proposed | LS | Proposed | LS | |

Mean | 0.8857 | 14.8167 | 0.8688 | 9.4746 | 0.8177 | 1.5517 |

Median | 0.7336 | 1.0179 | 0.7116 | 1.0179 | 0.7107 | 1.0179 |

Min | 0.5207 | 0.7323 | 0.4879 | 0.7323 | 0.2763 | 0.7619 |

The derived results with the lowest score out of 15 runs for the LS cost function and the proposed cost function are in Table 3.

Table 3 shows that the resulting overall error for the proposed cost function is lower than that of the LS cost function. All four parameters were estimated incorrectly using the LS cost function, while they were estimated almost accurately using the proposed cost function.

### 3.4 Parameter estimation results for five-state Goldbeter model

*D. melanogaster*circadian model of Goldbeter [27] was investigated in the third study. This model is also available in BioModels database [34] (BIOMD0000000016). Here, the circadian oscillations of PER is modeled with five states: PER mRNA [M], PER protein [P0], the mono-phosphorylated form [P1], the bi-phosphorylated form [P2] and nuclear PER [PN]. This five-state model has 18 kinetic parameters. The ODE model of the system is shown in 14. The nominal values of the 18 kinetic parameters of this system are available in Table 5.

It could be seen in all figures that the changes in period of the oscillation does not produce significant irregularities in the LS cost function surface, which is different than previous examples. Figure 18b, for instance, shows the changes of period for *k*_{2} in the interval [0.4 2]. However, there are not multiple basins of attractions along the *k*_{2} direction in spite of these changes in fundamental period. This is due to the fact that the LS cost function changes over orders of magnitudes along this parameter direction in a way that the produced ripples has little effect on the monotonicity of the LS cost function. This extreme change in the LS cost function (approximately from 400 to 2200 for *k*_{2} over the interval [0.4 2]) happens because the peak to peak magnitude of the sustained oscillations of the simulated data also increases in order of magnitudes along this parameter direction. For example, the peak of the [*P*_{2}] increases from 0.25 to 1.5 for *k*_{2} over the interval [0.4 2].

The proposed cost function still shows good surface characteristics although it was not much different than the already favorable characteristics of the LS cost function. Thus, it is expected that both of these cost functions would perform almost similar in the optimization process.

#### 3.4.1 Parameter estimation results

*M*], [

*P*

_{0}], [

*P*

_{1}], [

*P*

_{2}], and [

*P*

_{ N }] components with the sampling rate = 1 (sample/hour) and AWGN noise with SNR = 20. We suppressed the noise using the proposed denoising approach. The RMSE between the noisy samples and their real values were 0.1012, which was suppressed to 0.04906 after denoising. We calculated ${N}_{{T}_{i}}=23$ for the score in (11). The results of 15 optimization runs are shown in Figure 19 and Table 6. This shows that the performances of the LS cost function and the proposed cost function are almost the same in all steps. These results are also consistent with our visual inspections of the cost functions in Figure 18.

The Result of Optimization with Minimum Score for Goldbeter Model.

Parameter | Nominal value | Estimation of the proposed cost function | Estimation of the LS cost function |
---|---|---|---|

| 0.76 | 0.6980 | |

| 1 | 0.9996 | 0.9400 |

| 4 | | |

| 0.65 | 0.5972 | |

| 0.5 | 0.5056 | |

| 0.38 | 0.3677 | 0.3732 |

| 3.2 | | 3.1552 |

| 2 | | 1.8288 |

| 1.58 | | |

| 2 | | |

| 5 | | 4.5100 |

| 2 | | 2.0120 |

| 2.5 | | |

| 2 | | |

| 0.95 | 0.9713 | 0.9614 |

| 0.2 | | |

| 1.9 | 1.7541 | |

Statistics of optimization results for Goldbeter model.

Step 1 | Step 2 | Step 3 | ||||
---|---|---|---|---|---|---|

Proposed | LS | Proposed | LS | Proposed | LS | |

Mean | 14.7428 | 18.8913 | 0.5812 | 0.4939 | 0.1914 | 0.1585 |

Median | 12.3454 | 22.7683 | 0.2456 | 0.2553 | 0.1778 | 0.1446 |

Min | 1.5970 | 3.5281 | 0.1282 | 0.1183 | 0.1255 | 0.1183 |

The derived results with the minimum score out of 15 runs for the LS cost function and the proposed cost function are shown in Table 5.

Table 5 shows that 8 out of 18 parameters were estimated within 10% of their nominal value for the proposed cost function as opposed to 7 out of 18 for the LS cost function. This shows a wide range of parameters have similar dynamics. This is due to system sloppiness that was also mentioned for the Tyson model. Our proposed cost function takes into account the accuracy of dynamics, which is similar to the LS cost function. Therefore, this may results in moderate accuracy in parameter values because of the sloppiness.

## 4 Conclusions

This article addresses the issue of kinetic parameter estimation in oscillatory biochemical systems. We showed that the LS cost function for oscillatory systems results in surface characteristics that potentially hinder the performance of optimization routines used to estimate kinetic parameters. Thus, we suggested a new cost function with more favorable surface properties which leads to improved results for parameter estimation. This cost function integrates temporal information with periodic information embedded in measurements used to estimate these parameters. This generalized cost function also needs less first principles knowledge to generate the cost function in comparison to the previous developed methods for oscillatory systems. We tested our cost function using three benchmark oscillatory biochemical pathways and compared our proposed objective function with the traditional LS cost function in several optimization runs using noisy measurements. The comparison of the results verified that the optimization performed more effectively using our 26 proposed cost function as compared to the traditional LS cost function. Furthermore, we introduced a wavelet hardthresholding approach for noise removal. This modified approach is able to suppress noise in oscillatory data better than the traditional wavelet thresholding approach. This, together with the proposed objective function will result in more accurate kinetic parameters that will eventually lead to biochemical models that are more precise, predictable and controllable. There are, however, unsolved issues with sloppiness of biochemical pathways [35, 36], which require further investigation especially for oscillatory biochemical pathways.

## Notes

## Supplementary material

### References

- 1.Goldbeter A:
*Biochemical Oscillations and Cellular Rhythms the Molecular Bases of Periodic and Chaotic Behaviour*. Cambridge University Press, Cambridge; 1996.MATHCrossRefGoogle Scholar - 2.Fall C, Marland E, Tyson J:
*Computational Cell Biology*. Springer, New York; 2002.MATHGoogle Scholar - 3.Perez-Martin J: Growth and development eukaryotes.
*Current Opinion Microbiol*2010, 13(6):661-662. 10.1016/j.mib.2010.10.007CrossRefGoogle Scholar - 4.Yan J, Wang H, Liu Y, Shao C: Analysis of gene regulatory networks in the mammalian circadian rhythm.
*PLos Comput Biol*2008, 4(10):e1000193. 10.1371/journal.pcbi.1000193MathSciNetCrossRefGoogle Scholar - 5.Collins K, Jacks T, Pavletich N: The cell cycle and cancer.
*PNAS: Proc Natl Acad Sci*1997, 94(7):2776-2778. 10.1073/pnas.94.7.2776CrossRefGoogle Scholar - 6.Boullin J, Morgan JM: The development of cardiac rhythm.
*Heart*2005, 91(7):874-875. 10.1136/hrt.2004.047415CrossRefGoogle Scholar - 7.Perry J:
*The Ovarian Cycle of Mammals*. Oliver and Boyd, Edinburgh; 1971.Google Scholar - 8.Zaccolo M, Pozzan T: cAMP and Ca2+ interplay: a matter of oscillation patterns.
*Trends Neurosci*2003, 26(2):53-55. 10.1016/S0166-2236(02)00017-6CrossRefGoogle Scholar - 9.Bagheri N, Lawson M, Stelling J, Doyle F: Modeling the Drosophila melanogaster circadian oscillator via Phase optimization.
*J Biol Rhythms*2008, 23(6):525-537. 10.1177/0748730408325041CrossRefGoogle Scholar - 10.Zeilinger M, Farre E, Taylor S, Kay S, Doyle F: A novel computational model of the circadian clock in Arabidopsis that incorporates PRR7 and PRR9.
*Mol Syst Biol*2006., 2(58):Google Scholar - 11.Locke J, Millar A, Turner M: Modelling genetic networks with noisy and varied experimental data the circadian clock in Arabidopsis thaliana.
*J Theor Biol*2005, 234(3):383-393. 10.1016/j.jtbi.2004.11.038MathSciNetCrossRefGoogle Scholar - 12.Rodriguez-Fernandez M, Mendes P, Banga J: A hybrid approach for efficient and robust parameter estimation in biochemical pathways.
*BioSystems*2005, 83: 248-265.CrossRefGoogle Scholar - 13.Vyshemirsky V, Girolami M: Bayesian ranking of biochemical system models.
*Bioinformatics*2008, 24(6):833-839. 10.1093/bioinformatics/btm607CrossRefGoogle Scholar - 14.Chou IC, Voit E: Recent developments in parameter estimation and structure identification of biochemical and genomic systems.
*Math Biosci*2009, 219(2):57-83. 10.1016/j.mbs.2009.03.002MATHMathSciNetCrossRefGoogle Scholar - 15.Mostacci E, Truntzer C, Cardot H, Ducoroy P: Multivariate denoising methods combining wavelets and principal component analysis for mass spectrometry data.
*Proteomics*2010, 10(14):2564-2572. 10.1002/pmic.200900185CrossRefGoogle Scholar - 16.Tang G, Qin A: ECG de-noising based on empirical mode decomposition.
*The 9th International Conference for Young Computer Scientists, 2008. ICYCS*2008, 903-906.CrossRefGoogle Scholar - 17.Ren Z, Liu G, Zeng L, Huang Z, Huang S: Research on biochemical spectrum denoising based on a novel wavelet threshold function and an improved translation-invariance method.
*Proc SPIE*2008, 7280: 72801Q.CrossRefGoogle Scholar - 18.Sugimoto M, Kikuchi S, Tomita M: Reverse engineering of biochemical equations from time-course data by means of genetic programming.
*Biosystems*2005, 80(2):155-164. 10.1016/j.biosystems.2004.11.003CrossRefGoogle Scholar - 19.Gonzalez O, Kuper C, Jung K, Naval JP, Mendoza E: Parameter estimation using simulated annealing for S-system models of biochemical networks.
*Bioinformatics*2007, 23(4):480-486. 10.1093/bioinformatics/btl522CrossRefGoogle Scholar - 20.Flaherty P, Radhakrishnan M, Dinh T, Rebres R, Roach T, Jordan M, Arkin A: A dual receptor crosstalk model of g-protein-coupled signal transduction.
*PLoS Comput Biol*2008, 4(9):e1000185. 10.1371/journal.pcbi.1000185MathSciNetCrossRefGoogle Scholar - 21.Zhan C, Yeung L: Parameter estimation in systems biology models using spline approximation.
*BMC Syst Biol*2011., 5(14):Google Scholar - 22.Marquardt D: An algorithm for least squares estimation of nonlinear parameters.
*SIAM J Appl Math*1963, 11(2):431-441. 10.1137/0111030MATHMathSciNetCrossRefGoogle Scholar - 23.Renders J, Flasse S: Hybrid methods using genetic algorithms for global optimization.
*IEEE Trans Syst Man Cybernet Part B, Cybernet*1996, 26(2):243-258. 10.1109/3477.485836CrossRefGoogle Scholar - 24.Gerhard D: Pitch extraction and fundamental frequency history and current techniques.
*Department of Computer Science, University of Regina, Regina, Canada*2003.Google Scholar - 25.Tyson J, Hong C, Thron D, Novak B: A simple model of circadian rhythm based on dimerization and proteolysis of PER and TIM.
*Biophys J*1999, 77: 2411-2417. 10.1016/S0006-3495(99)77078-5CrossRefGoogle Scholar - 26.Kondepudi D, Prigogine I:
*Modern Thermodynamics from Heat Engines to Dissipative Structures*. Wiley, Chichester; 1998.MATHGoogle Scholar - 27.Goldbeter A: A model for circadian oscillations in the drosophila period protein (PER).
*Proc Royal Soc B, Biol Sci*1995, 261(1362):319-324. 10.1098/rspb.1995.0153CrossRefGoogle Scholar - 28.Mallat S:
*A Wavelet Tour of Signal Processing*. American Press, San Diego; 1998.MATHGoogle Scholar - 29.Mallat S: A theory for multiresolution signal decomposition: the wavelet representation.
*IEEE Pattern Anal Mach Intell*1989, 11(7):674-693. 10.1109/34.192463MATHCrossRefGoogle Scholar - 30.Cheveigne A, Kawahara H: Yin, a fundamental frequency estimator for speech and music.
*J Acoust Soc Am*2002, 111(4):1917-1930. 10.1121/1.1458024CrossRefGoogle Scholar - 31.Moles C, Mendes P, Banga J: Parameter estimation in biochemical pathways: a comparison of global optimization methods.
*Genome Res*2003, 13(11):2467-2474. 10.1101/gr.1262503CrossRefGoogle Scholar - 32.Inc TM: MATLAB: version 7.6.0.
*Natick Massachusetts*2008.Google Scholar - 33.Lagarias J, Reeds J, Wright M, Wright P: Convergence properties of the Nelder-Mead simplex method in low dimensions.
*SIAM J Optim*1998, 9: 112-147. 10.1137/S1052623496303470MATHMathSciNetCrossRefGoogle Scholar - 34.Le Novère N, Bornstein B, Broicher A, Courtot M, Donizelli M, Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL, Hucka M: BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems.
*Nucleic Acids Res*2006, 34(suppl 1):D689-D691.CrossRefGoogle Scholar - 35.Gutenkunst R, Waterfall J, Casey F, Brown K, Myers C, Sethna J: Universally Sloppy Parameter Sensitivities in Systems Biology Models.
*PLos Comput Biol*2005, 3(10):1871-1878.MathSciNetGoogle Scholar - 36.Waterfall J, Casey F, Gutenkunst R, Brown K, Myers C, Brouwer P, Elser V, Sethna J: Sloppy-model universality class and the Vandermonde matrix.
*Phys Rev Lett*2006, 97(15):150601.CrossRefGoogle Scholar - 37.Apgar J, Witmer D, Whitead F, Tidor B: Sloppy models, parameter uncertainty, and the role of experimental design.
*Mol BioSyst*2010, 6(10):1890-1900. 10.1039/b918098bCrossRefGoogle Scholar

## Copyright information

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.