A novel cost function to estimate parameters of oscillatory biochemical systems
Oscillatory pathways are among the most important classes of biochemical systems with examples ranging from circadian rhythms and cell cycle maintenance. Mathematical modeling of these highly interconnected biochemical networks is needed to meet numerous objectives such as investigating, predicting and controlling the dynamics of these systems. Identifying the kinetic rate parameters is essential for fully modeling these and other biological processes. These kinetic parameters, however, are not usually available from measurements and most of them have to be estimated by parameter fitting techniques. One of the issues with estimating kinetic parameters in oscillatory systems is the irregularities in the least square (LS) cost function surface used to estimate these parameters, which is caused by the periodicity of the measurements. These irregularities result in numerous local minima, which limit the performance of even some of the most robust global optimization algorithms. We proposed a parameter estimation framework to address these issues that integrates temporal information with periodic information embedded in the measurements used to estimate these parameters. This periodic information is used to build a proposed cost function with better surface properties leading to fewer local minima and better performance of global optimization algorithms. We verified for three oscillatory biochemical systems that our proposed cost function results in an increased ability to estimate accurate kinetic parameters as compared to the traditional LS cost function. We combine this cost function with an improved noise removal approach that leverages periodic characteristics embedded in the measurements to effectively reduce noise. The results provide strong evidence on the efficacy of this noise removal approach over the previous commonly used wavelet hard-thresholding noise removal methods. This proposed optimization framework results in more accurate kinetic parameters that will eventually lead to biochemical models that are more precise, predictable, and controllable.
KeywordsCost Function Root Mean Square Error Circadian Clock Oscillatory System Fundamental Period
Oscillatory biochemical pathways are an important class of biochemical systems [1, 2] that play significant roles in living systems. For instance, "circadian rhythms" are fundamental daily time-keeping mechanisms in a wide range of species from unicellular organisms to complex eukaryotes . One of their most important roles is in regulating physiological processes such as the sleep-wake cycle in mammals . "Cell cycles" are also another vital class of biochemical oscillations. The cell cycle is the sequence of events by which a growing cell replicates all its components and divides into two daughter cells . Inappropriate cell proliferation due to malfunctioning cell cycle control mechanisms can cause development of certain types of cancers . There are also other classes of biochemical rhythms such as cardiac rhythms , ovarian cycles  and cAMP oscillations  that have their own significance in systems biology.
A complete modeling of a biochemical system includes characterization of all nonlinear structures of the network along with the associated kinetic rates. In other words, without fully identifying all the kinetic parameter values, these models are still incomplete even if the full structure of the model has been determined. Few kinetic rates are available directly from experimentation or literature. Most of them, however, have to be estimated by parameter fitting techniques to complete the modeling of the biochemical pathway. Thus, a mathematical framework is needed to fit the kinetic parameters using the observables. Optimization frameworks that focus specifically on estimating parameters associated with biochemical pathways have received much attention in recent years [9, 10, 11, 12, 13, 14].
Two main issues in estimating kinetic parameters in biochemical systems are data related issues and computational issues . The measurement dataset used to fit these parameters are usually noisy and incomplete. Measurement datasets are also affected by uncertainties related to experimental conditions such as temperature and light . Much study is done recently to reduce noise for different biochemical signals [15, 16, 17]. Mostacci et al.  proposed a denoising method for mass spectrometry data by integrating wavelet soft thresholding and principal component analysis. Weng et al.  suggested a noise removal approach for oscillatory ECG signals based on a recently developed method known as empirical mode decomposition. Ren et al.  also developed a method of denoising biochemical spectra by introducing a new thresholding function integrated with the "translation invariant" approach to lower the root mean square error (RMSE) in the measurements in comparison to the traditional soft and hard thresholding methods.
The computational issues include the challenges optimization algorithms face when identifying an optimal fit to measurement data. There are problems with optimization methods such as slow convergence toward global optima, complicated error surfaces and lack of convergence proofs . Much study has been done to address these issues in parameter estimation in biochemical systems [12, 13, 18, 19, 20, 21]. Zhan et al. proposed a method to reduce the computational time of each trial by integrating the spline functions theory with nonlinear programming to eliminate the need of solving the system of ordinary differential equations (ODEs) . Rodriguez-Fernandez et al.  suggested a hybrid optimization method to speed up the convergence toward the global optima. A variety of different algorithms has also been adapted to perform the inverse problem. A comprehensive list of such studies is provided in .
Furthermore, heuristic approaches have been developed to address the optimization problem in fitting parameters in oscillatory systems [9, 10, 11]. These methods improved the optimization by constructing error functions based on the features extracted from the data. Locke et al.  proposed a cost function based on the comparison of entrained period, phase and strength of oscillation for the circadian clock in Arabidopsis thaliana. Also, Zeilinger et al.  performed another parameter estimation approach for the A. thaliana model by investigating amplitudes of some species in dark/light cycles, periods under dark and light conditions and the period of one mutant phenotype under constant light. In , Bagheri et al. built up an optimization process to model Drosophila melanogaster circadian clock by defining three cost functions based on free running period, light/dark entrained period, differences in amplitude and differences in the phase of the components in the system. These methods are more applicable for problems where characteristics in the system and/or data can be exploited to improve the performance of the parameter estimation. These methods, however, require more information about the system than purely data-driven comparison methods. For instance, the cost function proposed in  needs the period information of both the light and dark cycles of their investigated model, which requires a greater level of first principles knowledge. These methods are also model specific, which makes it difficult to apply them to general oscillatory systems. For example, the dark/light cycle characteristics that were introduced in parameter fitting problem of  may not be a suitable feature for parameter fitting of non-circadian biorhythms.
This article focuses on the problem of estimating the kinetic parameters in oscillatory biochemical systems. We show that periodicity in the measurements of oscillatory systems results in irregularly surface properties of the LS cost function leading to numerous local minima. These multiple local optima cause premature convergence of even robust optimization algorithms. This eventually results in incorrect estimates, bad predictions of dynamics, and incorrect acceptance of functional hypotheses. This, compounded with uncertainties or noisy measurements leads to a difficult estimation problem to solve.
We develop a parameter estimation framework to address these issues by integrating information of oscillatory systems in the modeling process (parameter estimation and denoising). This periodic information is used to build a cost function with better surface properties. Our proposed cost function takes advantage of the basic properties of these oscillatory systems, which allows us to generalize our cost function to a variety of biochemical systems with sustained oscillations. The proposed cost function also needs less first principles knowledge to generate the cost function in comparison to the previous methods that was developed for oscillatory systems [9, 10, 11]. We verified for three oscillatory biochemical systems that our proposed cost function results in increased ability to estimate accurate kinetic parameters as compared to the traditional LS cost function. We combined this cost 6 function with an improved denoising method that also leverages periodic characteristics embedded in the measurements to effectively reduce noise. The results provide strong evidence on the efficacy of this noise removal approach over the previous commonly used wavelet hard-thresholding noise removal method. This proposed optimization framework results in more accurate kinetic parameters that will eventually lead to biochemical models that are more accurate, predictable, and controllable.
Here, x ∈ ℝm×1is the state vector of the m components of the pathway, p ∈ ℝn×1is the vector of n kinetic parameters, f: ℝm×1→ ℝm×1is a nonlinear vector function, x0 ∈ ℝm×1is the vector of the initial component concentrations at time t0 and t0< t < t e represents the time of interest.
Here, x ij , is the measurement at time j of the i th state of the system, is the reproduced data at time j for the i th state of the system given some parameter p, N m is the number of time points where measurements are obtained and N x is the number of measured outputs (in this manuscript, they are considered to be the measured states of the system).
2.1 Fundamental frequency estimation
the smallest value of T ≠ 0 for which (4) is valid is the "fundamental period" of oscillation. The inverse of the fundamental period is the fundamental frequency (f0). Several approaches has been proposed to estimate f0. We used a frequency-based method called component frequency ratio to extract the fundamental frequency of the measured data due to the fact that the time-series methods may not be adequate for biochemical measurements due to their low rate of sampling and low temporal resolution. This method starts with transforming the data to the Fourier domain by taking their Fourier transform. The locations of the peaks in the spectrum are then identified. The peaks in the frequency spectrum are the harmonics of the fundamental frequency. The final step is to find the greatest common factor of these frequencies in which peaks occur.
2.1.1 Effect of noise on estimation of f0
Figure 2 shows that the method used to estimate the fundamental period is robust enough to the additive noise.
2.2 Removing noise
2.2.1 Improving the hard-thresholding method in oscillatory systems
- 1.Partition the measurements X(nT s ) based on their calculated fundamental period to the sets of X k according to (5)(5)
- 2.Shift each x k by the value of . This will result in a single period of the measurements with higher resolution. The shifted versions of x k 's are calculated based on (6)(6)
Wavelet decomposition, thresholding and reconstruction are then applied to this "shifted version" of the noisy data. MATLAB was used to implement a three level wavelet decomposition using the "Daubechies 6" wavelet and the threshold value equaling 0.3. The wavelet type, number of levels, and the threshold value were chosen empirically and may vary from system to system. The results are shown in Figure 6c. The final step is to reconstruct the original signal by shifting the samples back to their respective periods (Figure 6d).
2.2.2 The effect of error in estimating f0on proposed denoising method
Figure 8 shows that the results of the proposed denoising method has lower RMSEs than the traditional wavelet thresholding with small errors in the estimation fundamental period. However, if the fundamental period is estimated with errors approximately more than 0.25 for these models, the proposed method does not yield lower RMSEs. However, Figure 2 shows that the error in fundamental period estimation due to noise is much smaller than the order of error that is considered in Figure 8.
2.3.1 Forming cost function
Figure 9 shows significant rippling especially along the f direction of the LS cost function. This happens due to the varying degree of overlap between various periods of two oscillatory signals in the LS objective function along the f axis. This potentially results in numerous local basins of attractions that hinder the optimizer's ability to find the global optimum. These ripples are fundamental characteristics of the LS cost function for systems with oscillatory dynamics. This phenomenon can be observed for a large class of oscillatory systems especially along the parameter axes to which the fundamental frequency is more sensitive.
If the simulated data are periodic, we introduced only the samples of one period of the data into the cost function. Likewise, only the samples of one period of the measurements will also be incorporated into this cost function. If the fundamental period of the measured data is not equal to the fundamental period of the simulated data, the signal with the smallest period is padded with zeros until the lengths of the signals are equal. This results in monotonic changes in error with respect to changes in fundamental period of the simulated data.
Here, x ij is the measurement at time t j of the i th state of the system, is the simulated data at time t j for the i th state of the system. z ij and are the truncated and zero padded x ij and , respectively, for the oscillatory . For non oscillatory , z ij , and are equal to x ij and , respectively. is the length of the z i and . N x is the number of states of the system, T i is the fundamental period of the measurements (x i ), which was computed using the component frequency ratio approach and is the fundamental period of the simulated data , which is estimated for each candidate parameter value. was estimated using the YIN approach , which is a modified version of the time-domain autocorrelation method.
The effect of error in estimating f0on the performance of the cost function
The performance of the proposed cost function (8) is not affected significantly by errors in the estimation of the fundamental frequency of the measurements. This is because of the fact that the measurements used in (8) have a certain sampling rate. Basically, if the error of the estimated fundamental period is small with respect to this sampling rate, it will not affect the number of samples that lies in one fundamental period of the data. Also, adding or reducing one sample in the summation of (8) obviously will not change the performance of the proposed cost function dramatically.
2.3.2 The optimization method
The optimization of the proposed cost function was performed using a hybrid approach. Hybrid methods, i.e. the combinations of global and local search methods, have been shown to yield results with smaller errors than global searches individually [12, 23]. The global search algorithm that we adopt in this study is the "Genetic Algorithm", which is a widely-used approach of a class of global search methods called evolutionary strategies. We used two consecutive local search methods of MATLAB  in this research. The first one was the derivative-based, constrained routine of fmincon, and the second one was the derivative-free routine of fminsearch that is based on the simplex algorithm .
The global optimization (MATLAB ga routine).
The first local optimization (MATLAB fmincon routine)
The second local optimization (MATLAB fminsearch routine)
3.1 Comparison of two different cost function
Here, N x , x ij , and are defined as (3) and is the number of samples that are extracted in (0 < t < T i ) assuming T i is the fundamental period of the x i .
3.2 Parameter estimation results for two-state Tyson model
Figure 14a shows that the system produces sustained oscillations only for k m in the range [0.03 0.44]. The fundamental period of the sustained oscillations falls from 58 to 6.6 along this range. This radical change in the fundamental period produces irregularities in the LS cost function over this interval. However, the proposed cost function maintains good surface properties in spite of this extreme change in the fundamental period of the system. This emphasizes that our proposed cost function addresses the issue of surface irregularities of the LS cost function caused by introducing multiple periods of the data in calculating the error. Figure 14b shows similar results.
Figure 14c, d shows that the fundamental period for different values of P crit is between 15.4 and 25.4 which is less than the changes in fundamental period that shown in Figure 14a, b. The LS cost function still shows varying levels of surface irregularities particularly along the P crit axis. The proposed cost function again shows smoother surface characteristics under these conditions as well.
3.2.1 Results of parameter estimation
The results of optimization with minimum score for Tyson model.
Estimation of the proposed cost function
Estimation of the LS cost function
Statistics of optimization results for Tyson model.
The optimized results with the lowest score out of 15 runs for the LS cost function and the proposed cost function are shown in Table 1.
The estimate results in the lowest score using noise-free measurements produces six of nine kinetic parameters with less than 10% errors (results not shown). Table 1 shows that the noisy case results in four of nine estimated parameters with more than 10% error. In both cases, proposed cost function yields more accurate results in comparison to the LS cost function. The large number of inaccuracies for the noisy case is more a result of system sloppiness versus inaccuracies of the estimation procedure [35, 36], which results in 21 a wide range of parameters with similar system dynamics. It is evident that our proposed cost function was able to produce better overall system dynamics than the traditional LS cost function, which is clearly conveyed by the lower overall error. Our proposed method, similar to the LS cost function, only takes into account the accuracy of dynamics. Thus, the sloppiness can results in moderate level of parameter accuracy. Recently, Apgar et al. proposed an experiment design framework to improve estimates of sloppy parameters in biochemical models . This, however, is beyond the scope of this article.
3.3 Parameter estimation for two-state Brusselator model
Figure 16a shows the fundamental period of sustained oscillation falls from 45.9 to 4.3 for k1 in the range [0.7 2.8]. This change in the fundamental period again produce irregularities in the LS cost function over this interval. The proposed cost function, on the other hand, maintains good surface properties in spite of this change in the fundamental period of the 22 system. This further verifies that the proposed cost function is able to address the irregularities of the LS cost function resulting from sustained dynamics embedded in the dynamics used to evaluate the cost function.
3.3.1 Results of parameter estimation
The results of optimization with minimum score for Brusselator model.
Estimation of the proposed cost function
Estimation of the LS cost function
Statistics of optimization results for Brusselator model.
The derived results with the lowest score out of 15 runs for the LS cost function and the proposed cost function are in Table 3.
Table 3 shows that the resulting overall error for the proposed cost function is lower than that of the LS cost function. All four parameters were estimated incorrectly using the LS cost function, while they were estimated almost accurately using the proposed cost function.
3.4 Parameter estimation results for five-state Goldbeter model
It could be seen in all figures that the changes in period of the oscillation does not produce significant irregularities in the LS cost function surface, which is different than previous examples. Figure 18b, for instance, shows the changes of period for k2 in the interval [0.4 2]. However, there are not multiple basins of attractions along the k2 direction in spite of these changes in fundamental period. This is due to the fact that the LS cost function changes over orders of magnitudes along this parameter direction in a way that the produced ripples has little effect on the monotonicity of the LS cost function. This extreme change in the LS cost function (approximately from 400 to 2200 for k2 over the interval [0.4 2]) happens because the peak to peak magnitude of the sustained oscillations of the simulated data also increases in order of magnitudes along this parameter direction. For example, the peak of the [P2] increases from 0.25 to 1.5 for k2 over the interval [0.4 2].
The proposed cost function still shows good surface characteristics although it was not much different than the already favorable characteristics of the LS cost function. Thus, it is expected that both of these cost functions would perform almost similar in the optimization process.
3.4.1 Parameter estimation results
The Result of Optimization with Minimum Score for Goldbeter Model.
Estimation of the proposed cost function
Estimation of the LS cost function
Statistics of optimization results for Goldbeter model.
The derived results with the minimum score out of 15 runs for the LS cost function and the proposed cost function are shown in Table 5.
Table 5 shows that 8 out of 18 parameters were estimated within 10% of their nominal value for the proposed cost function as opposed to 7 out of 18 for the LS cost function. This shows a wide range of parameters have similar dynamics. This is due to system sloppiness that was also mentioned for the Tyson model. Our proposed cost function takes into account the accuracy of dynamics, which is similar to the LS cost function. Therefore, this may results in moderate accuracy in parameter values because of the sloppiness.
This article addresses the issue of kinetic parameter estimation in oscillatory biochemical systems. We showed that the LS cost function for oscillatory systems results in surface characteristics that potentially hinder the performance of optimization routines used to estimate kinetic parameters. Thus, we suggested a new cost function with more favorable surface properties which leads to improved results for parameter estimation. This cost function integrates temporal information with periodic information embedded in measurements used to estimate these parameters. This generalized cost function also needs less first principles knowledge to generate the cost function in comparison to the previous developed methods for oscillatory systems. We tested our cost function using three benchmark oscillatory biochemical pathways and compared our proposed objective function with the traditional LS cost function in several optimization runs using noisy measurements. The comparison of the results verified that the optimization performed more effectively using our 26 proposed cost function as compared to the traditional LS cost function. Furthermore, we introduced a wavelet hardthresholding approach for noise removal. This modified approach is able to suppress noise in oscillatory data better than the traditional wavelet thresholding approach. This, together with the proposed objective function will result in more accurate kinetic parameters that will eventually lead to biochemical models that are more precise, predictable and controllable. There are, however, unsolved issues with sloppiness of biochemical pathways [35, 36], which require further investigation especially for oscillatory biochemical pathways.
- 7.Perry J: The Ovarian Cycle of Mammals. Oliver and Boyd, Edinburgh; 1971.Google Scholar
- 10.Zeilinger M, Farre E, Taylor S, Kay S, Doyle F: A novel computational model of the circadian clock in Arabidopsis that incorporates PRR7 and PRR9. Mol Syst Biol 2006., 2(58):Google Scholar
- 21.Zhan C, Yeung L: Parameter estimation in systems biology models using spline approximation. BMC Syst Biol 2011., 5(14):Google Scholar
- 24.Gerhard D: Pitch extraction and fundamental frequency history and current techniques. Department of Computer Science, University of Regina, Regina, Canada 2003.Google Scholar
- 32.Inc TM: MATLAB: version 7.6.0. Natick Massachusetts 2008.Google Scholar
- 34.Le Novère N, Bornstein B, Broicher A, Courtot M, Donizelli M, Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL, Hucka M: BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res 2006, 34(suppl 1):D689-D691.CrossRefGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.