Abstract
The gene regulatory network (GRN) is a complex control system and plays a fundamental role in the physiological and development processes of living cells. Focusing on the ordinary differential equation (ODE) modeling approach, we propose a novel pipeline for constructing high-dimensional dynamic GRNs from genome-wide time course gene expression data. A five-step procedure, i.e., detection of temporally differentially expressed genes, clustering genes into functional modules, identification of network structure, parameter estimate refinement and functional enrichment analysis, is developed, combining a series of cutting-edge statistical techniques to efficiently reduce the dimension of the problem and to account for the correlations between measurements from the same gene. In the key step of identifying the network structure, we employ the advanced parameter estimation and statistical inference methods to perform model selection for the ODE models. The proposed pipeline is a computationally efficient data-driven tool bridging the experimental data and the mathematical modeling and statistical analysis. The application of the pipeline to the time course gene expression data from influenza-infected mouse lungs has led to some interesting findings of the immune process in mice and also illustrated the usefulness of the proposed methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
BANSAL, M., GATTA, G. and DI BERNARDO, D. (2006). Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics22 815–822.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 57 289–300.
Chen, H.-C., Lee, H.-C., Lin, T.-Y., Li, W.-H. and Chen, B.-S. (2004). Quantitative characterization of the trancriptional regulatory network in the yeast cell cycle. Bioinformatics 20 1914–1927.
Chen, T., He, H. and Church, G. (1999). Modeling gene expression with differential equations. Pacific Symposium on Biocomputing 29–40.
DE JONG, H. (2002). Modeling and simulation of genetic regulatory systems: a literature review. Journal of Computational Biology 9 67–103.
Delyon, B., Lavielle, M. and Moulines, E. (1999). Convergence of a stochastic approximation version of the em algorithm. The Annals of Statistics 27 94–128.
Eisen, M., Spellman, P., Brown, P. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Science USA 95 14863–14868.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96 1348–1360.
Friedman, N., Linial, M., Nachman, I. and Pe’er, D. (2000). Using bayesian networks to analyze expression data. Journal of Computational Biology 7 (3–4) 601–620.
HARTIGAN, J. and Wong, M. (1979). Algorithm AS 136: A K–means clustering algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics) 28 100–108.
Hecker, M., Lambecka, S., Toepferb, S., Somerenc, E. and Guthke, R. (2009). Gene regulatory network inference: Data integration in dynamic models-a review. BioSystems 96 86–103.
Heckerman, D. (1996). A tutorial on learning with bayesian networks. Tech. rep., Microsft Research.
Heinrich, R. and Schuster, S. (1996). The regulation of cellular systems. Chapman and Hall.
Hirose, O., Yoshida, R., Imoto, S., Yamaguchi, R., Higuchi, T., Charnock-Jones, D. S., Print, C. and Miyano, S. (2008). Statistical inference of transcriptional module-based gene networks from time course gene expression profiles by using state space models. Bioinformatics 24 932–942.
HOLTER, N. S., Maritan, A., Cieplak, M., Fedoroff, N. V. and Banavar, J. R. (2001). Dynamic modeling of gene expression data. Proceedings of the National Academy of Science USA 98 1693–1698.
Huang, D., Sherman, B. and Lempicki, R. (2009). Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protocols 4 44–57.
Jeong, H., Tombor, B., Albert, R., Oltvai, Z. and Barabási, A.-L. (2000). The large-scale organization of metabolic networks. Nature 407 651.
KIKUCHI, S., TOMINAGA, D., ARITA, M., TAKAHASHI, K. and Tomita, M. (2003). Dynamic modeling of genetic networks using genetic algorithm and S–system. Bioinformatics 19 643–650.
Kim, Y., Choi, H. and Oh, H.-S. (2008). Smoothily clipped absolute deviation on high dimensions. Journal of the American Statistical Association 103 1665–1673.
Kimura, S., Ide, K., Kashihara, A., Kano, M., Hatakeyama, M., Masui, R., Nakagawa, N., Yokoyama, S., Kuramitsu, S. and Konagaya, A. (2005). Inference of s-system models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics 21 1154–1163.
Kohonen, T. (1997). Self-Organizing Maps. Springer, New York.
Kojima, K., Yamaguchi, R., Imoto, S., Yamauchi, M., Nagasaki, M., Yoshida, R., Shimamura, T., Ueno, K., Higuchi, T., Gotoh, N. and Miyano, S. (2009). A state space representation of var models with sparse learning for dynamic gene networks. Genome Informatics 22 56–68.
KUHN, E. and Lavielle, M. (2004). Coupling a stochastic approximation version of EM with a MCMC procedure. ESAIM: Probability and Statistics 8 115–131.
Liang, H. and Wu, H. (2008). Parameter estimation for differential equation models using a framework of measurement error in regression models. Journal of the American Statistical Association 103 15701583.
Lu, T., Liang, H., Li, H. and Wu, H. (2011). High dimensional odes coupled with mixedeffects modeling techniques for dynamic gene regulatory network identification. Journal of the American Statistical Association 106 1242–1258.
Luan, Y. and Li, H. (2003). Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics 19 474–482.
Ma, P., Castillo-Davis, C., Zhong, W. and Liu, J. (2006). A data-driven clustering method for time course gene expression data. Nucleic Acids Research 34 1261–1269.
Perrin, B., Ralaivola, L., Mazurie, A., Bottani, S., Mallet, J. and d’Alché Buc, F. (2003). Gene networks inference using dynamic bayesian networks. Bioinformatics 19 (Suppl. 2) ii138–148.
Pommerenke, C., Wilk, E., Srivastava, B., Schulze, A., Novoselova, N., Geffers, R. and Schughart, K. (2012). Global transcriptome analysis in influenza-infected mouse lungs reveals the kinetics of innate and adaptive host immune responses. PLoS One 7 e41169.
SHMULEVICH, I., DOUGHERTY, E. R., KIM, S. and ZHANG, W. (2002). Probabilistic boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics18 261–274.
Steuer, R., Kurths, J., Daub, C. O., Weise, J. and Selbig, J. (2002). The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics 18 S231–S240.
STUART, J.M., SEGAL, E., KOLLER, D. and KIM, S. K. (2003). A gene-coexpression network for global discovery of conserved genetic modules. Science 302 249–255.
THOMAS, R. (1973). Boolean formalization of genetic control circuits. Journal of Theoretical Biology 42 563–585.
TIBSHIRANI, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 58 267–288.
VOIT, E. O. and ALMEIDA, J. (2004). Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics 22 1670–1681.
Wu, H., Xue, H. and Kumar, A. (2012). Numerical discretization-based estimation methods for ordinary differential equation models via penalized spline smoothing with applications in biomedical research. Biometrics 68 344–352.
Wu, H. and Zhang, J.-T. (2005). Nonparametric regression methods for longitudinal data analysis. Wiley, New York.
Wu, S. and Wu, H. (2013). More powerful significant testing for time course gene expression data using functional principal component analysis approaches. BMC Bioinformatics 14 6.
Yeung, M. K. S., Tegner, J. and Collins, J. J. (2002). Reverse engineering gene networks using singular value decomposition and robust regression. Proceedings of the National Academy of Science USA 99 6163–6168.
Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. The Annals of Statistics 36 1567–1594.
ZOU, M. and CONZEN, S. (2005). A new dynamic bayesian network approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 21 71–79.
Acknowledgements
This research was partially supported by the NIH grants HHSN 272201000055C, AI087135, and the University of Rochester CTSI pilot award (UL1RR024160) from the National Center For Research Resources.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Wu, S., Liu, ZP., Qiu, X., Wu, H. (2013). High-Dimensional Ordinary Differential Equation Models for Reconstructing Genome-Wide Dynamic Regulatory Networks. In: Hu, M., Liu, Y., Lin, J. (eds) Topics in Applied Statistics. Springer Proceedings in Mathematics & Statistics, vol 55. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7846-1_15
Download citation
DOI: https://doi.org/10.1007/978-1-4614-7846-1_15
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7845-4
Online ISBN: 978-1-4614-7846-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)