Empirical Copula in the Detection of Batch Effects

  • Melih Ağraz
  • Vilda PurutçuoğluEmail author
Part of the Nonlinear Systems and Complexity book series (NSCH, volume 24)


The activation of the complex biological systems is presented by different mathematical expressions, called models, under various assumptions. One of the common modeling types in this description is the steady-state approach. In this description, we assume that the stochastic behavior of the system may not be observed under the constant volume and the temperature, and the mean change in the states of the system’s components is bigger than the variation of the states. Since this sort of the system’s representation needs less information about the actual biological activation, and majority of the collected data is more suitable for this approach with respect to its stochastic alternates, it is the most common modeling type in the presentation of the biological networks. In this study, we particularly deal with the steady-state type of model and suggest a preprocessing step for the raw data that is based on the transformation via the empirical copula. Here, we use the empirical copula, also called the normal copula, for eliminating the batch effects in the measurements so that the new data can fit the multivariate normal distribution. Then, we implement both parametric and nonparametric models in order to describe the final transformed measurements. In the description of the systems, we choose the Gaussian graphical model as the parametric modeling approach and select the probabilistic Boolean as well as the lasso-based MARS model as its correspondence under the nonparametric representation. Finally, in the analyses, we evaluate the performance of all suggested models and the effect of the empirical copula based on various accuracy measures via Monte Carlo studies.


Empirical Copula Batch Effects Multivariate Adaptive Regression Splines (MARS) Gaussian Graphical Models (GGM) MARS Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Abraham, A., Steinberg, D.: Is neural network a reliable forecaster on Earth? Bio-inspired applications of connectionism. In: Mira, J., Prieto, A. (eds.) Bio-Inspired Applications of Connectionism (IWANN 2001). Lecture Notes in Computer Science, vol. 2085. Springer, Heidelberg (2001)Google Scholar
  2. 2.
    Ağraz, M., Purutçuoğlu, V.: Transformations of data in deterministic modelling of biological networks. In: Anastassiou, G., Duman, O. (eds.) Intelligent Mathematics II: Applied Mathematics and Approximation Theory. Advances in Intelligent Systems and Computing, vol. 441. Springer, Cham (2016)Google Scholar
  3. 3.
    Ayyıldız, E., Ağraz, M., Purutçuoğlu, V.: MARS as the alternative approach of GGM in modelling of biochemical systems. J. Appl. Stat. 44(16), 2858–2876 (2017)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Banerjee, O., Ghaoui, L.E., Aspremont, D.A.: Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. Mach. Learn. Res. 9, 485–516 (2008)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Barabási, A.L., Oltvaii, Z.N.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5(2), 101–113 (2004)CrossRefGoogle Scholar
  6. 6.
    Chen, V.C.P., Günter, D., Johnson, E.L.: Solving for an optimal airline yield management policy via statistical learning. Appl. Stat. 52(1), 19–30 (2003)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Deichmann, J., Eshghi, A., Haughton, D., Sayek, S., Teebagy, N.: Application of multiple adaptive regression splines (MARS) in direct response modelling. JIM 16(4), 15–27 (2002)Google Scholar
  8. 8.
    Dempster, A.: Covariance selection. Biometrics 28, 157–175 (1972)CrossRefGoogle Scholar
  9. 9.
    Drton, M., Perlman, M.D.: A SINful approach to Gaussian graphical model selection. J. Stat. Plan. Inference. 138, 1179–1200 (2008)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Fernández, J.R.A., Muñiz, C.D., Nieto, P.J.G., Juez, F.J.C., Lasheras, F.S., Roqueñi, M.N.: Forecasting the cyanotoxins presence in fresh waters: a new model based on genetic algorithms combined with the MARS technique. Ecol. Eng. 53, 68–78 (2016)CrossRefGoogle Scholar
  11. 11.
    Friedman, J.H., Silverman, B.: Multidimensional additive spline approximation. SIAM J. Sci. Comput. 4(2), 291–301 (1991)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008)CrossRefGoogle Scholar
  13. 13.
    Gillespie, D.: The chemical Langevin equation. J. Chem. Phys. 113(1), 297–306 (2000)CrossRefGoogle Scholar
  14. 14.
    Meinshaussen, N., Buhlmann, P.: High dimensional graphs and variable selection with the lasso. Ann. Stat. 34(3), 1436–1462 (2006)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Purutçuoğlu, V., Ağraz, M., Wit, E.: Bernstein approximations in glasso-based estimation of biological networks. Can. J. Stat. 45(1), 62–76 (2017)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, New York (1990)zbMATHGoogle Scholar
  17. 17.
    Yuan, M., Lin, Y.: Model selection and estimation in the gaussian graphical model. Biometrika 94, 19–35 (2007)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Zhao, T., Liu, H., Roeder, K., Lafferty, J., Wasserman, L.: The huge package for high-dimensional undirected graph estimation in R. J. Mach. Learn. Res. 13, 1059–1062 (2012)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of StatisticsMiddle East Technical UniversityAnkaraTurkey

Personalised recommendations