Monte-Carlo Simulation of Correlated Binary Responses

  • Trent L. LalondeEmail author
Part of the ICSA Book Series in Statistics book series (ICSABSS)


Simulation studies can provide powerful conclusions for correlated or longitudinal response data, particularly for relatively small samples for which asymptotic theory does not apply. For the case of logistic modeling, it is necessary to have appropriate methods for simulating correlated binary data along with associated predictors. This chapter presents a discussion of existing methods for simulating correlated binary response data, including comparisons of various methods for different data types, such as longitudinal versus clustered binary data generation. The purposes and issues associated with generating binary responses are discussed. Simulation methods are divided into four main approaches: using a marginally specified joint probability distribution, using mixture distributions, dichotomizing non-binary random variables, and using a conditionally specified distribution. Approaches using a completely specified joint probability distribution tend to be more computationally intensive and require determination of distributional properties. Mixture methods can involve mixtures of discrete variables only, mixtures of continuous variables only, and mixtures involving both continuous and discrete variables. Methods that involve discretizing non-binary variables most commonly use normal or uniform variables, but some use count variables such as Poisson random variables. Approaches using a conditional specification of the response distribution are the most general, and allow for the greatest range of autocorrelation to be simulated. The chapter concludes with a discussion of implementations available using R software.


Correlation Structure Success Probability Binary Data Binary Outcome Marginal Probability 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Bahadur, R. R. (1961). A representation of the joint distribution of responses to n dichotomous items. Stanford Mathematical Studies in the Social Sciences, 6, 158–168.MathSciNetzbMATHGoogle Scholar
  2. Devroye, L. (1986). Non-uniform random variate generation (1st ed.). Springer, New York.Google Scholar
  3. Emrich, L. J., & Piedmonte, M. R. (1991). A method for generating high-dimensional multivariate binary variates. The American Statistician: Statistical Computing, 45(4), 302–304.Google Scholar
  4. Farrell, P. J., & Sutradhar, B. C. (2006). A non-linear conditional probability model for generating correlated binary data. Statistics & Probability Letters, 76, 353–361.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43, 521–532.CrossRefzbMATHGoogle Scholar
  6. Gange, S. J. (1995). Generating multivariate categorical variates using the iterative proportional fitting algorithm. The American Statistician, 49(2), 134–138.Google Scholar
  7. Genest, C., & MacKay, R. J. (1986a). Copules archimediennes et familles de lois bidimenionnelles dont les marges sont donnees. Canadian Journal of Statistics, 14, 280–283.CrossRefzbMATHGoogle Scholar
  8. Genest, C., & MacKay, R. J. (1986b). The joy of copulas: Bivariate distributions with uniform marginals. The American Statistician, 40, 549–556.MathSciNetGoogle Scholar
  9. Headrick, T. C. (2002a). Fast fifth-order polynomial transforms for generating univariate and multivariate non normal distributions. Computational Statistics & Data Analysis, 40, 685–711.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Headrick, T. C. (2002b). Jmasm3: A method for simulating systems of correlated binary data. Journal of Modern Applied Statistical Methods, 1, 195–201.MathSciNetCrossRefGoogle Scholar
  11. Headrick, T. C. (2010). Statistical simulation: Power method polynomials and other transformations (1st ed.). Chapman & Hall/CRC, New York.Google Scholar
  12. Headrick, T. C. (2011). A characterization of power method transformations through l-moments. Journal of Probability and Statistics, 2011.Google Scholar
  13. Kang, S. H., & Jung, S. H. (2001). Generating correlated binary variables with complete specification of the joint distribution. Biometrical Journal, 43(3), 263–269.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Kanter, M. (1975). Autoregression for discrete processes mod 2. Journal of Applied Probability, 12, 371–375.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Karian, Z. A., & Dudewicz, E. J. (1999). Fitting the generalized lambda distribution to data: A method based on percentiles. Communications in Statistics: Simulation and Computation, 28, 793–819.CrossRefzbMATHGoogle Scholar
  16. Koran, J., Headrick, T. C., & Kuo, T. C. (2015). Simulating univariate and multivariate no normal distributions through the method of percentiles. Multivariate Behavioral Research, 50, 216–232.CrossRefGoogle Scholar
  17. Lee, A. J. (1993). Generating random binary deviates having fixed marginal distributions and specified degrees of association. The American Statistician: Statistical Computing, 47(3), 209–215.Google Scholar
  18. Lee, Y., & Nelder, J. A. (1996). Hierarchical generalized linear models. Journal of the Royal Statistical Society, Series B (Methodological), 58(4), 619–678.Google Scholar
  19. Lunn, A. D., & Davies, S. J. (1998). A note on generating correlated binary variables. Biometrika, 85(2), 487–490.MathSciNetCrossRefzbMATHGoogle Scholar
  20. Molenberghs, G., & Verbeke, G. (2006). Models for discrete longitudinal data (1st ed.). Springer.Google Scholar
  21. Oman, S. D., & Zucker, D. M. (2001). Modelling and generating correlated binary variables. Biometrika, 88(1), 287–290.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Park, C. G., Park, T., & Shin, D. W. (1996). A simple method for generating correlated binary variates. The American Statistician, 50(4), 306–310.MathSciNetGoogle Scholar
  23. Prentice, R. L. (1988). Correlated binary regression with covariates specific to each binary observation. Biometrics, 44, 1033–1048.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Qaqish, B. F. (2003). A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika, 90(2), 455–463.MathSciNetCrossRefzbMATHGoogle Scholar
  25. Stiratelli, R., Laird, N., & Ware, J. H. (1984). Random-effects models for serial observations with binary response. Biometrics, 40, 961–971.CrossRefGoogle Scholar
  26. Touloumis, A. (2016). Simulating correlated binary and multinomial responses with simcormultres. The Comprehensive R Archive Network 1–5.Google Scholar
  27. Vale, C. D., & Maurelli, V. A. (1983). Simulating multivariate no normal distributions. Psychometrika, 48, 465–471.CrossRefzbMATHGoogle Scholar
  28. Zeger, S. L., & Liang, K. Y. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics, 42, 121–130.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  1. 1.Department of Applied Statistics and Research MethodsUniversity of Northern ColoradoGreeleyUSA

Personalised recommendations