# Monte-Carlo Simulation of Correlated Binary Responses

## Abstract

Simulation studies can provide powerful conclusions for correlated or longitudinal response data, particularly for relatively small samples for which asymptotic theory does not apply. For the case of logistic modeling, it is necessary to have appropriate methods for simulating correlated binary data along with associated predictors. This chapter presents a discussion of existing methods for simulating correlated binary response data, including comparisons of various methods for different data types, such as longitudinal versus clustered binary data generation. The purposes and issues associated with generating binary responses are discussed. Simulation methods are divided into four main approaches: using a marginally specified joint probability distribution, using mixture distributions, dichotomizing non-binary random variables, and using a conditionally specified distribution. Approaches using a completely specified joint probability distribution tend to be more computationally intensive and require determination of distributional properties. Mixture methods can involve mixtures of discrete variables only, mixtures of continuous variables only, and mixtures involving both continuous and discrete variables. Methods that involve discretizing non-binary variables most commonly use normal or uniform variables, but some use count variables such as Poisson random variables. Approaches using a conditional specification of the response distribution are the most general, and allow for the greatest range of autocorrelation to be simulated. The chapter concludes with a discussion of implementations available using R software.

## Keywords

Correlation Structure Success Probability Binary Data Binary Outcome Marginal Probability## References

- Bahadur, R. R. (1961). A representation of the joint distribution of responses to n dichotomous items.
*Stanford Mathematical Studies in the Social Sciences*,*6*, 158–168.MathSciNetzbMATHGoogle Scholar - Devroye, L. (1986).
*Non-uniform random variate generation*(1st ed.). Springer, New York.Google Scholar - Emrich, L. J., & Piedmonte, M. R. (1991). A method for generating high-dimensional multivariate binary variates.
*The American Statistician: Statistical Computing*,*45*(4), 302–304.Google Scholar - Farrell, P. J., & Sutradhar, B. C. (2006). A non-linear conditional probability model for generating correlated binary data.
*Statistics & Probability Letters*,*76*, 353–361.MathSciNetCrossRefzbMATHGoogle Scholar - Fleishman, A. I. (1978). A method for simulating non-normal distributions.
*Psychometrika*,*43*, 521–532.CrossRefzbMATHGoogle Scholar - Gange, S. J. (1995). Generating multivariate categorical variates using the iterative proportional fitting algorithm.
*The American Statistician*,*49*(2), 134–138.Google Scholar - Genest, C., & MacKay, R. J. (1986a). Copules archimediennes et familles de lois bidimenionnelles dont les marges sont donnees.
*Canadian Journal of Statistics*,*14*, 280–283.CrossRefzbMATHGoogle Scholar - Genest, C., & MacKay, R. J. (1986b). The joy of copulas: Bivariate distributions with uniform marginals.
*The American Statistician*,*40*, 549–556.MathSciNetGoogle Scholar - Headrick, T. C. (2002a). Fast fifth-order polynomial transforms for generating univariate and multivariate non normal distributions.
*Computational Statistics & Data Analysis*,*40*, 685–711.MathSciNetCrossRefzbMATHGoogle Scholar - Headrick, T. C. (2002b). Jmasm3: A method for simulating systems of correlated binary data.
*Journal of Modern Applied Statistical Methods*,*1*, 195–201.MathSciNetCrossRefGoogle Scholar - Headrick, T. C. (2010).
*Statistical simulation: Power method polynomials and other transformations*(1st ed.). Chapman & Hall/CRC, New York.Google Scholar - Headrick, T. C. (2011). A characterization of power method transformations through l-moments.
*Journal of Probability and Statistics*,*2011*.Google Scholar - Kang, S. H., & Jung, S. H. (2001). Generating correlated binary variables with complete specification of the joint distribution.
*Biometrical Journal*,*43*(3), 263–269.MathSciNetCrossRefzbMATHGoogle Scholar - Kanter, M. (1975). Autoregression for discrete processes mod 2.
*Journal of Applied Probability*,*12*, 371–375.MathSciNetCrossRefzbMATHGoogle Scholar - Karian, Z. A., & Dudewicz, E. J. (1999). Fitting the generalized lambda distribution to data: A method based on percentiles.
*Communications in Statistics: Simulation and Computation*,*28*, 793–819.CrossRefzbMATHGoogle Scholar - Koran, J., Headrick, T. C., & Kuo, T. C. (2015). Simulating univariate and multivariate no normal distributions through the method of percentiles.
*Multivariate Behavioral Research*,*50*, 216–232.CrossRefGoogle Scholar - Lee, A. J. (1993). Generating random binary deviates having fixed marginal distributions and specified degrees of association.
*The American Statistician: Statistical Computing*,*47*(3), 209–215.Google Scholar - Lee, Y., & Nelder, J. A. (1996). Hierarchical generalized linear models.
*Journal of the Royal Statistical Society, Series B (Methodological)*,*58*(4), 619–678.Google Scholar - Lunn, A. D., & Davies, S. J. (1998). A note on generating correlated binary variables.
*Biometrika*,*85*(2), 487–490.MathSciNetCrossRefzbMATHGoogle Scholar - Molenberghs, G., & Verbeke, G. (2006).
*Models for discrete longitudinal data*(1st ed.). Springer.Google Scholar - Oman, S. D., & Zucker, D. M. (2001). Modelling and generating correlated binary variables.
*Biometrika*,*88*(1), 287–290.MathSciNetCrossRefzbMATHGoogle Scholar - Park, C. G., Park, T., & Shin, D. W. (1996). A simple method for generating correlated binary variates.
*The American Statistician*,*50*(4), 306–310.MathSciNetGoogle Scholar - Prentice, R. L. (1988). Correlated binary regression with covariates specific to each binary observation.
*Biometrics*,*44*, 1033–1048.MathSciNetCrossRefzbMATHGoogle Scholar - Qaqish, B. F. (2003). A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations.
*Biometrika*,*90*(2), 455–463.MathSciNetCrossRefzbMATHGoogle Scholar - Stiratelli, R., Laird, N., & Ware, J. H. (1984). Random-effects models for serial observations with binary response.
*Biometrics*,*40*, 961–971.CrossRefGoogle Scholar - Touloumis, A. (2016). Simulating correlated binary and multinomial responses with simcormultres.
*The Comprehensive R Archive Network*1–5.Google Scholar - Vale, C. D., & Maurelli, V. A. (1983). Simulating multivariate no normal distributions.
*Psychometrika*,*48*, 465–471.CrossRefzbMATHGoogle Scholar - Zeger, S. L., & Liang, K. Y. (1986). Longitudinal data analysis for discrete and continuous outcomes.
*Biometrics*,*42*, 121–130.Google Scholar