Abstract
We propose a Multivariate Logistic Distance (MLD) model for the analysis of multiple binary responses in the presence of predictors. The MLD model can be used to simultaneously assess the dimensional/factorial structure of the data and to study the effect of the predictor variables on each of the response variables. To enhance interpretation, the results of the proposed model can be graphically represented in a biplot, showing predictor variable axes, the categories of the response variables and the subjects’ positions. The interpretation of the biplot uses a distance rule. The MLD model belongs to the family of marginal models for multivariate responses, as opposed to latent variable models and conditionally specified models. By setting the distance between the two categories of every response variable to be equal, the MLD model becomes equivalent to a marginal model for multivariate binary data estimated using a GEE method. In that case the MLD model can be fitted using existing statistical packages with a GEE procedure, e.g., the genmod procedure from SAS or the geepack package from R. Without the equality constraint, the MLD model is a general model which can be fitted by its own right. We applied the proposed model to empirical data to illustrate its advantages.
Article PDF
Similar content being viewed by others
References
ACITO, F., and ANDERSON, R.D. (1986), “A Simulation Study of Factor Score Indeterminacy”, Journal of Marketing Research, 23, 111–118.
AGRESTI, A. (2002), Categorical Data Analysis (2nd ed.), New York: John Wiley and Sons.
AKAIKE, H. (1973), “Information Theory and an Extension of the Maximum Likelihood Principle”, in Proceedings of the Second International Symposium on Information Theory, eds. B.N. Petrov and F. Csaki, Budapest: Akademiai Kiado, pp. 267–281.
ASAR, Ö., and ILK, Ö. (2013), “mmm: An R Package for Analyzing Multivariate Longitudinal Data with Multivariate Marginal Models”, Computer Methods and Programs in Biomedicine, 112, 649–654.
BEESDO-BAUM, K. et al. (2009), “The Structure of Common Mental Disorders: A Replication Study in a Community Sample of Adolescents and Young Adults”, International Journal of Methods in Psychiatric Research, 18, 204–220.
BOOMSMA, A., and HOOGLAND, J.J. (2001), “The Robustness of LISREL Modeling Revisted”, in Structural Equation Modeling: Present and Future, eds. R. Cudeck, S. de Toit, and D.Sörbom, Chicago: Scientific Software International, pp. 139–168.
BORG, I. , and GROENEN, P.J.F. (2005), Modern Multidimensional Scaling: Theory and Applications (2nd ed.), New York: Springer.
BULL, S.B. (1998), “Regression Models for Multiple Outcomes in Large Epidemiological Studies”, Statistics in Medicine, 17, 2179–2197.
CHENG, G., YU, Z., and HUANG, J.Z. (2013), “The Cluster Bootstrap Consistency in Generalized Estimating Equations”, Journal of Multivariate Analysis, 115, 33–47.
COSTA, P.T., and MCCRAE, R.R. (1992), Revised NEO Personality Inventory (NEO-PRI) and NEO Five-Factor Inventory (NEO- FFI) Professional Manual, Odessa, FL: Psychological Assessment Resources.
DE ROOIJ, M. (2009), “Ideal Point Discriminant Analysis with a Special Emphasis on Visualization”, Psychometrika, 74, 317–330.
DE ROOIJ, M., and WORKU, H.M. (2012), “A Warning Concerning the Estimation of Multinomial Logistic Models with Correlated Responses in SAS”, Computer Methods and Programs in Biomedicine, 107(2), 341–346.
ELLIOT, D.S., HUIZINGA, D., and MENARD, S. (1989), Multiple Problem Youth: Delinquency, Substance Use, and Mental Health Problems, New York: Springer-Verlag.
FITZMAURICE, G., DAVIDIAN, M., VERBEKE, G., and MOLENBERGHS, G. (2008), Longitudinal Data Analysis, London: Chapman and Hall.
GABRIEL, K.R. (1971), “The Biplot Graphical Display of Matrices with Application to Principal Component Analysis”, Biometrika, 58, 453–467.
GIFI, A. (1990), Nonlinear Multivariate Analysis, Chichester: John Wiley and Sons.
GOWER, J.C., and HAND, D.J. (1996), Biplots, London: Chapman and Hall.
GOWER, J.C., LUBBE, S., and LE ROUX, N. (2011), Understanding Biplots, Chichester: John Wiley and Sons Ltd.
HALEKOH, U., HOJSGAARD, S., and YAN, J. (2006), “The R Package geepack for Generalized Estimating Equations”, Journal of Statistical Software, 15(2), 1–11.
HUBBARD, A.E. et al. (2010), “To GEE or Not to GEE: Comparing Population Averaged and Mixed Models for Estimating the Associations Between Neighborhood Risk Factors and Health”, Epidemiology, 21(4), 467–474.
KRUEGER, R.F. (1999), “The Structure of Common Mental Disorders”, Archives of General Psychiatry, 56, 921–926.
KRUSKAL, J.B., and WISH, M. (1978), Multidimensional Scaling, Sage Publications.
LIANG, K.Y., and ZEGER, S.L. (1986), “Longitudinal Data Analysis Using Generalised Linear Models”, Biometrika, 73, 13–22.
LIANG, K.Y., ZEGER, S.L., and QAQISH, B. (1992), “Multivariate Regression Analyses for Categorical Data, Journal of the Royal Statistical Society, Series B (Methodological), 54(1), 3–40.
LIPSITZ, S.R., KIM, K., and ZHAO, L.P. (1994), “Analysis of Repeated Categorical Data Using Generalized Estimating Equations”, Statistics in Medicine, 14, 1149–1163.
MCCULLAGH, P., and NELDER, J.A. (1989), Generalized Linear Models, London: Chapman and Hall.
PAN, W. (2001), “Akaike’s Information Criterion in Generalized Estimating Equations”, Biometrics, 57, 120–125.
PARK, T. (1994), “Multivariate Regression Models for Discrete and Continuous Repeated Measurements”, Communications in Statistics - Theory and Methods, 23, 1547–1564.
PENNINX, B.W. et al. (2008), “The Netherlands Study of Depression and Anxiety (NESDA): Rationale, Objectives and Methods”, International Journal of Methods in Psychiatric Research, 17, 121–140.
PLEWIS, I. (1996), “Statistical Methods for Understanding Cognitive Growth: A Review, A Synthesis and An Application”, British Journal of Mathematical and Statistical Psychology, 49, 25–42.
R DEVELOPMENT CORE TEAM (2013), “R: A Language and Environment for Statistical Computing”, Computer Software Manual Version 3.0.2, Vienna, Austria, http://www.r-project.org/.
SAS INSTITUTE INC. (2011), “SAS/STAT Software”, Computer Software Manual Version 9.3, Cary, NC, http://www.sas.com.
SHERMAN, M., and LE CESSIE, S. (1997), “A Comparison Between Bootstrap Methods and Generalized Estimating Equations for Correlated Outcomes in Generalized Linear Models”, Communications in Statistics - Simulation and Computation, 26, 901–925.
SOMMER, A., KATZ, J., and TARWOTJO, I. (1984), “Increased Risk of Respiratory Disease and Diarrhea in Children with Preexsting Mild Vitamin A Deficiency”, American Society for Clinical Nutrition, 40, 1090–1095.
SPINHOVEN, P., DE ROOIJ, M., HEISER, W., PENNINX, B.W.J.H., and SMIT, J. (2009), “The Role of Personality in Comorbidity Among Anxiety and Depressive Disorders in Primary Care and Speciality Care: A Cross-Sectional Analysis”, General Hospital Psychiatry, 31, 470–477.
SPINHOVEN, P., PENELO, E., DE ROOIJ, M., PENNINX, B,W., and ORMEL, J. (2013), “Reciprocal Effects of Stable and Temporary Components of Neuroticism and Affective Disorders: Results of a Longitudinal Cohort Study”, Psychological Medicine, 44, 337–348.
TER BRAAK, C.J.F. (1986), “Canonical Correspondence Analysis: A New Eigenvector Technique for Multivariate Direct Gradient Analysis”, Ecology, 67(5), 1167–1179.
TER BRAAK, C.J.F., and VERDONSCHOT, P.F.M. (1995), “Canonical Correspondence Analysis and Related Multivariate Methods in Aquatic Ecology”, Aquatic Sciences, 57(3), 1015–1621.
VAN DER HEIJDEN, P.G.M., MOOIJAART, A., and TAKANE, Y. (1994), “Correspondence Analysis and Contingency Models”, in Correspondence Analysis in the Social Sciences, eds. M.J. Greenacre and J. Blasius, New York: Academic Press, pp. 79–111.
VON OERTZEN, T., HERTZOG, C., LINDENBERGER, U., and GHISLETTA, P. (2010), “The Effect of Multiple Indicators on the Power to Detect Inter-Individual Differences in Change”, British Journal of Mathematical and Statistical Psychology, 63, 627–646.
WEI, L., and STRAM, D. (1988), “Analysing Repeated Measurements with Possibly Missing Observations by Modeling Marginal Distributions”, Statistics in Medicine, 7, 139–148.
WEI, X. (2012), “%PROC_R: A SAS Macro That Enables Native R Programming in the Base SAS Environment”, Journal of Statistical Software, 46.
WORKU, H.M., and DE ROOIJ, M. (2016), “Properties of Ideal Point Classification Models for Bivariate Binary Data”, Psychometrika (accepted for publication).
ZIEGLER, A. (2011), Generalized Estimating Equations, New York: Springer.
ZIEGLER, A., and ARMINGER, G. (1995), “Analyzing the Employment Status with Panel Data from GSOEP - A Comparison of the MECOSA and the GEE1 Approach for Marginal Models”, Vierteljahreshefte zur Wirtschaftsforschung, 64, 72–80.
ZIEGLER, A., KASTNER, C., and BLETTNER, M. (1998), “The Generalized Estimating Equations: An Annotated Bibliography”, Biometrical Journal, 40(2), 115–139.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Worku, H.M., de Rooij, M. A Multivariate Logistic Distance Model for the Analysis of Multiple Binary Responses. J Classif 35, 124–146 (2018). https://doi.org/10.1007/s00357-018-9251-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-018-9251-4