The distance-based regression model has many applications in analysis of multivariate response regression in various fields, such as ecology, genomics, genetics, human microbiomics, and neuroimaging. It yields a pseudo F test statistic that assesses the relation between the distance (dissimilarity) of the subjects and the predictors of interest. Despite its popularity in recent decades, the statistical properties of the pseudo F test statistic have not been revealed to our knowledge. This study derives the asymptotic properties of the pseudo F test statistic using spectral decomposition under the matrix normal assumption, when the utilized dissimilarity measure is the Euclidean or Mahalanobis distance. The pseudo F test statistic with the Euclidean distance has the same distribution as the quotient of two Chi-squared-type mixtures. The denominator and numerator of the quotient are approximated using a random variable of the form \(\xi\chi_d^2+\eta\) and the approximate error bound is given. The pseudo F test statistic with the Mahalanobis distance follows an F distribution. In simulation studies, the approximated distribution well matched the “exact” distribution obtained by the permutation procedure. The obtained distribution was further validated on H1N1 influenza data, aging human brain data, and embryonic imprint data.
distance-based regression Euclidean pseudo F test statistic Mahalanobis
This is a preview of subscription content, log in to check access.
This work was supported by National Natural Science Foundation of China (Grant No. 11722113). The authors thank the anonymous reviewers for their insightful comments, which improve the manuscript substantially.
Chen J, Bittinger K, Charlson E S, et al. Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics, 2012, 28: 2106–2113CrossRefGoogle Scholar
Du S, Lv J. Minimal Euclidean distance chart based on support vector regression for monitoring mean shifts of auto-correlated processes. Internat J Product Econom, 2013, 141: 377–387CrossRefGoogle Scholar
Li Q, Wacholder S, Hunter D J, et al. Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment. Genet Epidemiol, 2009, 33: 432–441CrossRefGoogle Scholar
Lu T, Pan Y, Kao S, et al. Gene regulation and DNA damage in the aging human brain. Nature, 2004, 429: 883–891CrossRefGoogle Scholar
McArdle B H, Anderson M J. Fitting multivariate models to community data: A comment on distance-based redun-dancy analysis. Ecology, 2001, 82: 290–297CrossRefGoogle Scholar
Nievergelt C M, Libiger O, Schork N J. Generalized analysis of molecular variance. PLoS Genet, 2007, 3: 467–478CrossRefGoogle Scholar
Pan W. Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing. Genet Epidemiol, 2011, 35: 211–216CrossRefGoogle Scholar
Shapira S D, Irit G V, Shum B O V, et al. A physical and regulatory map of host-in uenza interactions reveals pathways in H1N1 infection. Cell, 2009, 139: 1255–1267CrossRefGoogle Scholar
Shehzad Z, Kelly C, Reiss P T, et al. A multivariate distance-based analytic framework for connectome-wide association studies. Neuroimage, 2014, 93: 74–94CrossRefGoogle Scholar
Wessel J, Schork N J. Generalized genomic distance-based regression methodology for multilocus association analysis. Amer J Hum Genet, 2006, 79: 792–806CrossRefGoogle Scholar
Xu Y, Guo X, Sun J, et al. Snowball: Resampling combined with distance-based regression to discover transcriptional consequences of a driver mutation. Bioinformatics, 2015, 31: 84–93CrossRefGoogle Scholar
Zapala M A, Hovatta I, Ellison J A, et al. Adult mouse brain gene expression patterns bear an embryologic imprint. Proc Natl Acad Sci USA, 2005, 102: 10357–10362CrossRefGoogle Scholar
Zapala M A, Schork N J. Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables. Proc Natl Acad Sci USA, 2006, 103: 19430–19435CrossRefGoogle Scholar
Zhang J. Approximate and asymptituc distributions of Chi-squared-type mixtures with applications. J Amer Statist Assoc, 2005, 100: 273–285MathSciNetCrossRefGoogle Scholar