Abstract
High-throughput sequencing has often been used in pedigree-based studies to identify genetic risk factors associated with complex traits. The genotype data in such studies exhibit complex correlations attributed to both familial relation and linkage disequilibrium. Accounting for these genotypic correlations can improve power for assessing the contribution of multiple genomic loci. However, due to model restrictions, existing multiple variant association testing methods cannot make efficient use of the correlation information appropriately. Recognizing this limitation, we develop PC-ABT, a novel principal-component-based adaptive-weight burden test for gene-based association mapping of quantitative traits. This method uses a retrospective score test to incorporate genotypic correlations, and employs “data-driven” weights to obtain maximized test statistic. In addition, by adjusting the number of principal components that essentially reveals the effective number of tests in the target gene region, PC-ABT is able to reduce the degree of freedom of the null distribution to improve power. Simulation studies show that PC-ABT is generally more powerful than other multiple variant tests that allow related individuals. We illustrate the application of PC-ABT by a gene-based association analysis of systolic blood pressure using data from the NHLBI “Grand Opportunity” Exome Sequencing Project.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Asimit, J., Zeggini, E.: Rare variant association analysis methods for complex traits. Ann. Rev. Genet. 44, 293–308 (2010)
Berthelot, C.C., et al.: Changes in PTGS1 and ALOX12 gene expression in peripheral blood mononuclear cells are associated with changes in arachidonic acid, oxylipins, and oxylipin/fatty acid ratios in response to Omega-3 fatty acid supplementation. PLoS One 10(12), e0144,996 (2015)
Chen, H., Meigs, J.B., Dupuis, J.: Sequence kernel association test for quantitative traits in family samples. Genet. Epidemiol. 37(2), 196–204 (2013)
Cui, J.S., Hopper, J.L., Harrap, S.B.: Antihypertensive treatments obscure familial contributions to blood pressure variation. Hypertension 41(2), 207–210 (2003)
Derkach, A., Lawless, J.F., Sun, L.: Assessment of pooled association tests for rare variants within a unified framework. Stat. Sci. 29(2), 302–321 (2013)
Fang, S., Zhang, S., Sha, Q.: Detecting association of rare variants by testing an optimally weighted combination of variants for quantitative traits in general families. Ann. Hum. Genet. 77(6), 524–534 (2014)
Fuentes, M.: Testing for separability of spatial-temporal covariance functions. J. Stat. Plan. Inference. 136, 447–466 (2006)
Gauderman, W.J., Murcray, C., Gilliland, F., Conti, D.V.: Testing association between disease and multiple SNPs in a candidate gene. Genet. Epidemiol. 31(5), 383–395 (2007)
Han, F., Pan, W.: A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 70(1), 42–54 (2010)
Jakobsdottir, J., McPeek, M.S.: Mastor: Mixed-model association mapping of quantitative traits in samples with related individuals. Am. J. Hum. Genet. 92, 652–666 (2013)
Jiang, D., McPeek, M.S.: Robust rare variant association testing for quantitative traits in samples with related individuals. Genet. Epidemiol. 38(1), 1–20 (2013)
Ladouceur, M., Dastani, Z., Aulchenko, Y.S., Greenwood, C.M., Richards, J.B.: The empirical power of rare variant association methods: Results from Sanger sequencing in 1998 individuals. PLoS Genet. 8(2), e1002,496 (2012)
Lee, S., Emond, M.J., Bamshad, M.J., Barnes, K.C., Rieder, M.J., Nickerson, D.A., NHLBI GO Exome Sequencing Project-ESP Lung Project Team, Christiani, D.C., Wurfel, M.M., Lin, X.: Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224–237 (2012)
Lee, S., Wu, M.C., Lin, X.: Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13(4), 762–775 (2013)
Li, Q.H., Lagakos, S.W.: On the relationship between directional and omnibus statistical tests. Scand. J. Stat. 33, 239–246 (2006)
Li, B., Leal, S.M.: Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008)
Li, M.X., Gui, H.S., Kwan, J.S., Sham, P.C.: GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am. J. Hum. Genet. 88, 283–293 (2011)
Lin, D.Y., Tang, Z.Z.: A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 89, 354–367 (2011)
Liu, D.J., Leal, S.M.: A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet. 6, e1001,156 (2010)
Ma, L., Clark, A.G., Keinan, A.: Gene-based testing of interactions in association studies of quantitative traits. PLoS Genet. 9, e1003,321 (2013)
Madsen, B.E., Browning, S.R.: A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000,384 (2009)
Maier, K.G., Ruhle, B., Stein, J.J., Gentile, K.L., Middleton, F.A., Gahtan, V.: Thrombospondin-1 differentially regulates microRNAs in vascular smooth muscle cells. Mol. Cell. Biochem. 412(1–2), 111–117 (2016)
Manolio, T.A.: Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363(2), 166–176 (2010)
McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P., Hirschhorn, J.N.: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9(5), 356–369 (2008)
McPeek, M.S.: BLUP genotype imputation for case control association testing with related individuals and missing data. J. Comp. Biol. 19(6), 756–765 (2012)
McPeek, M.S., Wu, X., Ober, C.: Best linear unbiased allele-frequency estimation in complex pedigrees. Biometrics 60, 359–367 (2004)
Morgenthaler, S., Thilly, W.G.: A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat. Res. 615, 28–56 (2007)
Neale, B.M., Sham, P.C.: The future of association studies: Gene-based analysis and replication. Am. J. Hum. Genet. 75, 353–362 (2004)
Price, A.L., Kryukov, G.V., de Bakker, P.I., Purcell, S.M., Staples, J., Wei, L.J., Sunyaev, S.R.: Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010)
Price, A.L., Zaitlen, N.A., Reich, D., Patterson, N.: New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11(7), 459–463 (2011)
Schaid, D.J., McDonnell, S.K., Sinnwell, J.P., Thibodeau, S.M.: Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genet. Epidemiol. 37(5), 409–418 (2013)
Schifano, E.D., Epstein, M.P., Bielak, L.F., Jhun, M.A., Kardia, S.L., Peyser, P.A., Lin, X.: SNP set association analysis for familial data. Genet. Epidemiol. 36(8), 797–810 (2012)
Sha, Q., Wang, X., Wang, X., Zhang, S.: Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genet. Epidemiol. 36(6), 561–571 (2012)
Sha, Q., Zhang, S.: A novel test for testing the optimally weighted combination of rare and common variants based on data of parents and affected children. Genet. Epidemiol. 38(2), 135–143 (2014)
Splansky, G.L., et al.: The third generation cohort of the National Heart, Lung, and Blood Institute’s Framingham Heart Study: design, recruitment, and initial examination. Am. J. Epidemiol. 165(11), 1328–1335 (2007)
Srivastava, M.S., von Rosen, T., von Rosen, D.: Models with a Kronecker product covariance structure: estimation and testing. Math. Methods Stat. 17(4), 357–370 (2008)
The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)
Thornton, T., McPeek, M.S.: Case-control association testing with related individuals: a more powerful quasi-likelihood score test. Am. J. Hum. Genet. 81, 321–337 (2007)
Thornton, T., McPeek, M.S.: ROADTRIPS: Case-control association testing with partially or completely unknown population and pedigree structure. Am. J. Hum. Genet. 86, 172–184 (2010)
Tobin, M.D., Sheehan, N.A., Scurrah, K.J., Burton, P.R.: Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat. Med. 24, 2911–2935 (2005)
Wang, Y., Chen, Y.H., Yang, Q.: Joint rare variant association test of the average and individual effects for sequencing studies. PLoS One 7, e32,485 (2012)
Wang, X., Morris, N.J., Zhu, X., Elston, R.C.: A variance component based multi-marker association test using family and unrelated data. BMC Genet. 14, 17 (2013)
Wang, X., Lee, S., Zhu, X., Redline, S., Lin, X.: GEE-based SNP set association test for continuous and discrete traits in family based association studies. Genet. Epidemiol. 37(8), 778–786 (2014)
Weisinger, G., Limor, R., Marcus-Perlman, Y., Knoll, E., Kohen, F., Schinder, V., Firer, M., Stern, N.: 12S-lipoxygenase protein associates with alpha-actin fibers in human umbilical artery vascular smooth muscle cells. Biochem. Biophys. Res. Commun. 356(3), 554–560 (2007)
Wu, M.C., Kraft, P., Epstein, M.P., Taylor, D.M., Chanock, S.J., Hunter, D.J., Lin, X.: Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet. 86, 929–942 (2010)
Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M., Lin, X.: Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011)
Zhu, Y., Xiong, M.: Family-based association studies for next-generation sequencing. Am. J. Hum. Genet. 90, 1028–1045 (2012)
Acknowledgements
This research was funded by 4-VA, a collaborative partnership for advancing the Commonwealth of Virginia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: Description of MASTOR and Theoretical Justification of the Null Distribution of S ABT
MASTOR (Jakobsdottir and McPeek 2013) is a retrospective, quasi-likelihood score test for testing single-variant association with a quantitative trait in samples with related individuals. Considering a biallelic genetic variant X of interest (an example in the general setting described in Sect. 4.2.1 is to let X = G j, 1 ≤ j ≤ m), the MASTOR statistic (for complete data) takes the form
In this expression, \(\boldsymbol {V}=\widehat {\boldsymbol {\varSigma }}_0^{-1}(\boldsymbol {Y}-\boldsymbol {Z}\widehat {\boldsymbol {\beta }}_0)\) is the transformed phenotypic residual obtained from the null model Y = Zβ 0 + 𝜖, 𝜖 ∼ N(0, Σ 0), where β 0 represents the coefficient of regressing quantitative trait Y on non-genetic covariates Z, and Σ 0 is the trait covariance matrix under the null, usually with a variance component form \(\sigma _e^2\boldsymbol {I}+\sigma _a^2\boldsymbol {\varPhi }\). The variance of variant X is denoted by \(\sigma _X^2\). When Hardy-Weinberg equilibrium is assumed for this variant, \(\sigma _X^2\) can be estimated by \(\widehat {\sigma }_X^2=\widehat {p}(1-\widehat {p})/2\), where \(\widehat {p}=(\boldsymbol {1}^T\boldsymbol {\varPhi }^{-1}\boldsymbol {1})^{-1}\boldsymbol {1}^T\boldsymbol {\varPhi }^{-1}\boldsymbol {X}\) is the best linear unbiased estimator (McPeek et al. 2004) of the allele frequency p of X, and 1 denotes a vector with every element equal to 1.
Now in Sect. 4.2.2, we have obtained the ABT statistic
Let \(\widetilde {\boldsymbol {G}}=\boldsymbol {G}(\widehat {\boldsymbol {D}}\boldsymbol {R}\widehat {\boldsymbol {D}})^{-1/2}\) be a decorrelated version of the genotype matrix in which the across-column covariance has been transformed to identity, and let \(\widetilde {\boldsymbol {G}}_j\) be the jth column of \(\widetilde {\boldsymbol {G}}\). By linear algebra,
This is essentially the summation of m independent MASTOR statistics (in observing the uncorrelatedness and joint normality of \(\boldsymbol {V}^T\widetilde {\boldsymbol {G}}_j\)), each formulated from a transformed variant \(\widetilde {\boldsymbol {G}}_j\) (note the variance estimate is 1 after transformation). Hence S ABT follows \(\chi _m^2\) distribution under the null hypothesis.
Appendix 2: Additional Simulation Results Show That the Data-Driven Weights W ∗ Is Adaptive to the Direction of True Genetic Effects
In order to understand how the data-driven weights W ∗ (defined in Eq. (4.5) of the main text) help gain power in association testing, we compare the signs of W ∗ to those of the genetic effects γ using the simulated data sets in the power analysis. Figure 4.5, Panels a–d, present boxplots of the weights W ∗ based on 5000 simulated data replicates in Scenario S2 with genetic effect Setting III, for LD Configurations C1–C4, respectively. We note that, in this setting, the first 30% components of γ are set to be positive, the next 30% are negative, and the remaining 40% are zeros. The boxplots clearly demonstrates that on average, the weights W ∗ is able to track the direction of true genetic effects, thus result in stronger association on the weighted sum genetic score.
Appendix 3: Additional Simulation Results Show the Relation Between the ABT Statistic and the famSKAT Statistic
We show in Fig. 4.6, Panels a–d, the scatter plots of the numerator of the ABT statistic vs. the famSKAT statistic based on 5000 simulated data replicates in Scenario S3 with genetic effect Setting II, for LD Configurations C1–C4, respectively. We observe that, when the LD correlation is negligible (Panel a), the numerator of the ABT statistic behaves similarly as the famSKAT statistic because in Eq. (4.6) of the main text, \((\widehat {\boldsymbol {D}}\boldsymbol {R}\widehat {\boldsymbol {D}})^{-1}\) is equivalent to the Madsen-Browning weights used in calculating the famSKAT statistic. As the LD correlation increases (Panels b, c, and d), the two statistics become less and less consistent because in calculating the famSKAT statistic, the Madsen-Browning weights only depend on individual variants, whereas in calculating the ABT statistic, the weight of an individual variant statistic is also affected by other variants on linked sites, as seen from the weight matrix \((\widehat {\boldsymbol {D}}\boldsymbol {R}\widehat {\boldsymbol {D}})^{-1}\) in Eq. (4.6) of the main text.
Appendix 4: Additional Simulation Results to Validate the Asymptotic Null Distribution of S PC-ABT via Permutation Based Approach
We perform 1000 permutations to the simulated data under Scenario S1 (unrelated individuals and common variants) and configuration C3 (strong LD with η = 0.7). Figure 4.7 shows the asymptotic null distributions of S PC-ABT for the number of principal components q = 1, 25, and 50, together with the corresponding empirical CDFs obtained via permutation. Note that two different asymptotic distributions are shown in this figure, one is \(\chi _q^2\), the other is a mixture of \(\chi _1^2\) distribution, obtained by applying adaptive weights W # in the famSKAT method. In Fig. 4.8, panels a, b, and c, we compare in log scale the empirical p-values via permutation based approach against the p-values from the asymptotic distribution (mixture of \(\chi _1^2\)) for the number of principal components q = 1, 25, and 50, respectively. Panel d of Fig. 4.8 further reports the correlation between −log10(empirical p-values via permutation) and −log10(p-values based on the asymptotic distribution) for the number of principal components q = 1, 2, ⋯ , 50.
Appendix 5: Additional Simulation Results for Type I Error Evaluation
We provide additional simulation results for type I error evaluation. Table 4.4 lists the empirical type I error rates of five testing methods: FBT, famSKAT, ABT, MONSTER, and PC-ABT for the combinations of four scenarios (S1, S2, S3, and S4) and four LD configurations (C1, C2, C3, and C4), based on 20,000 simulated data replicates. Figures 4.9, 4.10, 4.11, and 4.12 show the Q-Q plots of the PC-ABT p-values under the null hypothesis for Scenarios S1, S2, S3, and S4, respectively. The number of principal components is chosen to guarantee that the total percent variance explained (PVE) >90%.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Wu, X. (2019). A Powerful Retrospective Multiple Variant Association Test for Quantitative Traits by Borrowing Strength from Complex Genotypic Correlations. In: Zhang, L., Chen, DG., Jiang, H., Li, G., Quan, H. (eds) Contemporary Biostatistics with Biopharmaceutical Applications. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-15310-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-15310-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15309-0
Online ISBN: 978-3-030-15310-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)