A Powerful Retrospective Multiple Variant Association Test for Quantitative Traits by Borrowing Strength from Complex Genotypic Correlations

Wu, Xiaowei

doi:10.1007/978-3-030-15310-6_4

A Powerful Retrospective Multiple Variant Association Test for Quantitative Traits by Borrowing Strength from Complex Genotypic Correlations

Xiaowei Wu⁸

Chapter
First Online: 09 July 2019

807 Accesses

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

High-throughput sequencing has often been used in pedigree-based studies to identify genetic risk factors associated with complex traits. The genotype data in such studies exhibit complex correlations attributed to both familial relation and linkage disequilibrium. Accounting for these genotypic correlations can improve power for assessing the contribution of multiple genomic loci. However, due to model restrictions, existing multiple variant association testing methods cannot make efficient use of the correlation information appropriately. Recognizing this limitation, we develop PC-ABT, a novel principal-component-based adaptive-weight burden test for gene-based association mapping of quantitative traits. This method uses a retrospective score test to incorporate genotypic correlations, and employs “data-driven” weights to obtain maximized test statistic. In addition, by adjusting the number of principal components that essentially reveals the effective number of tests in the target gene region, PC-ABT is able to reduce the degree of freedom of the null distribution to improve power. Simulation studies show that PC-ABT is generally more powerful than other multiple variant tests that allow related individuals. We illustrate the application of PC-ABT by a gene-based association analysis of systolic blood pressure using data from the NHLBI “Grand Opportunity” Exome Sequencing Project.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Asimit, J., Zeggini, E.: Rare variant association analysis methods for complex traits. Ann. Rev. Genet. 44, 293–308 (2010)
Article Google Scholar
Berthelot, C.C., et al.: Changes in PTGS1 and ALOX12 gene expression in peripheral blood mononuclear cells are associated with changes in arachidonic acid, oxylipins, and oxylipin/fatty acid ratios in response to Omega-3 fatty acid supplementation. PLoS One 10(12), e0144,996 (2015)
Article Google Scholar
Chen, H., Meigs, J.B., Dupuis, J.: Sequence kernel association test for quantitative traits in family samples. Genet. Epidemiol. 37(2), 196–204 (2013)
Article Google Scholar
Cui, J.S., Hopper, J.L., Harrap, S.B.: Antihypertensive treatments obscure familial contributions to blood pressure variation. Hypertension 41(2), 207–210 (2003)
Article Google Scholar
Derkach, A., Lawless, J.F., Sun, L.: Assessment of pooled association tests for rare variants within a unified framework. Stat. Sci. 29(2), 302–321 (2013)
Article MATH Google Scholar
Fang, S., Zhang, S., Sha, Q.: Detecting association of rare variants by testing an optimally weighted combination of variants for quantitative traits in general families. Ann. Hum. Genet. 77(6), 524–534 (2014)
Article Google Scholar
Fuentes, M.: Testing for separability of spatial-temporal covariance functions. J. Stat. Plan. Inference. 136, 447–466 (2006)
Article MathSciNet MATH Google Scholar
Gauderman, W.J., Murcray, C., Gilliland, F., Conti, D.V.: Testing association between disease and multiple SNPs in a candidate gene. Genet. Epidemiol. 31(5), 383–395 (2007)
Article Google Scholar
Han, F., Pan, W.: A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 70(1), 42–54 (2010)
Article MathSciNet Google Scholar
Jakobsdottir, J., McPeek, M.S.: Mastor: Mixed-model association mapping of quantitative traits in samples with related individuals. Am. J. Hum. Genet. 92, 652–666 (2013)
Article Google Scholar
Jiang, D., McPeek, M.S.: Robust rare variant association testing for quantitative traits in samples with related individuals. Genet. Epidemiol. 38(1), 1–20 (2013)
Google Scholar
Ladouceur, M., Dastani, Z., Aulchenko, Y.S., Greenwood, C.M., Richards, J.B.: The empirical power of rare variant association methods: Results from Sanger sequencing in 1998 individuals. PLoS Genet. 8(2), e1002,496 (2012)
Article Google Scholar
Lee, S., Emond, M.J., Bamshad, M.J., Barnes, K.C., Rieder, M.J., Nickerson, D.A., NHLBI GO Exome Sequencing Project-ESP Lung Project Team, Christiani, D.C., Wurfel, M.M., Lin, X.: Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224–237 (2012)
Google Scholar
Lee, S., Wu, M.C., Lin, X.: Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13(4), 762–775 (2013)
Article Google Scholar
Li, Q.H., Lagakos, S.W.: On the relationship between directional and omnibus statistical tests. Scand. J. Stat. 33, 239–246 (2006)
Article MathSciNet MATH Google Scholar
Li, B., Leal, S.M.: Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008)
Article Google Scholar
Li, M.X., Gui, H.S., Kwan, J.S., Sham, P.C.: GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am. J. Hum. Genet. 88, 283–293 (2011)
Article Google Scholar
Lin, D.Y., Tang, Z.Z.: A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 89, 354–367 (2011)
Article Google Scholar
Liu, D.J., Leal, S.M.: A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet. 6, e1001,156 (2010)
Article Google Scholar
Ma, L., Clark, A.G., Keinan, A.: Gene-based testing of interactions in association studies of quantitative traits. PLoS Genet. 9, e1003,321 (2013)
Article Google Scholar
Madsen, B.E., Browning, S.R.: A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000,384 (2009)
Article Google Scholar
Maier, K.G., Ruhle, B., Stein, J.J., Gentile, K.L., Middleton, F.A., Gahtan, V.: Thrombospondin-1 differentially regulates microRNAs in vascular smooth muscle cells. Mol. Cell. Biochem. 412(1–2), 111–117 (2016)
Article Google Scholar
Manolio, T.A.: Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363(2), 166–176 (2010)
Article Google Scholar
McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P., Hirschhorn, J.N.: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9(5), 356–369 (2008)
Article Google Scholar
McPeek, M.S.: BLUP genotype imputation for case control association testing with related individuals and missing data. J. Comp. Biol. 19(6), 756–765 (2012)
Article MathSciNet Google Scholar
McPeek, M.S., Wu, X., Ober, C.: Best linear unbiased allele-frequency estimation in complex pedigrees. Biometrics 60, 359–367 (2004)
Article MathSciNet MATH Google Scholar
Morgenthaler, S., Thilly, W.G.: A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat. Res. 615, 28–56 (2007)
Article Google Scholar
Neale, B.M., Sham, P.C.: The future of association studies: Gene-based analysis and replication. Am. J. Hum. Genet. 75, 353–362 (2004)
Article Google Scholar
Price, A.L., Kryukov, G.V., de Bakker, P.I., Purcell, S.M., Staples, J., Wei, L.J., Sunyaev, S.R.: Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010)
Article Google Scholar
Price, A.L., Zaitlen, N.A., Reich, D., Patterson, N.: New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11(7), 459–463 (2011)
Article Google Scholar
Schaid, D.J., McDonnell, S.K., Sinnwell, J.P., Thibodeau, S.M.: Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genet. Epidemiol. 37(5), 409–418 (2013)
Article Google Scholar
Schifano, E.D., Epstein, M.P., Bielak, L.F., Jhun, M.A., Kardia, S.L., Peyser, P.A., Lin, X.: SNP set association analysis for familial data. Genet. Epidemiol. 36(8), 797–810 (2012)
Google Scholar
Sha, Q., Wang, X., Wang, X., Zhang, S.: Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genet. Epidemiol. 36(6), 561–571 (2012)
Article Google Scholar
Sha, Q., Zhang, S.: A novel test for testing the optimally weighted combination of rare and common variants based on data of parents and affected children. Genet. Epidemiol. 38(2), 135–143 (2014)
Article Google Scholar
Splansky, G.L., et al.: The third generation cohort of the National Heart, Lung, and Blood Institute’s Framingham Heart Study: design, recruitment, and initial examination. Am. J. Epidemiol. 165(11), 1328–1335 (2007)
Article Google Scholar
Srivastava, M.S., von Rosen, T., von Rosen, D.: Models with a Kronecker product covariance structure: estimation and testing. Math. Methods Stat. 17(4), 357–370 (2008)
Article MathSciNet MATH Google Scholar
The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)
Google Scholar
Thornton, T., McPeek, M.S.: Case-control association testing with related individuals: a more powerful quasi-likelihood score test. Am. J. Hum. Genet. 81, 321–337 (2007)
Article Google Scholar
Thornton, T., McPeek, M.S.: ROADTRIPS: Case-control association testing with partially or completely unknown population and pedigree structure. Am. J. Hum. Genet. 86, 172–184 (2010)
Article Google Scholar
Tobin, M.D., Sheehan, N.A., Scurrah, K.J., Burton, P.R.: Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat. Med. 24, 2911–2935 (2005)
Article MathSciNet Google Scholar
Wang, Y., Chen, Y.H., Yang, Q.: Joint rare variant association test of the average and individual effects for sequencing studies. PLoS One 7, e32,485 (2012)
Article Google Scholar
Wang, X., Morris, N.J., Zhu, X., Elston, R.C.: A variance component based multi-marker association test using family and unrelated data. BMC Genet. 14, 17 (2013)
Article Google Scholar
Wang, X., Lee, S., Zhu, X., Redline, S., Lin, X.: GEE-based SNP set association test for continuous and discrete traits in family based association studies. Genet. Epidemiol. 37(8), 778–786 (2014)
Article Google Scholar
Weisinger, G., Limor, R., Marcus-Perlman, Y., Knoll, E., Kohen, F., Schinder, V., Firer, M., Stern, N.: 12S-lipoxygenase protein associates with alpha-actin fibers in human umbilical artery vascular smooth muscle cells. Biochem. Biophys. Res. Commun. 356(3), 554–560 (2007)
Article Google Scholar
Wu, M.C., Kraft, P., Epstein, M.P., Taylor, D.M., Chanock, S.J., Hunter, D.J., Lin, X.: Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet. 86, 929–942 (2010)
Article Google Scholar
Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M., Lin, X.: Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011)
Article Google Scholar
Zhu, Y., Xiong, M.: Family-based association studies for next-generation sequencing. Am. J. Hum. Genet. 90, 1028–1045 (2012)
Article Google Scholar

Download references

Acknowledgements

This research was funded by 4-VA, a collaborative partnership for advancing the Commonwealth of Virginia.

Author information

Authors and Affiliations

Department of Statistics, Virginia Tech, Blacksburg, VA, USA
Xiaowei Wu

Authors

Xiaowei Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaowei Wu .

Editor information

Editors and Affiliations

Data and Statistical Sciences, AbbVie Inc., North Chicago, IL, USA
Lanju Zhang
School of Social Work, University of North Carolina, Chapel Hill, NC, USA
Ding-Geng (Din) Chen
Department of Statistics, Northwestern University, Evanston, IL, USA
Hongmei Jiang
R&D, Janssen Pharmaceuticals, Raritan, NJ, USA
Gang Li
Sanofi US, Bridgewater, NJ, USA
Hui Quan

Appendices

Appendix 1: Description of MASTOR and Theoretical Justification of the Null Distribution of S _ABT

MASTOR (Jakobsdottir and McPeek 2013) is a retrospective, quasi-likelihood score test for testing single-variant association with a quantitative trait in samples with related individuals. Considering a biallelic genetic variant X of interest (an example in the general setting described in Sect. 4.2.1 is to let X = G _j, 1 ≤ j ≤ m), the MASTOR statistic (for complete data) takes the form

$$\displaystyle \begin{aligned} S_{MAS}=\frac{(\boldsymbol{V}^T\boldsymbol{X})^2}{(\boldsymbol{V}^T\boldsymbol{\varPhi}\boldsymbol{V})\widehat{\sigma}_X^2}. \end{aligned}$$

In this expression, $\boldsymbol {V}=\widehat {\boldsymbol {\varSigma }}_0^{-1}(\boldsymbol {Y}-\boldsymbol {Z}\widehat {\boldsymbol {\beta }}_0)$ is the transformed phenotypic residual obtained from the null model Y = Zβ ₀ + 𝜖, 𝜖 ∼ N(0, Σ ₀), where β ₀ represents the coefficient of regressing quantitative trait Y on non-genetic covariates Z, and Σ ₀ is the trait covariance matrix under the null, usually with a variance component form $\sigma _e^2\boldsymbol {I}+\sigma _a^2\boldsymbol {\varPhi }$. The variance of variant X is denoted by $\sigma _X^2$. When Hardy-Weinberg equilibrium is assumed for this variant, $\sigma _X^2$ can be estimated by $\widehat {\sigma }_X^2=\widehat {p}(1-\widehat {p})/2$, where $\widehat {p}=(\boldsymbol {1}^T\boldsymbol {\varPhi }^{-1}\boldsymbol {1})^{-1}\boldsymbol {1}^T\boldsymbol {\varPhi }^{-1}\boldsymbol {X}$ is the best linear unbiased estimator (McPeek et al. 2004) of the allele frequency p of X, and 1 denotes a vector with every element equal to 1.

Now in Sect. 4.2.2, we have obtained the ABT statistic

$$\displaystyle \begin{aligned} S_{ABT}=\frac{\boldsymbol{V}^T\boldsymbol{G}(\widehat{\boldsymbol{D}}\boldsymbol{R}\widehat{\boldsymbol{D}})^{-1}\boldsymbol{G}^T\boldsymbol{V}}{\boldsymbol{V}^T\boldsymbol{\varPhi}\boldsymbol{V}}. \end{aligned}$$

Let $\widetilde {\boldsymbol {G}}=\boldsymbol {G}(\widehat {\boldsymbol {D}}\boldsymbol {R}\widehat {\boldsymbol {D}})^{-1/2}$ be a decorrelated version of the genotype matrix in which the across-column covariance has been transformed to identity, and let $\widetilde {\boldsymbol {G}}_j$ be the jth column of $\widetilde {\boldsymbol {G}}$. By linear algebra,

$$\displaystyle \begin{aligned} S_{ABT}=\sum_{j=1}^m\frac{\left(\boldsymbol{V}^T\widetilde{\boldsymbol{G}}_j\right)^2}{\boldsymbol{V}^T\boldsymbol{\varPhi}\boldsymbol{V}}. \end{aligned}$$

This is essentially the summation of m independent MASTOR statistics (in observing the uncorrelatedness and joint normality of $\boldsymbol {V}^T\widetilde {\boldsymbol {G}}_j$), each formulated from a transformed variant $\widetilde {\boldsymbol {G}}_j$ (note the variance estimate is 1 after transformation). Hence S _ABT follows $\chi _m^2$ distribution under the null hypothesis.

Appendix 2: Additional Simulation Results Show That the Data-Driven Weights W ^∗ Is Adaptive to the Direction of True Genetic Effects

In order to understand how the data-driven weights W ^∗ (defined in Eq. (4.5) of the main text) help gain power in association testing, we compare the signs of W ^∗ to those of the genetic effects γ using the simulated data sets in the power analysis. Figure 4.5, Panels a–d, present boxplots of the weights W ^∗ based on 5000 simulated data replicates in Scenario S2 with genetic effect Setting III, for LD Configurations C1–C4, respectively. We note that, in this setting, the first 30% components of γ are set to be positive, the next 30% are negative, and the remaining 40% are zeros. The boxplots clearly demonstrates that on average, the weights W ^∗ is able to track the direction of true genetic effects, thus result in stronger association on the weighted sum genetic score.

Appendix 3: Additional Simulation Results Show the Relation Between the ABT Statistic and the famSKAT Statistic

We show in Fig. 4.6, Panels a–d, the scatter plots of the numerator of the ABT statistic vs. the famSKAT statistic based on 5000 simulated data replicates in Scenario S3 with genetic effect Setting II, for LD Configurations C1–C4, respectively. We observe that, when the LD correlation is negligible (Panel a), the numerator of the ABT statistic behaves similarly as the famSKAT statistic because in Eq. (4.6) of the main text, $(\widehat {\boldsymbol {D}}\boldsymbol {R}\widehat {\boldsymbol {D}})^{-1}$ is equivalent to the Madsen-Browning weights used in calculating the famSKAT statistic. As the LD correlation increases (Panels b, c, and d), the two statistics become less and less consistent because in calculating the famSKAT statistic, the Madsen-Browning weights only depend on individual variants, whereas in calculating the ABT statistic, the weight of an individual variant statistic is also affected by other variants on linked sites, as seen from the weight matrix $(\widehat {\boldsymbol {D}}\boldsymbol {R}\widehat {\boldsymbol {D}})^{-1}$ in Eq. (4.6) of the main text.

Appendix 4: Additional Simulation Results to Validate the Asymptotic Null Distribution of S _PC-ABT via Permutation Based Approach

We perform 1000 permutations to the simulated data under Scenario S1 (unrelated individuals and common variants) and configuration C3 (strong LD with η = 0.7). Figure 4.7 shows the asymptotic null distributions of S _PC-ABT for the number of principal components q = 1, 25, and 50, together with the corresponding empirical CDFs obtained via permutation. Note that two different asymptotic distributions are shown in this figure, one is $\chi _q^2$, the other is a mixture of $\chi _1^2$ distribution, obtained by applying adaptive weights W ^# in the famSKAT method. In Fig. 4.8, panels a, b, and c, we compare in log scale the empirical p-values via permutation based approach against the p-values from the asymptotic distribution (mixture of $\chi _1^2$) for the number of principal components q = 1, 25, and 50, respectively. Panel d of Fig. 4.8 further reports the correlation between −log₁₀(empirical p-values via permutation) and −log₁₀(p-values based on the asymptotic distribution) for the number of principal components q = 1, 2, ⋯ , 50.

Appendix 5: Additional Simulation Results for Type I Error Evaluation

We provide additional simulation results for type I error evaluation. Table 4.4 lists the empirical type I error rates of five testing methods: FBT, famSKAT, ABT, MONSTER, and PC-ABT for the combinations of four scenarios (S1, S2, S3, and S4) and four LD configurations (C1, C2, C3, and C4), based on 20,000 simulated data replicates. Figures 4.9, 4.10, 4.11, and 4.12 show the Q-Q plots of the PC-ABT p-values under the null hypothesis for Scenarios S1, S2, S3, and S4, respectively. The number of principal components is chosen to guarantee that the total percent variance explained (PVE) >90%.

Table 4.4 Empirical type I error of five testing methods

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wu, X. (2019). A Powerful Retrospective Multiple Variant Association Test for Quantitative Traits by Borrowing Strength from Complex Genotypic Correlations. In: Zhang, L., Chen, DG., Jiang, H., Li, G., Quan, H. (eds) Contemporary Biostatistics with Biopharmaceutical Applications. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-15310-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-15310-6_4
Published: 09 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15309-0
Online ISBN: 978-3-030-15310-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics