Skip to main content

A Powerful Retrospective Multiple Variant Association Test for Quantitative Traits by Borrowing Strength from Complex Genotypic Correlations

  • Chapter
  • First Online:
  • 807 Accesses

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

High-throughput sequencing has often been used in pedigree-based studies to identify genetic risk factors associated with complex traits. The genotype data in such studies exhibit complex correlations attributed to both familial relation and linkage disequilibrium. Accounting for these genotypic correlations can improve power for assessing the contribution of multiple genomic loci. However, due to model restrictions, existing multiple variant association testing methods cannot make efficient use of the correlation information appropriately. Recognizing this limitation, we develop PC-ABT, a novel principal-component-based adaptive-weight burden test for gene-based association mapping of quantitative traits. This method uses a retrospective score test to incorporate genotypic correlations, and employs “data-driven” weights to obtain maximized test statistic. In addition, by adjusting the number of principal components that essentially reveals the effective number of tests in the target gene region, PC-ABT is able to reduce the degree of freedom of the null distribution to improve power. Simulation studies show that PC-ABT is generally more powerful than other multiple variant tests that allow related individuals. We illustrate the application of PC-ABT by a gene-based association analysis of systolic blood pressure using data from the NHLBI “Grand Opportunity” Exome Sequencing Project.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Asimit, J., Zeggini, E.: Rare variant association analysis methods for complex traits. Ann. Rev. Genet. 44, 293–308 (2010)

    Article  Google Scholar 

  • Berthelot, C.C., et al.: Changes in PTGS1 and ALOX12 gene expression in peripheral blood mononuclear cells are associated with changes in arachidonic acid, oxylipins, and oxylipin/fatty acid ratios in response to Omega-3 fatty acid supplementation. PLoS One 10(12), e0144,996 (2015)

    Article  Google Scholar 

  • Chen, H., Meigs, J.B., Dupuis, J.: Sequence kernel association test for quantitative traits in family samples. Genet. Epidemiol. 37(2), 196–204 (2013)

    Article  Google Scholar 

  • Cui, J.S., Hopper, J.L., Harrap, S.B.: Antihypertensive treatments obscure familial contributions to blood pressure variation. Hypertension 41(2), 207–210 (2003)

    Article  Google Scholar 

  • Derkach, A., Lawless, J.F., Sun, L.: Assessment of pooled association tests for rare variants within a unified framework. Stat. Sci. 29(2), 302–321 (2013)

    Article  MATH  Google Scholar 

  • Fang, S., Zhang, S., Sha, Q.: Detecting association of rare variants by testing an optimally weighted combination of variants for quantitative traits in general families. Ann. Hum. Genet. 77(6), 524–534 (2014)

    Article  Google Scholar 

  • Fuentes, M.: Testing for separability of spatial-temporal covariance functions. J. Stat. Plan. Inference. 136, 447–466 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Gauderman, W.J., Murcray, C., Gilliland, F., Conti, D.V.: Testing association between disease and multiple SNPs in a candidate gene. Genet. Epidemiol. 31(5), 383–395 (2007)

    Article  Google Scholar 

  • Han, F., Pan, W.: A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 70(1), 42–54 (2010)

    Article  MathSciNet  Google Scholar 

  • Jakobsdottir, J., McPeek, M.S.: Mastor: Mixed-model association mapping of quantitative traits in samples with related individuals. Am. J. Hum. Genet. 92, 652–666 (2013)

    Article  Google Scholar 

  • Jiang, D., McPeek, M.S.: Robust rare variant association testing for quantitative traits in samples with related individuals. Genet. Epidemiol. 38(1), 1–20 (2013)

    Google Scholar 

  • Ladouceur, M., Dastani, Z., Aulchenko, Y.S., Greenwood, C.M., Richards, J.B.: The empirical power of rare variant association methods: Results from Sanger sequencing in 1998 individuals. PLoS Genet. 8(2), e1002,496 (2012)

    Article  Google Scholar 

  • Lee, S., Emond, M.J., Bamshad, M.J., Barnes, K.C., Rieder, M.J., Nickerson, D.A., NHLBI GO Exome Sequencing Project-ESP Lung Project Team, Christiani, D.C., Wurfel, M.M., Lin, X.: Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224–237 (2012)

    Google Scholar 

  • Lee, S., Wu, M.C., Lin, X.: Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13(4), 762–775 (2013)

    Article  Google Scholar 

  • Li, Q.H., Lagakos, S.W.: On the relationship between directional and omnibus statistical tests. Scand. J. Stat. 33, 239–246 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, B., Leal, S.M.: Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008)

    Article  Google Scholar 

  • Li, M.X., Gui, H.S., Kwan, J.S., Sham, P.C.: GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am. J. Hum. Genet. 88, 283–293 (2011)

    Article  Google Scholar 

  • Lin, D.Y., Tang, Z.Z.: A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 89, 354–367 (2011)

    Article  Google Scholar 

  • Liu, D.J., Leal, S.M.: A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet. 6, e1001,156 (2010)

    Article  Google Scholar 

  • Ma, L., Clark, A.G., Keinan, A.: Gene-based testing of interactions in association studies of quantitative traits. PLoS Genet. 9, e1003,321 (2013)

    Article  Google Scholar 

  • Madsen, B.E., Browning, S.R.: A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000,384 (2009)

    Article  Google Scholar 

  • Maier, K.G., Ruhle, B., Stein, J.J., Gentile, K.L., Middleton, F.A., Gahtan, V.: Thrombospondin-1 differentially regulates microRNAs in vascular smooth muscle cells. Mol. Cell. Biochem. 412(1–2), 111–117 (2016)

    Article  Google Scholar 

  • Manolio, T.A.: Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363(2), 166–176 (2010)

    Article  Google Scholar 

  • McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P., Hirschhorn, J.N.: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9(5), 356–369 (2008)

    Article  Google Scholar 

  • McPeek, M.S.: BLUP genotype imputation for case control association testing with related individuals and missing data. J. Comp. Biol. 19(6), 756–765 (2012)

    Article  MathSciNet  Google Scholar 

  • McPeek, M.S., Wu, X., Ober, C.: Best linear unbiased allele-frequency estimation in complex pedigrees. Biometrics 60, 359–367 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Morgenthaler, S., Thilly, W.G.: A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat. Res. 615, 28–56 (2007)

    Article  Google Scholar 

  • Neale, B.M., Sham, P.C.: The future of association studies: Gene-based analysis and replication. Am. J. Hum. Genet. 75, 353–362 (2004)

    Article  Google Scholar 

  • Price, A.L., Kryukov, G.V., de Bakker, P.I., Purcell, S.M., Staples, J., Wei, L.J., Sunyaev, S.R.: Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010)

    Article  Google Scholar 

  • Price, A.L., Zaitlen, N.A., Reich, D., Patterson, N.: New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11(7), 459–463 (2011)

    Article  Google Scholar 

  • Schaid, D.J., McDonnell, S.K., Sinnwell, J.P., Thibodeau, S.M.: Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genet. Epidemiol. 37(5), 409–418 (2013)

    Article  Google Scholar 

  • Schifano, E.D., Epstein, M.P., Bielak, L.F., Jhun, M.A., Kardia, S.L., Peyser, P.A., Lin, X.: SNP set association analysis for familial data. Genet. Epidemiol. 36(8), 797–810 (2012)

    Google Scholar 

  • Sha, Q., Wang, X., Wang, X., Zhang, S.: Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genet. Epidemiol. 36(6), 561–571 (2012)

    Article  Google Scholar 

  • Sha, Q., Zhang, S.: A novel test for testing the optimally weighted combination of rare and common variants based on data of parents and affected children. Genet. Epidemiol. 38(2), 135–143 (2014)

    Article  Google Scholar 

  • Splansky, G.L., et al.: The third generation cohort of the National Heart, Lung, and Blood Institute’s Framingham Heart Study: design, recruitment, and initial examination. Am. J. Epidemiol. 165(11), 1328–1335 (2007)

    Article  Google Scholar 

  • Srivastava, M.S., von Rosen, T., von Rosen, D.: Models with a Kronecker product covariance structure: estimation and testing. Math. Methods Stat. 17(4), 357–370 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)

    Google Scholar 

  • Thornton, T., McPeek, M.S.: Case-control association testing with related individuals: a more powerful quasi-likelihood score test. Am. J. Hum. Genet. 81, 321–337 (2007)

    Article  Google Scholar 

  • Thornton, T., McPeek, M.S.: ROADTRIPS: Case-control association testing with partially or completely unknown population and pedigree structure. Am. J. Hum. Genet. 86, 172–184 (2010)

    Article  Google Scholar 

  • Tobin, M.D., Sheehan, N.A., Scurrah, K.J., Burton, P.R.: Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat. Med. 24, 2911–2935 (2005)

    Article  MathSciNet  Google Scholar 

  • Wang, Y., Chen, Y.H., Yang, Q.: Joint rare variant association test of the average and individual effects for sequencing studies. PLoS One 7, e32,485 (2012)

    Article  Google Scholar 

  • Wang, X., Morris, N.J., Zhu, X., Elston, R.C.: A variance component based multi-marker association test using family and unrelated data. BMC Genet. 14, 17 (2013)

    Article  Google Scholar 

  • Wang, X., Lee, S., Zhu, X., Redline, S., Lin, X.: GEE-based SNP set association test for continuous and discrete traits in family based association studies. Genet. Epidemiol. 37(8), 778–786 (2014)

    Article  Google Scholar 

  • Weisinger, G., Limor, R., Marcus-Perlman, Y., Knoll, E., Kohen, F., Schinder, V., Firer, M., Stern, N.: 12S-lipoxygenase protein associates with alpha-actin fibers in human umbilical artery vascular smooth muscle cells. Biochem. Biophys. Res. Commun. 356(3), 554–560 (2007)

    Article  Google Scholar 

  • Wu, M.C., Kraft, P., Epstein, M.P., Taylor, D.M., Chanock, S.J., Hunter, D.J., Lin, X.: Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet. 86, 929–942 (2010)

    Article  Google Scholar 

  • Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M., Lin, X.: Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011)

    Article  Google Scholar 

  • Zhu, Y., Xiong, M.: Family-based association studies for next-generation sequencing. Am. J. Hum. Genet. 90, 1028–1045 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

This research was funded by 4-VA, a collaborative partnership for advancing the Commonwealth of Virginia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaowei Wu .

Editor information

Editors and Affiliations

Appendices

Appendix 1: Description of MASTOR and Theoretical Justification of the Null Distribution of S ABT

MASTOR (Jakobsdottir and McPeek 2013) is a retrospective, quasi-likelihood score test for testing single-variant association with a quantitative trait in samples with related individuals. Considering a biallelic genetic variant X of interest (an example in the general setting described in Sect. 4.2.1 is to let X = G j, 1 ≤ j ≤ m), the MASTOR statistic (for complete data) takes the form

$$\displaystyle \begin{aligned} S_{MAS}=\frac{(\boldsymbol{V}^T\boldsymbol{X})^2}{(\boldsymbol{V}^T\boldsymbol{\varPhi}\boldsymbol{V})\widehat{\sigma}_X^2}. \end{aligned}$$

In this expression, \(\boldsymbol {V}=\widehat {\boldsymbol {\varSigma }}_0^{-1}(\boldsymbol {Y}-\boldsymbol {Z}\widehat {\boldsymbol {\beta }}_0)\) is the transformed phenotypic residual obtained from the null model Y  =  0 + 𝜖, 𝜖 ∼ N(0, Σ 0), where β 0 represents the coefficient of regressing quantitative trait Y on non-genetic covariates Z, and Σ 0 is the trait covariance matrix under the null, usually with a variance component form \(\sigma _e^2\boldsymbol {I}+\sigma _a^2\boldsymbol {\varPhi }\). The variance of variant X is denoted by \(\sigma _X^2\). When Hardy-Weinberg equilibrium is assumed for this variant, \(\sigma _X^2\) can be estimated by \(\widehat {\sigma }_X^2=\widehat {p}(1-\widehat {p})/2\), where \(\widehat {p}=(\boldsymbol {1}^T\boldsymbol {\varPhi }^{-1}\boldsymbol {1})^{-1}\boldsymbol {1}^T\boldsymbol {\varPhi }^{-1}\boldsymbol {X}\) is the best linear unbiased estimator (McPeek et al. 2004) of the allele frequency p of X, and 1 denotes a vector with every element equal to 1.

Now in Sect. 4.2.2, we have obtained the ABT statistic

$$\displaystyle \begin{aligned} S_{ABT}=\frac{\boldsymbol{V}^T\boldsymbol{G}(\widehat{\boldsymbol{D}}\boldsymbol{R}\widehat{\boldsymbol{D}})^{-1}\boldsymbol{G}^T\boldsymbol{V}}{\boldsymbol{V}^T\boldsymbol{\varPhi}\boldsymbol{V}}. \end{aligned}$$

Let \(\widetilde {\boldsymbol {G}}=\boldsymbol {G}(\widehat {\boldsymbol {D}}\boldsymbol {R}\widehat {\boldsymbol {D}})^{-1/2}\) be a decorrelated version of the genotype matrix in which the across-column covariance has been transformed to identity, and let \(\widetilde {\boldsymbol {G}}_j\) be the jth column of \(\widetilde {\boldsymbol {G}}\). By linear algebra,

$$\displaystyle \begin{aligned} S_{ABT}=\sum_{j=1}^m\frac{\left(\boldsymbol{V}^T\widetilde{\boldsymbol{G}}_j\right)^2}{\boldsymbol{V}^T\boldsymbol{\varPhi}\boldsymbol{V}}. \end{aligned}$$

This is essentially the summation of m independent MASTOR statistics (in observing the uncorrelatedness and joint normality of \(\boldsymbol {V}^T\widetilde {\boldsymbol {G}}_j\)), each formulated from a transformed variant \(\widetilde {\boldsymbol {G}}_j\) (note the variance estimate is 1 after transformation). Hence S ABT follows \(\chi _m^2\) distribution under the null hypothesis.

Appendix 2: Additional Simulation Results Show That the Data-Driven Weights W Is Adaptive to the Direction of True Genetic Effects

In order to understand how the data-driven weights W (defined in Eq. (4.5) of the main text) help gain power in association testing, we compare the signs of W to those of the genetic effects γ using the simulated data sets in the power analysis. Figure 4.5, Panels a–d, present boxplots of the weights W based on 5000 simulated data replicates in Scenario S2 with genetic effect Setting III, for LD Configurations C1–C4, respectively. We note that, in this setting, the first 30% components of γ are set to be positive, the next 30% are negative, and the remaining 40% are zeros. The boxplots clearly demonstrates that on average, the weights W is able to track the direction of true genetic effects, thus result in stronger association on the weighted sum genetic score.

Fig. 4.5
figure 5

Boxplot of W based on 5000 simulated data replicates in Scenario S2 with genetic effect Setting III. The adaptive weights of risk, protective, and neutral variants are marked with red, green, and white color, respectively. Panel a: Configuration C1; Panel b: Configuration C2; Panel c: Configuration C3; Panel d: Configuration C4

Appendix 3: Additional Simulation Results Show the Relation Between the ABT Statistic and the famSKAT Statistic

We show in Fig. 4.6, Panels a–d, the scatter plots of the numerator of the ABT statistic vs. the famSKAT statistic based on 5000 simulated data replicates in Scenario S3 with genetic effect Setting II, for LD Configurations C1–C4, respectively. We observe that, when the LD correlation is negligible (Panel a), the numerator of the ABT statistic behaves similarly as the famSKAT statistic because in Eq. (4.6) of the main text, \((\widehat {\boldsymbol {D}}\boldsymbol {R}\widehat {\boldsymbol {D}})^{-1}\) is equivalent to the Madsen-Browning weights used in calculating the famSKAT statistic. As the LD correlation increases (Panels b, c, and d), the two statistics become less and less consistent because in calculating the famSKAT statistic, the Madsen-Browning weights only depend on individual variants, whereas in calculating the ABT statistic, the weight of an individual variant statistic is also affected by other variants on linked sites, as seen from the weight matrix \((\widehat {\boldsymbol {D}}\boldsymbol {R}\widehat {\boldsymbol {D}})^{-1}\) in Eq. (4.6) of the main text.

Fig. 4.6
figure 6

Comparison between S famSKAT and the numerator of S ABT based on 5000 simulated data replicates in Scenario S3 with genetic effect Setting II. Panel a: Configuration C1; Panel b: Configuration C2; Panel c: Configuration C3; Panel d: Configuration C4

Appendix 4: Additional Simulation Results to Validate the Asymptotic Null Distribution of S PC-ABT via Permutation Based Approach

We perform 1000 permutations to the simulated data under Scenario S1 (unrelated individuals and common variants) and configuration C3 (strong LD with η = 0.7). Figure 4.7 shows the asymptotic null distributions of S PC-ABT for the number of principal components q = 1, 25, and 50, together with the corresponding empirical CDFs obtained via permutation. Note that two different asymptotic distributions are shown in this figure, one is \(\chi _q^2\), the other is a mixture of \(\chi _1^2\) distribution, obtained by applying adaptive weights W # in the famSKAT method. In Fig. 4.8, panels a, b, and c, we compare in log scale the empirical p-values via permutation based approach against the p-values from the asymptotic distribution (mixture of \(\chi _1^2\)) for the number of principal components q = 1, 25, and 50, respectively. Panel d of Fig. 4.8 further reports the correlation between −log10(empirical p-values via permutation) and −log10(p-values based on the asymptotic distribution) for the number of principal components q = 1, 2, ⋯ , 50.

Fig. 4.7
figure 7

Asymptotic null distribution and permutation based empirical distribution of S PC-ABT (for the number of principal components q = 1, 25, and 50) under Scenario S1 and LD configuration C3

Fig. 4.8
figure 8

Comparing empirical p-values via permutation and p-values based on the asymptotic distribution. Panel a: scatter plot in log scale for the number of principal components q = 1; Panel b: for q = 25; Panel c: for q = 50; Panel d: correlation between −log10(empirical p-values via permutation) and −log10(p-values based on the asymptotic distribution) for q = 1, 2, ⋯ , 50

Appendix 5: Additional Simulation Results for Type I Error Evaluation

We provide additional simulation results for type I error evaluation. Table 4.4 lists the empirical type I error rates of five testing methods: FBT, famSKAT, ABT, MONSTER, and PC-ABT for the combinations of four scenarios (S1, S2, S3, and S4) and four LD configurations (C1, C2, C3, and C4), based on 20,000 simulated data replicates. Figures 4.9, 4.10, 4.11, and 4.12 show the Q-Q plots of the PC-ABT p-values under the null hypothesis for Scenarios S1, S2, S3, and S4, respectively. The number of principal components is chosen to guarantee that the total percent variance explained (PVE) >90%.

Fig. 4.9
figure 9

Q-Q plots of the PC-ABT p-values under the null hypothesis for Scenario S1 based on 20,000 simulated data replicates. In each simulation, the number of principal components is chosen such that PVE >90%

Fig. 4.10
figure 10

Q-Q plots of the PC-ABT p-values under the null hypothesis for Scenario S2 based on 20,000 simulated data replicates. In each simulation, the number of principal components is chosen such that PVE >90%

Fig. 4.11
figure 11

Q-Q plots of the PC-ABT p-values under the null hypothesis for Scenario S3 based on 20,000 simulated data replicates. In each simulation, the number of principal components is chosen such that PVE >90%

Fig. 4.12
figure 12

Q-Q plots of the PC-ABT p-values under the null hypothesis for Scenario S4 based on 20,000 simulated data replicates. In each simulation, the number of principal components is chosen such that PVE >90%

Table 4.4 Empirical type I error of five testing methods

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wu, X. (2019). A Powerful Retrospective Multiple Variant Association Test for Quantitative Traits by Borrowing Strength from Complex Genotypic Correlations. In: Zhang, L., Chen, DG., Jiang, H., Li, G., Quan, H. (eds) Contemporary Biostatistics with Biopharmaceutical Applications. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-15310-6_4

Download citation

Publish with us

Policies and ethics