Skip to main content

Combining eQTL and SNP Annotation Data to Identify Functional Noncoding SNPs in GWAS Trait-Associated Regions

  • Protocol
  • First Online:
eQTL Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2082))

Abstract

We describe a statistical method for prioritizing candidate causal noncoding single nucleotide polymorphisms (SNPs) in regions of the genome that are detected as trait-associated in a population-based genome-wide association study (GWAS). Our method’s key step is to combine, within a naïve Bayes-like framework, three quantities for each SNP: (1) the p-value for the association test between the SNP’s genotype and the trait; (2) the p-value for the SNP’s cis-expression quantitative trait locus (cis-eQTL) association test; and (3) a model-based prediction score for the SNP’s potential to be a regulatory SNP (rSNP). The method is flexible with respect to the source of the model-based rSNP prediction score; we demonstrate the method using scores obtained using the previously published machine-learning-based rSNP prediction method, CERENKOV2. Because it requires only the GWAS trait association test p-value for each SNP and not full genotype information, our method is applicable for GWAS secondary analysis in the common situation where only summary data (and not full genotype data) are readily available. We illustrate how the method works in step-by-step fashion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For a human GWAS, the number n of individuals in the study typically ranges from a few hundred to a few hundred thousand.

  2. 2.

    For a large fraction of triallelic SNPs, the second alternative allele is rare in the population [2].

  3. 3.

    The tissue type can range from a precisely defined primary cell type to fairly coarse-grained complex tissue types, e.g., “muscle,” “heart,” or “adipose tissue.”

  4. 4.

    For simplicity we will assume that they are imputed to the same set of SNPs as the GWAS marker SNPs.

References

  1. Bryzgalov LO, Antontseva EV, Matveeva MY, Shilov AG, Kashina EV, Mordvinov VA, Merkulova TI (2013) Detection of regulatory SNPs in human genome using ChIP-seq ENCODE data. PLoS One 8(10):e78833

    Article  CAS  Google Scholar 

  2. Cao M, Shi J, Wang J, Hong J, Cui B, Ning G (2015) Analysis of human triallelic SNPs by next-generation sequencing. Ann Hum Genet 79(4):275–281

    Article  CAS  Google Scholar 

  3. Chen M, Cho J, Zhao H (2011) Incorporating biological pathways via a Markov random field model in genome-wide association studies. PLOS Genet 7(4):e1001353

    Article  CAS  Google Scholar 

  4. Gao L, Uzun Y, Gao P, He B, Ma X, Wang J, Han S, Tan K (2018) Identifying noncoding risk variants using disease-relevant gene regulatory networks. Nat Commun 9(1):702

    Article  Google Scholar 

  5. GTEx Consortium (2015) Human genomics. The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348(6235):648–660

    Article  Google Scholar 

  6. Gulko B, Hubisz MJ, Gronau I, Siepel A (2015) A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet 47(3):276–283

    Article  CAS  Google Scholar 

  7. Ionita-Laza I, McCallum K, Xu B, Buxbaum JD (2016) A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat Genet 48(2):214–220

    Article  CAS  Google Scholar 

  8. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46(3):310–315

    Article  CAS  Google Scholar 

  9. Krawczak M, Cooper DN (1997) The human gene mutation database. Trends Genet 13(3):121–122

    Article  CAS  Google Scholar 

  10. Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42(D1):D980–D985

    Article  CAS  Google Scholar 

  11. Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL, McCallion AS, Beer MA (2015) A method to predict the impact of regulatory variants from DNA sequence. Nature Genet 47(8):955–961, gkm-SVM

    Article  CAS  Google Scholar 

  12. Leslie R, O’Donnell CJ, Johnson AD (2014) GRASP: analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database. Bioinformatics 30(12):i185–i194

    Article  CAS  Google Scholar 

  13. Li MJ, Wang LY, Xia Z, Sham PC, Wang J (2013) GWAS3D: detecting human regulatory variants by integrative analysis of genome-wide associations, chromosome interactions and histone modifications. Nucleic Acids Res 41(W1):W150–W158

    Article  Google Scholar 

  14. Li MJ, Pan Z, Liu Z, Wu J, Wang P, Zhu Y, Xu F, Xia Z, Sham PC, Kocher JPA, Li M, Liu JS, Wang J (2016) Predicting regulatory variants with composite statistic. Bioinformatics 32(18):2729–2736

    Article  CAS  Google Scholar 

  15. Liu Z, Yao Y, Wei Q, Weeder B, Ramsey SA (2019) Res2s2aM: deep residual network-based model for identifying functional noncoding SNPs in trait-associated regions. In: Liu Z (ed) Proceedings of the 24th Pacific symposium on biocomputing

    Google Scholar 

  16. Macintyre G, Bailey J, Haviv I, Kowalczyk A (2010) is-rSNP: a novel technique for in silico regulatory SNP detection. Bioinformatics 26(18):i524–i530

    Article  CAS  Google Scholar 

  17. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J et al (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337(6099):1190–1195

    Article  CAS  Google Scholar 

  18. Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED, Prychyna Y, Zhang X, Jones SJM (2006) ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics 22(5):637–640

    Article  CAS  Google Scholar 

  19. Montgomery SB, Griffith OL, Schuetz JM, Brooks-Wilson A, Jones SJM (2007) A survey of genomic properties for the detection of regulatory polymorphisms. PLOS Comput Biol 3(6):e106

    Article  Google Scholar 

  20. Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ (2010) Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLOS Genet 6(4):e1000888

    Article  Google Scholar 

  21. Ongen H, Buil A, Brown AA, Dermitzakis ET, Delaneau O (2016) Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32(10):1479–1485

    Article  CAS  Google Scholar 

  22. Panagiotou OA, Ioannidis JPA, Genome-Wide Significance Project (2012) What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations. Int J Epidemiol 41(1):273–286

    Article  Google Scholar 

  23. Peterson TA, Mort M, Cooper DN, Radivojac P, Kann MG, Mooney SD (2016) Regulatory single-nucleotide variant predictor increases predictive performance of functional regulatory variants. Hum Mutat 37(11):1137–1143

    Article  CAS  Google Scholar 

  24. Quang D, Xie X (2016) DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 44(11):e107

    Article  Google Scholar 

  25. Quang D, Chen Y, Xie X (2015) DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31(5):761–763

    Article  CAS  Google Scholar 

  26. Ritchie GRS, Dunham I, Zeggini E, Flicek P (2014) Functional annotation of noncoding sequence variants. Nat Methods 11(3):294–296

    Article  CAS  Google Scholar 

  27. Riva A (2012) Large-scale computational identification of regulatory SNPs with rSNP-MAPPER. BMC Genet 13(Suppl 4):S7

    Article  CAS  Google Scholar 

  28. Schaid DJ, Chen W, Larson NB (2018) From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet 19(8):491

    Article  CAS  Google Scholar 

  29. Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M (2012) Linking disease associations with regulatory information in the human genome. Genome Res 22(9):1748–1759

    Article  CAS  Google Scholar 

  30. Stranger BE, Stahl EA, Raj T (2011) Progress and promise of genome-wide association studies for human complex trait genetics. Genetics 187(2):367–383

    Article  CAS  Google Scholar 

  31. Torkamani A, Schork NJ (2008) Predicting functional regulatory polymorphisms. Bioinformatics 24(16):1787–1792

    Article  CAS  Google Scholar 

  32. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L, Parkinson H (2014) The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42(D1):D1001–D1006. Accessed in 2016

    Article  Google Scholar 

  33. Xiao R, Scott LJ (2011) Detection of cis-acting regulatory SNPs using allelic expression data. Genetic Epidemiol 35(6):515–525

    Google Scholar 

  34. Xu H, Gregory SG, Hauser ER, Stenger JE, Pericak-Vance MA, Vance JM, Züchner S, Hauser MA (2005) SNPselector: a web tool for selecting SNPs for genetic association studies. Bioinformatics 21(22):4181–4186

    Article  CAS  Google Scholar 

  35. Yao Y, Liu Z, Singh S, Wei Q, Ramsey SA (2017) CERENKOV: computational elucidation of the regulatory noncoding variome. In: Proceedings of the 8th ACM international conference on bioinformatics, computational biology, and health informatics. ACM, New York, pp 79–88

    Google Scholar 

  36. Yao Y, Liu Z, Wei Q, Ramsey SA (2019) CERENKOV2: improved detection of functional noncoding SNPs using data-space geometric features. BMC Bioinform 20:63 https://doi.org/10.1186/s12859-019-2637-4

  37. Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nature Methods 12(10):931–934

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Science Foundation (award numbers 1557605-DMS and 1553728-DBI to S.A.R.), the PhRMA Foundation (Informatics Grant to S.A.R.), and the Oregon State University Division of Health Sciences (Interdisciplinary Research Grant Award to S.A.R.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen A. Ramsey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Ramsey, S.A., Liu, Z., Yao, Y., Weeder, B. (2020). Combining eQTL and SNP Annotation Data to Identify Functional Noncoding SNPs in GWAS Trait-Associated Regions. In: Shi, X. (eds) eQTL Analysis. Methods in Molecular Biology, vol 2082. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0026-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0026-9_6

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0025-2

  • Online ISBN: 978-1-0716-0026-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics