Increasing Power in Association Studies by Using Linkage Disequilibrium Structure and Molecular Function as Prior Information
The availability of various types of genomic data provide an opportunity to incorporate this data as prior information in genetic association studies. This information includes knowledge of linkage disequilibrium structure as well as knowledge of which regions are likely to be involved in disease. In this paper, we present an approach for incorporating this information by revisiting how we perform multiple hypothesis correction. In a traditional association study, in order to correct for multiple hypothesis testing, the significance threshold at each marker, t, is set to control the total false positive rate. In our framework, we vary the threshold at each marker t i and use these thresholds to incorporate prior information. We present a novel Multi-threshold Association Study Analysis (MASA) method for setting these threshold to maximize the statistical power of the study in the context of the additional information. Intuitively markers which are correlated with many polymorphisms will have higher thresholds than other markers. The simplest approach for encoding prior information is through assuming a causal probability distribution. In this setting, we assume that the causal polymorphism is chosen from this distribution and only one polymorphism is causal. We refer to the probability that the polymorphism i is causal as its causal probability, c i . Given the causal probabilities, using the approach presented in this paper, we can numerically solve for the marker thresholds which maximize power. By taking advantage of this information, we show how our multi-threshold framework can significantly increase the power of association studies while still controlling the overall false positive rate, α, of the study as long as ∑ t i = α. We present a numerical procedure for solving for thresholds that maximize association study power using prior information. We present the results of benchmark simulation experiments using the HapMap data which demonstrate a significant increase in association study power under this framework.
Our optimization algorithm is very efficient and we can obtain thresholds for whole genome associations in minutes. We also present an efficient permutation procedure for correctly adjusting the false positive rate for correlated markers and show how the this approach increases computational time only slightly relative to performing permutation tests for traditional association studies.
We provide a webserver for performing association studies using this method at http://masa.cs.ucla.edu/. On the website, we provide thresholds optimized for the the Affymetrix 500k and Illumina HumanHap 550 chips.