Invited Keynote Talk: Set-Level Analyses for Genome-Wide Association Data
High-throughput genotyping platforms allow the investigation of hundreds of thousands of markers at a time, and this has led to a growing number of genome-wide association studies in which the entire human genome is mined for genes involved in etiology of complex traits. This approach for discovery of genetic risk factors has yielded promising results, but most of the analyses have focused on single marker tests. In general, a method of analysis that uses the markers as if they are biologically unrelated throws away all the information contained in the structure of the genome.
In this paper, we propose a method for incorporating structural genomic information by grouping the markers in relevant units, and assigning a measure of significance to these pre-defined sets of markers. The sets can be genes, conserved regions, or groups of genes such as pathways. Using the proposed methods and algorithms, evidence for association between a particular functional unit and a disease status can be obtained not just by the presence of a strong signal from a SNP within it, but also by the combination of several simultaneous weaker signals that are uncorrelated. Note that the method will combine evidence for association from both the genotyped and the untyped markers. The untyped markers are tested using haplotype predictors for their alleles, with the prediction training done in reference databases such as HapMap.
There are several advantages in using this approach. There is an increase in the power of detecting genes associated to disease because moderately strong signals within a gene are combined to obtain a much stronger signal for the gene as a functional unit. The results are easily combined across platforms that use different sets of SNP. Lastly, the results are easy to interpret since the refer to functional regions, and they also provide targets for biological validation.