Introduction

An individual’s risk of developing insulin resistance and type 2 diabetes is determined in part by genetic factors [1]. The transition from normal glucose homeostasis to type 2 diabetes is thought to be due primarily to increasingly dysfunctional beta cells [26]. While estimates of gene contribution to beta cell response vary, there is little argument that discovery of genes accounting for variation in the way beta cells respond to glucose could identify important pathways for type 2 diabetes risk prediction, intervention and treatment.

Candidate gene evaluations and genome-wide linkage scans have been used to search for genes with modest to large effect that contribute to variation in beta cell response; however, these methods have not been highly successful. The genome-wide association scan (GWAS) has become a popular approach for detecting genes with small to modest effect. GWAS studies in type 2 diabetes have been especially productive and had an extraordinary impact on current understanding of genetic susceptibility to type 2 diabetes [713].

The vast majority of GWAS studies for type 2 diabetes have used cases and controls from European or European-derived populations. Although over a dozen genes/regions have been identified with robust statistical significance, it is unclear whether these genes/regions will also be found in populations of non-European origin [14, 15]. Currently, the genes identified for type 2 diabetes from populations of European origin have suggested that most, if not all, of the type 2 diabetes genes mediate their influence through the beta cell and not through insulin resistance pathways.

In the current report, we evaluate a quantitative, directly assessed measure of beta cell response in a non-European population. Herein, we present results of a two-stage GWAS in Hispanic-Americans from the Insulin Resistance Atherosclerosis Family Study (IRAS FS). Through a high-density single nucleotide polymorphism (SNP) scan and follow-up genotyping, a series of genes and regions may have been identified that contribute to variation in the acute insulin response of the beta cell to a glucose challenge.

Methods

IRAS FS participants

The description of the study design, recruitment and phenotyping for IRAS FS have been presented previously [16]. Briefly, the IRAS FS is a multi-centre study designed to identify the genetic determinants of quantitative measures of glucose homeostasis in African-American and Hispanic-American populations in the USA. Members of large families of self-reported Hispanic ancestry (n = 1268 individuals in 92 pedigrees; San Antonio, TX; San Luis Valley, CO, USA) were recruited and used in this report. The Institutional Review Board of each participating clinical and analysis site approved the study protocol and all participants provided their written informed consent.

A clinical examination was performed that included an in-depth medical history interview, a frequently sampled intravenous glucose tolerance test (FSIGT), anthropometric measurements and collection of samples for blood chemistry and biomarker analysis. Measures of glucose homeostasis were derived using mathematical models [17] from glucose and insulin values obtained during the FSIGT [1820]. These estimates of glucose homeostasis include insulin sensitivity (SI), glucose effectiveness (SG), disposition index (DI; with \( {\text{DI}}\, = \,{\text{AIR}} \times {\text{S}}_{\text{I}} \)) and the acute insulin response to glucose (AIRg). This report focuses on the AIRg phenotype of glucose homeostasis.

A collection of IRAS FS DNA samples from Hispanic-American participants (229 individuals from 34 families) was chosen from the San Antonio study group for the first stage of the GWAS. These samples were from participants without type 2 diabetes who had complete data for glucose homeostasis and obesity phenotypes, but with an age, BMI and sex distribution consistent with that of the entire IRAS FS collection. The participants appear to be representative of a relatively homogeneous population based upon Structure (http://pritch.bsd.uchicago.edu/structure.html, accessed October 2008) analysis [21] using microsatellite polymorphisms from an earlier genome-wide linkage scan [22, 23]. DNA used in the high-throughput genotyping (317K SNPs) was obtained from Epstein–Barr virus (EBV)-transformed lymphoblastoid cell lines.

Genome-wide association scan

Genotyping was performed at Cedars–Sinai Medical Center using 1.5 μg of genomic DNA (15 μl of 100 ng/μl stock) and Illumina technology (Illumina Infinium II HumanHap 300 BeadChips; Illumina, San Diego, CA, USA) and assay protocol [24]. Genotypes were determined based on clustering of the raw intensity data for the two dyes using Illumina Bead Studio software. Consistency of genotyping was checked using 18 repeat samples; the concordance rate was 100%. Repeat genotyping of DNA samples was performed once if the overall call rate was less than 98%; the sample was rejected if there was no improvement in call rate. The average sample call rate was 99.76%. SNPs with Hardy–Weinberg disequilibrium (p < 0.001), minor allele frequency (MAF) less than 0.05 or more than 5% missing genotypes were excluded from subsequent analysis. Genotypes with GenCall (Illumina) scores less than 0.15 were set to missing (0.25%). For highly associated SNPs, clustering was repeated to exclude spurious significance. All genotypes were oriented to the forward strand. Risk of strand ambiguities is minimal, as no C/G or A/T polymorphisms are included on the Illumina 300K HumanHap panel.

Validation genotyping in the entire IRAS FS Hispanic sample

SNPs with evidence of association in the GWAS were validated in the entire Hispanic cohort (excluding participants with type 2 diabetes). A total of 1,536 SNPs were chosen for genotyping on all Hispanic samples for which glucose homeostasis data were available (n = 1190). Genotyping was performed at Cedars–Sinai Medical Center using the Illumina Golden Gate Assay. SNPs with low call frequencies (<98%) were manually re-clustered (~15% of all SNPs). Of the 1,536 SNPs, 3.5% were excluded due to call frequency less than 0.7 and/or cluster separation less than 0.3. The average SNP call frequency was 99.48%. Duplicate genotyping of 12 samples yielded a 100% concordance rate. The minimum acceptable sample call rate was 95%; the average sample call rate was 99.5%. SNP selection for this second stage was based upon: (1) identification of the most strongly associated 50 to 100 SNPs for each glucose homeostasis and related phenotype (SI, SG, AIRg, DI) from the initial GWAS; (2) tag SNPs in genes with high evidence of association across more than one phenotype (tags for SNPs with MAF >0.1 using Haploview version 4 [25]); and (3) ancestry-informative markers (AIMs) for Hispanic populations [26, 27].

Follow-up locus-specific genotyping

Individual genes/regions with confirmed evidence of association from the GWAS and the validation genotyping were targeted for additional genotyping using tag SNPs. This genotyping was performed using iPLEX Gold SBE assays on a MassARRAY genotyping system (Sequenom, San Diego, CA, USA). Locus-specific primers were designed using the system’s software (MassARRAY Assay Design 3.0; Sequenom). Mass spectrograms were analysed using MassARRAY Typer software (Sequenom). The minimum acceptable call frequency was 98%; no SNP failed this criterion as average call frequencies were >99.14%. We included 51 blind duplicate samples to evaluate genotyping accuracy; the concordance rate was 99.94%. SNPs were selected to capture common variation within linkage disequilibrium (LD) haplotype blocks as defined by the Centre d’Etude du Polymorphisme Humain (CEPH) (Utah residents with northern and western European ancestry) (CEU) population of the International HapMap project [28]. Specifically, genotype data from the genomic interval containing the candidate gene ±5 kb were exported from the HapMap database and imported into Haploview. For genes with few LD blocks (e.g. ETV7), SNPs were selected to tag the entire genic region with a mean r 2 = 0.80 with forced inclusion of previously genotyped SNPs. For larger genes (e.g. STK38), SNP selection focused on the LD block containing the SNP associated in the validation genotyping; additional SNPs were selected to tag the block with a mean r 2 = 0.80 with forced inclusion of previously genotyped SNPs.

Statistical methods

For quality control, each SNP was examined for Mendelian inconsistencies using PedCheck [29]. There were 1657 SNPs exhibiting inconsistencies and these genotypes were converted to ‘missing’. Maximum likelihood estimates of allele frequencies were computed using the largest set of unrelated Hispanic-American individuals (n = 34); SNP genotypes were tested for departures from Hardy–Weinberg equilibrium proportions. SNPs with no evidence of a difference in AIRg values between individuals with and without missing genotype data (p > 0.05), and with no evidence of departure from Hardy–Weinberg equilibrium (p > 0.001) were included in subsequent analyses.

To test for association between individual SNPs and AIRg, variance component measured genotype analyses were performed as implemented in SOLAR (Sequential Oligogenic Linkage Analysis Routines) software [30]. X-Chromosome SNPs (used for ancestry adjustment) were not used in the primary analyses. For statistical testing, AIRg was transformed using the signed-square root to best approximate the distributional assumptions of the test and minimise heterogeneity of the variance. SNPs were ranked using p values from the additive genetic model. The primary statistical inference was the additive genetic model. All tests and levels of significance were computed after adjustment for age, sex and BMI.

Analyses for validation and locus-specific genotyping data used the same modelling framework as employed for analysis of the GWAS data, except that covariate adjustment included a term for the site of recruitment (San Antonio, San Luis Valley) and admixture. For incorporation of admixture into the combined analyses of GWAS and validation data, a collection of AIMs was used. These AIMs were selected from the literature on studies performed in Hispanics [26, 27]. The GWAS had 80 SNPs (including 14 on the X chromosome) and the validation genotyping had 149 SNPs (including 23 on the X chromosome). The 149 AIMs were available on 1,279 participants; these data were merged with HapMap data for CEPH (n = 90) and Yoruba (n = 90) populations.

A principal components (PC) analysis was performed on the 149 AIMs as well as the 80 AIMs common to the GWAS (317K SNP panel) and the validation (1,536 SNP panel) experiments. The total proportion of variance explained by the first three PCs with the 80 AIMs (PC1, 10.2%; PC2, 5.1%; PC3, 2.7%) differed little from the proportion of variance explained by the 149 AIMS (PC1, 10.3%; PC2, 4.8%; PC3, 1.9%). However, there were differences overall between the Hispanic-American sites with respect to PC2 \( \left( {p\, = \,2.35 \times 10^{{ - 53}} } \right) \). Hispanic-Americans from the two sites differed in two measures of glucose homeostasis (SG, \( p\, = \,2.65 \times 10^{{ - 17}} \); DI, \( p\, = \,6.22 \times 10^{{ - 10}} \)) and two measures of adiposity (BMI, \( p\, = \,1.46 \times 10^{{ - 12}} \); visceral fat, \( p\, = \,7.37 \times 10^{{ - 5}} \)), but not in AIRg (Spearman correlation 0.09, p = 0.09). For AIRg, the proportion of variance explained by the centre of ascertainment was 2.56%; thus, all results are presented with adjustment for admixture in addition to age, sex, BMI and centre of ascertainment.

Results

Characteristics of the study participants

This pilot GWAS studied 229 Hispanic-American participants who did not have type 2 diabetes, but had complete data for glucose homeostasis, critical covariates and DNA obtained from EBV-transformed cell lines from the San Antonio population. A sample of 961 participants with DNA and baseline data was used for replication. The total study sample of 1,190 Hispanic-American participants included the following characteristics: 58.6% women, average age 42.8 years, mean AIRg (excluding type 2 diabetic participants) 767 pmol/l and BMI 29.0 kg/m2 (Table 1). There were no significant differences between the groups in these phenotypes.

Table 1 Characteristics of the non-diabetic Hispanic-American sample in the IRAS FS GWAS

Genome-wide association scan for AIRg

A total of 309,200 SNPs met all quality control criteria and were evaluated for association with AIRg. SNPs were ranked using p values from the additive genetic model. The quantile–quantile plot for the stage 1 GWAS indicated that the majority of SNPs exhibited a –log10(p value) of less than 2; the observed distribution of p values matched the expectation for the majority of the observed data (Electronic supplementary material [ESM] Fig. 1). There was some departure from the null distribution at p < 10−3, so this value was used as a rough cut-point for selection of SNPs for follow-up. The highest-ranking SNPs associated with AIRg were chosen for genotyping on all non-diabetic Hispanic-American participants in the IRAS FS (n = 1190). We included 672 SNPs with evidence of association with AIRg and other glucose homeostasis phenotypes (SI, SG, DI) or SNPs that tag genes associated with multiple phenotypes in a 1,536 custom chip (another 461 SNPs were tested on the basis of association with adiposity and related phenotypes). For AIRg, 157 SNPs were chosen based on AIRg alone (125 SNPs) or AIRg plus other traits (32 SNPs). Of these 157 SNPs, 149 passed clustering-related quality control.

Candidate genes/regions for AIRg

Results from the AIRg analyses (full admixture-adjusted analyses, initial GWAS and the independent replication analysis), representing the most highly associated SNPs that reside in genes, are shown in Table 2. SNP locations within and in proximity to genes were determined by dbSNP (www.ncbi.nlm.nih.gov/projects/SNP/, dbSNP build 129, genome build 36.3; accessed 9 February 2009). The SNP most highly associated with AIRg was rs1566458 \( \left( {p\, = \,5.5 \times 10^{{ - 4}} } \right) \) in ZDHHC7 on chromosome 16q24.1. A second gene, RGS6 (rs2238256, \( p\, = \,8.6 \times 10^{{ - 4}} \)) on 14q24.3, encodes proteins that negatively regulate heterotrimeric G protein signalling and modulate neuronal, cardiovascular and lymphocytic activity. Three of the most associated genic SNPs for AIRg (rs1061632, rs2300064 and rs12190911) are located in the KCTD20/STK38 region on 6p21.3, within the human MHC and near the SNP in the NCR2 locus, which was also associated with AIRg. Only two associated SNPs were in exons (rs1061632 in KCTD20 and rs3748400 in ZCCHC14).

Table 2 SNPs significantly associated with AIRg in the IRAS FS in genic regions ordered by p values obtained from combined first-stage (GWAS) and replication samples (admixture-adjusted)

The region on 6p21 (ESM Fig. 2) contains five blocks that span approximately 200 kb, based upon the MEX HapMap (genotyping of 71 Mexican Americans from Los Angeles, CA, USA). This region is covered by a five-SNP haplotype (rs12190911/rs1061632/rs2300064/rs612399/rs7772334). A smaller LD block is tagged by rs12190911 (admixture adjusted \( p\, = \,1.4 \times 10^{{ - 3}} \)) and contains two genes, ETV7 (also known as TEL2), a member of the E26 transformation-specific family of transcription factors, and PXT1. A larger LD block is tagged by rs1061632 \( \left( {p\, = \,9.0 \times 10^{{ - 4}} } \right) \), rs2300064 \( \left( {p\, = \,9.1 \times 10^{{ - 4}} } \right) \), rs612399 \( \left( {p\, = \,6.2 \times 10^{{ - 3}} } \right) \) and rs7772334 \( \left( {p\, = 7.8 \times 10^{{ - 4}} } \right) \), and contains the genes KCTD20, STK38 and SFRS3. For the entire region, 22 SNPs were chosen to capture common variation (r 2 = 0.98) for follow-up analyses of these candidate genes in the entire IRAS FS Hispanic sample. The AIRg effect in the 6p21 region appears to be composed of three sets of two adjacent SNPs spanning a 105 kb region bounded by SNPs rs12190911 (p = 0.001 between PXT1 and KCTD20) and rs614028 (p = 0.04 within STK38). Locations and genotypic means for these SNPs can be found in ESM Table 1.

A second region of associated SNPs that appear clustered is on chromosome 20p11.2-p11.1, including the genes ENTPD6, PYGB, ABHD12, GINS1 and RP4-691N24.1 (also known as KIAA0980) (ESM Fig. 3). In this case, however, only ABHD12 did not have at least one SNP that was significantly associated with AIRg. ENTPD6 (rs2179638, p = 0.001), PYGB (rs6138553, p = 0.026), GINS1 (rs6076347, p = 0.0037; rs2500406, p = 0.003) and RP4-691N24.1 (rs16987806, p = 0.001; rs6083877, p = 0.035) all had associations with variation in AIRg. Similar results are shown for SNPs that do not have recognised genes within 10 kb of the associated SNP (Table 3). The second most highly associated SNP in the entire data (rs7772334, \( p\, = \,7.8 \times 10^{{ - 4}} \)) was 13.6 kb from NCR2 on 6p21.1, which is involved in natural killer cell activity. Locations and genotypic means for these SNPs can be found in ESM Table 2.

Table 3 SNPs associated with AIRg not in genic regions ordered by p values from combined first-stage (GWAS) and replication samples (admixture-adjusted)

Discussion

Minimal model analysis measurement of beta cell response and SI derived from the FSIGT is highly correlated with corresponding measures obtained from euglycaemic– and hyperglycaemic–hyperinsulinaemic clamp studies [31, 32]. The hyperglycaemic clamp, unlike the euglycaemic clamp, provides direct estimates of beta cell function. Variation in these measures of glucose homeostasis, including AIRg, appears to be controlled more extensively by genetic factors than by other surrogate measures of SI, such as fasting insulin or HOMA [33]. These variables of glucose homeostasis may represent important phenotypes for prediction of the overall and genetic risk of type 2 diabetes through the mechanism of beta cell action.

To our knowledge, this is the first report of GWAS analysis of a direct measure of beta cell response, namely AIRg. In this analysis, multiple regions of the human genome were identified as likely to harbour genes that contribute to variation in AIRg. Two regions, 6p21 and 20p11, were notable in having associations at more than one SNP. In each case, the region is complex, with multiple genes and extensive LD structure. The region on 6p21 contains KCTD20 (proposed to participate in potassium ion transport) and STK38 (involved in protein serine/threonine kinase activity, including ATP binding, magnesium ion binding and protein binding) [34, 35]. The KCTD20 gene is of particular interest given the key role of potassium ion transport in glucose-stimulated insulin secretion [36]. However, the strongest region of significance on 6p21 is centred on a 1.8 kb region without any known genes. It remains to be determined whether the observed SNP associations are due to a single or multiple causal variants. Preliminary analyses, in which each associated SNP was used as a covariate, suggest that at least two areas may contain independent variants, one associated with the PXT1/KCTD20/STK38 region and one associated with the TREM1/NCR2 region (ESM Table 3).

In contrast, the region associated with AIRg on chromosome 20p11 contains several interesting candidate genes. Similarly to the preliminary analyses in the candidate region on chromosome 6, conditioning on SNP rs1555286 led to little change in the significance of the GINS1 SNP rs6076347, suggesting that at least two independent variants may exist (ESM Table 4). The ENTPD6 gene encodes E-type NTPases (such as CD39) that participate in purine and pyrimidine metabolism, calcium ion binding, hydrolase activity, magnesium ion binding and nucleoside-diphosphatase activity [37]. The protein encoded by PYGB is a glycogen phosphorylase that catalyses the rate-determining step in glycogen degradation [38]. In addition, the rs6076347 SNP in GINS1 is a missense mutation in exon 5 (Ile → Val). Exposure to genistein, which is known to modulate hepatic glucose- and lipid-regulating enzyme activities in C57BL/KsJ-db/db mice [39], results in increased GINS1 mRNA expression. However, it is not immediately apparent how the functions of these genes may relate to beta cell response.

Recent meta-analysis of GWAS in type 2 diabetes has uncovered at least 18 genes/regions that appear to be common in the general population (MAFs of the variant associated with type 2 diabetes greater than 1%), but with relatively small effect [40]. A number of the associated SNPs have identified genes (HHEX, CDKN2A/2B, CDKAL1) that may be implicated in beta cell development and function. In this study, the most highly associated SNPs in the initial GWAS for several glucose homeostasis and obesity traits underwent an additional 1,536 SNP analysis in an expanded population sample of Hispanic-American ethnicity from San Antonio and San Luis Valley. Of these SNPs, four SNPs in two genes (at the p < 0.01 level for an additive model) were among the 18 type 2 diabetes-associated genes evaluated by GWAS in the IRAS FS sample. These two genes are THADA (rs7595299) and CAMK1D (rs2768367, rs2399866, rs1004247), and appear to have a common pathway involved in regulating the number of insulin-producing cells in the pancreas. The remaining 16 type 2 diabetes-associated genes evaluated by GWAS were not associated with AIRg in the current study.

While the GWAS results in the IRAS FS Hispanic-American sample have provided interesting candidates, there are limitations to the interpretation of study results. First, as there are very few comparable studies with similarly defined phenotypes of beta cell response, the ability to replicate these findings is limited. Second, while the associations are ranked on the basis of rigorous statistical criteria, the admixture-adjusted p values obtained do not meet genome-wide levels of significance (e.g. \( p\,\kern1.5pt<\kern1.5pt\,5 \times 10^{{ - 7}} \)). Third, the effect sizes reflected in the genotypic means for each associated SNP average 25% of a standard deviation. In the full Hispanic-American sample of 1,269 individuals for SNPs with MAF = 0.15, the power for this study to detect that effect size with a type 1 error (P) of 10−3 and 10−4 is 89% and 71%, respectively. The power to detect a similar effect at a genome-wide level of significance is only 32%. Further examination of populations and correlated phenotypes for replication are needed.

In summary, a multistage GWAS for AIRg as a measure of beta cell response was performed in a Hispanic-American sample from the IRAS FS. Based upon these results, we identified numerous SNPs and regions of the genome that may contain variants capable of accounting for common variation in AIRg. Two regions were examined with increased genotyping, with support being observed for a gene desert in chromosome 6p21 (containing PXT1, KCTD20 and STK38) and for several genes in chromosome 20p11 (ENTPD6, PYGB, GINS1 and RP4-691N24.1). The latter series of genes suggests a mechanism of action that could be involved in phosphorylation and ion transport. However, the power of the study is limited, the proportion of the observed heritability of AIRg explained by these genes is not substantial and there is need for replication. In conclusion, several candidate genes have been identified as possibly contributing to variation in AIRg, a predictor of type 2 diabetes.