Background

Wheat, the most widely grown grain crop providing the food requirements of about 35% of the global population, generates the largest total harvest and is the most traded grain commodity [1,2,3]. Studying and understanding the phenotypes and genotypes of its agronomic traits may result in an improvement its yield stability.

Single nucleotide polymorphisms (SNP), as third-generation molecular markers, are superior in automated genotyping [4,5,6]. There are many reports on the use of high-density Illumina iSelect 90 K SNP chips in generating linkage maps [7,8,9]. For example, Gao et al. [7] built a genetic linkage map of hexaploid wheat that included 5536 polymorphic SNP markers covering a genetic length of 3609.4 cM using the 90 K iSelect SNP array. Jin et al. [9] identified 46,961 polymorphic SNPs in a 176-RIL population derived from a Gaocheng 8901/Zhoumai 16 cross using the 90 K and 660 K SNP arrays, and they produced a genetic map with a total length of 4121 cM and marker density of 0.09 cM/marker in bread wheat.

In addition to genetic mapping SNP markers have unique advantages for genome-wide association studies (GWAS) of yield-related traits in cereal crops, including rice [10], barley [11] and common wheat [12,13,14,15]. In particular, Yu et al. [10] detected genes linked to kernel type (GS3) and weight (GW5) associated with grain quality in rice by genome-wide SNP scanning and high-density genetic maps. Cormier et al. [12] investigated 28 nitrogen use-related traits in 240 European wheat varieties in a GWAS study, detecting 1010 SNPs significantly associated with nitrogen utilization. Sukumaran et al. [14] scanned the whole genomes of 287 wheat varieties using the Illumina iSelect 90 K SNP array and identified loci significantly associated with yield traits. Specifically, four, one, and five loci were associated with grain yield (chromosomes 3B, 5A, 5B, 6A), kernel weight (6A), and maturity (2B, 3B, 4B, 4D, 6A), respectively.

Analysis of the breeding history of many crop species revealed the presence and roles of founder parents. Molecular markers were used to analyze the contributions of the genetic bases of founder parents in improvement of barley [16], sugarcane [17], rice [18,19,20], and wheat [21, 22]. For example, Li et al. [19] and Tan et al. [20] built genetic maps of rice showing that quantitative trait loci (QTLs) of kernel number per spike, thousand-grain weight, and yield in the founder parent Minghui 63 were transmitted to the progenies over generations. By pedigree tracking of the founder parent Beijing 8, Li et al. [21] found that the frequencies of alleles unique to Beijing 8 varied from 0 to 0.96 in its 51 descendants, suggesting that some of them underwent rigorous artificial selection. Jiang et al. [22] confirmed that Ningmai 9 could serve as a founder parent and found some significant chromosome regions that might be used in wheat breeding.

In this study we genotyped 215 wheat cultivars using the iSelect 9 K SNP array, including 11 founder parents and 106 derivatives. Based on multi-environmental trial data we used GWAS to identify favorable alleles of yield-related traits through sequential generations of breeding. Favorable alleles identified in derivatives could be used to detect important chromosome regions inherited from the founder parents. This information might be used for marker-assisted selection (MAS) in wheat breeding.

Results

Phenotypic assessment

The average coefficients of variation for phenotypic traits in each environment ranged from 6.29 to 26.35%, indicating considerable phenotypic variation (Table 1). There were significant positive correlations between traits across environments (P < 0.01; Additional file 1: Table S1).

Table 1 Descriptive statistics of six phenotypic traits in different environments assessed in this study

The founder parents Funo, Bima 4, and Nanda 2419 and their derivatives over following generations were compared in terms of yield-related traits, including plant height (PH), spike length (SL), spikelet number per spike (SNPS), kernel number per spike (KNPS), kernel weight per spike (KWPS), and thousand kernel weight (TKW). PH gradually declined and TKW increased with advancing generations, while SL, SNPS, KNPS, and KWPS showed no significant changes. This indicated continuing selective pressure on PH and TKW during breeding (Additional file 2: Table S2).

Allelic diversity and genetic structure

Genotyping of the 215 wheat cultivars using 9 K SNP array identified 4138 polymorphic SNPs, of which 3792 were mapped to single chromosome positions. Among them, 1795 were present in the A genome chromosomes, 1787 in the B genome, and only 210 in the D genome (Additional file 3: Table S3). Genetic diversity was analyzed using the 3792 SNPs. Gene diversity and polymorphism information content (PIC) ranged from 0.009 to 0.500 and from 0.009 to 0.375, with averages of 0.319 and 0.256, respectively. Major allele frequencies reached a maximum of 0.995, with an average of 0.762 (Additional file 3: Table S3), indicating that the germplasm was highly diverse.

The number of subpopulation (K) was plotted against the ΔK calculated from the Structure, and the peak of the broken line graph was observed at K = 2 (Fig. 1a, b). The neighbor-joining method was used to classify 215 wheat cultivars based on Nei’s standard genetic distance [23], and they were divided into two groups (Fig. 1c). The first group (162) mainly consisted of Funo, Nanda 2419, and their derivatives, which mainly originated from Anhui, Henan, Hunan, Jiangsu, Shaanxi, and Sichuan provinces. The second group (53) mainly consisted of Bima 4 and its derivatives, which mainly originated from Beijing and Shandong. This further demonstrated that the population was basically divided into two subpopulations.

Fig. 1
figure 1

Population structure and neighbor-joining (NJ) tree of 215 cultivars based on 3792 unlinked SNP markers. (a) plot of ΔK against putative K ranging from 1 to 12; (b) stacked bar plot of ancestry relationships of 215 wheat cultivars; (c) NJ tree based on Nei’s distance. The two divergent groups are shown in black (162) and red (53), respectively

Associations between yield-related traits and SNPs

Of the 3792 SNP markers, 3271 had a frequency higher than 0.05. Association analyses between the six yield-related traits and SNP markers showed that there were 117 significantly associated signals (P < 3.06 × 10− 4) among the 76 associated SNP loci, including 20, 35, 6, 23, 24, and 9 signals associated with PH, SL, SNPS, KNPS, KWPS, and TKW, respectively (Fig. 2). The phenotypic explanation rates (R2) ranged from 2.03 to 12.76%. The associated loci were located on 15 chromosomes (Table 2). Significant associations were found in two or more environments for 25 SNP loci; for example, wsnp_Ex_c49211_53875575-5A (SL) was significantly associated in all six environments, whereas others were significant in two to five environments (Table 2).

Fig. 2
figure 2

Manhattan and Q-Q plots of six phenotypic traits with 3792 genome-wide SNP markers shown as dot plots of compressed MLM at P < 3.06 × 10− 4. Red horizontal line corresponds to the threshold value for significant association. Green and orange colors separate different chromosomes. Significantly associated SNP markers are labeled with blue dotted lines. (a) SNPs wsnp_Ex_c12048_19288999 and wsnp_Ku_c99567_87349060 associated with PH were consistently detected in 5 and 3 environments, respectively; (b) SNPs wsnp_Ex_rep_c67779_66463916, wsnp_Ex_c3463_6348659 and wsnp_Ex_c49211_53875575 associated with SL were consistently detected in 4, 3 and 6 environments, respectively; (c) SNPs associated with SNPS were detected in less than two environments; (d) both wsnp_Ku_c29102_39008953 and wsnp_Ex_c13154_20784674 associated with KNPS were detected in 3 environments; (e) SNP wsnp_Ex_c12341_19693570 associated with KWPS was detected in 3 environments; (f) SNP wsnp_Ku_rep_c69511_68887456 associated with TKW was detected in 4 environments

Table 2 One hundred and seventeen significant association signals (P < 3.06 × 10− 4) involving 76 associated SNP loci and six phenotypic traits

Phenotypic effects of yield-related alleles

The phenotypic effects of alleles were further analyzed (Table 3). Favorable alleles with larger genetic effects on PH, SL, SNPS, KNPS, KWPS and TKW were wsnp_Ku_c99567_87349060-5B CC (reduction of PH by 8.82 cm in 09YL, 6.91 cm in 09YZ, and 5.90 cm in 10YZ), wsnp_Ex_c1630_3105100-5B AA (1.34 cm in 10YL), wsnp_Ex_c7713_13153321-6B CC (1.48 cm in 09YL); wsnp_Ex_c13953_21831752-4A CC (increases in KNPS by 4.27 in 09YZ and 3.45 in 10YL); wsnp_Ex_c19467_28423197-6B AA (increases in TKW by 0.26 g in 09YL); and wsnp_Ku_rep_c69511_68887456-3A TT (increases in TKW by 1.41 g in 09TA, 1.01 g in 09YL, 1.48 g in 09YZ, and 1.33 g in BLUP), respectively. The frequencies of these alleles at associated loci ranged from 6.05 to 97.21%, and exceeded 50% for 64 alleles, indicating strong selection on those alleles in breeding.

Table 3 Favored alleles and genetic effects of 76 SNP loci significantly (P < 3.06 × 10− 4) associated with six phenotypic traits

Transmission of favorable alleles from founder parents

All 76 alleles with a positive effect on yield-related traits identified in the association analysis were used to analyze the transmission rates of alleles from founder parents to progenies, as well as the frequencies of favorable alleles in later generations. Transmission rates from the first generation of Funo to the fifth generation were between 81.88 and 65.48%, and the frequencies of favorable alleles in different generations changed from 71.99 to 78.21%. Transmission rates from the first generation of Bima 4 to the fourth generation were between 79.94 and 64.38%, and frequencies of favorable alleles increased from 74.79 to 79.49%. Likewise, transmission rates for first to fifth generation derivatives of Nanda 2419 were between 64.25 and 50.72%, while the corresponding frequencies of favorable alleles increased from 68.91 to 78.21% (Fig. 3). Although the transmission rates of alleles from founder parents decreased with the number of generations, the percentage of favorable alleles increased.

Fig. 3
figure 3

Frequencies of favorable alleles and founder parent-derived alleles in three founder parents and their derivatives. Blue bars represent the frequencies of founder parent-derived alleles and the red dotted lines indicate the frequencies of favorable alleles in founder parents and their derivatives. a Funo; b Bima 4; c Nanda 2419

Overall analysis of chromosome regions involving 76 favorable alleles showed that among the 15 chromosomes with association signals for agricultural traits, only three regions, 95.5–97.8 cM on 3B, 136.2–144.1 cM on 4A, and 116.0–133.2 cM on 5A had high frequencies for alleles with a positive influence on yield traits (Fig. 4a). In particular, the 3B region was associated with SL and PH (Fig. 4b), while the 4A region associated with SL (Fig. 4c). Additionally, the 116.0–133.2 cM region on 5A was present in derivate cultivars with high frequency and associated with KWPS (Figs. 4d and 5).

Fig. 4
figure 4

Distributions of favorable alleles associated with agronomic traits in 215 cultivated varieties. Green indicates the favored allele at each locus, yellow indicates alternative alleles, white indicates missing data. a comparative distribution of favorable alleles associated with agronomic traits in 117 founder parents and derivatives and 98 independent varieties; b frequencies of favorable alleles associated with SL and PH in the chromosome 3B region 95.5–97.8 cM in the founder parents and derivatives and other cultivated varieties; c frequencies of favorable alleles associated with SL in the 4A region 136.2–144.1 cM after comparing founder parents and derivatives with an independent variety group; d frequencies of favorable alleles associated with KWPS in 5A region 116.0–133.2 cM after comparing founder parents and derivatives with cultivated varieties

Fig. 5
figure 5

Distribution of favorable alleles associated with agronomic traits in Funo and its derivatives. Green indicates the favorable allele at each locus, yellow indicates the alternative allele, white indicates missing data, red histogram represents the frequencies of founder parent-derived alleles. (a) distribution of favorable alleles associated with agronomic traits in Funo and its derivatives; (b) Manhattan plot displaying the GWAS result for KWPS with 3792 SNPs in six environments

Discussion

Genetic diversity of founder parents and derivatives

One hundred and seventeen of the 215 cultivars investigated in this study were first to fifth generation derivatives of Funo, Nanda 2419 and Bima 4 that were bred in different provinces of China. The 215 cultivars were divided into two groups, first (162 accessions, 75.3%) including Funo, Nanda 2419, and their derived offspring, while the second (53, 24.7%) included Bima 4 and many of its derived offspring. Pedigree analysis showed that the first generation derivatives of Funo Sumai 2 and Sumai 3, as well as the second generation Ning’ai 8628 and Wu 7815–4-1, clustered together. Moreover, the first generation of Funo derivatives obtained from a cross with Neixiang 5 (Zhengzhou 17, Kaifeng 10 and Xuchang 26), and the second generation obtained from crosses involving Zhengzhou 17 (Sudi 8112, Zhengzhou 741, Huapei 128–8 and Xiangmai 5) were also in the same cluster. Thus, different cultivars from the same original cross had high similarity, indicating little genetic differences in the traits analyzed. Moreover, among different clusters, genetically related lines mostly grouped in the same cluster, indicating the results were consistent with the genealogy (Additional file 4: Table S4). However, a few lines with direct pedigree relationship to a particular founder did not fall into the same group. For instance, 16 of the 17 first generation Funo lines belonged to the first group, but Linnong 14, which are 50% related to Funo, fell into the second group, showing that high performance offspring with large differences could be selectively bred from the same founder parent.

Dissection of founder parents by favorable alleles

Previous studies found that the genes controlling important traits tended to be clustered rather than randomly distributed on chromosomes [24,25,26,27]. For example, Huang et al. [25] identified QTLs for TKW and KNPS in the Xgwm334a-Xgwm1043 region on chromosome 6A, PH, KNPS, and TKW near Xgwm786 on chromosome 7D, and KNPS, spike weight, heading date, TKW, and PH in the Xgwm1220-Xgwm1002 region also on chromosome 7D. Li et al. [26] localized eight QTLs for TKW, spike number per square meter, sterile spikelet number per spike and fertile spikelet number per spike near markers Xwmc31, Xgdm67, and Xgwm428 on chromosome 7D.

We investigated favorable allele combinations carried by the founder parents and found that among the 76 associated loci, Bima 4, Funo, and Nanda 2419 carried 58, 56 and 48 favorable alleles, respectively. Among the 25 associated loci detected in multiple environments, Bima 4, Funo and Nanda 2419 carried 20, 19 and 14 favorable alleles, respectively. In particular, the wsnp_CAP8_c1419_836050 - wsnp_Ex_c6129_10723019 region on chromosome 3B associated with both SL and KNPS; and wsnp_Ex_c1563_2987002 - wsnp_Ex_c3463_6348659 on chromosome 4A associated with SL. Favorable alleles in these two segments were present with high frequency in Bima 4, Funo, and Nanda 2419, indicating that these varieties have potential for breeding programs. Similarly, the wsnp_Ku_c29102_39008953wsnp_Ex_c13154_20785032 region of chromosome 3B associated with KNPS. This segment was linked to yield increase in both Bima 4 and Funo.

Implications for molecular wheat breeding

Shoemaker et al. [28] suggested that the process of plant breeding reflects how breeders “manipulate” traits to preferentially select for high yield, disease resistance, and high quality. In this study, 27 of 76 marker-trait associations (MTAs) co-localized with previously identified QTL or loci (Additional file 5: Table S5) [7, 13, 28,29,30,31,32]. SNPs wsnp_Ku_c29102_39008953, wsnp_Ex_c11837_18996495 and wsnp_Ex_c12781_20280445 located in region 123.3–126.6 cM on chromosome 3B affected KWPS, while genes for KNPS near locus BS00022025_51 associated with TKW [13]. QSL.caas-4AS for SL mapped to region 140.5–144.1 cM, which included loci wsnp_Ex_c28092_37240192, wsnp_Ex_rep_c67779_66463916, wsnp_Ex_c12_21212, and wsnp_Ex_c3463_6348659 identified in the current GWAS [7]. We found that favorable alleles associated with spike weight in founder parents and derivatives were clustered in the 116.0–133.2 cM region on chromosome 5A (Figs. 3 and 4). Gao et al. [7] also indicated that QTLs QNDVI-A.caas-5AL, QChl-A.caas-5AL and QChl-10.caas-5AL in this region might affect yield.

Twenty-five SNP loci associated with yield-related traits in two or more of six environments. Among them, eight SNP loci co-localized with those found in previous studies (Table 2 and Additional file 5: Table S5) [7, 13, 28,29,30,31,32]. Thus, these favorable alleles, especially locus wsnp_Ex_c49211_53875575-5A detected in all six environments, is interesting for future breeding programs.

Conclusions

Two hundred and fifteen wheat cultivars were genotyped by the 9 K SNP iSelect assay and all were phenotyped for six yield-related traits in six environments. Comparisons of yield-related traits in founder parents Funo, Bima 4, Nanda 2419, and their derivatives indicated that breeders applied a strong selective pressure on PH and TKW. MAF, PIC and gene diversity analysis using 3792 SNP markers showed high genetic diversity. Genome-wide association analysis of yield-related traits detected 117 significant associations at 76 SNP loci on 15 chromosomes. Twenty five associations were detected in two or more environments. Three regions with high-frequencies of favorable alleles were identified in position 95.5–97.8 cM on chromosome 3B, and in position 136.2–144.1 cM and 116.0–133.2 cM on chromosome 5A. The region on chromosome 5A associated with KWPS was highly distinctive in favorable alleles between founder and derived lines compared to other cultivars. Our findings partially identify the genetic basis of the role of founder parents in crop breeding, and provide information for future wheat improvement by marker-assisted selection.

Methods

Plant materials

The plant material was a collection of 215 wheat cultivars, including 11 founder parents and 106 derivatives and 98 other varieties (Additional file 4: Table S4). The first group comprised 11 founder parents, such as Funo, Bima 4 and Nanda 2419 (Additional file 6: Figure S1) [33], and they have made significant contributions to Chinese wheat breeding and 106 derivatives of those parents. The other 98 genotypes originated from Italy (2) and Chinese provinces including Anhui (4), Beijing (5), Fujian (5), Gansu (2), Guizhou (1), Hebei (4), Henan (9), Hubei (3), Hunan (8), Jiangsu (16), Jiangxi (1), Shaanxi (17), Shandong (12), Shanxi (3) and Sichuan (6). Details are provided in Additional file 4: Table S4.

Phenotyping

The whole germplasm set was planted at three locations (Taian in Shandong; Yangling in Shaanxi; and Yangzhou in Jiangsu) in two growing seasons (2008–2009 and 2009–2010). Field management followed local practices. The six irrigated environments were designated 09TA, 10TA, 09YL, 10YL, 09YZ and 10YZ. Field experiments were grown in randomized block designs with three replications. Each line was planted in 2 m, 5 row plots at 40 kernels per row, with a row spacing of 20 cm. The agronomic traits PH and TKW were measured at maturity. Thirty spikes of each line were randomly collected from the middle row and used for measurements of SL, SNPS, KNPS, and KWPS.

Genotyping and statistical analysis

Genomic DNA extraction was carried out using the CTAB method [34]. Descriptive statistical analysis and analysis of variance (ANOVA) of phenotypic data were calculated by using SPSS 21.0 (http://www.brothersoft.com/ibm-spss-statistics-469577.html). The best linear unbiased prediction (BLUP) method was used to calculate the mean values of each trait [35,36,37].

SNP genotyping was performed on the BeadStation and iScan instruments and conducted at the Genome Center of the University of California at Davis according to the manufacturer’s protocols (Illumina, USA) [5]. Data correction, input and output performed using GenomeStudio v2011.1 [38]. Information on chromosome location of polymorphic SNPs was obtained from Cavanagh et al. [5]. PowerMarker V3.25 was used to estimate genetic diversity of SNPs [39]. Population structure of the 215 cultivars was evaluated with 3792 SNP markers distributed on all 21 chromosomes using Structure 2.3.4 with a burn-in period at 50,000 iterations and a run of 500,000 replications of Markov Chain Monte Carlo (MCMC) after burn in [40]. For each run, 5 independent runs were performed with the number of cluster K varying from 1 to 10, leading to 50 Structure outputs. Then the number of populations was estimated on the basis of the Evanno criterion [41]. Based on the Q + K model [42, 43] and TASSEL 5.0 software [31] (http://www.maizegenetics.net), GWAS was performed using the yield-related traits and SNP marker data. After exclusion of SNP loci with frequencies < 0.05, a uniform suggestive genome-wide significance threshold (1/3271 = 3.06 × 10− 4, or P < 3.06 × 10− 4, -LogP > 3.51) was given.

The 215 wheat cultivars were grouped by the neighbor-joining method in MEGA 5.0 [32]. The transmission frequencies of alleles from founder parents to later generations as well as favorable alleles were computed in this study. The transmission rate was defined as the percentage of average numbers of alleles carried by one generation derived from the founder parent relative to the total number of alleles detected. The frequency of favorable alleles was defined as the percentage of average numbers of favorable alleles carried by one generation relative to the total number of favorable alleles detected.