Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production
Sorghum is a tropical C4 cereal that recently adapted to temperate latitudes and mechanized grain harvest through selection for dwarfism and photoperiod-insensitivity. Quantitative trait loci for these traits have been introgressed from a dwarf temperate donor into hundreds of diverse sorghum landraces to yield the Sorghum Conversion lines. Here, we report the first comprehensive genomic analysis of the molecular changes underlying this adaptation.
We apply genotyping-by-sequencing to 1,160 Sorghum Conversion lines and their exotic progenitors, and map donor introgressions in each Sorghum Conversion line. Many Sorghum Conversion lines carry unexpected haplotypes not found in either presumed parent. Genome-wide mapping of introgression frequencies reveals three genomic regions necessary for temperate adaptation across all Sorghum Conversion lines, containing the Dw1, Dw2, and Dw3 loci on chromosomes 9, 6, and 7 respectively. Association mapping of plant height and flowering time in Sorghum Conversion lines detects significant associations in the Dw1 but not the Dw2 or Dw3 regions. Subpopulation-specific introgression mapping suggests that chromosome 6 contains at least four loci required for temperate adaptation in different sorghum genetic backgrounds. The Dw1 region fractionates into separate quantitative trait loci for plant height and flowering time.
Generating Sorghum Conversion lines has been accompanied by substantial unintended gene flow. Sorghum adaptation to temperate-zone grain production involves a small number of genomic regions, each containing multiple linked loci for plant height and flowering time. Further characterization of these loci will accelerate the adaptation of sorghum and related grasses to new production systems for food and fuel.
KeywordsGenotyping-by-sequencing introgression photoperiod flowering time dwarfism
single nucleotide polymorphism.
Cereals have been selected by humans for thousands of years, first during their domestication from wild grasses and subsequently for increased yield, uniformity, and adaptation to new environments and management practices [1, 2, 3]. Specific molecular pathways have recently proven useful for cereal adaptation to modern, high-input agriculture. For example, the Green Revolution exploited allelic variation in the gibberellin pathway in wheat and rice to produce semi-dwarf cultivars with increased harvest index and improved resistance to lodging [4, 5, 6, 7]. Similar phenotypic changes occurred during the creation of dwarf grain sorghum suitable for mechanized harvest at temperate latitudes. Understanding the genetic control of these changes is critical for the efficient transfer of useful alleles, both between tropical and temperate growing regions and between breeding programs for different end uses.
Sorghum is the fifth most important cereal crop worldwide  and is widely grown in temperate regions, but was domesticated in the African tropics . Temperate adaptation for grain production in sorghum requires photoperiod-insensitivity, for early maturity, and dwarfism, both of which involve at least four major loci . Of the major maturity loci (Ma1-Ma6), Ma1 has been identified as PRR37  and Ma3 as Phytochrome B . Of the major dwarfing loci (Dw1-Dw4), Dw3 has been identified as PGP1/PGP19, an auxin transporter orthologous to maize brachytic2 . Dw2 and Dw1 are uncloned, with the former closely-linked to Ma1  and the latter mapping to chromosome 9 [15, 16].
In this study, we use genotyping-by-sequencing (GBS) [18, 19] to generate genome-wide single nucleotide polymorphism (SNP) data for 580 pairs of EP and SC lines, for a total of 1,160 sorghum inbreds. We then employ a novel introgression mapping approach to identify loci required for temperate adaptation, and validate our results using both phenotype-genotype association and population differentiation (Fst) analyses.
Results and discussion
Genotyping-by-sequencing of SC lines and their exotic progenitors
To map elite donor introgressions in SC lines, we genotyped 580 pairs of SC and their corresponding EP lines (Additional File 1) at 54,034 SNPs using GBS. Briefly, we constructed reduced-representation DNA libraries using pairs of restriction enzymes , sequenced them in 96-plexes on the Illumina HiSeq, and processed the data using the TASSEL GBS pipeline . We found that combining two separate double digests nearly doubled the number of SNPs called per sample (Additional File 2). The full dataset contained 0.3% heterozygous genotypes. Partial imputation using the TASSEL GBS pipeline reduced the proportion of missing genotypes from 66% to 23%.
Three different seed sources of the elite donor line, BTx406, were used to construct 28 different genomic libraries. Three of these libraries originating from a single seed source of BTx406 showed low concordance and were removed from subsequent analyses (Additional File 3). This low concordance was likely due to laboratory error as it was confined to libraries prepared on a single day. The remaining 25 libraries from the elite donor contained clear, homozygous majority calls for 53,037 SNPs. The elimination of approximately 7,000 SNPs in complete linkage disequilibrium with another SNP less than 64 basepairs (bp) away resulted in a dataset of 46,137 SNPs for calling introgressions.
Each trio of homozygous genotypes for a given SNP across a SC line, its corresponding EP line, and the elite donor has four possible outcomes (Figure 1B), most common of which is a lack of polymorphism. Of the three remaining polymorphic combinations, shared genotypes between a SC line and its EP line provide evidence that introgression has not occurred, whereas shared genotypes between a SC line and the donor provide evidence that introgression has occurred. The fourth possibility is unexpected: a SC line has a genotype not found in either of its parents. Unexpected genotypes could result from laboratory error (mix-up or cross-contamination of seed or DNA samples in our laboratory), historical error during the SCP (pollen contamination or error in pedigree records), or uncharacterized heterozygosity and/or genetic drift during the maintenance of the EP, SC, or donor lines. We used the proportion of unexpected genotypes as a quality-control filter to prune both markers and individuals. First, we discarded 75 markers with >20% unexpected genotypes, of which 55 were on chromosome 6 and 44 were found between 30 Mb and 43 Mb on chromosome 6, a region that includes Ma1 and likely includes Dw2 . A possible explanation for the high proportion of unexpected genotypes in this region is that certain sources of the elite donor BTx406 used during the SCP differed from our BTx406 consensus genotype in this region. In support of this hypothesis, we note that the seed source of BTx406 derived from Lubbock, TX, very close to where the SCP was carried out, is heterozygous for many of the markers on chromosome 6 that were discarded due to having >20% unexpected genotypes. Second, we discarded 190 SC-EP pairs with >10% unexpected genotypes. The distribution of unexpected genotypes in some SC lines is clustered (for example, SC1104; Additional File 4), suggesting that genomic segments from a temperate donor other than BTx406 were introgressed. In other SC lines the unexpected genotypes are scattered, suggesting that genetic drift may have occurred between the EP line that was used as a recurrent parent and the EP line that was genotyped. For the 16 SC-EP pairs that have >33% unexpected genotypes, a clerical error of some kind - during transcription of pedigree records, seed packet labels, or DNA plates - is most likely. For the remainder of our analysis, we retained a set of 390 SC-EP pairs with <10% unexpected genotypes (Figure 1C), genotyped at 46,062 markers (Additional File 5).
Inferring elite donor introgressions in SC lines
Introgression maps were generated for each SC line (Figure 1E; Additional File 4). The long-range linkage disequilibrium in the SC lines was exploited to map unanchored contigs in the sorghum genome (Additional File 6). After setting non-polymorphic and unexpected genotypes as missing, missing data were inferred using flanking markers (Figure 1D). Introgression frequency was then calculated for each marker as the proportion of the 390 SC lines carrying a BTx406 introgression. The theoretical expectation of introgression frequency after four backcrosses in the absence of selection is roughly 3%. The standard deviation of this value in individual SC lines, in a species with 10 chromosomes and a map length of roughly 16 Morgans, is also roughly 3% , so that the introgression frequency in a sample of 390 SC lines is expected to range from 2% to 4% in the absence of selection. Because our dataset contains a substantial proportion of missing data, introgressions that are very small and very rare may be missed entirely. However, we find that every chromosome contains regions with introgression frequencies >4%, indicating linkage to a target of selection during the SCP.
Three genomic regions are associated with temperate adaptation in sorghum
The cloned Dw3 locus on chromosome 7 is tagged using three different methods
The Dw1 region on chromosome 9 fractionates into linked QTL
Chromosome 6 retains little functional variation in SC lines
Identification of subpopulation-specific introgression targets
The molecular analysis of parents and progeny provides the opportunity for pedigree verification. Our results show that almost one-third of SC lines contain a substantial proportion of unexpected genotypes (>10% of informative markers). We used three complementary approaches - introgression mapping, association mapping, and population differentiation (Fst) - to characterize the genetic architecture of adaptation to temperate-zone grain production in sorghum. Our novel introgression mapping strategy exploited recombination and selection previously imposed by plant breeders to map three major genomic regions, one of which no longer harbors functional variation in temperate-adapted SC lines. Association mapping confirmed that the Dw1 region contains separate QTL for plant height and flowering time. Significant differences in introgression frequency between subpopulations strongly suggest the existence of additional uncharacterized loci that affect plant height and flowering time in sorghum.
Linkage disequilibrium between at least four targeted loci on chromosome 6 has led to the introgression of a single elite haplotype across most of this chromosome in the majority of lines examined. Chromosome 6 contains roughly 10% of sorghum genes, for which very little functional diversity has been exploited for temperate sorghum breeding. This lack of diversity undoubtedly limits adaptive potential, especially for complex traits including resistance to abiotic and/or biotic stress. Increasing gene flow and recombination between tropical and temperate sorghum varieties and haplotypes will help unlock the genetic potential of this stress-tolerant crop to meet our rising demand for food, feed, and fuel in an era of increasing climatic volatility.
Plant materials, DNA extraction, and quantification
Seed for SC lines was obtained from the USDA-ARS Cropping Systems Research Laboratory (Lubbock, TX, USA) and seed for EP lines was obtained from the National Plant Germplasm System (NPGS ). Information on the geographic origins and morphological racial classification of each SC line were obtained from Texas A&M University (Additional File 1). Three independent seed sources of the elite donor BTx406 were obtained from the NPGS (PI 656020), the USDA-Cropping Systems Research Laboratory, and Texas A&M University. Genomic DNA was extracted from etiolated seedlings approximately 3 days after germination using a modified CTAB protocol  and quantified using PicoGreen (Invitrogen, NY, USA).
SNP library creation
Libraries were prepared using a protocol modified from Poland et al. 2012 . Genomic DNA (approximately 250 ng) was double digested with either PstI-HF and BfaI or PstI-HF and HinP1I at 37°C for 2 h with heat inactivation at 80°C for 20 min. Digested DNA was ligated to two separate adapters using T4 ligase with 1mM ATP. The first adapter contains the Illumina forward sequencing primer, one of 96 unique barcodes, and the PstI overhang. The second adapter contains the Illumina reverse sequencing primer and the overhang for either BfaI or HinP1I. The full list of adapters is shown in Additional File 11. Ligation reactions were held at 25°C for 2 h followed by heat inactivation at 65°C for 20 min. Pooled DNA from 96 barcoded libraries was cleaned using a 2:1 ratio of AmpureXP Beads (Beckman Coulter, CA, USA) to DNA solution using a Magnetic Particle Concentrator (Invitrogen, NY, USA) with two washes in 95% ethanol and resuspension in elution buffer (EB; 10mM Tris). Cleaned DNA pools were amplified using Illumina primers in a 2X PhusionHF Master Mix (New England Biolabs, MA, USA) with cycler conditions as follows: 98°C 30 s, 15 cycles (98°C 10 s, 68°C 30 s, 72°C 30 s), 72°C 5 min. Samples were run on agarose gels to confirm the presence of a genomic smear and cleaned a second time with AMPure beads. Amplified DNA sizes and relative concentrations were assessed using an Agilent Bioanalyzer 2100 and Agilent DNA1000 Kit (Agilent Technologies Inc., CA, USA) and PicoGreen. The two separately digested samples were combined in equimolar concentrations and diluted to 10 nM in library buffer (EB + 0.05% Tween-20) and submitted to the W.M. Keck Center at the University of Illinois for single-end sequencing on the Illumina HiSeq2000. The Keck Center performed an additional qPCR assay on each library to adjust concentrations before sequencing.
Genotype data analysis
SNPs were called from Illumina fastq files using the TASSEL GBS pipeline . Only 64 bp tags present at least 10 times in the dataset were considered. Alignment was performed using BWA  with the default settings. Inbred lines and SNPs with >95% missing data were discarded. SNPs were not filtered by minor allele frequency, as rare SNPs are especially useful for inferring introgression events between pairs of lines (Figure 1B). Heterozygous genotypes accounted for 0.3% of the total dataset. Partial imputation using the TASSEL GBS pipeline reduced the proportion of missing data from approximately 66% to approximately 20%. For the association and Fst analyses, the remaining missing data were imputed using BEAGLE. This yielded substantially fewer unexpected genotypes than direct imputation using BEAGLE without prior partial imputation (data not shown).
Mapping unanchored contigs in the sorghum genome
We defined a set of 213 SNPs from 31 unanchored contigs that had at least 20 introgression calls and an introgression frequency of at least 10%, and calculated linkage disequilibrium (r2) between introgression scores in the 213 unanchored SNPs and our complete set of 46,062 SNPs with introgression scores in the 390 SC-EP pairs that were placed on the sorghum physical map (V1.0 ). Most (181) of the unanchored SNPs mapped uniquely to a single chromosome, with a mean of 8.4 mapped SNPs tied for the highest r2 across a mean physical distance of 9.1 Mb (Additional File 6).
Calculation of introgression scores and frequencies
For each SNP, an introgression was scored as either present (1), when a genotype was shared between the SC line and the donor line, or absent (0), when a genotype was shared between the SC line and its EP line. Missing data for presence/ absence of introgressions were inferred as the mean of each flanking marker weighted by its physical distance (Figure 1D). Missing data proximal and distal to the first and last informative markers on a chromosome, respectively, were assigned the value of the closest informative marker. Once missing data were imputed, introgression frequencies were calculated at each SNP as the percentage of SC lines with an introgression.
Subpopulation assignment and permutations
Principal component analysis (PCA) was performed in EP lines in R  using the prcomp() function and a dataset of 22,203 SNPs with minor allele frequencies >10% in the set of 1,160 SC and EP lines (580 pairs). EP lines were assigned to subpopulations using values for PC1 and PC2 as follows: (1) lines with PC2 <-20 were assigned to the guinea/kafir (GK) group; (2) lines with PC2 >-20 and PC1 >0 were assigned to the caudatum (C) group; (3) lines with PC2 >-20 and PC1 <0 were assigned to the durra (D) group. Introgressed regions excluded from the analysis in Additional File 8 were defined as locations <55 Mb on chromosome 6, >50 Mb on chromosome 7, and >50 Mb on chromosome 9. Significance of subpopulation differences in introgression frequency was assessed by randomly assigning SC lines to subpopulations of equivalent size (137, 131, and 122 individuals) and calculating introgression frequencies across the three permuted subpopulations. For each permutation, the maximum range of introgression frequencies across the three subpopulations was recorded for each chromosome. Two hundred permutations were performed and α was set to 0.05.
Phenotypic data and association mapping
The 580 genotyped SC lines were grown in 6 m plots with 0.76 m row spacing in Urbana, IL in the summers of 2011 and 2012 and phenotyped for plant height and flowering time. Plant height was measured as the distance (cm) from the ground to the penultimate or 'pre-flag' leaf on one representative plant per row. Flowering time was measured as the time (days from planting) at which 50% of the plants in the row had initiated anthesis. Phenotypic data from each year were normalized and the mean normalized value across all years was used for association mapping. The GAPIT package in R  was used to conduct marker-trait associations using the default parameters. Markers included all SNPs discovered in this study with minor allele frequencies ≥10%. Missing SNP data were imputed using BEAGLE.
Raw genotyping-by-sequencing read data have been deposited in the Sequenced Read Archive [SRA: SRP022956]. Introgression scores have been included in a table as Additional File 12.
This project was supported by startup funding from the Energy Biosciences Institute to PB. Charlie Woodfin (USDA-ARS Lubbock, retired) and Bill Rooney (TAMU) provided seed. We also wish to acknowledge the stellar technical support provided by Alvaro Hernandez and staff at the University of Illinois' Keck Center.
- 1.Gepts P: Crop domestication as a long-term selection experiment. Plant Breeding Reviews. 2004, 24: 1-44.Google Scholar
- 6.Ashikari M, Sasaki A, Ueguchi-Tanaka M, Itoh H, Nishimura A, Datta S, Ishiyama K, Saito T, Kobayashi M, Khush GS: Loss-of-function of a rice gibberellin biosynthetic gene, GA20 oxidase (GA20ox-2), led to the rice 'green revolution'. Breeding Science. 2002, 52: 143-150. 10.1270/jsbbs.52.143.CrossRefGoogle Scholar
- 8.Food and Agriculture Organization of the United Nations. [http://www.fao.org/index_en.htm]
- 9.Smith CW, Frederiksen RA: Sorghum: Origin, History, Technology, and Production. 2000, Hoboken, NJ: John Wiley & SonsGoogle Scholar
- 10.Quinby JR: Sorghum improvement and the genetics of growth. 1974, College Station, TX: Texas A&M University PressGoogle Scholar
- 11.Murphy RL, Klein RR, Morishige DT, Brady JA, Rooney WL, Miller FR, Dugas DV, Klein PE, Mullet JE: Coincident light and clock regulation of pseudoresponse regulator protein 37 (PRR37) controls photoperiodic flowering in sorghum. Proc Natl Acad Sci USA. 2011, 108: 16469-16474. 10.1073/pnas.1106212108.CrossRefPubMedPubMedCentralGoogle Scholar
- 15.Morris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, Riera-Lizarazu O, Brown PJ, Acharya CB, Mitchell SE, Harriman J, Glaubitz JC, Buckler ES, Kresovich S: Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc Natl Acad Sci USA. 2013, 110: 453-458. 10.1073/pnas.1215985110.CrossRefPubMedGoogle Scholar
- 20.Buckler Lab for Maize Genetics and Diversity. [http://www.maizegenetics.net/]
- 25.Satish K, Srinivas G, Madhusudhana R, Padmaja PG, Nagaraja Reddy R, Murali Mohan S, Seetharama N: Identification of quantitative trait loci for resistance to shoot fly in sorghum [Sorghum bicolor (L.) Moench]. Theor Appl Genet. 2009, 119: 1425-1439. 10.1007/s00122-009-1145-8.CrossRefPubMedGoogle Scholar
- 27.National Plant Germplasm System. [http://www.ars-grin.gov/npgs/]
- 31.The Comprehensive R Archive Network. [http://cran.r-project.org/]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.