Selection at a genomic region of major effect is responsible for evolution of complex life histories in anadromous steelhead
Disparity in the timing of biological events occurs across a variety of systems, yet the understanding of genetic basis underlying diverse phenologies remains limited. Variation in maturation timing occurs in steelhead trout, which has been associated with greb1L, an oestrogen target gene. Previous techniques that identified this gene only accounted for about 0.5–2.0% of the genome and solely investigated coastal populations, leaving uncertainty on the genetic basis of this trait and its prevalence across a larger geographic scale.
We used a three-tiered approach to interrogate the genomic basis of complex phenology in anadromous steelhead. First, fine scale mapping with 5.3 million SNPs from resequencing data covering 68% of the genome confirmed a 309-kb region consisting of four genes on chromosome 28, including greb1L, to be the genomic region of major effect for maturation timing. Second, broad-scale characterization of candidate greb1L genotypes across 59 populations revealed unexpected patterns in maturation phenology for inland fish migrating long distances relative to those in coastal streams. Finally, genotypes from 890 PIT-tag tracked steelhead determined associations with early versus late arrival to spawning grounds that were previously unknown.
This study clarifies the genetic bases for disparity in phenology observed in steelhead, determining an unanticipated trait association with premature versus mature arrival to spawning grounds and identifying multiple candidate genes potentially contributing to this variation from a single genomic region of major effect. This illustrates how dense genome mapping and detailed phenotypic characterization can clarify genotype to phenotype associations across geographic ranges of species.
KeywordsOncorhynchus Adaptation Genome evolution Pooled sequencing Sexual maturation greb1L Migration
Abhydrolase Domain Containing 3
Analysis of Variance
Fisher’s Exact Test
Growth Regulation By Estrogen In Breast Cancer 1 Like
Genotyping-In-Thousands by sequencing
Minor Allele Frequency
Mindbomb E3 Ubiquitin Protein Ligase 1
Polymerase chain reaction
Passive Integrated Transponder
Restriction Site Associated DNA Sequencing
Rho Associated Coiled-Coil Containing Protein Kinase 1
Variation in the temporal occurrence of life history events, or phenology, occurs among a vast number of plant and animal systems [1, 2, 3]. The timing of processes such as migration, hibernation, flowering, and breeding can directly influence survival because essential resources vary over both time and space . Therefore, maintaining interspecific phenological variation is often essential for a species’ persistence, as the timing of biological events may be beneficial or detrimental depending on cyclical variation of biotic and abiotic factors . Thus, balancing selection can act to preserve variation in phenology [6, 7] which is often reflected by high genomic differentiation within multiple species [8, 9]. Consequently, discovery of phenology-related genomic variation is important both for understanding the genomic evolution of an organism and managing populations with phenological variation in the wild .
Variation in the timing of migration occurs across a variety of taxa but is particularly consistent and predictable in multiple anadromous salmonid species, which migrate from the ocean to freshwater tributaries to spawn . Specifically, steelhead trout (Oncorhynchus mykiss) show distinct bimodal variation in the timing of entry into freshwater tributaries (freshwater entry maturation) [12, 13]. Steelhead that enter freshwater early are sexually premature (stream-maturing) and undergo maturation while in freshwater, whereas late migrating fish typically become sexually mature in the ocean prior to freshwater entry (ocean-maturing) [11, 14]. Despite these two distinct maturation-timing strategies, both stream and ocean-maturing fish spawn at similar times, and admixture occurs in coastal streams where both strategies are present [13, 15]. These alternate phenotypes do not affect population structure as fish with distinct maturation differences from the same geographic regions tend to be closely related across genome-wide markers [15, 16, 17]. In contrast to coastal populations, inland populations of steelhead are comprised exclusively of the stream-maturing type and enter freshwater sexually premature several months in advance of spawning as they migrate long distances to spawning tributaries [14, 16]. Even with similarity in maturation states at freshwater entry of inland populations of steelhead, uncertainty remains in whether variation in maturation exists near freshwater spawning grounds. Other salmonid species demonstrate temporal variation in maturation and arrival timing to inland spawning tributaries [18, 19] suggesting that variation in spawning site maturation may occur in steelhead.
The genetic basis for freshwater entry maturation in steelhead has recently been attributed to a single locus, Growth regulation by estrogen in breast cancer-like (greb1L), an oestrogen target-gene. Previous genomic reduction techniques (e.g., RAD-seq) identified multiple single-nucleotide polymorphisms (SNPs) within greb1L that successfully differentiated ocean and stream-maturing fish [13, 15]. However, these techniques were relatively coarse and effectively evaluated 0.5–2.0% of the steelhead genome, leaving uncertainty to the genetic basis of freshwater entry maturation. The recent availability of a high-quality O. mykiss genome assembly and full-genome resequencing techniques enables a more precise investigation of the genomic basis of maturation in steelhead. Further, access to passive integrated transponder (PIT) tagging data offers a proxy for measuring spawning site maturation that has not been explored in previous studies.
In this study we investigated three primary questions related to the evolution of complex phenology in anadromous steelhead. First, we evaluated whether dense genome scans could reveal additional candidate genes associated with stream versus ocean maturation phenotypes in steelhead from replicated streams. Second, we tested whether a previously identified candidate gene, greb1L, was consistently associated with freshwater entry maturation phenotypes in steelhead across a broad geographic range that included far-migrating inland populations in addition to previously studied coastal populations. Third, we examined spawning site arrival phenotypes from individual fish to test for genotype to phenotype associations of greb1L with spawning site maturation. Together, we synthesized results to explore models of selection that likely maintain genomic variation of complex maturation phenotypes across a broad distribution of steelhead.
Genomic resequencing for fine scale mapping of phenology traits
To acquire high-density coverage of the O. mykiss genome, we utilized a pooled-sequencing (Pool-seq) approach. Pool-seq involves combining individual DNA samples together and sequencing the homogenized DNA mixture together . This method provides a representation of dense population allele frequencies across a reference genome . However, dense representation of the genome comes at the loss of individual genotypes which are generally used to estimate metrics such as linkage disequilibrium and heterozygosity. We implemented Pool-seq in two independent spawning tributaries in the Columbia River Basin, the Klickitat and Kalama rivers, which both contain stream and ocean-maturing fish. We targeted fish for each library by peak run-times for the two phenologies: stream-maturing in July and ocean-maturing in March for total of 193 individuals collected between 2003 and 2005 (Kalama) and 2014–2017 (Klickitat). Using non-invasive samples of fin clips from fish trapped at weirs, we pooled samples accordingly: Kalama ocean-maturing samples (n = 50), Kalama stream-maturing samples (n = 46), Klickitat ocean-maturing samples (n = 47), Klickitat stream-maturing samples (n = 50) (Additional file 1: Table S1).
All four libraries were prepared using a modified NEBNext Ultra enzymatic fragmentation protocol (Additional file 1) [22, 23]. In summary, individual DNA was quantified with pico-green fluorescence on a Tecan M200 (Tecan, Männedorf, Switzerland) and normalized within two-standard deviations of the mean concentration to avoid over-representation of any given individual . Pooled samples were fragmented using NEBNext Ultra dsDNA fragmentase and cleaned using a Qiagen MinElute. After ligation of Illumina adaptors, we targeted sequences with a mean size of 500 bp, performed PCR amplification, then cleaned PCR product using AMPure XP beads. All libraries were sequenced on an Illumina NextSeq 500 with a targeted 500–800 million paired-end reads per library.
Sequenced libraries were prepared with the PoolParty pipeline . As part of this pipeline, raw 150 bp paired-end reads were filtered by trimming reads (to a minimum of 50 bp) with a base quality score less than 20 using the trim-fastq.pl script part of Popoolation . Trimmed reads were then aligned to the O. mykiss reference assembly (Omyk_1.0; GCA_002163495) using bwa mem  with default parameters. PCR duplicates were removed using samblaster  and unpaired and unmapped reads were removed using the SAMtools view module . Filtered BAM files were then combined using the SAMtools mpileup module, which extracts SNP and coverage information for each pool. To remove any false positive SNPs that often occur around insertion-deletions (indels) we used the identify-genomic-indel-regions.pl and filter-sync-by-gtf.pl scripts from Popoolation2 to remove any SNPs within 5 bp of indel regions . Only variant positions with a minimum of 15 X depth of coverage and a maximum of 250 X depth of coverage were retained; this eliminated regions that may be paralogs (high coverage) or regions that were likely overrepresented by a small number of individuals (low coverage). Alignment and coverage statistics for all libraries were calculated using the PPstats module of PoolParty.
To determine selective sweeps or genomic regions with significant differentiation, we implemented sliding-window fixation index (FST), a local score technique to test for statistical association, and a Cochran–Mantel–Haenszel (CMH) test [24, 28]. Sliding window FST between stream and ocean-maturing pools, was calculated using Popoolation2, using a sliding window of 5-kb with a step size of 50 bp . Local score is an alternative to Fisher’s exact test (FET) for allele frequency difference that reduces false positives by incorporating linkage disequilibrium. Specifically, local score uses FET p-values to determine differentiated genomic regions while simultaneously considering linkage disequilibrium. Local score uses a score function related to -log(10) p which will vary based on window size. Opposed to combining p-values within a fixed window size, local score considers the proximity of statistically significant p-values to determine window size iteratively. Finally, the CMH test identifies consistent differences in allele frequencies across biological replicates and computes significance between groups of interest . Thus, in our libraries, the CMH test identified SNPs with allele frequency changes that occurred between stream and ocean-maturing fish from both the Kalama and Klickitat. We considered genomic regions to be significant if they showed statistical support for differentiation between stream and ocean-maturing fish in both local score analyses (analogous to a Bonferroni corrected α = 0.05), and the CMH test (Bonferroni corrected α = 0.05). Significant regions were then investigated for variant annotations using SnpEff  which predicts non-synonymous SNPS (nsSNPs) from a general feature format (GFF). Variants identified as nsSNPs are anticipated to be under selection and more likely to be causal SNPs for a given trait .
Population structure analyses were performed using PPanalyze from the PoolParty pipeline. We used PPanalyze to remove SNPs with minor allele frequency (MAF) < 0.05 and create neighbor joining trees based on Nei’s genetic distance and 10,000 bootstraps, using both all genomic SNPs, and SNPs within a 309-kb region of chromosome 28.
Broad scale characterization of candidate genotype frequencies
To determine association between maturation-timing and genomic regions of major effect across a large geographic scale, we isolated a single informative SNP within greb1L (greb1L-SNP) in an additional 59 collection localities across the Columbia River Basin  (n = 2915; Additional file 1: Table S3). In previous studies, greb1L-SNP explains a large proportion of trait variation in relation to maturation and consistently differentiates stream and ocean maturing fish . Steelhead populations in North America are generally divided into coastal and inland genetic lineages [11, 31]. For example, the Columbia River Basin, which primarily encompasses the states of Oregon, Washington, and Idaho in the United States, consists of a coastal lineage west of the Cascade mountain range that is genetically distinct from an inland lineage east of the Cascades . These two lineages also differ in respect to maturation-timing. The coastal lineage, which generally has shorter migration distances to spawning sites (50–380 km), consists of both stream and ocean-maturing fish. The inland lineage, which requires longer travel to spawning sites (370 –1500 km), only supports steam-maturing fish . Due to the apparent lack of variation in maturation-timing in inland populations, previous studies have only investigated the genetic basis of steelhead migration and maturation in coastal populations [13, 15, 17]. However, we plotted genotype frequencies of greb1L-SNP across 59 collections including inland and coastal populations of steelhead using ArcGIS 10.5 to represent a broad geographic range.
Individual phenotypes to refine genotype to phenotype associations
To determine an association between greb1L-SNP and spawning tributary arrival, we downloaded array ping dates from the Columbia Basin PIT Tag Information System (PTAGIS; ptasgis.org) for wild tagged fish between 2012 and 2016 in spawning tributaries across the Columbia River Basin. We retained data from sub-basins with known spawning tributaries which we also had sufficient individual DNA tissues (N > 50), leading to 6 distinct sub-basins, primarily in the inland lineage, with 890 fish in total (Additional file 1: Table S4). DNA from each individual was genotyped with a panel of markers using Genotyping-in-Thousands by sequencing (GT-seq)  to isolate and genotype a single greb1L-SNP (identified as 47080_54 in ). For each of the six sub-basins we determined significant associations between individual genotype (either premature [AA], heterozygote [AG], or mature [GG]) and spawning tributary arrival week using a one-way analysis of variance (ANOVA) paired with a Tukey’s range test . We additionally determined the variance explained by the genotype in each location using ANOVA sum of squares.
Genomic resequencing for fine scale mapping of phenology traits
Broad scale characterization of candidate genotype frequencies
Individual phenotypes to refine genotype-by-phenotype associations
Our study confirms a genomic region of major effect underlying phenological variation in anadromous steelhead through dense genome resequencing. Previous studies have mapped SNPs generated from restriction-site associated DNA sequencing (RAD-seq) to greb1L, yet have not explicitly explored additional genes of smaller effect upstream and downstream of greb1L [13, 15, 17]. Using dense mapping data, we illustrated that greb1L is part of a larger genomic region under selection consisting genes and divergent inter-genic regions. This discovery was made possible by both the advancement in quality of the O. mykiss genome assembly, and through resequencing techniques (Pooled-sequencing) that provide adequate read coverage across the majority of the reference genome . While previous genome scans with RAD-seq yielded significant association of markers from the greb1L region, fine scale mapping provided a broader understanding of the genomic basis of this phenological trait in steelhead due to higher marker density covering a large portion of the genome .
The additional genes of smaller effect characterized in this study indicate that the genomic basis for maturation phenology in anadromous steelhead may encompass a larger genomic region on chromosome 28 than previously understood. greb1L has consistently shown the most compelling associations to maturation phenotypes [13, 15, 17]. Our results additionally highlight extreme differentiation in the upstream intergenic region of greb1L which likely contains many regulatory components such as transcription factors, promoters, and enhancers . greb1L has obvious connections to sexual maturation since it mediates the interaction of oestrogen with other target proteins [13, 42]. Migrating fish, either mature, or nearing maturity, have elevated levels of oestrogen in their bloodstream which relates to multiple sexual characteristics such as egg formation and testicular development . Additionally, we showed that greb1L contains multiple non-conservative and non-synonymous mutations. These changes in protein structure are compelling candidates that are possibility under selection, which additionally provide evidence that greb1L is a key gene under selection for maturation phenology [30, 44]. rock1, the gene directly upstream of greb1L, also has obvious ties to maturation as it has been connected to embryo development in zebrafish  and testicular development in humans . Furthermore, it is a key regulator of actin-myosin contraction  which may be connected to the long migration distances anadromous steelhead need to swim to reach spawning grounds . abhd3, the gene immediately downstream from greb1L, was the least differentiated, yet is a physiological regulator of medium-chain phospholipids . The necessity for efficient fat disposition is essential in fish during long-distance migration [19, 49]. mib1 has less-apparent connections to migration and maturity as it primarily relates to cell apoptosis ; however, mib1 does influence ventricle formation and can be connected to cardiac function and swimming performance [51, 52]. Finally, a large intergenic region is highly differentiated between greb1L and rock1 which may simply be a gene that currently lacks annotation, or possibly a region that consists of enhancers or promoters for the nearby genes or non-coding RNAs. If the latter, these intergenic SNPs may play a large regulatory role in expression level . Overall, this highly divergent region on chromosome 28 contains compelling candidate genes which justifies further validation and marker development to investigate maturation phenotypes of steelhead in more detail. For example, informative SNPs from these candidate genes can be included into high-throughput amplicon sequencing panels to screen large numbers of individuals .
Our results indicated that the candidate region on chromosome 28 is most consistently associated with arrival timing on spawning grounds rather than arrival timing at freshwater entry across populations included in this study. Traditionally, arrival timing at freshwater entry has been a phenotypic proxy for maturation timing that has been used to characterize fish entering freshwater as either sexually mature or premature , which partly owes to the relative ease of recording this proxy trait for returning steelhead at many lower river collection sites including dams, weirs, and hatcheries [54, 55]. Coastal tributaries support both ocean and stream-maturing fish, whereas steelhead returning to inland tributaries are all stream-maturing fish . Previous studies have solely investigated freshwater entry maturation of steelhead in coastal and lowland rivers [13, 15, 17], and their findings beget expectations that all inland fish may be fixed for “premature” greb1L genotypes. To the contrary, using spawning tributary arrival time, we show that variation in maturation phenology does occur in inland steelhead, which are all stream-maturing fish. Specifically, about 10% of variance of time to arrival is explained by greb1L-SNP. While the strength of this pattern is not overwhelming, it is still compelling given that PIT-tag arrival times were used as a proxy and may be prone to some error, and only a single informative SNP was used. In addition, it currently serves as the only explanation for variation in greb1L genotypes in stream-maturing fish. Given this, fish with “premature” greb1L-SNP genotypes generally arrive to spawning tributaries early, fish with “mature” genotypes arrive later, and heterozygous fish arrive in an intermediate timeframe. This association suggests that inland fish tend to hold in larger freshwater tributaries for several months as premature fish, and then migrate to spawning grounds in headwater tributaries over a continuum of maturation states before all fish become sexually mature and spawn together. This admixture of fish with varying phenotypes shows no Wahlund effect heterozygote deficit , providing addition evidence that population structure is not directly influenced by this phenology . Inland steelhead with “premature” genotypes ascend to spawning grounds early (premature arrival) and continue to mature there, whereas fish with “mature” genotypes become sexually mature in freshwater downstream of spawning grounds, then move upstream to spawning grounds once they are mature (mature arrival). However, in coastal populations, mature arrival and premature arrival are likely analogous to ocean-maturing and stream-maturing phenotypes (respectively) that are commonly observed as steelhead enter freshwater systems near the ocean. Coastal and inland steelhead populations contain greb1L “mature” and “premature” genotypes; however, the inland greb1L “mature” and “premature” genotypes both exhibit an early and narrow range of freshwater entry timings, but later become diverged in their spawning tributary entry timing. In contrast, the greb1L “mature” and “premature” genotypes of the coastal steelhead populations instead show large disparity in freshwater entry timing from the ocean. This difference in freshwater entry timings of the greb1L mature genotypes of the coastal versus inland lineages may be due to the long migration distance that inland fish must swim to reach their (300–1500 km) spawning sites . Regardless of their greb1L genotypes, all fish from inland populations must migrate early to approach spawning grounds before environmental conditions become unfavorable [56, 57] including passage through unfavorable migratory corridors . Then, when near inland spawning sites, balancing selection may preserve the variation in greb1L genotypes which manifests as variation in spawning tributary arrival (i.e., premature or mature spawning tributary arrival). In some years, due to resources such as spawning habitat availability, and environmental conditions, it may be beneficial to arrive to a spawning tributary early (greb1L premature genotype), whereas in others it may be more beneficial to arrive late (greb1L mature genotype) [14, 58]. In addition, the benefits of tributary arrival may vary based on geographic localities. Thus, balancing selection likely maintains the genomic variation in chromosome 28, a phenomenon that may be confirmed with further studies that investigate variation in steelhead phenology in specific locations and throughout the species range.
As commonly illustrated, maintaining adaptive genetic variation is essential for protecting species at risk to extinction [59, 60]. We provide evidence that genomic variation linked to sexual maturation is present in inland steelhead populations, which were previously assumed to be fixed for greb1L premature genotypes similar to fish exhibiting stream-maturing phenotypes in coastal populations. This discovery illustrates that understanding complex phenology patterns is a difficult process due to challenges of monitoring migrating fish through sexual development stages across large watersheds. However, this challenge can be overcome by monitoring efforts that apply tags and collect non-lethal tissue from migrating fish as they enter freshwater . A profound understanding of complex life history traits can enable managers to maintain the necessary levels of neutral and adaptive genetic variation in wild populations to mitigate impacts of climate change on complex phenology traits [15, 61, 62].
Anadromous salmonids, a recreational and culturally significant family of fishes, have complex life histories related to migration and maturation. Previous investigations into the genetic basis of maturation-timing suggested that a single gene, greb1L, was related to sexually premature and mature entry of steelhead into freshwater systems. Using genome resequencing for fine scale mapping of maturation traits during migration, we identified a 309-kb genomic region of major effect for freshwater entry maturation that included four candidate genes, greb1L, rock1, mib1, and abhd3. This region also includes a highly significant intergenic region between greb1L and rock1 which may play a regulatory role in expression of these genes. Additionally, broad-scale SNP genotypes from greb1L in populations and individuals refined genotype to phenotype associations, revealing that candidate genotypes were more consistently associated with timing of arrival on spawning grounds rather than freshwater entry maturation. These results suggest that there are fitness benefits to arriving to spawning tributaries prematurely or maturely, and variation in this trait is likely to be maintained by balancing selection. Together this study illustrates the importance of high precision genomic scans and detailed phenotypes to identify targets of selection.
We thank Stephanie Harmon, Amanda Matala, and Janae Cole for laboratory support. Samples were contributed by biologists from Nez Perce Tribe, Yakama Nation, Warm Springs, and Umatilla, and agencies such as NOAA Fisheries, U.S Fish and Wildlife Service, Idaho Dept. of Fish and Game, Oregon Dept. of Fish and Wildlife, and Washington Dept. of Fish and Wildlife.
Availability of data and materials
Raw sequencing fastq files for the four pooled-sequencing libraries are provided in the NCBI sequence read archive (SRA; https://www.ncbi.nlm.nih.gov/sra/SRP151789) under project SRP151789.
SJM developed scripts, performed analyses, and wrote the manuscript. JSZ contributed to sample collection and experimental design. JEH contributed to SNP development and preliminary analyses. SRN assisted with analyses, experimental design. All authors participated in manuscript editing and revision. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Sampling activities were conducted in accordance with the terms and conditions of the US Endangered Species Act (ESA) Section 10a1A Permit 1379-6R to Columbia River Inter-Tribal Fish Commission and annual NOAA ESA Section 4d permits to the Yakima Indian Nation. Permits issued under the ESA are reviewed by expert committees to ensure ethical treatment of animals. This included collection of non-lethal samples and precautions to reduce stress and ensure adequate recovery after release.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 11.Quinn TP. The behavior and ecology of Pacific salmon and trout. UBC press, (2011).Google Scholar
- 17.Thompson TQ, Bellinger RM, O'Rourke SM, Prince DJ, Stevenson AE, Rodrigues AT, Banks MA. Anthropogenic habitat alteration leads to rapid loss of adaptive variation and restoration potential in wild salmon populations. bioRxiv. (2018);310714.Google Scholar
- 29.Mantel N. Chi-square tests with one degree of freedom; extensions of the mantel-Haenszel procedure. J Am Stat Assoc. 1963;58(303):690–700.Google Scholar
- 31.Utter FM, Campton D, Grant S, Milner G, Seeb J, Wishard L. Population structures in indigenous salmonid species of the Pacific northwest. In: Neil WJ, Himswonh DC, editors. Salmonid ecosystems of the North Pacific. Oregon state. Corvallis: University Press; 1980. p. 285–304.Google Scholar
- 44.Hartl DL, Clark AG, Clark AG. Principles of population genetics, vol. 116. Sunderland: Sinauer associates; 1997.Google Scholar
- 62.Langin. Salmon spawn fierce debate over protecting endangered species, thanks to a single gene. Science: Biology Plants & Animals. (2018). https://doi.org/10.1126/science.aau0709 .
- 63.United States Geological Survey (2007–2014). National Hydrography Dataset available on the World Wide Web (https://nhd.usgs.gov). Accessed 14 Feb 2018.
- 64.United States Geological Survey and United States Department of Agriculture, Natural Resources Conservation Service. (2013) . Federal standards and procedures for the National Watershed Boundary Dataset (WBD). 4th ed. 63 p. http://pubs.usgs.gov/tm/11/a3/.
- 65.United States Census Bureau. TIGER/line shapefiles (machinereadable data files) cartographic boundary shapefiles - nation. W.(2015).Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.