RAD-seq reveals genetic structure of the F2-generation of natural willow hybrids (Salix L.) and a great potential for interspecific introgression
Hybridization of species with porous genomes can eventually lead to introgression via repeated backcrossing. The potential for introgression between species is reflected by the extent of segregation distortion in later generation hybrids. Here we studied a population of hybrids between Salix purpurea and S. helvetica that has emerged within the last 30 years on a glacier forefield in the European Alps due to secondary contact of the parental species. We used 5758 biallelic SNPs produced by RAD sequencing with the aim to ascertain the predominance of backcrosses (F1 hybrid x parent) or F2 hybrids (F1 hybrid x F1 hybrid) among hybrid offspring. Further, the SNPs were used to study segregation distortion in the second hybrid generation.
The analyses in structure and NewHybrids revealed that the population consisted of parents and F1 hybrids, whereas hybrid offspring consisted mainly of backcrosses to either parental species, but also some F2 hybrids. Although there was a clear genetic differentiation between S. purpurea and S. helvetica (FST = 0.24), there was no significant segregation distortion in the backcrosses or the F2 hybrids. Plant height of the backcrosses resembled the respective parental species, whereas F2 hybrids were more similar to the subalpine S. helvetica.
The co-occurrence of the parental species and the hybrids on the glacier forefield, the high frequency of backcrossing, and the low resistance to gene flow via backcrossing make a scenario of introgression in this young hybrid population highly likely, potentially leading to the transfer of adaptive traits. We further suggest that this willow hybrid population may serve as a model for the evolutionary processes initiated by recent global warming.
KeywordsPopulation genomics Hybrid evolution Population structure Sex chromosomes Climate change
Natural hybridization due to secondary contact has been observed in many plant and animal species. Especially in North America and Northern and Central Europe, a major driving force for secondary contact is the ongoing recolonization after the retreat of glaciers . This process is amplified by human-induced global warming that also causes rapid range shifts of species [2, 3, 4, 5], especially in mountain regions, leading to secondary contact of previously allopatric species . The absence of strong pre- or postzygotic reproductive barriers may then lead to hybridization.
Many studies have investigated the evolutionary relevance of hybridization. Although there are some documented cases of homoploid hybrid speciation [7, 8, 9], speciation seems to be a rather rare outcome compared to the plenty of reported incidents of hybridization . Important requirements for homoploid hybrid speciation seem to be strong ecological or geographical barriers that restrict gene flow between hybrids and the parental species [7, 11, 12, 13]. Thus, hybrid speciation is closely connected to the availability of novel or extreme habitats [14, 15]. Chromosomal rearrangements can also rapidly establish crossing barriers between parents and hybrids . Generally, interspecific hybridization seems more likely to result in introgression than in speciation [7, 14]. In extreme cases, introgression can lead to genetic swamping threatening species integrity and posing a severe problem especially in small populations or rare species . On the other hand, adaptive introgression can lead to the transfer of favourable alleles [10, 13, 17, 18, 19]. Introgression of favourable traits can increase the species’ genetic and phenotypic diversity, and hence the potential to adapt to novel environments [9, 20]. The outcome of an incipient hybridization event is not easy to predict because it depends on many factors like the fitness of the hybrids and their offspring , the impact of endogenous and exogenous selection, interactions of certain genotypes with the environment , and habitat availability .
To assess the evolutionary impact of a hybridization event, it is crucial to know the extent to which a genome is susceptible to the introgression of heterospecific alleles. Segregation distortion, the deviation from expected Mendelian segregation ratios, can be used as a measure of the resistance of the hybridizing species’ genomes to introgression [24, 25]. Further, it can be assumed that distorted loci are linked to genes that affect the viability or fitness of hybrids or their gametes [25, 26]. Thus, segregation distortion is also connected to reproductive barriers and the suppression of interspecific gene flow [27, 28, 29]. Segregation distortion can also arise from low recombination rates on sex chromosomes or in sex-determining regions [30, 31]. In dioecious plants, female-biased sex ratios are connected to segregation distortion at distorter loci . The search for loci or regions under segregation distortion has therefore been applied, even in nonmodel species, as a basis to draw conclusions about the potential underlying causes of reproductive barriers between species [33, 34, 35, 36, 37], or to identify loci responsible for environmental adaptation [29, 38].
In this study, we investigate hybridization in a zone of secondary contact between two willow species, Salix purpurea L. and S. helvetica Vill., which are situated on the forefield of the Rhône Glacier in central Switzerland. Salix helvetica is a shrub that occurs naturally in the subalpine to alpine zone. Salix purpurea, on the other hand, is a widespread lowland species that was recently able to colonize higher altitudes due to global warming and subsequent glacier retreat . Secondary contact and hybridization of these species takes place on glacier forefields that have recently become ice free. These glacier forefields are covered with sparse vegetation and offer plenty of space and different niches for the settlement of pioneer species like willows [39, 40, 41]. Hybridization even across sections and between distantly related species is a common phenomenon in Salix [42, 43]. Although S. purpurea and S. helvetica belong to different sections of the genus , they form natural hybrid zones in the European Alps . The composition of such hybrid zones would provide important clues for an assessment of the evolutionary consequences of these hybridization events. A predominance of backcrosses would render introgression of genes between the parental species more likely, whereas the domination of F2 hybrids (i.e. F1 hybrid x F1 hybrid) might be an indication of the potential for hybrid swarm formation and further hybrid evolution. In an earlier study on the willow population at the Rhône Glacier, an attempt has been made to determine the exact class of the hybrids (F1, F2, backcrosses) on the glacier forefield based on genotyping with microsatellite markers . These markers clearly separated the two parental species and confirmed the hybrid origin of phenotypically intermediate individuals, but their resolution was not sufficient for an unequivocal assignment of all individuals to a certain hybrid class . Thus, the precise composition of the hybrid zone is still uncertain. However, we found that the hybrids between S. purpurea and S. helvetica are fertile and produce viable seeds in the natural population, and thus confirmed that hybridization can proceed beyond the F1 hybrid generation . The offspring raised from these naturally formed seeds offered the opportunity to study not only progeny classes of second generation hybrids, but also putative segregation distortion and phenotypic traits.
In order to overcome the limitations caused by a low number of markers, we used restriction-site associated DNA sequencing (RAD-seq) to generate a genome-wide set of thousands of single-nucleotide polymorphisms in this nonmodel species. High-quality biallelic SNPs were used to (i) determine the class of the hybrids on the glacier forefield and of offspring produced by F1 hybrids in order to predict the consequences of this hybridization event. Further, (ii) we were looking for deviations from expected segregation patterns at individual loci in F2 hybrids and backcrosses to determine if alleles of one parental species were favoured over the other. Population genomic analyses were accompanied by morphometric measurements to (iii) get insights into the variation of a selected, potentially adaptive phenotypic character (plant height) in the respective second generation hybrid classes.
RAD-seq and SNP calling
RAD-seq of S. purpurea, S. helvetica and their hybrids yielded an average of 7.5 × 106 reads per individual (SD 2.4 × 106). The average per base sequence quality was very high with a Phred score of 40 for all positions in the reads in all samples. The average depth of read coverage was 59x (SD 18). The stacks-pipeline initially generated 49,081 loci. After the application of all filters, 5758 nuclear loci remained. After mapping the loci to the plastid genome no match was observed. However, of the filtered loci 933 reads aligned to coding regions in the genome of P. trichocarpa. The results of population genetic and progeny analyses did not change when the SNPs lying in coding regions were excluded, and thus we performed all analyses with all 5758 SNPs.
Population genetic structure
The offspring of the F1 hybrids consisted of backcrosses to S. purpurea, backcrosses to S. helvetica, and of F2 hybrids (crosses between F1 hybrids). F2 hybrids were only observed among the offspring of one of the five mother-plants (Fig. 1b). All other plants produced backcrosses in both directions. Overall, significantly more backcrosses to S. purpurea (n = 40) than to S. helvetica (n = 16) were detected (binomial test, 2-sided, p = 0.001, n = 57). The different hybrid classes also formed well-separated clusters in the PCoA analysis (Fig. 2). The backcrosses clustered in the respective parental half without overlapping with the purebred individuals. The F2 hybrids formed a cluster of their own, clearly separated from the other individuals along the second axis. The F1 individual that clusters with the F2 hybrids is the mother of the F2 hybrids.
Segregation distortion in the second generation hybrids
Distribution of 334 species specific SNPs detected in S. purpurea and S. helvetica on the 19 chromosomes of S. purpurea
Nr of loci
There were no significant deviations from the expected segregation patterns neither in the backcrosses nor in the F2 hybrids. Although the distribution of homozygous and heterozygous genotypes was skewed at some loci in the F2 hybrids (Additional file 2: Table S3), the number of individuals seemed to be too low to support significant deviations.
Plant height in the second generation hybrids
Composition of the natural hybrid population
Our study confirms the existence of a natural secondary contact hybrid zone of S. helvetica and S. purpurea on the Rhône glacier forefield. RAD-seq data enabled a much better resolution of the hybrid classes than the microsatellite markers applied in our previous study . While the results based on genotyping with microsatellites had suggested that two-thirds of the hybrids were probably later generation hybrids (F2 hybrids or backcrosses), the analysis of the same hybrids with RAD-seq data showed with maximum statistical support that all hybrid individuals sampled on the forefield of the Rhône Glacier were F1 hybrids. This discrepancy in the hybrid classification is probably due to the low number of microsatellite loci used in the previous study that had to be restricted to primers amplifying in both species. Further, the lack of species-specific alleles at the microsatellite loci made it difficult to determine from which species an allele was inherited in the admixed genotypes. Similar discrepancies in the classification of hybrids were also observed in studies on Populus hybrid zones. While no hybrids were classified as F1 hybrids based on genotyping with microsatellites , subsequent genotyping-by-sequencing revealed that most hybrids belonged to the F1 generation . The authors explained the different results with the shortcomings of microsatellites, like allele dropout . We think that the results based on RAD-seq data are more reliable due to the large number of loci that produce a much higher resolution than the DNA fingerprinting techniques used so far . Further, all hybrids were assigned to the respective hybrid class with 100% posterior probability in the RAD-seq analysis, whereas the SSR analysis gained less than 95% posterior probability for many individuals. Altogether RAD-seq clearly performs better for studies on interspecific hybridization than microsatellites.
We expected to find some later generation hybrids on the glacier forefield because the analysis of offspring grown from seeds that had been collected from naturally pollinated F1 hybrids at the forefield of the Rhône Glacier  suggested that backcrosses and F2 offspring can be regularly formed in this population. However, the results revealed only F1 generation hybrids. We assume that this lack of second generation hybrids is due to the sampling strategy. We included only material from adult individuals because the leaves, flowers and fruits of juvenile plants and seedlings did not yet show the typical phenotypic characteristics of adult willows. In a previous study, we already concluded that the hybrid population on this recently emerged glacier forefield is probably not more than 20–30 years old . We thus believe that second generation hybrids on the glacier forefield were not yet present or very rare in the adult generation during our sampling. In order to clarify whether backcrosses and F2 hybrids have meanwhile grown up in the natural population, an extended sampling strategy that includes juvenile plants and a broad representation of phenotypes would have to be applied.
We have no reason to assume that backcrosses to either parent as well as F2 hybrids cannot establish on the glacier forefield. In a study on hybrid fertility, we found that the seed output of hybrids was reduced compared to the parents, but that seeds showed high germination ability, and seedlings developed well . Thus it seems unlikely that there are no backcrosses or F2 hybrids on the glacier forefield, although their numbers may probably be still lower compared to F1 hybrids. Alternatively, habitat mediated selection may act against the establishment of later generation hybrids on the parental sites, as it has been observed in Rhododendron hybrid zones . At the time the first F1 hybrids were formed, the glacier forefield was still in an early state of succession with less vegetation cover so that the conditions were more favourable for the establishment of willows, which are pioneer species. Later in time, when the backcrosses and F2 hybrids were produced, the vegetation may have been denser so that the conditions for the establishment may have become more difficult. However, it should also be kept in mind that glacier retreat is ongoing and that open pioneer sites for colonization will be continuously available. Although we did not find later generation hybrids in our present sampling, it may still be concluded that the hybrid population on the glacier forefield is able to develop beyond the F1 generation so that hybridization may have further consequences as discussed in the next section.
Classes of hybrid offspring
In contrast to the findings made in the natural population on the glacier forefield, the offspring raised from seeds formed in the wild consisted of F2 hybrids (F1 crossed with F1) and backcrosses to both parental species. Interestingly, F2 hybrids were only found among the offspring of one of the five mother-plants. On the glacier forefield, this female plant stands less than three metres away from a male F1 hybrid (S. Gramlich, unpublished observation). It thus seems that F2 hybrids are only produced in high numbers, when male and female F1 hybrids stand close together. This arrangement is quite rare on the glacier forefield so that female F1 hybrids are closer to male individuals of S. purpurea or S. helvetica in most cases (S. Gramlich, unpublished observation). The parental species and the hybrids occur evenly dispersed over the whole area so that there is no spatial structure like a clumped distribution or a cline from one parental species to the other . Therefore it is more likely that female F1 hybrids are pollinated by one of the purebred species so that they will produce backcrosses. Accordingly, our results showed that the offspring of the sampled F1 hybrids consisted mainly of backcrosses to the parental species and only few F2 hybrids. Similar findings were made in poplars where purebred female plants produced exceptionally large amounts of backcross seedlings when they were surrounded by F1 hybrids . Overall, there were more backcrosses to S. purpurea than to S. helvetica in the sample, yet the reason for this result is unclear. Possible causes could be a greater overlap of flowering time between S. purpurea and the hybrids, postzygotic selection against backcrosses to S. helvetica, stochastic factors like a closer proximity between male S. purpurea and female F1 hybrids than between S. helvetica males and F1 hybrids, or sampling bias due to the choice of mother plants in the analysis, the limited number of progenies, or conditions for pollination in the year of sampling. Another possible interpretation could be that pollen limitation is stronger for pollen of S. helvetica than for pollen of S. purpurea. Accordingly, a reduced seed set was found in purebred S. helvetica compared to purebred S. purpurea . The seed set in willows is pollen limited , and thus the pollen of S. helvetica could also be transported less efficiently. Pollen-pistil incongruences also represent a strong prezygotic crossing barrier in willows . Pollen tube growth could act differentially between S. purpurea and S. helvetica, as the former species has a much shorter style and a capsule without a beak, while the latter has long styles and beaked capsules. However, the determination of the exact causes of the observed pattern requires further research. Irrespective of the direction of backcrossing it can be concluded that, at least in this early stage of secondary contact, a higher production of backcrosses than of F2 hybrids renders a future trajectory of introgression more likely than hybrid speciation.
Second generation hybrids show no signs of segregation distortion
This is one of the first studies on nonmodel plant species where segregation distortion was analyzed with RAD-seq data. The power of this marker system for detecting segregation distortion and linkage groups was demonstrated e.g. on hybrid fish  and on white cypress pine . The fact that there are only two alleles per locus makes it difficult to determine the species of origin of an allele in a hybrid individual, especially when the alleles are evenly distributed in the parental species. Therefore we restricted our analyses to species-specific loci. We believe that this subsampling is representative of genome-wide patterns of hybridization because the markers are located on all 19 chromosomes of the Salix genome.
We did not detect significant deviations from the expected Mendelian segregation ratios in the F2 hybrids or backcrosses. It is expected that the magnitude of distorted loci correlates with the level of divergence between the parental species . The divergence between S. purpurea and S. helvetica turned out to be quite low with a FST value of 0.3 in our study based on microsatellite makers . Divergence based on RAD-seq loci is also quite low with an FST value of 0.24 for completely unfiltered loci, but moderate for the filtered loci (FST = 0.53). This increase of FST is thought to be due to the removal of loci that do not discriminate the parental species. Thus we think that the low FST value based on the unfiltered loci gives a more realistic estimate of the population divergence. Another hint for the low divergence between S. purpurea and S. helvetica is that they hybridize easily although they belong to different, unrelated sections or clades of the genus Salix [44, 54]. However, phylogenetic studies showed in general a low genetic divergence between species and sections in the genus Salix, especially in the shrub species [55, 56, 57]. Recently, the phylogeny of the whole subgenus comprising the shrub willows could be resolved using RAD sequencing while more conservative markers had failed . Thus, a low genetic divergence between the parental species seems to be a likely explanation for the absence of segregation distortion in the hybrids.
Genetic incompatibilities that cause hybrid sterility or inviability and thus act as postzygotic reproductive barriers accumulate with evolutionary divergence of the parental species [58, 59]. We observed that F1 hybrids produced less seeds than S. purpurea or S. helvetica but that the seeds they produced were viable and developed equally well as seedlings from the purebred species . Due to the shallow genetic divergence of the parental species there seems to be a certain degree of postzygotic (i.e. intrinsic) selection before seed maturation during meiosis, pollination, fertilization or seed development, but the absence of segregation distortion suggests that heterospecific alleles are not selectively purged. Because large parts of the genome seem to be unaffected by segregation distortion, it can be assumed that the genome is susceptible to introgression, as it was also concluded for backcrosses in Iris .
Another interesting finding is that only two species-specific loci are located on chromosome XV that carries the sex determination locus in Salix . Other studies found that sex chromosomes were highly divergent due to suppressed recombination and the accumulation of species-specific differences . However, in Salix, as well as in Populus, no heteromorphic sex chromosomes have been discovered yet. Stölting et al.  did also not identify fixed SNPs between Populus species on chromosome XIX that carries the sex determining locus in Populus [62, 63, 64]. They concluded that, against the predictions, the incipient sex chromosome of Populus is not resistant to gene flow and introgression. Accordingly, Macaya-Sanz et al.  also detected gene flow on chromosome XIX in Populus. This finding seems to be reflected in Salix due to the low number of species-specific SNPs on chromosome XV detected in this study.
In contrast to the genomic data, segregation became obvious in phenotypic traits in one-year old juvenile plants. Salix helvetica is a shrub that reaches a height of ca. 50–80 cm . Salix purpurea can reach up to 6 m in the lowland  but on the glacier forefield the shrubs were ca. 160–180 cm high (S. Gramlich, unpublished observation). With respect to plant height, the backcrosses seem to keep the traits of the recurrent parent, as expected, whereas the F2 hybrids adopted the lower height of S. helvetica, even in the absence of external selection under equal garden conditions. This is striking because a typical feature for alpine shrubs is the reduction of plant height (typically < 50 cm) as the plants are better protected by snow cover during freezing periods [54, 67]. Growth height thus appears to be a promising candidate for studying an adaptive trait in this hybrid system.
Range shifts initiated by climate change will increase the likelihood of secondary contact hybridization in some species . Comparisons of the outcome of diverse hybridization events induced by climate change are important in order to assess the effects of such events on biodiversity so that conservation measures can be initiated if necessary . Which effect will the hybridization event have on the genetic diversity of the hybridizing willow species? We found that introgression is highly likely because intrinsic barriers against hybridization and gene flow between S. purpurea and S. helvetica are low. Further, hybrids and the purebred species occur in a mixed stand on the glacier forefield leading to a continuing formation of F1 hybrids and backcrosses. Introgression might be asymmetric because there were more backcrosses to S. purpurea in the sampling. Due to the isolated location, introgression might be highly localized affecting mainly the gene pools of S. purpurea and S. helvetica individuals on the glacier forefield and the surrounding slopes. On the other hand, we already discovered another population of hybrids between S. purpurea and S. helvetica at a higher altitude on the Morteratsch glacier , and thus it can be assumed that further hybrid populations will emerge at other locations in the European Alps due to the ongoing retreat of glaciers. Many localized, independent hybridization events would make a wider distribution of introgressed alleles more likely.
In general, hybridization appears to increase genetic and phenotypic variability in the offspring population. Interspecific exchange of genes via introgression is considered to be an important evolutionary force because it may lead to the transfer of adaptations . Adaptation is viewed as the most important process that promotes divergence during speciation [13, 71]. In this way, introgression of adaptive traits could lead to the formation of ecotypes or even new species . However, the long generation turnover of shrubs and the time needed to establish populations makes it difficult to predict the adaptive value of traits. Long term monitoring of such hybrid populations is essential to draw final conclusions.
Willow hybrids may also serve as models for the evolutionary processes initiated by global warming. Due to their properties as pioneer species, range shift and establishment of willows in novel habitats may be more rapid than in other species. The observations made in this model system may thus help to anticipate evolutionary processes that might affect species with lower dispersal rates much later in time.
Leaf samples were collected at the forefield of the Rhône Glacier in central Switzerland (46°34′03.0″N, 08°22′12.3″E) from a mixed stand of S. purpurea, S. helvetica, and their hybrids. All plant samples were collected with the permission of the Canton du Valais, Service des forêts et du paysage. All individuals have already been genotyped at nine microsatellite loci in a previous study, which also included some reference populations sampled outside the glacier forefield . For the present study, we sampled leaves from six individuals of Salix purpurea and nine individuals of S. helvetica from the glacier forefield. To extend the data of the purebred species, we also included four individuals of S. purpurea from three additional locations in Germany (51°18′53.0″N, 11°54′19.8″E, 51°44′32.2″N, 10°43′31.8″E; 49°21′13.0″N, 8°14′15.0″E) and one S. helvetica individual from Austria (46°49′21.6″N, 10°59′25.0″E).
We included a comprehensive sampling of 45 hybrids from the glacier forefield. These plants were classified as hybrids by both an intermediate phenotype between the parental species and genetic analysis that had been conducted in a previous study . The identification of all specimens was done by S. Gramlich. Herbarium vouchers of each purebred and hybrid sample were deposited in the herbarium of the University of Göttingen (GOET).
Among these 45 hybrids, we selected five hybrids that had a > 95% probability of being a F1 hybrid in the NewHybrids analysis of our previous study . To investigate the second hybrid generation, seeds that had been collected from these five naturally pollinated F1 hybrids at the forefield of the Rhône Glacier were germinated under controlled conditions (for details see ). Seedlings were grown for 1 year under equal conditions in climate growth chambers (see below). Out of hundreds of juvenile plants, a subset of 20–30 progenies per mother plant was selected that represented the phenotypic diversity among the offspring. From each of five F1 mother plants 13–14 progeny (overall n = 68) were finally sampled for RAD-seq analysis. The five mother plants from the natural population at the Rhône Glacier were also included in the sampling. The final dataset for RAD-seq analysis comprised 133 individuals.
DNA extraction, RAD-seq
DNA was extracted from silica-dried leaves using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) following the manufacturer’s protocol. The DNA concentration was assessed with a Qubit 3.0 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and the samples were normalized to a concentration of 30 ng/μl. Aliquots of 3 μg DNA were then submitted to Floragenex Inc. (Eugene, OR, USA) for library preparation and single-end RAD sequencing (following the protocol of ). The total genomic DNA was digested with the restriction enzyme PstI. The size selection of 300 bp – 500 bp with a Pippin Prep (Sage Science, Beverly, MA, USA) was followed by the ligation of sequencing adaptors and unique 10 bp barcodes for each sample. The samples were sequenced on an Illumina HiSeq 2500 Instrument (Illumina Inc., San Diego, CA, USA) and raw reads were delivered in FASTQ format trimmed to 100 bp.
Bioinformatic analysis of raw data and SNP calling
The software stacks v. 1.44 [73, 74] was used for demultiplexing, SNP discovery, and genotyping. First, we demultiplexed the raw reads and removed low-quality reads using the process_radtags program implemented in stacks with the default parameters. In this step, the 10 bp barcodes were removed from the reads so that the final length of the reads was shortened to 90 bp. Afterwards, the quality of each sample was assessed with FastQC v. 0.11.4 . Loci were assembled de novo using the denovo_map pipeline that merges RAD-tags into loci in each sample (ustacks), creates a catalog containing the merged loci from multiple samples (cstacks), and finally matches the loci from each sample against the catalog (sstacks). The first step within the pipeline is the creation of so called stacks (matching reads) out of the raw reads of each individual. The stacks provide the basis for building loci . The minimum number of matching raw reads (minimum depth of coverage) required to create a stack (m) was set to 10. Thus, calling of heterozygotes requires at least 10 reads of each allele. The maximum number of nucleotide mismatches allowed between two stacks was set to 5 both for processing loci within individuals (M) and between individuals (n) when building the catalog. Appropriate values for M and n were determined in preliminary test runs. In these runs, M and n had the same value set between 2 and 7, and the total number of loci as well as the number of polymorphic loci was recorded. Following the recommendations of Viricel et al.  we chose the set of parameters where the total number of loci as well as the number of polymorphic loci reached an asymptote. Finally we used the populations program in stacks to extract loci that were present in all three groups (S. purpurea, S. helvetica, hybrids) in at least 70% of the individuals in each group (r). Data analysis was restricted to the first SNP at each locus in order to obtain unlinked SNPs required for population genetic analysis. Plink v 1.0.7  was used to filter out SNPs with a minor allele frequency < 0.05, and SNPs that were out of the Hardy–Weinberg equilibrium at p < 0.05 in one or both parental populations. Further, individuals with a genotyping rate < 90% were also excluded from the analysis leading to the exclusion of two individuals of the hybrid offspring. We also applied further filters to remove potentially paralogous loci resulting from the recent ‘salicoid’ duplication event . Collapsed paralogous copies at such loci should be characterized by an excess of heterozygosity and an increased coverage depth . Thus, we removed loci with an observed heterozygosity (Ho) ≥ 0.6, FIS < 0, or a coverage depth that was greater than twice the standard deviation. Further, we also removed loci with a FST-value of 0 between S. purpurea and S. helvetica in order to restrict the analyses to loci with variation between the parental species. Ho, FIS, and FST-values for each locus were generated by the populations program. The filtered loci were aligned to the Populus trichocarpa genome  to detect SNPs that were in coding, putative highly conservative regions, and to the plastome of Salix suchowensis  (GenBank: KM983390.1) to remove maternally inherited plastid markers. However, since no match to the plastid genome was observed, all loci appear to belong to the nuclear genome. Alignments were performed in Geneious R 10.0.9 . After the filtering, 5758 loci remained for population genomic analysis.
Genetic structure of the hybrid zone and the progenies
Pairwise genetic distances between individuals were calculated using the genetic distance measure of Smouse and Peakall  implemented in the R package PopGenReport . Based on these genetic distances we performed a PCoA implemented in the R package ade4 . We used the program structure v. 2.3.4  to confirm that all hybrids originated from crosses between S. purpurea and S. helvetica without the involvement of a third species. All 5758 loci were included in the structure analysis. We tested K-values ranging from 1 to 7 without prior population information under the admixture model assuming independent allele frequencies. Five runs were performed per tested K-value and the most likely K-value was determined using the method of Evanno et al. . Finally, we determined the parental plants and the hybrid categories (F1, F2, backcrosses) of each hybrid individual with the program NewHybrids . We designated only the hybrid x hybrid class as F2 hybrids, while the progeny as a whole was called second generation hybrids. Due to the young age of the hybrid zone on the forefield (approximately 20–30 years) it can be assumed that all individuals still belong to the early hybrid generations so that an assignment to exact hybrid classes is possible. NewHybrids cannot handle large datasets and thus we restricted the dataset to the first 300 loci of the whole dataset. We also ran NewHybrids with 300 loci that were selected randomly to ensure that the patterns were consistent across different subsets of the genome. The results were the same but we chose to use the first 300 loci to ensure the reproducibility of the results. Structure and NewHybrids were run using a burn-in period of 10,000 followed by 50,000 MCMC iterations. Longer run times were tested using reduced data sets, but did not change the results substantially.
Analysis of segregation distortion in the second generation hybrids
We selected loci showing fixed differences between S. purpurea and S. helvetica for the analysis of segregation distortion so that the origin of an allele in a hybrid individual could be unequivocally determined. We checked whether all 45 F1 hybrids were heterozygous at these loci. Overall, 396 loci met both criteria. The R package introgress  was used to count the number of alleles derived from each of the parental populations for each hybrid individual at each locus. These counts were used as the basis to detect deviations from the expected segregation patterns. In the F2 hybrids, we tested for the deviation from the expected 1:2:1 distribution of homozygous and heterozygous genotypes found at each locus. In the backcrosses, we tested for the deviation from the expected 1:1 distribution of homozygous and heterozygous genotypes. χ2 goodness-of-fit tests were performed in R . In the analysis of the F2 hybrids, p-values were computed by Monte Carlo simulation due to the low number of individuals. We corrected for multiple testing using the false discovery rate (FDR) method of  with α = 0.10.
We performed a Blast search of all RAD-loci containing species specific SNPs against the S. purpurea genome using Phytozome 12  to determine on which chromosomes the loci were located. The alignment was accepted when the reads showed > 98% identity over the whole read length of 90 bp.
Evaluation of plant height
The seedlings grown from five naturally pollinated F1 hybrids at the forefield of the Rhône Glacier  were raised in climate chambers for about 1 year at 18 °C with a 16-h light period (c. 250 μmol m− 2 s− 1) under equal soil and watering conditions. At this age, plants had leaves typical for adults (for some examples see Additional file 1: Figure S2), but did not yet produce flowers. Before these juvenile plants were transferred to pots for outdoor cultivation, the length of the longest shoot was measured to the nearest 0.5 cm. A one-way ANOVA was performed to test for differences between groups (according to the NewHybrids analysis: F2, backcross to S. purpurea, backcross to S. helvetica), followed by the Games–Howell test as post hoc test. The type I error rate was α = 0.05. The ANOVA was performed using SPSS version 24 (IBM Corp., Armonk, NY).
We thank Jennifer Krüger for extracting the DNA and Silvia Friedrichs for nursing the seedlings. The referee’s comments were of great value for improving the manuscript.
The study was funded by the German Research Fund (DFG project Ho 5462 7–1) to E.H. The DFG was not included in study design or any other operational part of this project.
Availability of data and materials
All demultiplexed read data were submitted to the NCBI Sequence Read Archive: accession number SRP133640, BioProject ID PRJNA429746. The dataset generated for the population genetic analysis (structure input file) is available in the Dryad Digital Repository, https://doi.org/10.5061/dryad.4k3v0kg.
EH and SG designed the research, SG and NDW analyzed the data, SG wrote the paper with assistance of NDW and EH. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 11.Grant V. Plant speciation. 2nd ed. New York: Columbia University Press; 1981.Google Scholar
- 29.Recknagel H, Elmer KR, Meyer A. A hybrid genetic linkage map of two ecologically and morphologically divergent Midas cichlid fishes (Amphilophus spp.) obtained by massively parallel DNA sequencing (ddRADSeq). G3 genes, genomes, Genet. 2013;3:65–74.Google Scholar
- 43.Hörandl E, Florineth F, Hadacek F. Weiden in Österreich und angrenzenden Gebieten. 2nd ed. Wien: Arbeitsbereich Ingenieurbiologie und Landschaftsbau, Inst. für Landschaftsplanung und Ingenieurbiologie, Univ. für Bodenkultur; 2012.Google Scholar
- 44.Skvortsov AK. Willows of Russia and adjacent countries. Joensuu: University of Joensuu; 1999.Google Scholar
- 55.Chen JH, Sun H, Wen J, Yang YP. Molecular phylogeny of Salix L. (Salicaceae) inferred from three chloroplast datasets and its systematic implications. Taxon. 2010;59:29–37.Google Scholar
- 66.Schiechtl HM. Weiden in der Praxis: Die Weiden Mitteleuropas, ihre Verwendung und ihre Bestimmung. Berlin-Hannover: Patzer Verlag; 1992.Google Scholar
- 70.Rieseberg LH, Wendel JF. Gene flow and its consequences in plants. In: Harrison RG, editor. Hybrid zones and the evolutionary process. Oxford: Oxford University Press; 1993. p. 70–109.Google Scholar
- 71.Coyne JA, Orr HA. Speciation. Sinauer: Sunderland, MA; 2004.Google Scholar
- 73.Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH, De Koning D-J. Stacks: building and genotyping loci de novo from short-read sequences. G3 Genes, Genomes, Genetics. 2011;1:171–82.Google Scholar
- 75.Braham Bioinformatics. FastQC 0.11.4. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc
- 80.Wu Z. The whole chloroplast genome of shrub willows (Salix suchowensis). Mitochondrial DNA part A. 2016;27:2153–4.Google Scholar
- 89.R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2016.Google Scholar
- 90.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995;57:289–300.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.