Genome-wide scans identify known and novel regions associated with prolificacy and reproduction traits in a sub-Saharan African indigenous sheep (Ovis aries)
Maximizing the number of offspring born per female is a key functionality trait in commercial- and/or subsistence-oriented livestock enterprises. Although the number of offspring born is closely associated with female fertility and reproductive success, the genetic control of these traits remains poorly understood in sub-Saharan Africa livestock. Using selection signature analysis performed on Ovine HD BeadChip data from the prolific Bonga sheep in Ethiopia, 41 candidate regions under selection were identified. The analysis revealed one strong selection signature on a candidate region on chromosome X spanning BMP15, suggesting this to be the primary candidate prolificacy gene in the breed. The analysis also identified several candidate regions spanning genes not reported before in prolific sheep but underlying fertility and reproduction in other species. The genes associated with female reproduction traits included SPOCK1 (age at first oestrus), GPR173 (mediator of ovarian cyclicity), HB-EGF (signalling early pregnancy success) and SMARCAL1 and HMGN3a (regulate gene expression during embryogenesis). The genes involved in male reproduction were FOXJ1 (sperm function and successful fertilization) and NME5 (spermatogenesis). We also observed genes such as PKD2L2, MAGED1 and KDM3B, which have been associated with diverse fertility traits in both sexes of other species. The results confirm the complexity of the genetic mechanisms underlying reproduction while suggesting that prolificacy in the Bonga sheep, and possibly African indigenous sheep is partly under the control of BMP15 while other genes that enhance male and female fertility are essential for reproductive fitness.
The evolution of novel traits is underpinned by genetic changes encoding new phenotypes. The genetic basis of traits that are controlled by few genes has been well established. For example, mutations in MC1R influences coat colour in animals ranging from mice (Hoekstra et al. 2006) to camels (Almathen et al. 2018). However, little remains known on the genetic control of complex traits, which have proven challenging to study using traditional approaches. Recent developments in next-generation sequencing and associated techniques (single-nucleotide polymorphism (SNP) genotyping arrays and bioinformatics pipelines) have provided a unique opportunity to examine genes, and gene networks, encoding complex phenotypes in domestic animals (Andersson and Georges 2004; Gouivea et al. 2014).
Diverse geographic adaptation and selection pressure have resulted in shared and population-specific phenotypes in many livestock species (Xu et al. 2015). Prolificacy is one such phenotype that has been observed in several breeds of sheep in Europe, Africa, Middle East, and Asia and other mammalian species. Whether the trait evolved independently, within or across species and/or breeds of livestock in different geographic regions or, already existed in the genome of the wild ancestor at the time of domestication remains unknown. What is clear, however, is that the trait is under the control of a few genes with large effects (Davis 2004, 2005; Monestier et al. 2014; Abdoli et al. 2016). Several studies identified causative variants in three related oocyte-derived members of the transforming growth factor-beta (TGF-β) superfamily including bone morphogenic protein receptor 1B (BMPR1B), bone morphogenic protein 15 (BMP15) and growth differentiation factor 9 (GDF9) (Davis 2004, 2005; Abdoli et al. 2016) which have been shown to be essential for ovulation rate and follicular growth (Juengel and McNatty 2005; Knight and Glister 2006). The BMPR1B gene located on chromosome (Oar) 6 has been found in mostly Asian breeds. The sole mutation observed in this gene is present in the Small-tailed Han and Hu sheep in China, the Kendrapada and Garole sheep in India and the Javanese thin-tailed sheep in Indonesia but seems to be absent in European breeds (Davis et al. 2002, 2005; Jansson 2014; Abdoli et al. 2016). BMP15 and GDF9 located on OarX and 5, respectively, appear to be the main prolificacy genes in European and Middle East (specifically Iran) sheep breeds. In BMP15 eight mutations, which differ slightly in type and effect, have been discovered in different sheep breeds and populations (see reviews by Davis 2004, 2005; Abdoli et al. 2016 and references therein). Four mutations affecting ovulation rate have been discovered to date in GDF9 (see reviews by Davis 2004, 2005; Abdoli et al. 2016). Other genes that have also been reported in European sheep include B4GALNT2 on Oar11 and FecX2 on OarX (see Abdoli et al. 2016). New mutations are, however, continuously being discovered in these genes, the latest were reported in Tunisian Barbarine (Lassoued et al. 2017) and three Iranian breeds (Amini et al. 2018). These findings suggest the genetic control of prolificacy traits varies between breeds.
Sub-Saharan Africa (SSA) is home to several breeds of prolific indigenous sheep but with no known history, or information, on any form of either natural and/or artificial selection targeting the trait. In spite of the large body of knowledge generated in the last decade on prolificacy in domestic sheep, the genetic basis for the trait in SSA sheep remains poorly investigated. The various studies documenting variations in major genes and the inherent causative mutations associated with ovulation and litter size the species justify the investigation of genes of major effect in prolific SSA sheep. African indigenous sheep are known to share their genome ancestry with sheep from the Middle East and the Indian sub-continent (Muigai and Hanotte 2013; Mwacharo et al. 2017). The expectation therefore is that one of the three members of TGF-β superfamily of genes could be responsible for the trait in SSA sheep.
Bonga is a breed of indigenous sheep found in southwestern Ethiopia. It displays good maternal characteristics and is one of the naturally prolific breeds of sheep found in SSA. It shows an average litter size of 1.54 ± 0.006 (average range = 1.25 ± 0.433 to 2.12 ± 0.499) and above average reproductive efficiency under smallholder village production (Haile et al. Unpublished). Edea et al. (2012) reported an average litter size of 1.36 and average twinning rate of 36.3 ± 4.7% in the breed. Other naturally prolific breeds of sheep in SSA include Horro (litter size of 1.40 and twinning rate of 39.9%; Edea et al. 2012) and Doyogana breeds in Ethiopia, the West African Dwarf/Djallonke sheep found across southwest and Central Africa. Other prolific breeds of sheep in the continent outside SSA are the Barbarine and D’man sheep found in Tunisia and Morocco, respectively. The prolificacy in the Barbarine and D’man sheep has, however, been enhanced through artificial selection programmes.
In this study, we performed genome-wide scans of selection signatures using three approaches (FST, hapFLK, XP-EHH) and genotype data generated with the Illumina OvineHD SNP Chip array in Bonga sheep sampled from farmers’ flocks, as a proxy for other prolific sheep found in SSA, to identify candidate genomic regions and genes associated with the trait. We identified a strong selection signature on OarX spanning BMP15 and uniquely, a diverse range of genomic regions spanning several candidate genes, never reported before in prolific sheep, but known to be associated with fertility and reproduction in other species. Our results suggest that, prolificacy in SSA indigenous sheep is a function of the actions of BMP15 and several genes that are associated with male and female fertility.
The phenotypic dataset consisted of 98 litter size records of Bonga sheep, a non-seasonal breeder, that were collected between 2009 and 2018 from farmers flocks participating in a community-based breeding programme (CBBP). For this study, litter size was considered a prolificacy trait of the dam and one of the indicators of improved reproduction. It was defined as the number of lambs born alive per lambing. The most prolific ewes (n = 74) with twins (n = 38), triplets (n = 35) and quadruplets (n = 1) lambs born alive per lambing and non-prolific ones (one lamb born alive per lambing; n = 24), for at least three parities, were sampled from different farmers flocks. Genotyping was performed with the Illumina OvineHD BeadChip, which includes 606,006 genomic variants and 30,000 functional putative variants, at GeneSeek Inc (http://genomics.neogen.com/en/). The genotype data were assessed for quality with PLINK 1.9 (www.cog-genomics.org/plink2). Variants with no assigned genomic positions, call rates lower than 95%, large Hardy–Weinberg equilibrium (HWE) deviations (P value < 1 × 10−6) and minor allele frequency (MAF) < 0.01, and samples with call rates < 98% were excluded from the final dataset. Following quality filtering, 457,087 variants and 84 individuals (33 ewes with twins, 30 with triplets, 1 with quadruplet, and 20 with single births) were retained for analysis.
The retained dataset, following quality filtering, was used to investigate genome-wide signatures of selection using three cross-population selection tests; FST (Biswas and Akey 2006), XP-EHH (Sabeti et al. 2007) and hapFLK (Fariello et al. 2013). For the analysis, prolific ewes, defined as those with twin, triplet and quadruplet births, for at least three consecutive lambing seasons, were taken as one group and the non-prolific ones (ewes with single births) formed the contrasting group. The grouping was informed by the objective of detecting selection signatures that can be attributed, to a large extent, to differences in prolificacy. For this reason, we avoided using a different breed due to the high likelihood of detecting strong selection signatures arising from genetic differences between breeds which might have masked the ones attributing to prolificacy.
Candidate regions and associated genes revealed by FST, hapFLK and XP-EHH selection signature analysis in the Bonga breed of sheep
Genomic region (bp)
Number of significant SNPs
SPOCK1, KLHL3, HNRNPA0, ENSOART00000016824, MYOT, PKD2L2, FAM13B, WNT8A, NME5, BRD8, KIF20A, CDC23, GFRA3, CDC25C, ENSOART00000017740, ENSOART00000014602, ENSOART00000017768, KDM3B, REEP2, EGR1, ENSOART00000017967, HSPA9, CTNNA1, HB-EGF, SLC4A9
IQSEC2, KDM5C, TSPYL2, GPR173, MAGED1, GSPT2, BMP15, SHROOM4, ENSOART00000006108, ENSOART00000010008, ENSOART00000006154, ENSOART00000010178, ENSOART00000010185, ENSOART00000006171, ENSOART00000006186
SLC9A4, IL18RAP, IL18R1, IL1RL1
DECR1, NBN, OSGIN2
GLOD4, ENSOART00000004387, GEMIN4, ENSOART00000012524
RNF157, FOXJ1, EXOC7, ZACN, GALR2, SRP68, EVPL
ENSOART00000003233, TMEM240, ENSOART00000003483, VWA1, TMEM88B, ANKRD65, MRPL20, CCNL2
ENSOART00000012708, FBXO31, ENSOART00000012902
ENSOART00000010008, MAGED1, GSPT2, ENSOART00000006154, ENSOART00000010178, ENSOART00000010185, ENSOART00000006171, BMP15, ENSOART00000006186
MARCH4, SMARCAL1, ENSOART00000021
AMDHD1, HAL, LTA4H, ELK3
SGTA, SLC39A3, DIRAS1
MAGED1, GSPT2, ENSOART00000006154 (BMP15 is downstream to the region)
In this study, we used the prolific Bonga sheep found in Ethiopia, as a proxy to other prolific indigenous African sheep which, apart from the thin-tailed West African Dwarf, are all fat-tailed and of the same genetic background (Muigai and Hanotte 2013) to investigate the genetic basis of prolificacy. We applied three methods, FST, hapFLK and XP-EHH, to exploit their strengths and increase the reliability of our findings by minimizing biases associated with each method (Simianer 2014). We were also unsure about the evolutionary selection time span of prolificacy and the best method to detect candidate selection signatures associated with the trait. The FST, which is directly related to variance(s) in allele frequency, is the most commonly used method to detect selection signatures. The hapFLK detects selection signatures based on differences in haplotype frequencies while accounting for hierarchical population structure. XP-EHH detects long-range haplotypes or recent positive selection, where the selected loci are close to fixation in one population but remain polymorphic in another based on the relationship between an allele and its surrounding linkage disequilibrium (LD). Approaches such as FST and hapFLK detect better long-term selection because they are dependent on the accumulation of mutations around the causative variant. Their resolving power, however, declines if the selection advantage is small, as it takes longer for the frequency of the favoured allele to increase to the point of detection. LD-based methods, such as XP-EHH, retain superior detection power when a new mutation arises in a population due to an adaptive advantage or an existing variant being exposed to a new environment that provides favourable selection pressure, resulting in an increase in its frequency but without fixation (Fagny et al. 2014).
Functional enrichment analysis using the highest classification stringency in DAVID 6.8 revealed four enriched clusters with genes linked to immune functions being overrepresented, a result which was replicated by STRING. The importance of the immune system in reproduction in sheep has recently been reported (Pokharel et al. 2019). Following arguments advanced in other studies that found an over-representation of immune-related genes in the genomes of tropically adapted livestock (Xu et al. 2015; Bahbahani et al. 2017; Mwacharo et al. 2017), it can also be argued that selection has enhanced adaptive immune response to tropical infections in the genomes of Bonga sheep. This is possible, but unlikely to explain the result of our study given its analytical design which contrasted prolific and non-prolific ewes of the same breed from the same ecological environment and therefore exposed to similar pathogens, parasites and infections. An alternative explanation or hypothesis that we favour associates the immune function with enhancement of the process and success of fertilization in the female reproductive tract. As in other eutherians, in domestic sheep, fertilization also occurs internally and ejaculated spermatozoa are normally retained in functional pre-ovulatory sperm reservoirs in female genital tracts until ovulation. Rather than being eliminated, the immunologically foreign sperm is tolerated by the female immune defence system. It has been observed that semen can signal genomic shifts that modulate expression of genes linked to immune function in poultry (Das et al. 2009; Long et al. 2003; Huang et al. 2016) and mammals (Almiñana et al. 2014; López-úbeda et al. 2015), resulting in a state of immune tolerance during the lengthy storage of spermatozoa in oviductal sperm reservoirs (Holt and Fazeli 2016). A large subset of differentially expressed genes that suppress local immune defence in oviductal sperm reservoirs following mating and mating-induced changes in the expression of immune activating genes in utero-tubal junction has also been reported in chicken and pigs, respectively (Atikuzzaman et al. 2017). The upregulation of immune defence genes in the oviduct following mating was also observed in mice (Fazeli et al. 2004). The interval between sperm deposition and the change in gene expression seems to cover the time period when spermatozoa are in the sperm reservoir. The activation of immune response genes therefore serves to cleanse the genital tract of redundant spermatozoa, foreign proteins and/or pathogens, for the descending ova or embryos. It is therefore a possibility that high prolificacy has made the oviduct of prolific individuals less responsive to antigenic seminal fluid. This creates an appropriate immune-balanced physiological environment tailored for sperm survival and fertilization.
The FST and hapFLK identified one overlapping candidate region on OarX. As was expected based on our sampling and analytical design, the most significant window in this region spanned BMP15 that plays a role in various functions implicated in prolificacy (Persani et al. 2014; Monestier et al. 2014). The candidate region revealed by XP-EHH on OarX was 259,479-bp downstream of BMP15. The FST test also revealed a region on Oar5 which was 4.4 Mb downstream of GDF9, an important paralog of BMP15. As the extent of LD declines rapidly from 0 to 300 kb in the ovine genome (Kijas et al. 2014; Al-Mamun et al. 2015), LD between the SNPs found on the XP-EHH candidate region on OarX and BMP15 is thus expected. This result suggests that BMP15 may be the primary candidate gene responsible for prolificacy in Bonga sheep. Experimental disruption of BMP15 in mice result in mild defects in female fertility (Su et al. 2008) whereas natural missense mutations result in variable phenotypes in ewes, ranging from hyperprolificacy to complete sterility, depending on a fine gene dosage mechanism involving GDF9 (Belli and Shimasaki 2018). This is the first study, to the best of our knowledge, to identify a candidate genomic region and gene associated with prolificacy in an indigenous SSA sheep breed. It is therefore possible that BMP15 may also encode the trait in other prolific SSA sheep. It would be of interest to investigate whether the causative variant(s) are the same or novel across prolific SSA sheep breeds vis a vis the ones found in Europe and the Middle East. This will shed light on the evolution of the genetic basis of the trait and is of relevance since at least 8 mutations have been reported, so far, in BMP15 while a new variant was recently found in the Barbarine (Lassoued et al. 2017) and three Iranian (Amini et al. 2018) breeds.
The study design also revealed genes, not reported previously in prolific sheep, but have been associated with female and male reproduction traits in other species. The ones reported in female animals included SPOCK1, PKD2L2, HB-EGF, GPR173, MAGED1, SMARCAL1, HMGN3a, ELK3 and KDM3B. SPOCK1 was identified as a novel candidate gene for age at start of menstruation, which marks the beginning of a females reproductive life (Dvornyk and Waqar-ul-Haq 2012), in beef cattle (Fortes et al. 2010) and humans (Liu et al. 2009). With its specific expression in postnatal day-1 to postnatal day-14 ovaries, and low, albeit significantly in adult ovaries in mice, PKD2L2 possibly functions in early or late follicular growth or at multiple stages of follicular development to regulate the assembly, preservation and maturation of ovarian follicles (Gallardo et al. 2007). In vitro studies showed possible interactions of HB-EGF with blastocyst epidermal growth factor receptor (EGF-R) very early in the process of implantation in species with different hormonal requirements (Das et al. 1994; Martin et al. 1998; Mishra and Seshagiri, 2000; Seshagiri et al. 2002). It is possible therefore that HB-EGF could be a critical signalling protein for early pregnancy success (Das et al. 1994; Paria et al. 2001; Xie et al. 2007; Jessmon et al. 2009). GPR173 is a receptor for PNX, an important mediator of ovarian cyclicity and through its actions at multiple levels of the hypothalamo–pituitary–gonadal axis, it is involved in maintaining various reproductive functions in rats (Stein et al. 2016; Treen et al. 2016; Bauman et al. 2017) and possibly in prolific sheep. The expression of MAGED1 has been detected in mice embryos as early as E3 stage (Kendal et al. 2002). Similarly, the expression of SMARCAL1 and HMGN3a has been detected in bovine oocytes and in early embryos (Uzun et al. 2009), whereas ELK3 has been isolated from 16-day mouse embryos and one of its transcripts was expressed predominantly in 8- to 14-day embryos (Nozaki et al. 2009). These expression patterns highlight the importance of these genes in regulating gene expression, during embryonic development. The KDM3B gene is highly expressed in female reproductive organs including the ovary, oviduct and uterus. Knockout of Kdm3b in female mice resulted in irregular oestrus cycles and decreased ovulation capability, fertilization rate, uterine decidual response and circulating levels of 17β-estradiol (Liu et al. 2015a). Thus KDM3B could play a crucial role in regulating oestrus and menstrual reproduction cycles and in preparing the uterus for successful implantation of multiple ova.
The candidate regions identified in this study also spanned several genes implicated in male fertility and reproduction in other species, but not reported in prolific sheep. They included FOXJ1, NME5, PKD2L2, MAGED1 and KDM3B. As discussed, the latter three genes have been implicated in female reproduction. FOXJ1 is a key transcription factor for the formation of motile cilia in human (Lyons et al. 2006), mice and xenopus (Weidemann et al. 2016). Immotile-cilia syndrome has been associated with male and female infertility in humans as motile cilia are critical in propelling ova along the fallopian tube while motility in sperm flagellum is also critical for sperm function and successful fertilization (Afzelius and Eliasson 1983). In mice and xenopus, CFAP157 was identified as a novel target protein for FOXJ1 and is only essential during spermatogenesis but is expressed in motile ciliated tissues. The prominent expression of PKD2L2 and its encoded protein Polycystin-L2 in adult mouse testis, spermatocytes and spermatids (Guo et al. 2000) suggests a role in spermatogenesis (Chen et al. 2008) and testicular maturation (Kierszenbaum 2004). Polycystin-L2 also modulates intracellular calcium concentration during spermatogenesis. Calcium ions are critical in regulating sperm cell functions including capacitation, progressive motility, hyperactivated motility and acrosome reaction, which are important during fertilization (Bedford 1998; Darszon et al. 1999; Zhang and Gopalakrishman 2005). NME5 also exhibits a strict testis-specific expression in spermatogonia and early spermatocytes in several vertebrates (Muneir et al. 1998; Hwang et al. 2003; Desvignes et al. 2009). The protein encoded by KDM3B is also highly expressed in multiple cell types in mouse testes, such as Leydig and sertoli cells, spermatogonia and spermatocytes, at different stages of differentiation and has also been observed in epithelial cells of the caput epididymis, prostrate and seminal vesicle (Liu et al. 2015b). Knockout of Kdm3b in male mice resulted in reduction in the number of pups produced by breeding pairs due to a decrease in the number of litters, fewer number of mature sperms in cauda epididymis displaying significantly reduced sperm motility and a significant reduction in circulating levels of 17beta-estradiol, a modulator of sperm maturation and male sexual behaviour. Like Kdm3b knockout males, MAGED1-deficient male mice also displayed severely impaired male coital behaviour resulting in male infertility which is attributed to deficient production of mature oxytocin in hypothalamus indicating that MAGED1 is required for oxytocin processing and stability (Dombret at al. 2012). The occurrence of these genes in the candidate regions suggests that they are essential in the maintenance of the process of spermatogenesis and normal male sexual behaviour to ensure successful fertilization of a large number of ova generated in prolific ewes.
The overall objective of this study was to identify genes associated with prolificacy in a prolific SSA fat-tailed breed of sheep. Unexpectedly, we also identified several known and novel candidate genes implicated in male and female fertility and reproduction in other species, suggesting that such genes could be hotspots of selection in indigenous SSA prolific breeds of sheep. The findings suggest that enhanced reproduction in prolific ewes entails not only prolificacy genes but also epistatic effects with genes associated with other reproduction traits. Although we identify BMP15 as the main candidate gene for prolificacy in Bonga sheep, the exact causative variants need to be determined to further confirm whether they are novel or are part of what has been reported in prolific breeds of sheep from Europe and the Middle East. It is important to note that the sample size used here, 84 individuals, is rather low. This may have underpowered our analysis; thus, our findings should be interpreted with caution and need validation using a larger subset of animals and populations. Our findings are of significance given that reproductive traits have low to medium heritability and thus do not exhibit a noticeable response to phenotypic selection in traditional breeding methods based on phenotypic data only. The incorporation of the genetic information, such as revealed here, in such breeding programmes (e.g. CBBPs) via either genomic selection (GS), marker-assisted selection (MAS), genome-wide association studies (GWAS) or genomic best linear unbiased predictions of breeding values, could enhance response to selection towards the genetic improvement of reproductive performance.
Materials and methods
Samples, genotyping and quality control
A total of 95 ewes belonging to the Bonga breed of sheep were sampled from four locations (Shuta n = 33, Boqa n = 45, Buta n = 13, Medudha n = 4) in Southwestern Ethiopia. All the 95 animals had at least three lambing parities and came from farmers flocks that are participating in a community-based breeding programme (CBBP) where performance recording is undertaken. From the records of the 95 animals, 31 gave birth to single lambs, 33 to twins, 30 to triplets and one to a quadruplet. Whole blood was collected from each animal via jugular venipuncture with EDTA as the anticoagulant and later transferred to Whatman™ FTA™ Classic Cards (GE Healthcare) for storage. Genotyping was done using FTA™ preserved blood samples with the Ovine Infinium® HD SNP BeadChip (http://genomics.neogen.com/en/illumina-ovine-hd-beadchip) at GeneSeek Inc. (Lincoln NE, USA). The BeadChip includes 606,006 genomic variants designed by the International Sheep Genomics Consortium (ISGC), nearly all the contents from the original OvineSNP50 array and 30,000 putative functional variants. The raw genotypes were first analysed with GenomeStudio (Illumina; GenCall score for raw genotypes > 0.15) which was used to extract the genotypes in standard format for PLINK 1.09 (http://www.cog-genomics.org/plink/1.9/). Chromosomal coordinates for each SNP were obtained from the Ovine Oar 3.1 genome assembly (https://www.ensembl.org/Ovis_aries/Info/Index). The raw dataset was filtered to remove animals with more than 10% missing genotypes, SNPs with no known positions in the genome, SNPs with a call rate lower than 95%, a minor allele frequency lower than 1%, and large HWE deviations (P < 0.000001). All the SNPs mapping to the X chromosome were retained in the final dataset because genes associated with prolificacy have also been observed on this chromosome. Hence, 84 individuals and 457,086 SNPs were available for analysis following quality filtering. This dataset was first used to assess genetic relationship and structure by performing the principal component analysis (PCA) with ADEGENET (http://adegenet.r-forge.r-project.org/) executed in R (https://www.R-project.org). To further assess possible fine-scale population genetic structure, NetView (Neuditschko et al. 2012) was also used to analyse the genotype data, testing up to ten near-neighbour genetic clusters.
Identification of candidate genomic regions under selection
Three selection detection tests, FST, hapFLK and XP-EHH, were implemented. Prior to performing selection signature mapping, the genotyped individuals were classified, a priori, into two groups, prolific and non-prolific ewes. The prolific group included ewes with twins, triplets and quadruplet litter sizes. The non-prolific group included ewes with single litter sizes.
Smoothed FST test statistic
The FST test indicates genetic differentiation among groups of individuals/populations/breeds arising from different evolutionary pressures acting in a segment of the genome. To identify loci under selection, we calculated the allele frequencies for each of the 457,086 retained SNPs for the two contrasting groups of prolific and non-prolific ewes. The allele frequencies were used to calculate FST values for each locus as a measure of group differentiation following Porto-Neto et al. (2013a, b). For each SNP, FST was calculated as the squared deviation of the average frequency in a group from the average frequency across the groups divided by the allele frequency variance (p*q). To identify regions under selection, the non-prolific group was compared against the prolific one and the pairwise group values were averaged to obtain a single FST value per SNP for each group. To facilitate the identification of genomic regions containing more extreme FST values, the individual SNP values of FST were grouped within genomic windows, using a kernel regression smoothing algorithm (Gasser et al. 1991) implemented with LOKERN in R (Hermann 2016). This method uses local averaging of the observations (FST values) when estimating the regression function. By testing window sizes of two, five and ten SNPs, we chose a window of five SNPs as it gave sufficient smoothing and showed the best signals. Higher scores of smoothed FST for individual loci or genomic regions indicate stronger signal of genetic differentiation/selection. Smoothed FST values greater than the average plus/minus three standard deviations (mean FST ± 3 SD) were taken to be under selection.
hapFLK test statistic
As a complementary approach to mapping selection sweeps, we used hapFLK 1.3 (https://forge-dga.jouy.inra.fr/projects/hapflk), which implements the FLK (Bonhomme et al. 2010) and hapFLK (Fariello et al. 2013) algorithms. The FLK tests the neutrality of polymorphic markers by contrasting their allele frequencies in a set of populations against what is expected under neutrality. The hapFLK extends the FLK test to account for differences in haplotype frequencies between populations. This method has been shown to be robust with respect to bottlenecks and migration events (Fariello et al. 2013). To perform hapFLK analysis, Reynolds’ genetic distances between the prolific and non-prolific ewes were calculated and converted to a kinship matrix with an R script (available at https://forge-dga.jouy.inra.fr/projects/hapflk/documents). Subsequently, by assuming ten haplotype clusters in the linkage disequilibrium (LD) model (− K 10; number of haplotype clusters determined by running a fastPHASE cross-validation analysis), the hapFLK statistics were computed and averaged across 20 expectation–maximization runs to fit the LD model (− nfit = 20). The standardization of the statistics using the corresponding python script provided with the software allowed the estimation of the associated P values from a standard normal distribution. To correct for multiple testing, we considered the threshold of the nominal P value as < 0.001 to identify the significant haplotypes following previous studies using hapFLK analysis on the Sheep HapMap dataset (Fariello et al. 2014; Kijas 2014).
XP-EHH test statistic
Functional enrichment of the candidate regions under selection
For the three selection mapping approaches, positions that showed evidence of selection (mean FST ± 3 SD; or a P value < 0.001 for hapFLK and XP-EHH) were considered to be the result of selection sweeps. The genes that were either partially or fully covered by these regions were identified based on the ovine 3.1 reference genome assembly using Ensembl Comparative Genomics Resources Database Release 94 (https://www.ensembl.org/index.html). Functional enrichment analysis was performed with the functional enrichment clustering tool of DAVID Bioinformatics Resources 6.8 (Huang et al. 2009a, b). Each gene was analysed and enrichment analysis was performed using Ovis aries as the target species and the Bos taurus genome supplied with DAVID 6.8 as the background species. Corrections for multiple testing were performed by applying the Benjamini and Hochberg (1995) approach. For functional enrichment clustering, an enrichment score of 1.3 was taken as the threshold following the authors of DAVID 6.8. A search of the literature was also performed to identify phenotypes that are known to be affected by variation in the genes found in the candidate regions in other species.
Functional protein–protein interaction (PPI) networks and gene ontology (GO) terms encoded by the candidate genes were also investigated using STRING Genomics 11.0 (Szklarczyk et al. 2019) with the Bos taurus as the background species. STRING provides (i) known PPI from curated databases or experiments and (ii) PPI predicted on the basis of gene neighbourhoods, fusions and co-occurrences, text mining in literature, co-expression or protein homology. A global PPI network which retained interactions with a high level of confidence (PPI enrichment score > 0.4) was constructed.
The authors are gratefully indebted to sheep farmers participating in the Bonga community-based breeding programme. We also acknowledge the support of the research and technical staff of Bonga Agricultural Research Centre. Financial support from various donors to the CGIAR Research Programme on Livestock is recognized. The genotyping was supported by the generous donation through the Illumina Greater Good Initiative and additionally by discounted prices from GeneSeek Inc. Activities involving Iowa State University were supported by the Ensminger Endowment, the Department of Animal Science and the College of Agriculture and Life sciences and the State of Iowa.
MR, AH, BR and JMM contributed to the study conception and design. On-farm phenotypic recording and tissue sample collection were done by ATD and AM. Data analysis was done by NK, DW and JMM. Genotyping reagents were provided by MFR. The study was supervised by JMM who also wrote the manuscript. All authors read and approved the final manuscript.
Compliance with Ethical standards
Conflict of interest
The authors declare no conflict of interest.
Research involving human and animal participants
The animals used in this study are owned by farmers who are participating in a community-based indigenous sheep breeding programme. The farmers were aware of the objectives of the study and the animals were sampled with their permission in the presence of a qualified veterinarian following the guidelines for the care and use of animals of the Ministry of Livestock and Fisheries Resources, Peoples Democratic Republic of Ethiopia. No animal was injured during the sampling process. All the procedures described here were approved by the Institute Ethics Committee of the International Centre for Agricultural Research in the Dry Areas (ICARDA).
- Bahbahani H, Tijjani A, Mukasa C, Wragg D, Almathen F, Nash O, Akpa GN, Mbole-Kariuki M, Malla S, Woolhouse M, Sonstegard T, van Tassell C, Blythe M, Huson H, Hanotte O (2017) Signatures of selection for environmental adaptation and Zebu x Taurine hybrid fitness in East African shorthorn zebu. Front Genet 8:68PubMedPubMedCentralCrossRefGoogle Scholar
- Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc 57:289–300Google Scholar
- Das SK, Wang X-N, Paria BC, Damm D, Abraham JA, Klagsbrun M, Andrews GK, Dey SK (1994) Heparin-binding EGF-like growth factor gene is induced in the mouse uterus temporally by the blastocyst solely at the site of its apposition: a possible ligand for interaction with blastocyst EGF-receptor in implantation. Development 120:1071–1083PubMedGoogle Scholar
- Davis GH, Galloway SM, Ross IK, Gregan SM, Ward J, Nimbkar BV, Ghalsasi PM, Nimbkar C, Gray GD, Subandriyo, Inounu I, Tiesnamurti B, Martyniuk E, Eythorsdottir E, Mulsant P, Lecerf F, Hanrahan JP, Bradford GE, Wilson T (2002) DNA tests in prolific sheep from eight countries provide new evidence on origin of the Booroola (FecB) mutation. Biol Reprod 66:1869-1874PubMedCrossRefGoogle Scholar
- Dombret C, Nguyen T, Schakman O, Michaud JL, Hardin-Pouzet H, Bertrand MJM, De Backer O (2012) Loss of Maged1 results in obesity, deficits of social interactions, impaired sexual behavior and severe alteration of mature oxytocin production in the hypothalamus. Hum Mol Genet 21:4703–4717PubMedCrossRefGoogle Scholar
- Edea Z, Haile A, Tibbo M, Sharma AK, Sölkner J, Wurzinger M (2012) Sheep production systems and breeding practices of smallholders in western and south-western Ethiopia: implications for designing community-based breeding strategies. Livest Res Rural Dev 24:7Google Scholar
- Herrmann E (2016) Lokern version 1.1-8: An R package for kernel smoothing. https://cran.r-project.org/web/packages/lokern/index.html
- Jansson T (2014) Genes involved in ovulation rate and litter size in sheep. BSc These, Swedish University of Agricultural Sciences. Uppsala Sweden. p 19Google Scholar
- Kijas JW, Porto-Neto L, Dominik S, Reverter A, Bunch R, McCulloch R, Hayes BJ, Brauning R, McEwan J, The International Sheep Genomics Consortium (2014) Linkage disequilibrium over short physical distances measured in sheep using a high density SNP Chip. Anim Genet 45:754–757PubMedCrossRefGoogle Scholar
- Liu YZ, Guo YF, Wang L, Tan LJ, Liu XG, Pei YF, Yan H, Xiong DH, Deng FY, Yu N, Zhang YP, Zhang L, Lei SF, Chen XD, Liu HB, Zhu XZ, Levy S, Papasian CJ, Drees BM, Hamilton JJ, Recker RR, Deng HW (2009) Genome-wide association analyses identify SPOCK as a key novel gene underlying age at menarche. PLoS Genet 5:e1000420PubMedPubMedCentralCrossRefGoogle Scholar
- Monestier O, Servin B, Auclair S, Bourguard T, Poupon A, Pascal G, Fabre S (2014) Evolutionary origin of bone morphogenetic protein 15 and growth and differentiation factor 9 and differential selective pressure between mono- and polyovulating species. Biol Reprod 91(83):1–13Google Scholar
- Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, The International HapMap Consortium (2007) Genome-wide detection and characterization of positive selection in human populations. Nature 449:913–918PubMedPubMedCentralCrossRefGoogle Scholar
- Simianer H (2014) Statistical problems in livestock population genomics. In: Proceedings of the 10th World Congress on Genetics Applied to Livestock Production. ASAS, Vancouver, CanadaGoogle Scholar
- Stein LM, Tullock CW, Mathews SK, Garcia-Galiano D, Elias CF, Samson WK, Yosten GL (2016) Hypothalamic action of phoenixin to control reproductive hormone secretion in females: importance of the orphan G protein-coupled receptor Gpr173. Am J Physiol Regul Integr Comp Physiol 311:R489–R496PubMedPubMedCentralCrossRefGoogle Scholar
- Su YQ, Sugiura K, Wigglesworth K, O’Brien MJ, Affourtit JP, Pangas SA, Matzuk MM, Eppig JJ (2008) Oocyte regulation of metabolic cooperativity between mouse cumulus cells and oocytes: BMP15 and GDF9 control cholesterol biosynthesis in cumulus cells. Development 135:111–121PubMedCrossRefGoogle Scholar
- Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P, Jensen LJ, von Mering C (2019) STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47(D1):D607–D613PubMedCrossRefGoogle Scholar
- Weidemann M, Schuster-Gossler K, Stauber M, Wrede C, Hegermann J, Ott T, Boldt K, Beyer T, Serth K, Kremmer E, Blum M, Ueffing M, Gossler A (2016) CFAP157 is a murine downstream effector of FOXJ1 that is specifically required for flagellum morphogenesis and sperm motility. Development 143:4736–4748PubMedCrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.