Background

The Hainan Partridge (Arborophila ardens, Phasianidae, and Galliformes) is a species endemic to Hainan Island of China, distributed mainly in tropical evergreen forests between 600 and 1600 m above the sea level (Gao 1998; Yang et al. 2011). During the past few decades, the population of this partridge was reported to have rapidly declined owing to their poor flying ability and ground-dwelling nature, making them vulnerable to human activities, non-indigenous predators and rapid loss of habitat (Chang et al. 2012; Liang et al. 2013; Rao et al. 2017). Therefore, it was listed as a first class protected species in China (Zhang et al. 2003) and global vulnerable species (IUCN 2018). Historical and ongoing population declines (Chang et al. 2012; Chen et al. 2015a) and a suite of persistent and novel threats (Liang et al. 2013) have led to governmental protection of this species in much of their range. However, the whole genome of Hainan Partridge is not currently available and few studies have been conducted for the exploration related to the genetic mechanisms of the environmental adaption to Hainan Island of the Hainan Partridge. To provide genome-scale insights into the vulnerable Hainan Partridge, facilitate comparative studies of avian genomics and further the development of genetic tools for Hainan Partridge research and conservation, we sequenced the genome of the Hainan Partridge. The results will provide a better understanding of the factors that shape the evolutionary history of the Hainan Partridge and eventually improve its conservation.

Methods

Sampling and sequencing

Muscle sample was collected from a wild dead male Hainan Partridge which was preserved in the Natural History Museum of Sichuan University (NCBI Taxonomy ID: 1206065). A whole genome shotgun approach on the Illumina 2000 platform was performed to sequence the genome. Two paired-end libraries with insert sizes of 230 bp and 500 bp, as well as three mate-paired libraries with insert sizes of 2 kb, 5 kb and 10 kb were constructed.

Genome size estimation, genome assembly and completeness evaluation

Before assembly, a 17-Kmer analysis was performed to estimate the genome size and the assembly was firstly performed by SOAPdenovo2 (Luo et al. 2012) with the parameters set as “all -d 5 –M 3 –k 25”. After using SSPACE (Boetzer et al. 2011) to build super-scaffolds, intra-scaffold gaps were then filled using Gapcloser (Tigano et al. 2018), which is distributed with SOAP, with reads from short-insert libraries. In order to verify the correctness of the assembly, we aligned it to the Red Junglefowl (Gallus gallus) reference genome. Visualization of the Hainan Partridge/Red Junglefowl genome alignment was performed using LAST (Kiełbasa et al. 2011) with the settings suggested by the software developers for similarly distantly related taxa. We used CEGMA (Parra et al. 2007) and BUSCO (Simão et al. 2015) to evaluate the genome completeness.

Gene prediction and annotation

We combined the de novo and homology-based prediction to identify protein-coding genes (PCGs) in the genome. The de novo prediction was performed on the assembled genomes with repetitive sequences masked as “N” based on the HMM (hidden Markov model) algorithm. AUGUSTUS (Stanke et al. 2006) and GENSCAN (Burge and Karlin 1997) programs were executed to the PCGs using appropriate parameters. For the homology prediction, proteins of the Red Junglefowl, Turkey (Meleagris gallopavo), Zebra Finch (Taeniopygia guttata), and human (Homo sapiens) were mapped onto the genome using TblastN (Altschul et al. 1997) with an E-value cutoff of 1E−5. To obtain the best matches of each alignment, the results yielded from TblastN were processed by SOLAR (Yu et al. 2006). Homologous sequences were successively aligned against the matching gene models using GeneWise (Birney et al. 2004). We used EVidenceModeler (EVM) (Haas et al. 2008) to integrate the above evidence and obtained a consensus gene set.

Functional annotation

Functional annotation of all genes was undertaken based on the best match derived from the alignments to proteins annotated in Swissprot and TrEMBL databases (Boeckmann et al. 2003) and BlastP tools with the same E-value cut-off of 1E−5 was applied. Descriptions of gene products from Gene Ontology ID were retrieved from the results of Swissprot. We also annotated proteins against the NCBI non-redundant (Nr) protein database. The motifs and domains of genes were annotated using InterProScan (Hunter et al. 2008) against publicly available databases, including ProDom (Bru et al. 2005), PRINTS (Attwood et al. 2000), PIRSF (Wu et al. 2004), Pfam (Finn et al. 2013), ProSiteProfiles (Sigrist et al. 2002), PANTHER (Thomas et al. 2003), SUPERFAMILY (Gough and Chothia 2002), and SMART (Letunic et al. 2004). To find the best match and involved pathway for each gene, all genes were uploaded to KAAS (Moriya et al. 2007), a web server for functional annotation of genes against the manually corrected KEGG genes database by BLAST, using the bi-directional best hit (BBH) method.

Analyses of gene family, phylogeny, and divergence

We used orthoMCL (Li et al. 2003) to define orthologous genes from 10 avian genomes (Hainan Partridge, Red Junglefowl, Turkey, Chinese Monal Lophophorus lhuysii, Japanese Quail Coturnix japonica, Rock Dove Columba livia, Mallard Anas platyrhynchos, Peregrine Falcon Falco peregrinus, Zebra Finch, Ostrich Struthio camelus). Phylogenetic tree of these 10 birds was constructed using nucleotide sequences of 1:1 orthologous genes. Coding sequences from each 1:1 orthologous family were aligned by PRANK (Nick and Ari 2010) and concatenated to one sequence for each species for building the tree. Modeltest (Posada and Crandall 1998) was used to select the best substitution model for the whole concatenated sequence. RAxML (Stamatakis 2014) was then applied to reconstruct the maximum likelihood (ML) phylogenetic tree with 1000 bootstrap replicates. Divergence time estimation was performed by PAML MCMCTREE (Yang 2007).

Positive selection analysis

The above alignments of 1:1 orthologous genes and phylogenetic tree were used to estimate the ratio of the rates of non-synonymous to synonymous substitutions (ω) using the codeml program within PAML under the branch-site model. We then performed a likelihood ratio test and identified the positively selected genes (PSGs) by means of FDR adjustment with Q-values < 0.05.

SNP distribution and demography from genome data

We used SAMtools and Pairwise Sequentially Markovian Coalescent (PSMC) to detect SNPs between diploid chromosomes (Li et al. 2009) and the demographic history of the Hainan Partridge (Li and Durbin 2011), respectively.

Results

Genome sequencing, assembly and quality assessment

After filtering out low quality and duplicated reads, a total of 221.54 Gb (~ 205-fold coverage) high quality sequences were obtained (Table 1). The genome size of Hainan Partridge is 1.08 Gb on the basis of K-mer analysis and it is similar to the reported avian genomes (Cai et al. 2013). The total length of all scaffolds was 1.05 Gb with the scaffold N50 8.28 Mb (Table 2). The genome completeness was evaluated using CEGMA methods with the results of 83.47% for completeness and 90.32% for partial gene set (Additional file 1: Table S1). A total of 91.4% of the eukaryotic 1:1 genes were captured according to the BUSCO evaluations (Additional file 1: Table S2). At the same time, visual inspection of the alignment of the Hainan Partridge scaffolds against the Red Junglefowl reference genome also indicated high synteny and assembly correctness. Hainan Partridge scaffolds generally aligned entirely to a single Red Junglefowl chromosome, even though detectable inversions were common and some chromosomal rearrangements were evident, especially on chromosome 2 and chromosome 4 of the Red Junglefowl (Fig. 1a, b). Some of the Red Junglefowl microchromosomes were covered almost entirely by a single Hainan Partridge scaffold (Fig. 1c, d).

Table 1 Genome sequencing information of Hainan partridge
Table 2 Assembly information of Hainan Partridge genome
Fig. 1
figure 1

Alignments of the Hainan Partridge scaffolds to Red Junglefowl chromosomes (a, b) several chromosomal rearrangements between the red jungle fowl and Hainan Partridge are evident in the red jungle fowl chromosome 2 and chromosome 4, while c, d the alignment of the Hainan Partridge scaffolds to Red Junglefowl chromosome 15 and chromosome 26 is an example of scaffolds approaching chromosome size, high synteny and high assembly correctness. Forward alignments are in blue, reverse alignments in red

Repeat sequences and gene prediction

The GC content of the Hainan Partridge genome was approximately 42.17%, similar to other bird species such as the Ground Tit, Red Junglefowl and Zebra Finch (Cai et al. 2013). The repeat sequences are about 9.19% (96.04 Mb) including long interspersed nuclear elements (LINEs, 6.70%), long terminal repeats (LTRs, 1.27%), short interspersed nuclear elements (SINEs, 0.06%), and DNA transposons 1.14% (Additional file 1: Table S3). A total of 17,376 PCGs in Hainan Partridge genome were predicted and most (92.03%) of them were well supported by public protein databases (TrEMBL, Swissprot, Nr, InterPro, GO and KEGG) (Fig. 2b, Additional file 1: Table S4). The average length of genes and coding sequences were 24,359 bp and 1689 bp with an average of 10 exons per gene.

Fig. 2
figure 2

Comparative genomics in avian species studied. a Phylogenetic tree constructed using 1:1 orthologous genes. The time lines indicate divergence times among the species. b Functional annotation of Hainan Partridge genes. c Orthologous gene clusters in five Phasianidae species

Bird phylogeny, divergence and evolution of gene families

We identified 14,668 gene families from 10 available bird genomes (Hainan Partridge, Red Junglefowl, Turkey, Chinese Monal, Japanese Quail, Mallard, Peregrine Falcon, Zebra Finch, Ostrich), of which 5491 represented 1:1 orthologous gene families. Comparison of orthologous gene clusters between the former five Phasianidae species is shown in Fig. 2c. The maximum likelihood phylogeny constructed based on the 1:1 orthologous genes indicated that Hainan Partridge possessed a basal phylogenetic position within Phasianidae and was most likely derived from a common ancestor approximately 36.8 Mya (Fig. 2a).

Positive selection in Hainan Partridge

There are 504 genes in 5491 1:1 orthologous genes which were under positive selection in the Hainan Partridge using the branch-site likelihood ratio test. The KEGG annotation of these PSGs suggested that they were distributed in 46 pathways such as signal transduction (28 genes), folding, sorting and degradation (21 genes), the immune system (20 genes), and transport and catabolism (17 genes) (Additional file 2: Fig. S1a). The Gene Ontology (GO) annotation classified the PSGs into three categories: molecular functions, cellular components and biological processes. Molecular functions included genes mainly involved in binding (291 genes; GO:0005488) and catalytic activity (141 genes; GO:0003824). Genes related to cellular components were primarily cell (420 genes; GO:0005623), cell part (418 genes; GO:0044464), and organelle (357 genes; GO:0043226). Biological process genes were mainly involved in cellular process (366 genes; GO:0009987), metabolic process (250 genes; GO:0008152), and biological regulation (248 genes; GO:0065007) (Additional file 2: Fig. S1b).

We found several PSGs related to environmental adaptation in Hainan Partridge. For example, there are three genes (CASP3, BRCA2, DTL) related to response to ultraviolet (UV) (GO:0009411) and it is possible that they directly respond to the high UV radiation in Hainan Island (Liao et al. 2007). Furthermore, the endoplasmic reticulum (GO: 0005783) plays key roles in crucial processes like protein transport and energy metabolism and the mRNA expression of genes in GO: 0005783 in mice is related to temperature (Yu et al. 2011).

Demography reconstruction

We identified 7,015,181 heterozygous SNPs in the Hainan Partridge genome and their density distribution is shown in Fig. 3a. On the basis of local SNP densities, we performed PSMC analysis and found that Hainan Partridge population had experienced one bottleneck in demographic history during 20 Mya to 10,000 years ago (Fig. 3b). The effective population size decreased from approximately 1,040,000 individuals about 2.5 Mya to a minimum of 200,000 individuals approximately 0.25 Mya and then expand to about 460,000 individuals.

Fig. 3
figure 3

SNP density distribution and demography reconstruction of the Hainan Partridge. a distribution of SNP density across the Hainan Partridge genome. Heterozygous SNPs between the chromosomes were annotated, and heterozygosity density was observed in non-overlapping 50-kb windows. b PSMC inference of Hainan Partridge population history based on autosomal data. The central bold lines represent inferred population sizes. The 100 thin curves surrounding each line are the PSMC estimates that were generated using 100 sequences randomly resampled from the original sequence. The mutation rate on autosomes used in time scaling, was estimated using red jungle fowl autosome data

Discussion

There are more than 250 species in the Galliformes in the world including 63 Galliformes species distributed in China (Shen et al. 2010; Li et al. 2010; Zheng 2017). However, there is a limited number of pheasant genomes sequenced so far. The genus Arborophila is very abundant with at least 16 species (Chen et al. 2015a, b; Clements et al. 2018; del Hoyo et al. 2018) and they all are in IUCN red list (IUCN 2018). However, no genome is available for this genus. In this research, we provided the high-quality genome sequences of Hainan Partridge and it will be a very important resource for the investigation associated with the molecular genetic mechanisms of the environmental adaption to Hainan Island of the Hainan Partridge, facilitating comparative studies of avian genomics and developing the genetic tools for Hainan Partridge protection.

The genome synteny analysis between the Hainan Partridge and Red Junglefowl demonstrated that their genome structures were relatively conserved. Our observation was in line with previous reports of conserved overall synteny between the Ground Tit (Pseudopodoces humilis) and Zebra Finch (Cai et al. 2013), the Zebra Finch and Red Junglefowl (Warren et al. 2010), and also between the Turkey and Red Junglefowl (Yang 2002). However, this inference needs further confirmation with more sequenced avian genomes. The phylogenetic analysis showed that Phasianidae was a monophyly and the Hainan Partridge was given a basal phylogenetic position, branching apparently earlier than other genera within Phasianidae. This is in accordance with the opinion that Arborophila was basal to the phasianines (Crowe et al. 2006). The Hainan Partridge diverged from the other lineages in the Phasianidae around 36.8 Mya, which was much earlier than other genera (Crowe et al. 2006; Chen et al. 2015b).

The Hainan Partridge is only distributed in Hainan Island and there are at least three positively selected genes (CASP3, BRCA2, DTL) related to UV radiation. This evolution of genes may be important for the Hainan Partridge to survive in the high UV radiation environment at low latitude Hainan Island (Liao et al. 2007). This kind of adaptation has also been developed by other birds living at high altitude (Cai et al. 2013). Previous studies reported that CASP3, a central effector of apoptosis, facilitated rather than inhibited radiation-induced genetic instability and carcinogenesis (Liu et al. 2015). Liu et al. (2015) showed that a significant fraction of mammalian cells that were treated with ionizing radiation could survive despite caspase-3 activation, and this sublethal activation of CASP3 promoted persistent DNA damage and oncogenic transformation. Homologous recombination (HR) repair following DNA double-strand breaks (DSB) was a primary, high-fidelity mechanism of radiation repair in cells. An important step in HR was recruitment of the repair protein RAD51 by BRCA2 to the damaged DNA sites; the alteration of these proteins rendered cells resistant to cytotoxic damage (Abaji et al. 2005; Luo et al. 2016). Previous studies found genes that were known to upregulate DNA repair proteins such as BRCA2 to protect cells from radiation-induced DNA damage (Im et al. 2018). Several studies have demonstrated that DTL had an oncogenic function in some cancer types, such as hepatocellular carcinoma, breast cancer, and Ewing sarcoma and it plays an important part in regulating the protein stability of p53 (Kobayashi et al. 2015; Banks et al. 2006; Li et al. 2009).

In the present study, we performed PSMC analysis and the results contradicted the previous studies proposing that the Hainan Partridge populations contracted during the last ice age followed by a warming period expansion (Hewitt 2004). However, our results supported the expectation that the current demography was representative of their past during the Last Glacial Maximum (LGM) (Chang et al. 2012). Previous studies revealed the postglacial expansion events from potential refugia of the Hainan Partridge by climate warming, but this phenomenon was not shown in the present study. The pattern in this study has revealed much similarity to forest community results (species in Southeast Asia once have survived ice ages with comparatively steady demographic history during the LGM) (Chang et al. 2012). These similar results of steady demographic history after postglacial periods were supported by several previous researches (Xu et al. 2010). Overall, it is possible that the Hainan Partridge has experienced local adaptation and dealt with the glacial climate changes, owing to the lack of evidence of effective population size contraction during the LGM.

Conclusions

We sequenced the Hainan Partridge genome and compared it with other avian genomes. Phylogenetic analysis confirmed that the Hainan Partridge possessed a basal phylogenetic position in Phasianidae. Positive selection analysis revealed the environmental adaption of Hainan Partridge to UV radiation in the Hainan Island.