Background

The Pacific white shrimp (Litopenaeus vannamei) is a species of Penaeus shrimp that are native to the eastern Pacific Ocean, from the Mexican state of Sonora as far south as northern Peru [1]. It has become the world’s most prevalent cultured crustacean species as a result of its fast growth, adaptability to a wide range of salt and temperature, strong disease resistance, and low demand for dietary protein [2]. By 2004, global production of L. vannamei approached 1,116,000 tons, and exceeded that of Penaeus monodon [3]. By 2010, production had reached 2,721,000 tons [4]. However, the very limited supply of high-quality broodstocks is contrasted with the heavy demand of shrimp larvae from large-scale cultivation, especially in non-native countries such as China. The employment of inferior-quality broodstocks could lead to an eventual loss in gametic and larval quality, and production would then decline. Studies aimed at improving reproductive performance would therefore be helpful for the industrial applications of L. vannamei. However, previous studies primarily focused on disease resistance mechanisms and culturing techniques of L. vannamei [59], and baseline information related to its reproductive activity and molecular aspects of gonadal development remain scarce. Thus, it is important to understand the regulatory mechanisms of reproductive phenotypes in this species.

The first step toward understanding molecular mechanisms of reproduction is to identify and characterize sex-related genes and regulatory pathways. Many efforts have been made to reveal sex-related genes, and many of these genes are cloned and characterized in shrimp; for example, M-phase phosphoprotein 6 (MPP6) [10], cell division cycle 2 (Cdc2) [11], cyclin A and cyclin B [12], gonad-inhibiting hormone (GIH) [13], mitogen-activating protein kinase 1 (MAPK1) [14], prostaglandin reductase 1 [15], and valosin-containing protein (VCP) [16] in Penaeus monodon; and sex-determiner transformer-2 (Tra-2) [17], activated protein kinase C1 (RACK1) [18], and cell apoptosis susceptibility (FcCAS) [19] in Chinese shrimp Fenneropenaeus chinensis. In L. vannamei, vasa-like, vitellin, gonadotropin-releasing hormone-like and a sex-related marker have been identified [2023]. Additionally, research methods, such as suppression subtractive hybridization SSH [24, 25], proteomic analysis [26, 27], EST sequencing [2830], and microarray [28, 31, 32], have also been applied to scientific studies of shrimp in order to reveal potential sex-related genes. However, because of the lack of genomic sequences, comprehensive identification of sex-related genes and construction of regulatory networks associated with shrimp gonadal development are lacking.

Newly developed next-generation high-throughput sequencing technology has become a powerful tool for identifying genes involved in gonadal development, sex determination and sex differentiation [3337]; and for SNP/SSR marker discovery [3840] in aquaculture species where the genomic sequences are not available. In the present study, we performed transcriptomic sequencing of the gonads of 13-month-old adult male and female L vannamei to identify sex-related genes. The gonadal transcriptomic data of one-day post-eyestalk- ablation females (Day1O) and six-day post-eyestalk ablation females (Day6O) were also used for de-novo assembly and annotation so as to identify more genes during ovarian maturation. Results from the transcriptomic analysis would be particularly important for better understanding the regulation of gonadal development between sexes in this economically important aquaculture species. In addition, real-time PCR verification of 104 sex differentially expressed genes herein validates the reliability of the transcriptomic analysis strategy, and emphasizes some candidate genes of interest involved in sex determination and gonadal development for further functional studies.

Methods

Ethics statement

All procedures involving the handling and treatment of shrimp used in this study were conducted with the approval of the Animal Care and Use Committee of the Guangxi Academy of Fishery Sciences, Nanning, China.

Experimental shrimp and sample collection

L. vannamei used in this study were reared at Fangchenggang aquaculture base, Guangxi Academy of Fishery Sciences, Nanning, China. The experimental shrimp were 13 months of age with a weight of 40–50 g. First, shrimp were anesthetized on ice for three minutes, then testes from male shrimp (Testis), and ovaries from the pre-eyestalk ablation female shrimp (PreO) were removed. The ovaries of one-day post-eyestalk ablation female shrimp (Day1O) and the six-day post-eyestalk ablation female shrimp (Day6O) were also isolated. The gonadal tissues (testis for male and ovary for female) were cut into cubes of approximately 0.5 × 0.5 × 0.5 cm in size and immediately immersed in liquid nitrogen overnight, and then stored at −80 °C until RNA extraction. Some pieces of gonad from each shrimp were fixed in 4 % PFA and sectioned at 5 μm for hematoxylin and eosin (HE) staining and observation by light microscopy.

RNA isolation, library preparation, and Illumina Hiseq2500 sequencing

Total RNA was isolated using TRIzol® Reagent (Invitrogen, CA, U.S.) according to the manufacturer’s instructions, and genomic DNA was removed using DNase I (Takara, Tokyo, Japan). Then RNA quality was determined with a 2100 Bioanalyser (Agilent, CA, U.S.) and quantified using a ND-2000 (NanoDrop Technologies, DE, U.S.). Three RNA samples from each group (Testis, PreO, Day1O, and Day6O) were pooled equally, and the mRNA-seq libraries were then prepared using 5 μg of pooled total RNA and the TruSeqTM RNA sample preparation Kit (Illumina, San Diego, CA). First, mRNA was isolated with oligo(dT) beads and then fragmented to 100–400 bp by fragmentation buffer. Second, double-stranded cDNA was synthesized using a SuperScript double-stranded cDNA synthesis kit (Invitrogen, CA) with random hexamer primers (Illumina). Then the synthesized cDNA was subjected to end-repair, phosphorylation, and ‘A’ base addition according to Illumina’s library construction protocol. The libraries were amplified by PCR for 15 cycles using Phusion DNA polymerase (NEB), and then target cDNA fragments of 200–300 bp were selected using 2 % Low Range Ultra Agarose gel. After quantification by TBS380, paired-end sequencing of 101 bp reads was performed for the four cDNA libraries in one lane on an Illumina HiSeq2500 high-throughput sequencer.

De-novo assembly and annotation

The raw paired-end reads were trimmed and quality controlled by filter fq (BGI internal software) to remove reads with adaptors, reads with unknown nucleotides larger than 5 % and low-quality reads (the rate of reads in which the quality value ≦ 10 was more than 20 %). Then the clean data from samples (Testis, PreO, Day1O, Day6O) were used to perform de-novo assembly with Trinity (https://github.com/trinityrnaseq/trinityrnaseq/wiki; version Trinityrnaseq_r2013-02-25) and with min_kmer_cov set to 2 and all other parameters set to default [41]. The longest- assembled sequences per gene model were called contigs. Then the reads were mapped back to the contigs, as with paired-end reads we are able to detect contigs from the same transcript as well as the distances between these contigs. Finally, we retrieved sequences without Ns and these could not be extended at either end. Such sequences were thereby defined as Unigenes. TGICL (http://sourceforge.net/projects/tgicl/files/tgicl%20v2.1/; v2.1) [42] was used to further assemble all the unigenes from different samples to form a single set of non-redundant unigenes (called all-unigenes). The all-unigenes displaying >70 % sequence identities were grouped into a cluster, in which the prefix is CL, and the cluster ID is locater after. And the others were singletons whose the prefix was Unigene.

The completeness of the transcriptome assembly was tested by Core Eukaryotic Genes Mapping Approach (CEGMA) software (cegma_v2.4.010312, using the default parameters), by comparing known 248 Core Eukaryotic Genes (CEGs) and the transcripts assembled by Trinity. CEGMA was developed to identify a subset of 248 highly conserved core eukaryotic genes (CEGs) deriving from six diverse model organisms in eukaryotic genomes [43].

All non-redundant unigene sequences were searched against protein databases (Nr, SwissProt, KEGG, COG.) using blastx (evalue < 0.00001) and the nucleotide database NT (e-value < 0.00001) by blastn (evalue < 0.00001) (http://blast.ncbi.nlm.nih.gov/Blast.cgi; v2.2.26 + x64-linux). Protein function information was predicted from annotation of the most similar protein available in the databases.

BLAST2GO (http://www.blast2go.com/b2ghome; v2.5.0, release 2012-08-01) [44] program was used to retrieve GO annotations of unigenes for describing biological processes, molecular functions and cellular components. Metabolic pathway analysis was performed using online KEGG Automatic Annotation Server (http://www.genome.jp/; BGI internal version, Release 63.0) [45].

Identification of sex-specifically expressed and differentially expressed genes

SOAPaligner/soap2 (http://soap.genomics.org.cn/) was used to map the reads to the assembled transcriptome. Unique mapped reads including paired-end reads for which only one part matched, were used to calculate the level of gene expression using the fragments per kb per million fragments method (FPKM) [46]. The method edgeR was used to identify differentially expressed genes (DEGs) between two samples [47]. The threshold for the P-value was determined by the false-discovery rate (FDR). Unigenes with FDR ≤ 0.001 and ratio of FPKMs of the two samples larger than 2 (genes for which FPKM < 1 were filtered) were considered to be differentially expressed genes in this study.

In addition, functional-enrichment analysis was performed to identify which GO terms and metabolic pathways were significantly enriched in DEGs. Hypergenometric test and 0.05 cutoff P adjusted were used for analysis of the enrichment of functional terms. GO functional enrichment and KEGG pathway analysis were carried out using Goatools (https://github.com/tanghaibao/Goatools) and KOBAS (http://kobas.cbi.pku.edu.cn/home.do) [48].

Simple sequence repeat (SSR) detection

SSRs were detected among the unigenes using MISA software (MIcroSAtellite; http://pgrc.ipk-gatersleben.de/misa/misa.html; version 1.0). Six types of SSRs were investigated: mono-, di-, tri-, quad-, penta-, and hexa-nucleotide repeats; and we kept the SSRs in which the lengths at both ends of the Unigene were more than 150 bp for primer design by Primer 3 (http://www.onlinedown.net/soft/51549.htm; Release 2.3.4).

Semi-quantitative and quantitative real-time PCR validation

Quantitative and semi-quantitative real-time PCR were used to verify sex differentially expressed genes as identified from the gonadal transcriptome.

The sequences of 47 up-regulated testicular and 57 up-regulated ovarian unigenes were chosen for primer design using Array Designer 4 (http://premierbiosoft.com/dnamicroarray/index.html). The β-actin (BQF: 5′-GTGTGACGACGAAGTAGC-3′, BQR: 5′-GATACCTCGCTTGCTCTG-3′) was used as a reference gene. Total RNAs were reverse-transcribed with Goscript™ reverse transcription system (Promega, U.S.) according to the manufacturer’s instructions.

Primers for Testis-specific (FPKM of PreO = 0 and FDR ≦ 0.001) and PerO-specific (FPKM of Testis = 0 and FDR ≦ 0.001) unigenes were first amplified by semi-quantitative PCR using equivalent cDNAs from testis and ovary of pre-eyestalk ablation shrimp as templates. PCR products were analyzed with gel electrophoresis using 1.2 % agarose gel. When a gene was found to be expressed only in testis, the gene was considered to be a testis-specific gene, while a gene that was expressed only in PreO was considered to be an ovary-specific gene. For the genes that were expressed in both ovary and testis, quantitative real-time PCR was further conducted to verify whether they were differentially expressed in ovaries and testes. The primers for other DEGs were also verified by quantitative real-time PCR.

The quantitative real-time PCR was performed in the Applied Biosystems 7500 fast real-time system using THUNDERBIRD qPCR Mix (TOYOBO, Japan) as recommended by the manufacturer. The original cDNAs were diluted 100-fold for the target gene and β-actin amplification, and the PCR cycle for both the target genes and the β-actin was as follows: 3 min at 94 °C; then 36 cycles of 15 s at 94 °C, 30 s at 56 °C and 30 s at 72 °C; followed by 10 min at 72 °C. The PCRs used to detect all the target genes and β-actin reference gene were performed with three biologic replicates. The specificity of the amplification was assessed by a melting curve analysis to exclude primers with nonspecific amplification peaks. The relative expression level of target genes was calculated with the 2−ΔΔCT method [49]. The Student’s t-test was conducted using SPSS 23.0 (http://www-01.ibm.com/software/analytics/spss/), and significant differences were determined at a P-value < 0.05 (two-tailed test).

Results

Sequence analysis and assembly

A total of 25.16 gigabases (Gb) of sequences were generated from four L. vannamei gonadal tissue libraries (The raw reads data can be obtained from the NCBI Short Read Archive [SRA] under accession number SRA SRP059164.) After quality control with filter_fq, 63,782,344, 60,300,542, 58,195,380, and 61,048,948 clean reads were retrieved from the testis, PreO, Day1O and Day6O library, respectively. De-novo assembly of all the clean reads from four libraries generated a total of 65,218 unigenes with a mean size of 1021 bp and N50 of 2,000 bp (Table 1) (The assembled transcriptome can be obtained from the NCBI Transcriptome Shotgun Assembly (TSA) Database under accession number GDUV00000000). The length distribution of the unigenes obtained is illustrated in Fig. 1. The completeness of the transcriptome assembly was tested by CEGMA software by comparing known 248 Core Eukaryotic Genes (CEGs) and the transcripts assembled by Trinity. As a result, 234 out of 248 (94.35 %) CEGs were deemed to be complete proteins in the transcriptome, and 245 out of 248 (98.79 %) CEGs were found in the transcriptome including some partial proteins (Table 2), indicates a high level of completeness of the transcriptome.

Table 1 Summary statistics of the gonadal transcriptome of L. vannamei
Fig. 1
figure 1

Length distribution of all-unigenes. X-axis, size distribution of unigenes; Y-axis, number of unigenes in different length ranges

Table 2 Statistics regarding completeness of the Unigene assembly based on 248 CEGs

Sequence annotation

All-unigene sequences were searched against Nr, SwissProt, KEGG, COG and NT databases, which returned 26,482 (40.61), 23,062 (35.36), 20,659 (31.68), 11,935 (18.30), and 14,626 (22.43 %) matches, respectively; providing a final total of 30,304 annotated unigenes (46.47 %) (Table 3).

Table 3 Statistics of annotation results

The 20,659 unigenes with a KO annotation were mapped to 258 pathways (Additional file 1: Table S1). The top 3 pathways were metabolic pathways (1,517 unigenes), RNA transport (562 unigenes), and regulation of actin cytoskeleton (501 unigenes). Importantly, the main biological pathways involved in germ cell meiosis during gonadal development were cell cycle (ko04110, 233 unigenes), DNA replication (ko03030, 61 unigenes), mismatch repair (ko03430, 41 unigenes), base excision repair (ko03410, 73 unigenes), oocyte meiosis (ko04114, 141 unigenes) and homologous recombination (ko03440, 41). The primary pathways involved in oogenesis, spermatogenesis and gonadal maturation were MAPK signaling pathway (ko04010, 309 unigenes), GnRH signaling pathway (ko04912, 118 unigenes), progesterone-mediated oocyte maturation (ko04914, 153 unigenes), focal adhesion (ko04510, 412 unigenes), calcium signaling pathway (ko04020, 229 unigenes), ubiquitin mediated proteolysis (ko04120, 344 unigenes) and wnt signaling pathway (ko04310, 228 unigenes). Further investigation of these pathways would be expected to reveal the regulatory mechanisms governing reproductive processes in L. vannamei.

Sex-biased gene identification and enrichment analysis

Comparison of gene expression levels in testis and different ovarian stages revealed similar numbers of differentially expressed genes (testis vs. PreO, 22,808 DEGs; testis vs. Day1O, 22,168 DEGs; testis vs. Day6O, 22187 DEGs), and over 84 % of the DEGs between testis and different ovarian stages are always the same (Fig. 2); and unigenes of testis and PreO libraries were selected for further analysis. Among 45,998 unigenes expressed in testis and ovary with lengths > 200 bp and FPKM > 1, 22,808 (45.6 %) showed sex-biased expression (Additional file 1: Table S2); 3,529 unigenes were up-regulated in the ovary and 19,279 were up-regulated in the testis (Fig. 2). Interestingly, the number of male-biased genes was much greater than that for female-biased genes, and the testicular up-regulated genes showed a greater average in fold-change than did the female-biased transcripts (Calculating from data in Additional file 1: Table S2, the average FC for male-biased and female-biased genes was 106.9 and 21.1, respectively).

Fig. 2
figure 2

Statistical evaluation of differentially expressed genes between testis and ovaries that were obtained at different points in time. a: Venn diagram of DEGs that showed more expression in the testis than in the ovaries; b: Venn diagram of DEGs that showed more expression in the ovaries than in the testis

Gene ontology (GO) annotation was performed to classify sex-biased genes (Fig. 3). Results also showed a much higher number of male-biased genes than female-biased genes. Most of the GO terms showed significantly higher counts for male-biased genes. At the molecular function level, there were significantly more DEGs in catalytic activity and binding GO terms than in other terms. The cell part and cell GO terms had greatest DEG counts at the cellular component level. At the biological process level, cellular processes, metabolic processes, and single-organism processes had the top counts. By enrichment analysis, there were 49, 53, and 312 enriched GO-terms for male-biased DEGs, and 111, 155, and 380 enriched GO-terms for female-biased DEGs, in the cellular component, the molecular function and the biology process categories, respectively (Additional file 1: Table S3). The most enriched GO-terms for male-biased DEGs included DNA-directed RNA polymerase in the cellular component category, RNA polymerase activity in the molecular function category and isoprenoid biosynthetic and metabolic process in the biology process category (Table 4). As for female-biased DEGs, the most enriched GO-terms at cellular component level was mitochondrion and ribosome, the most enriched GO-terms associated with molecular function were aminoacyl-tRNA ligase activity, and the most enriched biology process were translation, cellular biosynthetic process and organic substance biosynthetic process (Table 4).

Fig. 3
figure 3

Distribution of DEGs among GO terms in biological processes, molecular functions, and cellular components

Table 4 Representative enriched GO terms (Bonferroni-corrected P-value ≤ 0.05) and pathways for sex-biased (Q value ≤0.05) genes

Enriched pathways (Table 4) associated with male-biased DEGs were mRNA surveillance pathway, RNA transport, ABC transporters, terpenoid backbone biosynthesis etc. And enriched pathways for female-biased DEGs were oxidative phosphorylation, metabolic pathways, aminoacyl-tRNA biosynthesis, steroid biosynthesis, ribosome biogenesis in eukaryotes, DNA replication, vasopressin-regulated water re-absorption, neuroactive ligand-receptor interaction, cell cycle, nucleotide excision repair etc.

According to the enrichment analysis results, transcription was the most enriched activity in testis and translation and mitochondrion synthesis were the most enriched activities in ovary. Further studies on DEGs associated with the GO terms and pathways are needed to reveal the different molecular mechanisms that apparently exist between ovarian and testicular developmental processes.

Real-time PCR confirmation of sex-biased genes

One hundred and four DEGs with different levels of FPKM and FDR values were selected for RT-PCR validation to determine the veracity of the transcriptomic analysis. In male-biased genes, 6 of 14 testis-specific genes showed a specific testicular expression pattern by semi-quantitative PCR (Fig. 4), and the other 8 genes were significantly up-regulated in the testis as assessed with quantitative real-time PCR (Table 5). For the 33 male-biased DEGs, 32 showed the significantly up-regulated expression in testis, and only one gene was up-regulated in ovary (Table 5). In the female-biased genes selected, 9 of 14 PreO-specific genes in the ovary showed a specific expression pattern by semi-quantitative PCR (Fig. 5), and the other 5 genes were significantly up-regulated in ovary (Table 6). For the 43 female-biased DEGs, 33 showed up-regulated ovarian expression, 5 showed up-regulated testicular expression and 5 did not show expression differences between ovary and testis (Table 6). In total, 9 ovary-specific, 6 testis-specific, 45 testicular up-regulated and 39 ovarian up-regulated unigenes were obtained by real-time PCR confirmation.

Fig. 4
figure 4

Testis-specific genes. M, DNA marker DL2000; T, testis; O, ovary. TS are the series numbers for testis-specific genes from bioinformatics analysis

Table 5 Summary of RT-PCR results for male-biased DEGs
Fig. 5
figure 5

Ovary-specific genes. M, DNA marker DL2000; T, testis; O, ovary. OS are the series numbers for ovary-specific genes from bioinformatics analysis

Table 6 Summary of RT-PCR results for female-biased DEGs

SSRs in the gonadal transcriptome of L. vannamei SSR detection was performed with the MicroSAtellite (MISA) software using all-unigenes as references. A total of 13,233 SSRs were identified in 10,411 unigene sequences, with 2,192 (21.05 %) unigenes containing more than one SSR. Among the different types of SSRs, the di-nucleotide repeats were the most abundant, accounting for 41.83 %, followed by the tri-nucleotide repeats (28.25 %), mono-nucleotide repeats (24.79 %), quad-nucleotide repeats (2.86 %), penta-nucleotide repeats (1.23 %) and hexa-nucleotide repeats (1.04 %) (Fig. 6). The distributions of SSRs in unigenes and primers for SSRs are shown in Additional file 1: Table S6.

Fig. 6
figure 6

Distribution of putative SSRs in the transcriptome of the L. vannamei gonad. X-axis, distribution of SSR types; Y-axis, number of different SSR types

Discussion

L. vannamei is one of the most important aquaculture species and has attracted the attention of many researchers. A high-throughput RNA sequencing strategy has already been used in studying the mechanisms of viral resistance [5053] and nitrite adaptation [54], as well as in revealing gene expression patterns in muscle, hepatopancreas, gills and pleopods of normal shrimp [55, 56]. However, data regarding its reproductive or sex phenotypes is lacking. Herein, we performed RNA-Seq on the gonads of adult males and females in an attempt to unravel sex-related genes. The reason we chose adult individuals for gonadal transcriptomic analysis was that we focused primarily on genes involved in gonadal development and gametogenesis. Moreover, these cell types showed more diversity in adult gonads; for example, oogonia/spermatogonia, and primary and secondary oocytes/spermatocytes are all present in adult gonads (Additional file 1: Figure S1). This stage selection was appropriate since we identified in our assembled sequences a great majority of genes involved in gonadal development, oogenesis, and spermatogenesis, as well as some sex-determining genes previously reported.

Reference gonadal transcriptome of L. vannamei

De-novo assembly of the sequencing data from four L. vannamei gonadal cDNA libraries resulted in 65,218 unigenes with an average length of 1021 bp. Approximately half of the unigenes (46.47 %) had significant matches against existing sequences and 40.65 % were annotated using Gene Ontology terms. The L. vannamei sequences possessed top matches with D. pulex sequences, since this latter species is also a crustacean and its entire genome has been sequenced [57]. KEGG pathway analysis showed that, the main biological pathways involved in gonadal development and gametogenesis were obtained, which will facilitate the further in-depth analysis of the relationships between different genes in the transcriptome of gonads. However, there were far fewer pathways involved in spermatogenesis than in oogenesis, and more female-biased genes (75.6 %) than male-biased genes (37.9 %) showed NR hits. This could be due to the fact that researchers have mainly focused on the reproductive phenotypes of female crustaceans because of their economic importance to aquaculture [15, 26]; thus, genetic information for female crustaceans has been much richer than for males.

Sex-determining genes

The transcription factor, sex-determining region of the Y-chromosome (Sry), is the sex-determining gene in mammalians. The correct expression of Sry triggers testicular development, while decreased or delayed Sry expression leads to testicular defects [58]. Sry acts by activating the Sox9 gene [59], and then, downstream genes of Sox9 promote testis development. Once expression of sox9 or its downstream genes is impeded, the gonad can switch to ovarian development [60]. Doublesex (Dsx) is a male sex-determining gene that plays an important role in controlling sexual dimorphism in invertebrate organisms such as nematodes, insects and Daphnia magna, as well as in vertebrates [6163]. Fem-1 is also a male sex-determining gene required for somatic and germline development in the testis of C. elegans [64, 65]. Herein, we identified all male sex-determining genes mentioned above in L. vannamei, including a testicular up-regulated Sry (Unigene7231_All), sox9 (CL86.Contig2_All, Unigene2458_All), Dsx (Unigene35364_All) and seven fem-1 or fem-1-like genes (Unigene24042_All, Unigene9434_Al, Unigene9463_All, Unigene24047_All, Unigene11407_All, Unigene2036_All, Unigene15025_All).

Foxl2 (Forkhead box l2) encodes a conserved transcription factor and is a sex determiner in female vertebrates. It is preferentially expressed in the ovary and involved in ovarian differentiation and maintenance by repression of testis-specific genes [66, 67]. Although foxl2 orthologs have been cloned in some invertebrates, its role in sex determination or differentiation remains unclear [68]. We found foxl2 (Unigene24037_All) in our gonadal transcriptomic data, and intriguingly, it shows up-regulated expression in testis.

The identification of known sex-determining genes in the present study proved that transcriptomic sequencing is a powerful tool in mining sex-determining genes. The exact role of these genes in sex determination of L. vannamei remains to be examined.

Identification of genes involved in germline determination and development

Gonadal tissue is composed of germ cells and somatic cells. Germ cells are the only cells that can undergo both meiosis and mitosis, and finally differentiate into gametes (spermatozoon and oocyte); and sperm-oocyte binding in sexually reproducing organisms constitutes the beginning of the next generation. To maintain germ cell fate and regulate its development and differentiation, gene expression patterns in germ cells need to be regulated at the transcriptional and post-transcriptional levels. Moreover, it is also important to maintain the niche formed by somatic cells so as to maintain the characteristics of germ cells. Transcriptional regulation of gene expression is widely used in early differentiation stages of germ cells. In mouse, interaction of blimp1 (B-lymphocyte-induced maturation protein 1) with Prmt (histone methyltransferase) represses chromatin transcription and prevents trans-differentiation from germ cells to somatic cells. In the blimp1-knockout mouse, germ cells cannot form [6972]. In the present study, a blimp1 (Unigene15201_All), three Prmt (PRMT2: Unigene7507_All, PRMT3: CL2301.Contig2_All, CL2301.Contig1_All, PRMT6: Unigene14757_All), and a chromatin target of Prmt1 protein genes (Unigene4147_All) were identified, indicating a similar regulation of germ cell fate at the transcriptional level in shrimp.

The regulation of RNA translation, however, is more common and important in germ cells. A battery of evolutionally conserved RNAs is essential for germ cell proliferation, survival and differentiation; some of these genes are called germ cell markers, because they exist throughout the germ cell life cycle. Vasa, first isolated in Drosophila, is expressed throughout germ cell developmental stages in many invertebrates and vertebrates species [73, 74]. As a molecular marker, vasa was used to track germ cell specification, migration and differentiation [7577]; and loss of function of vasa led to germ cell specification failure and defects in germ cell development in Drosophila, mouse and nematode [7881]. Vasa were hypothesized to function by inhibiting expression of genes that are responsible for somatic differentiation in germ cells [82]. This hypothesis was further supported by studies of nanos, a target of vasa, which can act together with pumilio to repress mitosis, transcription and translation during the development of the germline [8385]. The DAZ (Deleted in Azoospermia) family is another well-known gene family that consists of three genes specifically expressed in germ cells: BOULE, DAZ-Like (DAZL) and DAZ. DAZ and DAZL are expressed throughout almost the entire life cycles of germ cells, and are essential for germ cell determination, differentiation and maturation; while BOULE is mainly involved in germ cell meiosis [8689]. In the present study, vasa (Unigene1169_All), nanos (Unigene22266_All), pumilio (CL285.Contig16_All), boule (Unigene1947_All) and three DAZ-associated protein (daz interacting protein 1, Unigene13632_All; DAZ-associated protein 1-like, Unigene20_All; DAZ-associated protein 2, Unigene4428_All) were identified, offering abundant genetic information for further study of germline determination and development.

miRNAs are also involved in RNA regulation of germ cells. Knockout of the Dicer gene prevented splicing of miRNAs--the regulator of germ cell-specific genes-- leading to the termination of germ cell development [90]. Additionally, the PIWI-mediated piRNA pathway regulates the expression of germ cell determination genes, such as vasa and oskar. The PIWI-null mutant fly (Drosophila) and mouse exhibited a deficiency in germ cell formation and differentiation [91, 92]. The existence of Dicer (Unigene17990_All) and two piwi (piwi 1, Unigene399_All; piwi 2, Unigene21063_All) in our transcriptome made miRNA-mediated regulation in L. vannamei possible.

The gonadal cell niche surrounding germ cells is also important for the preservation of germ cell fate [93]. For example, Notch signaling and its downstream genes in nematode [94], BMPs (bone morphogenetic proteins) in Drosophila [95] and GDNF (glial cell line-derived neurotrophic factor, GDNF) in mammals are produced by gonadal somatic cells [96] such as Sertoli cells, and are the known pathway or factors that might control meiotic timing and germ cell numbers. We retrieved two notch genes (Notch, Unigene14535_All; Notch2, Unigene14705_All); two BMP genes (BMP1, Unigene14770_All; BMP7, Unigene12011_All); and two GDNF associated genes (GDNF-inducible zinc finger protein 1, Unigene4392_All; GDNF family receptor alpha-3, Unigene7144_All) in the gonadal transcriptome of L. vannamei. Moreover, 101 sequences for genes in the Wnt signaling pathway were also obtained. This information will further benefit the study of the mechanisms of interaction between germ cells and somatic cells in L. vannamei.

Sex-biased genes involved in spermatogenesis and oogenesis are identified

Except for similar cell lines in both male and female gonads, there are remarkable differences between the ovary and testis that involve morphology, cell types and biologic processes; and also gene expression patterns and molecular regulatory mechanisms. A large number of gonadal differentially expressed genes have been identified in the present transcriptome, and many more genes were over-expressed in the testis (19,279) compared to the ovary (3,529). This male-biased gene expression pattern in the gonads has been observed in Drosophila [97, 98], Caenorhabditis elegans [99], fishes [100, 101] and mammals [102], as well as in the green mud crab (Scylla paramamosain) [36], a crustacean species. This phenomenon may be explained by the previous assumption that male development is regulated by activating a series of testis-specific genes and/or by repressing genes vital for ovarian development [103, 104].

The most important biologic processes in the testis and ovary are spermatogenesis and oogenesis, respectively. As expected, testis-specific and differentially expressed genes include members with functions pertinent to spermatogenic stages. For example, the male-biased gene spermatogonial stem-cell renewal factor (CL2114.Contig2_All), MLH1 (CL2662.Contig1_All, CL2662.Contig2_All) and RHAU (Unigene22544_All) are essential for early stages of spermatogenesis, including spermatogonial maintenance and differentiation; and knockout of RHAU or MLH1 resulted in deficiencies in spermatogonial or spermatocytic differentiation in the mouse [105107]. Genes involved in later stages of spermatogenesis include Spermatogenesis-associated proteins (Spata2, Unigene8309_All; Spata5 (CL1808.Contig1_All, CL1808.Contig2_All); Spata20, Unigene9757_All), spermatogenesis regulator (Unigene14897_All); sperm associated antigens ciliary and flagellar proteins such as Sperm-associated antigen (Spag2, Unigene7744_All; Spag7, Unigene482_All); Sperm-specific protein PHI-2B/PHI-3 (Unigene12515_All); Sperm protamine P2 (Unigene20056_All); Spermatid-specific protein T1 (Unigene12459_All, Unigene17251_All); Round spermatid basic protein 1 (Unigene6458_AlL); major sperm protein (Unigene9914_All); and Motile sperm domain-containing protein 2 (Unigene9914_All). In addition, the testis-biased genes testis-specific serine proteases (TESSP) (Unigene17312_All, Unigene2408_All, Unigene25019_All) are involved in germ cell survival during meiosis [108, 109], and T-complex testis-specific protein 1(t-complex 11) (Unigene4999_All) might function in insemination processes so as to stimulate sperm capacitation and inhibit the acrosome reaction [110].

Among the female-biased sequences, we focused on genes involved in oogenic processes. Vitellogenesis is the central process in oogenesis. In crustaceans, vitellogenin--the precursor of yolk protein--is synthesized in the hepatopancreas and ovary [111, 112]. The synthesis of vitellogenin in oviparous vertebrates is regulated by the E2-ER-Hsp90-Vtg pathway, in which estrogen receptor (ER) and heat shock protein 90 (Hsp90) mediate the enhancement of vitellogenin transcription by estrogen or estrogen-like hormones [113, 114]. In our transcriptomic data, we identified vitellogenin (CL1767.Contig3_All), vitellogenin receptor (VgR) (Unigene2102_All), estrogen receptor (CL2390.Contig2_All, Unigene9144_All), estrogen receptor-binding protein (Unigene2782_All) and HSP90 (Unigene15703_All) genes, indicating that an E2-ER-Hsp90-Vtg pathway exists in crustaceans. Further experiments should focus on whether and how E2, ER and Hsp90 regulate the synthesis of vitellogenin in shrimp.

The second meiotic division and oocyte maturation constitute another vital process in oogenesis. MPF (maturation-promoting factor), mainly composed of CDK1 (encoded product of the Cdc2 gene) and Cyclin B proteins, is a primary regulator of this process. MPF can be activated by G2 to reach the M phase transition, and the active MPF promotes rapid maturation of oocytes [115]. In the present study, we identified G2/mitotic-specific cyclin-A (Unigene10629_All), cyclin-B (Unigene4127_All, Unigene1805_All), cyclin-B3 (CL352.Contig2_All), cyclin-F (Unigene11044_All); and the components of MPF, Cdc2 (Unigene3813_All) and Cyclin B (CL2390.Contig3_All). The identification of these genes will facilitate further study on the artificial induction of oocyte maturation in L. vannamei. In addition, oocyte quality is hypothesized to be improved by high levels of the expression of superoxide dismutase genes, which could neutralize reactive oxygen species and protect the embryo during its development [116]. Herein, we found four superoxide dismutase homologs (Unigene14142_All, Unigene419_All, Unigene6388_All, Unigene8984_All) that are predominantly expressed in the ovary, indicating a similar maternal protection of offspring by shrimp; and antioxidant defense techniques may be useful in improving the overall quality of shrimp larvae.

Verification of sex-biased genes

Among 104 differentially expressed gonadal unigenes, 9 genes specifically expressed in ovary and 6 specifically expressed in testis were verified by semi-quantitative RT-PCR; while of the remaining DEGs tested, 45 testis-predominant unigenes and 39 ovary-predominant unigenes were verified by quantitative real-time PCR analysis. These gonadal specific and differentially expressed genes must play important roles in certain parts of gonadal development in L. vannamei. However, there are some differences observed between transcriptomic analysis and qRT-PCR data concerning fold-changes in gene expression in testis and ovary, and one male-biased DEG showed ovarian up-regulated expression and 5 female-biased DEGs showed testis-predominant expression. This is most likely due to the differences that often existing between the bioinformatics analysis of next-generation sequencing technology and the actual experimental analysis [117]; and biologic replicates (which are more valuable and accurate for detecting differently expressed genes), were not available during the course of the present study [118].

Discovery of SSR markers

Simple sequence repeats (SSRs) are single-locus markers with high allelic variation and are widely applicable to molecular genetics studies, including research involving genetic diversity assessment, comparative genomics, gene flow characterization, and genetic linkage mapping [119]. In L.vannamei, a number of microsatellite sequences have been reported for genetic map construction and quantitative trait locus (QTL) detection [120122], and next-generation sequencing has facilitated the discovery of a relatively large set of SSRs in the hepatopancreas of the shrimp by the present team [53, 123]. The SSRs identified in this study can serve as genetic markers for quantifying genetic diversity of germplasm within breeding and imported populations, identifying paternity of the breeding families and managing broodstocks, which are the major issues for the selectively breeding and farming of introduced species. They can also be used in QTL mapping and marker-assisted select ion (MAS) in L. vannamei and other phylogenetically similar shrimp species. This may promote genetic gain to traits of interest for aquaculture, such as reproduction, sex determination, growth, and tolerance against environmental stress.

Conclusions

The present study encompasses the first large-scale RNA sequencing of shrimp gonads. We have identified many important sex-related functional genes, GO terms and pathways, all of which will facilitate future research into the reproductive biology of shrimp. The SSRs detected in this study can be used as genetic markers for germplasm evaluation of breeding and imported populations.