Background

Numerous studies have demonstrated that non-coding RNAs (ncRNAs) are widely expressed in both prokaryotes and eukaryotes [14]. Furthermore, the number of ncRNAs substantially increases with the complexity of the organism, whereas the number of protein-coding genes remains relatively static. In bacteria, unicellular eukaryotes, and invertebrates, the coding sequences constitute approximately 95, 30, and 20% of the genomic DNA, respectively. In mammals, open-reading frames only account for approximately 1–2% of the genomes [59].

NcRNAs include highly abundant and functionally important RNAs, such as transfer RNA (tRNA) and ribosomal RNA (rRNA), as well as other small, stable RNAs, such as small nuclear RNAs (snRNAs), small nucleolar RNAs (snoRNAs), RNase P and mitochondrial RNA processing (MRP) RNA, signal recognition particle (SRP) RNA, and telomerase RNA. These RNAs have been characterised and are involved in splicing, ribosome biogenesis, translation, and chromosome replication [10, 11]. Recent transcriptomic and bioinformatic studies have also identified an increasing number of new ncRNAs whose function has not been validated [1216]. Hence, the discovery and analysis of ncRNAs has become an important step in our understanding of genomic structure and will expand our knowledge of the function and the regulatory roles of ncRNAs in the cell cycle and development.

In recent years, ncRNAs have been identified using experimental methods and computational predictions in several fungi [3, 4, 1722]. A large number of non-coding RNA genes, including 33 box C/D snoRNA genes, have been predicted in the genome of Schizosaccharomyces pombe. Functional analyses of 20 Box H/ACA snoRNAs indicated that the snoRNAs evolved in coordination with rRNAs to preserve post-transcriptional modification sites among distant eukaryotes [3, 4, 20]. A comparative genomics analysis of seven different yeast species identified a substantial number of evolutionarily conserved, structured ncRNAs, suggesting their roles in post-transcriptional regulation [20]. NcRNAs that participate in the cleavage and processing of tRNAs were observed in Aspergillus fumigatus[21]. An extensive analysis of snoRNA genes from Neurospora crassa indicated a high diversity of post-transcriptional modification guided by snoRNAs in the fungus kingdom [22]. Thus far, the ncRNAs of dermatophytes have not been studied.

Trichophyton rubrum is the most common dermatophyte that can infect human keratinised tissue (skin, nails, and, rarely, hair) [2325]. T. rubrum has a 22.5-Mbp haploid nuclear genome consisting of five chromosomes that range in size from 3.0–5.8 Mbp and a 27-kbp circular mitochondrial genome [26]. The Broad Institute has sequenced the T. rubrum genome and predicted more than 8,700 protein-coding genes. However, apart from rRNAs and tRNAs, no other ncRNAs have been annotated and characterised within the T. rubrum genome [26]. In the present study, we constructed an ncRNA library (ranging from 70–500 nt) and identified ncRNAs in T. rubrum using an RNA-Seq method. A total of 352 ncRNA candidates were characterised, including 198 entirely novel ncRNAs and 154 known ncRNAs. We also analysed the sequence conservation, and genomic location of these ncRNAs in six other dermatophytes. Our results may guide further studies of the important roles of ncRNA in T. rubrum and provide important complementary information to the annotation of the T. rubrum genome.

Results

Identification of ncRNA candidates inT. rubrum

To obtain a global view of ncRNAs in T. rubrum, we extracted total RNA from the conidia and mycelia phases and generated a small RNA cDNA library with size-fractionated total RNA ranging in size from 70–500 bp. After sequencing on the 454/Roche sequencing platform, a total of 87,601 reads were obtained and mapped to the T. rubrum genome. Next, the reads that mapped to the same genomic loci were clustered, resulting in 4,432 unique contigs. After removing the coding RNA and matches to tRNAs and rRNAs, the remaining 352 clusters (corresponding to 56,550 reads) were considered ncRNA candidates. Of these candidates, 154 were predicted to align with Rfam sequences and the remaining 196 were novel ncRNA candidates (Figure 1; for detailed information, see Additional file 1: Table S1).

Figure 1
figure 1

Detection of ncRNA candidates in T. rubrum by sequencing a size-fractionated cDNA library. (A) The distribution of 87,601 reads from the constructed small cDNA library of T. rubrum in different RNA classes. (B) The numbers of ncRNAs from different regions in the T. rubrum genome. (C) The number of different classes of ncRNAs are displayed in brackets.

Characteristics of ncRNA candidates

Of the 352 identified ncRNA candidates, 234 mapped to loci within 1 kb of the closest coding gene, implying a possible functional relationship. Some of the ncRNA clusters located in the immediate vicinity of a protein-coding region might be processed from the 5′- or 3′-UTR of the corresponding mRNA. Among the 352 ncRNA clusters, 82 were intronic and 29 corresponded to non-annotated intergenic regions of the T. rubrum genome (Figure 1). To verify the expression and sizes of candidate ncRNAs, we selected the spliceosomal snRNAs U1, U2, U4, U5, and U6 and 15 randomly selected novel ncRNA candidates to use in northern hybridisation. The results are shown in Figure 2.

Figure 2
figure 2

Northern blotting analysis of T. rubrum ncRNA candidates. M. RiboRuler Low Range RNA Ladder (Fermentas), 1. snRNA U1, 2. snRNA U2, 3. snRNA U4, 4. snRNA U5, 5. snRNA U6, 6. Trnc_2843, 7. Trnc_3589, 8.Trnc_369, 9. Trnc_1414, 10. Trnc_293, 11. Trnc_305, 12. Trnc_1472, 13. Trnc_961, 14. Trnc_608, 15. Trnc_4262, 16. Trnc_1437, 17. Trnc_2618, 18. Trnc_3096, 19. Trnc_1686, 20. TRnc2844, and 21. 5.8S rRNA. The lengths and other information describing the ncRNAs from the northern blotting analysis are shown in Additional file 1: Table S1.

snRNA candidates

The spliceosome contains five essential small nuclear RNAs (snRNAs)—U1, U2, U4, U5, and U6—that are essential components for assembling the spliceosome and accomplishing the intricate task of intron removal from newly synthesised eukaryotic RNAs [17, 18, 27]. Here, we identified the genomic loci of snRNAs U1, U2, U5, and U6, each of which exhibited a unique genomic location. U5 and U6 were the most abundant snRNAs among our data, found in 15,583 and 9,034 reads, respectively. The expression of U2 and U4 was lower than the other snRNA candidates; we found only 163 reads of U2 and 146 reads of U4. These results are in agreement with those of the small ncRNA transcriptome analysis of another filamentous fungus, A. fumigatus[21, 28]. U4 was not initially identified in our data. To find the U4 genomic locus in T. rubrum, we downloaded the U4 sequences of A. fumigatus, A. oryzae, and A. niger from Rfam to use as query sequences to search for homologues in the T. rubrum genome using BLASTn. One genomic locus was identified. Corresponding reads assigned to the same locus had been sequenced and clustered in our data but had been eliminated because the percentage of ORF in the cluster was greater than 80%.

We aligned the T. rubrum snRNA U1, U2, U4, U5, and U6 candidates to the genomes of six T. rubrum-related dermatophytes to predict the homologues in these genomes by BLASTn. The homologues were compared using the multiple sequence alignment software ClustalW2, revealing that all snRNAs were highly conserved in these seven dermatophytes (Table 1). High variance was observed among the sequences and lengths of these snRNAs in T. rubrum and their homologues in other fungi; however, these snRNAs were conserved at the secondary structure level, with conserved regions in the hairpin loops (Additional file 2: Figure S2). These results correspond with previous reports on A. fumigatus[21].

Table 1 Conservation level of snRNAs in T. rubrum and related dermatophytes

snoRNAs

In eukaryotic cells, two major classes of small nucleolar ncRNA (snoRNA) have been identified: C/D box snoRNAs, which are involved in the 20-O-methylation of ribosomal, spliceosomal, and transfer RNAs (the latter in Archaea only), and H/ACA snoRNAs, which guide pseudouridylation in these RNA species [29, 30].

To predict the two classes of snoRNAs and their putative targets in our data, we used the Snoscan and SnoGPS programs, defining the potential target sequences as the 5.8S, 18S, and 25S rRNAs of T. rubrum and all snRNAs identified in our data [17, 18]. We identified 96 snoRNAs, including 58C/D box snoRNAs (46 had homologues in other organisms) and 38H/ACA snoRNAs (nine had homologues in other organisms). We identified 37C/D box snoRNAs as putative targets, most of which were predicted to guide methylation of 18S and 25S rRNAs. We also identified five C/D box snoRNAs (TRnc_801, TRnc_3573, TRnc_4113, TRnc_1272, and TRnc_1271) that were predicted to guide the methylation of snRNAs U1, U2, and U5. Of the 37C/D box snoRNAs, 22 had different modification sites in target rRNA or snRNA sequences. No rRNA or snRNA targets were identified in the remaining 21C/D box snoRNAs (Table 2). Additionally, the 30 identified H/ACA box snoRNAs were identified as guiding the pseudouridylation of 45 sites in rRNAs (Table 3. Detail information about potential base-paring between H/ACA box snoRNAs and rRNA shown in Additional file 3: Figure S3), whereas no pseudouridine sites were predicted on any snRNAs.

Table 2 C/D box snoRNA candidates identified in T. rubrum
Table 3 H/ACA box snoRNA candidates identified in T. rubrum

Other types of ncRNA inT. rubrum

We also identified 51 other ncRNA genomic loci, such as pri-miRNAs or pre-miRNAs, RNAse MRP, and telomerase RNA. miRNAs related transcriptional loci were the most widely distributed ncRNAs in the T. rubrum genome; for example, the mir-598 miRNA family had 13 transcriptional regions and mir-533 had eight. In our data, these miRNA homologies of ncRNAs, which varied from 70–270 bp, were much longer than the lengths of mature miRNAs (18–25 bp), they may be pri- or pre-miRNAs candidates.

Evolutionary conservation of the ncRNAs inT. rubrum

To analyse the evolutionary conservation of ncRNAs in T. rubrum, we used BLASTn to align the sequences of all 352 ncRNAs to the genomes of six related dermatophytes: T. equinum, T. tonsurans, T. verrucosum, A. benhamiae, M. gypseum, and M. canis. The loci of 102 of these sncRNAs were also identified in all six genomes (Additional file 4: Table S4). We found that the sequences of these sncRNAs were highly conserved, with sequence identities above 85%. Of the 352 ncRNAs, ten had no hits in other genomes and might be specifically expressed in T. rubrum (Table 4). To further analyse the conserved ncRNAs in dermatophytes, we employed BLASTn to align all of the sncRNAs with the NCBI non-redundant nucleotide database (NT) after excluding Arthrodermataceae. These BLASTn results were processed by MEGAN4, which placed each ncRNA sequence in a node in the NCBI taxonomy [31].

Table 4 The ncRNA candidates specifically expressed in T. rubrum

As shown in Figure 3, a total of 179 ncRNA sequences were classified under cellular organisms, with 166 clustered to the Eukaryota node (approximately 47.2% of the total 352 ncRNAs). Of these ncRNAs, 97 were assigned to Fungi, indicating that these ncRNAs were conserved in fungi; all snRNAs were assigned to this node. Of the ncRNAs under the Fungi taxonomic level, 16 and 44 were assigned to Onygenales and Trichocomaceae, respectively, supporting the close relationship between the dermatophytes and the fungi in these families. Seventy-three ncRNAs were assigned to phyla distantly related to fungi, including three assigned to the root, seven to cellular organisms, 27 to the Eukaryota node, 30 under Bilateria, and six under Bacteria. These results suggest that some ancient ncRNAs are preserved in T. rubrum.

Figure 3
figure 3

MEGAN phylogenetic analysis of T. rubrum ncRNA candidates. A MEGAN tree with the taxonomic affiliation of 352 ncRNAs that were identified by BLASTN of all sequences in NT after excluding Arthrodermataceae according to NCBI taxonomy. Each circle of the MEGAN tree represents a taxon in the NCBI taxonomy database and is labelled by its name and the number of snRNAs that were assigned to the taxon and not to a subtaxon. The size of the circles represents the number of ncRNAs.

Apart from the classified ncRNAs, the remaining 170 ncRNA candidates had no significant similarity to any nucleotide sequence in NT, including 154 unassigned ncRNAs and 16 ncRNAs with no hits. Of these unclassified ncRNAs, 27 existed in and were conserved in all six dermatophytes, indicating that these 27 ncRNAs were dermatophyte-specific ncRNAs (Table 5).

Table 5 The ncRNA candidates specifically expressed in dermatophytes

Discussion

RNA is emerging as a central player in cellular regulation, with active roles in multiple regulatory layers, including transcription, RNA maturation, RNA modification, and translational regulation [32]. Recent studies have revealed an unexpected complexity of regulatory RNAs, even in bacteria [2, 33]. In the present study, we first used an RNA-Seq method to analyse the ncRNAs in the genome of the dermatophyte fungus T. rubrum. We identified 352 sncRNA candidates, including snRNAs, snoRNAs, miRNAs, and other types of ncRNAs; 196 novel ncRNAs were predicted. We further confirmed the genomic loci of these ncRNAs in T. rubrum. This work provides an important complement to the current annotation of the T. rubrum genome, which is currently comprised primarily of protein-coding genes.

Five types of snRNAs (U1, U2, U4, U5, and U6) were identified, and their secondary structures were predicted by RNAfold [27]. We found these snRNAs to be highly conserved among dermatophytes. We also detected 96 snoRNAs, including 55 that were annotated in other organisms and 41 that were novel snoRNAs. Using the Snoscan and snoGPS programs, we bioinformatically identified their potential target sites on rRNAs and snRNAs. miRNAs have been previously reported in some fungi, such as S. pombe, but have not been found in A. fumigatus[21, 34]. In our data, we detected 68 genomic loci corresponding to 12 miRNA families; the lengths of these ncRNAs varied from 80–270 bp, suggesting that they were pri-miRNAs or pre-miRNAs [35]. To analyse the evolutionary conservation of ncRNAs, we aligned the 352 snRNAs to six other dermatophyte genomes and the NT database; we found 27 dermatophyte-specific ncRNAs and 11 T. rubrum-specific ncRNAs.

Conclusions

In this study, sequences for ncRNAs were obtained in T.rubrum and characterized by sequence comparison to know ncRNAs in other organisms, some of which were presumably functionally characterized in other work. This will prove to be a valuable resource but real understanding of regulatory mechanisms will come from followon work from this strong beginning.

Methods

Strain and culture conditions

The T. rubrum strain BMU01672 was grown on potato glucose agar (Difco) at 28°C for ten days to produce conidia. The conidia were isolated as previously reported, introduced into YPD medium (2% dextrose, 2% Bacto-Peptone, and 1% yeast extract), and incubated at 28°C with constant shaking at 200 rpm (Innova 4230 Refrigerated Incubator Shaker; New Brunswick Scientific, Edison NJ) [36]. After culture, the mycelia were harvested and ground to a powder in liquid nitrogen for RNA extraction.

RNA extraction and cDNA library construction

Total RNA was extracted from conidia and mycelia using the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. Same amount of total RNA from conidia and mycelia was mixed and pooled on a denaturing 8% polyacrylamide gel [7 M urea and 1× TBE buffer (90 mM Tris, 64.6 mM boric acid, 2.5 mM EDTA, pH 8.3)]. We collected gel bands containing RNAs of 70–500 bp, excluding the 5.8S rRNA band. RNAs were passively eluted and then ethanol-precipitated. RNA size and concentration were quantified with the Agilent 2100 Bioanalyser and the Agilent RNA 6000 Pico Kit according to the manufacturer’s protocols. The fractionated RNA was dephosphorylated with FastAP (Fermentas) and ligated to the 3′-adaptor oligonucleotide (UUUUGACCACGGTACCCAG, RNA is underlined) by T4 RNA ligase (Promega). Subsequently, the RNA was reverse transcribed using oligo 3RT (CTGGGTACCGTGGTCAAA) and converted into double-stranded cDNA with a SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen). The ds-cDNA was purified using the MinElute Reaction Cleanup Kit (Qiagen) according to the manufacturer’s protocol.

454/Roche sequencing and data bioinformatic analysis

For 454/Roche sequencing, approximately 5 μg of the size-fractionated cDNA sample (70–500 bp) was blunted. The pieces were then ligated with short adaptors prior to amplification and sequencing. The sequencing run was performed using the method of Margulies et al.[37].

After 454 sequencing, the 5′ and 3′ adaptors were removed from the reads. Genome data for T. rubrum and six related dermatophytes (Trichophyton equinum, Trichophyton tonsurans, Trichophyton verrucosum, Arthroderma benhamiae, Microsporum gypseum, and Microsporum canis) were downloaded from the Broad Institute web site (http://www.broadinstitute.org/annotation/genome/dermatophyte_comparative/MultiDownloads.html).

The high-quality reads were mapped to the genome using BLAST (version 2.2.22) (Eval < 1e − 5). Then, reads that were 80% mapped to the genome were clustered according to their genomic position and assembled into contigs according to the genomic sequence at the corresponding loci. The ORFs in the contigs were predicted using getorf in the EMBOSS program (version 6.3.1). Contigs with less than 80% ORF were aligned to TrED EST sequences and the NCBI non-redundant protein sequence database (NR) [38, 39]. The clusters with no hits in the TrED EST sequences and NR were used for the following steps: (1) alignment to non-coding RNA sequences with rRNA sequences downloaded from Rfam and GenBank [40], (2) identification of tRNAs with tRNAscan-SE (version 1.1) [41], and (3) alignment of clusters to Rfam sequences using HMMER (version 3.0) [42] and INFERNAL (version 1.0.2). The criteria for identification of known ncRNAs were as follows: (1) percentage of ORF less than 80%, (2) no hits in NR, (3) not mRNA, and (4) with homologues in Rfam [Eval (HMMER and INFERNAL) < 0.01]. For new ncRNA identification, the criteria were as follows: (1) percentage of ORF less than 80%, (2) no hits in NR, (3) not mRNA, (4) not rRNA, (5) not tRNA, and (6) no hits in Rfam (Eval > 0.01).

Analysis of snRNAs folding and predication of snoRNAs putative targets

T. rubrum snRNAs are compared with the homologs in other fungi using the multiple sequence alignment software ClustalW2. The secondary structures of aligned sequences are predicted by RNAalifold [28]. The putative targets of snoRNAs were predicted by Snoscan and SnoGPS programs [17, 18]. The potential target sequences as the 5.8S, 18S, and 25S rRNAs of T. rubrum were downloaded from GenBank under the accession number JX431933.

To predict the two classes of snoRNAs and their putative targets in our data, we used the Snoscan and SnoGPS programs, defining the potential target sequences as the 5.8S, 18S, and 25S rRNAs of T. rubrum and all snRNAs identified in our data [17, 18].

Northern blot analysis

For the northern blot analysis, 10 μg of total RNA was separated by electrophoresis on an 8% polyacrylamide gel containing 7 M urea and then electrotransferred onto a nylon membrane (Hybond-N+; Amersham) using a semi-dry blotting apparatus (BioRad). A total of 24–30 mer DNA oligonucleotides antisense to snRNAs and 15 randomly selected ncRNA candidates were end-labelled with (γ32P)-ATP and hybridised at 45°C for 16 hr. After stringency washes, the blots were exposed to phosphor storage screens, which were then scanned with a Typhoon 9200 imager (GE Healthcare).

Nucleotide sequence accession numbers

The 352 ncRNAs sequences of T. rubrum were submitted to GenBank under the following accession numbers: KC352999 – KC353350.