RNA processing in the minimal organism Nanoarchaeum equitans
The minimal genome of the tiny, hyperthermophilic archaeon Nanoarchaeum equitans contains several fragmented genes and revealed unusual RNA processing pathways. These include the maturation of tRNA molecules via the trans-splicing of tRNA halves and genomic rearrangements to compensate for the absence of RNase P.
Here, the RNA processing events in the N. equitans cell are analyzed using RNA-Seq deep sequencing methodology. All tRNA half precursor and tRNA termini were determined and support the tRNA trans-splicing model. The processing of CRISPR RNAs from two CRISPR clusters was verified. Twenty-seven C/D box small RNAs (sRNAs) and a H/ACA box sRNA were identified. The C/D box sRNAs were found to flank split genes, to form dicistronic tRNA-sRNA precursors and to be encoded within the tRNAMet intron.
The presented data provide an overview of the production and usage of small RNAs in a cell that has to survive with a highly reduced genome. N. equitans lost many essential metabolic pathways but maintains highly active CRISPR/Cas and rRNA modification systems that appear to play an important role in genome fragmentation.
KeywordstRNA Gene Cluster Regularly Interspaced Short Palindromic Repeat Pseudouridine tRNA Molecule Reverse Gyrase
CRISPR associated protein
clustered regularly interspaced short palindromic repeats
polymerase chain reaction
small nucleolar RNA
Nanoarchaeum equitans is a 400 nm small archaeon isolated from hot submarine vent microbial communities whose growth relies on its attachment to the cell surface of the archaeon Ignicoccus hospitalis . Phylogenetic analyses based on its unusual ribosomal RNA sequences placed N. equitans into a novel phylum termed 'Nanoarchaeota'. However, different phylogenetic studies focused on ribosomal proteins and concluded that N. equitans represents a member of a fast-evolving euryarchaeal lineage related to the Thermococcales . The genome sequence of N. equitans Kin4-M revealed a minimal, compact genome of only 490 kilobases and an extremely high gene density with little noncoding DNA or pseudogenes . This highly reduced genome lacks almost all known genes for the synthesis of amino acids, nucleotides, cofactors, and lipids. Conserved operonic structures are absent and an unusually high number of genes is found in split variations [3, 4]. Examples of such splits are the two open reading frames encoding domains of the alanyl-tRNA synthetase or the reverse gyrase . Other unusual features concern the processing of RNA molecules. N. equitans was the first organism shown to require the assembly of tRNA halves to generate six essential functional tRNA isoacceptors . A heteromeric splicing endonuclease generates these mature tRNAs via an unusual trans-splicing reaction [6, 7, 8]. N. equitans is also the only currently identified organism that can survive without an RNase P molecule [9, 10, 11]. RNase P is an otherwise universal ribonucleoprotein complex that mediates the removal of 5' leaders in pre-tRNAs. The absence of both RNA and protein components of RNase P is compensated by genomic rearrangements that resulted in a removal of 5' leader sequences from all N. equitans tRNA genes, ensuring proper transcription initiation conditions.
The loss of many essential pathways has to be compensated by the transfer of metabolites between N. equitans and I. hospitalis . It is assumed that direct cell-cell surface contacts as well as interconnections via thin fibers fulfill this purpose . The N. equitans genome encodes a fairly complete set of proteins for replication, transcription and translation. In addition, surprisingly extensive sets of genes with proposed roles in DNA repair and RNA modification are annotated. Finally, two clustered regularly interspaced short palindromic repeats (CRISPR) arrays and a complete set of CRISPR associated (Cas) proteins are present. These systems are mainly characterized as adaptive antiviral defense systems even though the viral threat towards N. equitans is not known [14, 15]. In this study, RNA-Seq deep sequencing methodology was used to analyze the RNA components involved in the processing and maturation of tRNAs, rRNAs and CRISPR RNAs (crRNAs) to obtain insights into the usage of small RNA molecules in an organism that has to survive with a minimal and condensed genome.
Results and discussion
Abundance of RNA species
Maturation of tRNA molecules
Absence of RNase P
Potential low abundance structural RNA molecules were searched in intergenic regions but the otherwise universal RNase P RNA molecule could not be detected, which is in agreement with previous studies . Genomic rearrangements compensate for the loss of RNase P and ensure that all tRNAs start with a purine residue directly at the transcription initiation site. The RNA-Seq data allowed us to analyze the 5' tRNA termini and verified the absence of leader sequences. One interesting example is tRNATyr, which requires a C1 base for its recognition by the tyrosyl-tRNA synthetase. However, without RNase P, such tRNAs could not start with a pyrimidine residue and it was reported that an unsual G-1 extension solves the need for both a C1 base and a purine residue at the transcription start . As this unique acceptor stem is direct evidence for the absence of RNase P, the RNA-Seq reads were mapped to (i) the tRNA gene containing an intron and (ii) the mature intron-less tRNA. While there were significantly less reads detected for the mature tRNA due to problems of reverse transcription of a fully modified tRNA, the vast majority of all sequences that mapped to the tRNATyr locus did contain the G-1 extension.
Processing of CRISPR RNAs
Identification of C/D box and H/ACA box sRNAs
The C/D box sRNAs26 contains an inverted order of the conserved boxes with the box D upstream of box C. This observation could be an effect of circular box C/D evolution that has been recognized in, for example, Pyrococcus furiosus .
Mobile C/D box sRNAs
Alignment of the DNA stretches upstream and downstream of the identified small RNA termini enables the analysis of potential promoter and terminator elements. Previously, the conserved elements of nanoarchael tRNA and tRNA half gene promoters were identified . The promoters contain a clearly identifiable box A motif (5'-TTTAAA-3') 26 nucleotides upstream of the transcription start and the terminator contains a stretch of polypyrimidines (T-stretch) downstream of the tRNA gene. Both elements are described to be commonly employed by the archaeal RNA transcription machinery [22, 33] and can also be found for the H/ACA box sRNA. Transcription and processing of C/D box sRNAs is more diverse. Some C/D box sRNAs (sRNA12, 13, 15, 16, 21) contain their own promoter and termination signals and transcription starts with a purine residue. However, most C/D box sRNAs do not contain easily identifiable promoters or start with a pyrimidine residue, which indicates that they are processed during maturation. Interestingly, potential dicistronic tRNA-sRNA precursors were identified. The gene for the most abundant C/D box sRNA, sRNA8, lies immediately downstream of the gene for tRNAVal. Therefore, 3' processing of the sRNA8-tRNAVal precursor by RNase Z (NEQ064) automatically generates the 5' terminus of C/D box sRNA (Figure 5). This is, to the best of my knowledge, the first time that this processing activity has been observed in prokaryotes as tRNA-snoRNAs were previously thought to be unique to plants . Furthermore, two tRNAGly isoacceptors (tRNAGly(CCC), tRNAGly(TCC)) are located adjacent to the C/D box sRNAs3 and sRNA14.
In the presented study over 12 million RNA sequence reads were mapped to the minimal 0.49 million bp genome of N. equitans. The resulting sequencing depth allowed the detection of all predicted tRNA half precursors of the tRNA trans-splicing pathway. In addition, further evidence for the currently unique absence of RNase P in this organism was obtained. The analysis of the abundant small RNA population identified a considerable fraction of crRNAs as well as C/D box and H/ACA box sRNAs. These findings underline the importance of these two RNA fractions for an organism that lost most essential pathways for the synthesis of amino acids, nucleotides, cofactors, and lipids. It seems plausible that an organism that relies on the import of nucleotides would require their usage to be constrained. Nevertheless, crRNAs are abundant in the cell, which is in contrast to the silenced expression of CRISPR clusters found, for example, in some bacteria [37, 38]. N. equitans appears to require the constant expression of this interference system against the attack of mobile genetic elements even though viruses of N. equitans are yet to be discovered. The abundant C/D box and H/ACA box sRNA fraction showcases the importance of rRNA processing events for N. equitans that is mirrored in its large set of RNA processing enzymes. These ribonucleoprotein complexes are suggested to ensure proper modification (and processing) of rRNAs in hyperthermophilic growth conditions. In conclusion, analysis of the RNA fractions in the minimal N. equitans cell revealed the loss or degeneration of universal RNA molecules (tRNAs, RNase P) while other, seemingly less essential RNA species (crRNAs, C/D box and H/ACA box sRNAs) are found to be highly abundant. The identification of C/D box sRNAs adjacent to split protein encoding genes and tRNAs as well as within a tRNA intron suggests their involvement in genome fragmentation.
Materials and methods
Cell cultivation and RNA isolation
N. equitans Kin4-M cells were a kind gift of D Söll. The organism was grown in the Archaeenzentrum Regensburg (H Huber, M Thomm, K Stetter) in a 300 liter fermenter in simultaneous culture with I. hospitalis KIN4-I and purified by gradient centrifugation as described . Total RNA was isolated by SDS-lysis of the cell pellet and phenol/chloroform extraction as described  and small RNAs were purified from total RNA using the MirVana RNA extraction kit (Ambion (Paisley, UK).
N. equitants/I. hospitalis RNA (3 μg) was treated with T4 polynucleotidekinase to ensure proper termini for ligation. A protocol for the dephosphorylation of 2'3' cyclic phosphate termini was modified from : 1 μg of RNA was incubated at 37°C for 6 hours with 10 units T4 polynucleotidekinase and 10 μl 5 × T4 polynucleotidekinase buffer (NEB, Ipswich, MA, USA) in a total volume of 50 μl. Subsequently, 1 mM ATP was added and the reaction mixture was incubated for 1 hour at 37°C to generate monophosphorylated 5' termini. RNA libraries were prepared with an Illumina TruSeq RNA Sample Prep Kit and sequencing on an Illumina HiSeq2000 sequencer was performed at the Max-Planck Genomecentre, Cologne (Max Planck Institute for Plant Breeding Research, Köln, Germany).
Identification of small RNA species
Sequencing reads were trimmed by (i) removal of Illumina TruSeq linkers and poly-A tails, and (ii) removal of sequences using a quality score limit of 0.05. A total of 16,614,433 reads with an average length of 62.3 nucleotides were obtained after trimming. Of these, 626,555 reads below 15 nucleotides were removed, and 12,178,737 reads were mapped to the N. equitans reference genome (GenBank: NC_005213) with CLC Genomics Workbench 5.0 (CLC Bio, Aarhus, Denmark). The following mapping parameters were employed: mismatch cost, 2; insertion cost, 3; deletion cost, 3; length fraction, 0.5; similarity, 0.8). This program was also utilized to determine the coverage of individual RNA molecules. All predicted RNA molecules and their termini were manually verified and all intergenic regions were checked for the presence of RNA molecules with coverage of less than 1,000 reads. The following algorithms were used for the computational analysis of the data: RNA folding (Mfold ), tRNA gene prediction (tRNAScan-SE ), snoRNA gene prediction (snoscan ), C/D box sRNA target prediction (plexy ), H/ACA box sRNA target prediction (RNAsnoop ), crRNA identification (crisprdb ), RNA alignments (ClustalW2 ), RNA visualization (VARNA ). Gene annotations were obtained from GenBank and tRNA annotations were taken from .
Circular C/D box sRNAs 23 and 24 were amplified from the small RNA purification sample via the Thermoscript RT-PCR system (Invitrogen (Paisley, UK) with Thermoscript reverse transcriptase and Platinum Taq DNA polymerase according to the manufacturer's instructions. The RNA was denatured at 100°C for 5 minutes and snap-cooled on ice for 5 minutes to facilitate reverse transcription at 70°C through potential secondary structures of the RNA. The following oligonucleotides were employed: sRNA23For, 5'-CTGAATTTATGATGAAGAGCCTGGATGCAG-3'; sRNA23Rev: 5'- CATCATAAATTCAGAGTAGCGGCTTTCTTC-3'; sRNA24For, 5'- GCTGAACATCGGGTATACTGAATAGTGATG-3'; sRNA24Rev, 5'- CCGATGTTCAGCATTTTTAATATTGCTCTCAG-3'. The oligonucleotides partly overlap to ensure proper annealing to the sRNA template. PCR amplificates were cloned into a pCR2.1 TOPO vector (Invitrogen) and subjected to DNA sequencing (Eurofins MWG Operon (Ebersberg, Germany).
The RNA-Seq data are available at NCBI's Gene Expression Omnibus (GEO) website as series GSE38821.
I thank Michael J Hohn for the cultivation of N. equitans cells, Jeanette Schermuly and Andreas Su for technical help and Dieter Söll and Jing Yuan for advice and discussions. This work was supported by grants from the Deutsche Forschungsgemeinschaft (DFG, FOR1680) and the Max-Planck Society.
- 3.Waters E, Hohn MJ, Ahel I, Graham DE, Adams MD, Barnstead M, Beeson KY, Bibbs L, Bolanos R, Keller M, Kretz K, Lin X, Mathur E, Ni J, Podar M, Richardson T, Sutton GG, Simon M, Soll D, Stetter KO, Short JM, Noordewier M: The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc Natl Acad Sci USA. 2003, 100: 12984-12988. 10.1073/pnas.1735403100.PubMedPubMedCentralCrossRefGoogle Scholar
- 6.Randau L, Calvin K, Hall M, Yuan J, Podar M, Li H, Söll D: The heteromeric Nanoarchaeum equitans splicing endonuclease cleaves noncanonical bulge-helix-bulge motifs of joined tRNA halves. Proc Natl Acad Sci USA. 2005, 102: 17934-17939. 10.1073/pnas.0509197102.PubMedPubMedCentralCrossRefGoogle Scholar
- 12.Giannone RJ, Huber H, Karpinets T, Heimerl T, Kuper U, Rachel R, Keller M, Hettich RL, Podar M: Proteomic characterization of cellular and molecular processes that enable the Nanoarchaeum equitans--Ignicoccus hospitalis relationship. PLoS One. 2011, 6: e22942-10.1371/journal.pone.0022942.PubMedPubMedCentralCrossRefGoogle Scholar
- 21.Zhang J, Rouillon C, Kerou M, Reeks J, Brugger K, Graham S, Reimann J, Cannone G, Liu H, Albers SV, Naismith JH, Spagnolo L, White MF: Structure and mechanism of the CMR complex for CRISPR-mediated antiviral immunity. Mol Cell. 2012, 45: 303-313. 10.1016/j.molcel.2011.12.013.PubMedPubMedCentralCrossRefGoogle Scholar
- 36.Clouet d'Orval B, Bortolin ML, Gaspin C, Bachellerie JP: Box C/D RNA guides for the ribose methylation of archaeal tRNAs. The tRNATrp intron guides the formation of two ribose-methylated nucleosides in the mature tRNATrp. Nucleic Acids Res. 2001, 29: 4518-4529. 10.1093/nar/29.22.4518.PubMedCrossRefGoogle Scholar
- 37.Medina-Aparicio L, Rebollar-Flores JE, Gallego-Hernandez AL, Vazquez A, Olvera L, Gutierrez-Rios RM, Calva E, Hernandez-Lucas I: The CRISPR/Cas immune system is an operon regulated by LeuO, H-NS, and leucine-responsive regulatory protein in Salmonella enterica serovar Typhi. J Bacteriol. 2011, 193: 2396-2407. 10.1128/JB.01480-10.PubMedPubMedCentralCrossRefGoogle Scholar
- 38.Westra ER, Pul U, Heidrich N, Jore MM, Lundgren M, Stratmann T, Wurm R, Raine A, Mescher M, Van Heereveld L, Mastop M, Wagner EG, Schnetz K, Van Der Oost J, Wagner R, Brouns SJ: H-NS-mediated repression of CRISPR-based immunity in Escherichia coli K12 can be relieved by the transcription activator LeuO. Mol Microbiol. 2010, 77: 1380-1393. 10.1111/j.1365-2958.2010.07315.x.PubMedCrossRefGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.