A method for the construction of equalized directional cDNA libraries from hydrolyzed total RNA
- 3.3k Downloads
The transcribed sequences of a cell, the transcriptome, represent the trans-acting fraction of the genetic information, yet eukaryotic cDNA libraries are typically made from only the poly-adenylated fraction. The non-coding or translated but non-polyadenylated RNAs are therefore not represented. The goal of this study was to develop a method that would more completely represent the transcriptome in a useful format, avoiding over-representation of some of the abundant, but low-complexity non-translated transcripts.
We developed a combination of self-subtraction and directional cloning procedures for this purpose. Libraries were prepared from partially degraded (hydrolyzed) total RNA from three different species. A restriction endonuclease site was added to the 3' end during first-strand synthesis using a directional random-priming technique. The abundant non-polyadenylated rRNA and tRNA sequences were largely removed by using self-subtraction to equalize the representation of the various RNA species. Sequencing random clones from the libraries showed that 87% of clones were in the forward orientation with respect to known or predicted transcripts. 70% matched identified or predicted translated RNAs in the sequence databases. Abundant mRNAs were less frequent in the self-subtracted libraries compared to a non-subtracted mRNA library. 3% of the sequences were from known or hypothesized ncRNA loci, including five matches to miRNA loci.
We describe a simple method for making high-quality, directional, random-primed, cDNA libraries from small amounts of degraded total RNA. This technique is advantageous in situations where a cDNA library with complete but equalized representation of transcribed sequences, whether polyadenylated or not, is desired.
KeywordscDNA Library Hydroxylapatite cDNA Population Buffer Sequence Reassociation Kinetic
Almost the entire trans-acting fraction of genetic information is represented by the transcriptome, the population of transcribed sequences in a cell. In terms of complexity, much of the functional transcriptome of eukaryotic cells has traditionally been considered poly-adenylated and translated. In terms of quantity, this poly-adenylated fraction constitutes only 3–6% of the total RNA population. For these reasons, experimental representation of eukaryotic transcriptomes was usually done by constructing cDNA libraries from the poly A+ fraction of the RNA population. All such libraries do not, by design, represent the entire trans-acting genetic information. They lack representation of non-coding but functional RNAs (ncRNA), e.g. , including the abundant but low complexity tRNAs and rRNAs and the increasingly studied populations of various snRNAs, scRNAs, snoRNAs, telomeric RNAs, vRNAs, and microRNAs [2, 3, 4, 5, 6, 7]. They lack representation of the mRNAs of organelles – mitochondria and chloroplasts – for which polyadenylation may be a signal for degradation [8, 9]. They lack representation of mRNAs that are not poly-adenylated or lose their polyA tails, but are nevertheless translated [10, 11, 12, 13]. The recent call for a more systematic examination of the entire transcriptome – RNomics  led to much greater interest in ncRNAs and a variety of wet and computational approaches to their identification [reviewed in [15, 16]].
Our purpose here was to develop a library construction method that would result in a more complete representation, in useable form, of the transcriptome. We reasoned that self-subtraction [17, 18], which equalizes the representation of different sequences through reassociation kinetics, should work as well as poly A+ selection for reducing the frequency of the abundant but low-complexity rRNAs and tRNAs, without eliminating them, or any other polyA- RNAs, from cDNA libraries. We describe here the method and show that informative, random-primed, directional, and more completely representative cDNA libraries can be made from partially degraded total RNA.
Results and discussion
Since a method for the production of high-quality, more fully representative cDNA libraries from even difficult samples was sought, three very different RNA sources were chosen: 48 hour zebrafish (Danio rerio) embryos, field-collected 36 hour embryonic amphioxus (Branchiostoma floridae), and isolated 3rd instar fruitfly (Drosophila melanogaster) larval brain and eye discs. Total RNA was extracted from these samples with TRIzol (Invitrogen, manufacturer's protocol). Contaminating genomic DNA was completely removed from aliquots of the RNAs by digestion with 1 U RNase-free DNase (NewEngland Biolabs)/ug RNA in the manufacturer's buffer for 30 minutes at room temperature. 1 μg of RNase free glycogen (Roche) was then added and the sample re-extracted with Trizol (Methods-1). Total RNA was partially hydrolyzed in 100 mM (Na)CO3, pH 10.0, for 20 min. at 60°C. This resulted in a population of 100–1300 nt RNA fragments (Methods-2).
5 μg of amplified cDNA in 10 μl of annealing buffer (0.34 M NaCl, 0.1 M Na(PO4) pH 6.8, 1 mM EDTA) containing 300 ng of LL1F was overlayed with mineral oil, denatured by boiling for 5 min and then annealed at 60°C for one hour (Cot ~ 3 × 10-4 M·min). 100 μl of binding buffer (0.12 M Na(PO4), pH 6.8) was added to the bottom aqueous phase which was then transferred to another tube. 100 μl of hydrated hydroxylapatite (Methods-5) suspended in 1.2 ml of binding buffer at 60°C was then added and the suspension incubated at 60°C for 10 min with frequent mixing. Bound dsDNA and hydroxylapatite were removed completely by discarding the pellets after two consecutive centrifugations (Methods-6). The 32P counts of sample aliquots taken before and after subtraction, indicated that between 90 and 97% of the cDNA was bound to the hydroxylapatite and removed. The hydroxylapatite phosphate buffer was replaced with 10 mM Tris, 1 mM EDTA, pH8 (TE), by four consecutive exchanges in Centricon-100 filters (Amicon; Methods-7). The remaining single-stranded (ss) cDNAs were then subjected to a second round of amplification, and self-subtraction as described above, with the exception that the second reannealing time was 24 hours instead of one hour (Cot of ~ 7 × 10-3 M·min). Approximately 80% of the cDNA amplified after the first subtraction was removed in the second self-subtraction. The phosphate buffer of the second self-subtraction reaction was exchanged for TE as described. The final double-stranded cDNA population for cloning was generated using the same PCR protocol used for the previous amplifications.
Regenerated ds cDNAs were then digested with Bam HI and Asc I, and size-selected by agarose gel electrophoresis to obtain 200–300 bp cDNA fractions. These were then ligated into pKE-1 or pKE-2 vectors  and used to transform E. coli DH10B (Invitrogen) by electroporation. The three cDNA libraries – amphioxus, Drosophila and zebrafish – contained 6 × 106, 4 × 105, and 3 × 106 independent clones respectively.
The cloning strategy was designed to preserve the orientation of the cDNAs. The first strand directional random primer, DRP1, contained, from 5' to 3', 12 nt of defined buffer sequence, the 8 nt Asc I restriction endonuclease sequence, 5 nt of random sequence and a 3' T nucleotide (Fig. 1A). The 5' buffer sequence was included to prevent destruction of the Asc I site by the 5'->3' exonuclease activity of E. coli Pol I during second-strand synthesis. Using a primer which did not contain the buffer sequence resulted in the frequent appearance of cDNA clones that had lost the Asc I site and therefore orientation (data not shown). T was added to the 3' end and As were excluded from the defined primer sequence at the cost of initiating cDNA synthesis at A (Fig. 1A-1, 2, 4) to eliminate an observed DRP1 self-priming artifact during first-strand synthesis.
We tested the procedure by sequencing random clones picked from the different cDNA libraries. In an initial test, 43 clones from the Amphioxus library, 54 from the zebrafish library, and 15 from the Drosophila library were sequenced. Of the 112 clones, 9 clones contained no cDNAs and 4 clones yielded unreadable sequence data. The remaining sequences were compared to the NCBI non-redundant combined protein, combined DNA, and combined EST sequence databases, using the TBlastX search procedure . In the self-subtracted libraries, 59 sequences matched translated RNAs, ESTs, or gene exons (e < 10-5) and 87% of these were in the forward reading frame. This fraction is within the 80–95% observed in directional cDNA libraries made by standard approaches using oligo dT primed cDNAs , indicating that the directional cloning strategy functions properly. that cloning was, as designed, directional. 16 matches to genomic DNA that were not part of any known functional or encoding RNA were identified. Since the first strand priming technique was not completely random and the directional primer included additional 5' GC-rich sequence (Fig. 1, parts 1,2), we were concerned that priming might be biased towards GC-rich regions. We examined this using clones which identified perfect matches in the sequence databases. From the 55 such clones we collected the 1100 nucleotides of sequence data lying within 20 nt of the 3' end of the random primer, i.e. under the non-random 5' portion of DRP1. These sequences were 49% G+C, indicating that the additional GC rich sequence in the 5' portion of the first strand cDNA primer did not seriously bias the priming process. 74% of the sequenced clones in the self-subtracted libraries contained an A at the sixth nucleotide following the Asc I restriction site in the primer, indicating that cDNA synthesis mostly initiated, as designed, at an A (Fig. 1A).
In 227 of the matches, orientation of the cDNAs could be assigned relative to known or hypothetical loci. Of these, 80% were in the sense orientation. Of the antisense matches, only four could clearly be considered artifacts of cloning. 13 antisense clones were structurally normal, exhibiting the appropriate transitions from cDNA to vector sequences, but were spliced according to their sense strand match. Although this suggests that the antisense orientation is an artifact of cloning, such precisely matching non-canonical splicing has been reported for at least one genuine antisense transcript .
We turn finally to the question of the extent of subtraction. The library data clearly show that the cDNA population was subtracted enough to remove most representation of the abundant rRNAs and greatly increase the representation of the different mRNA species. However, equalization was not complete. Since the transcriptome contains all intron sequences, which greatly exceed the mRNA population in complexity (at least in the zebrafish), a complete subtraction would be expected to contain far more intron than mRNA sequence. This is not the case in the libraries examined, where mRNA sequences outnumber intron sequences by a ratio of almost 7:1. We addressed the issue of equalization within the mRNA population in two different ways. We determined the frequency of β-actin sequence, a particularly abundant mRNA, by probing the 200–300 bp zebrafish library with antisense oligonucleotide. Only one β-actin clone was identified among 5 × 104 colonies. Since β-actin represents approximately 1% of the mRNAs of embryonic zebrafish heart  and is relatively constant across tissues under physiological conditions , the representation of this abundant mRNA has been reduced in the equalized libraries. In the second approach, we compared mRNA representation among our sequences with mRNA representation in an unsubtracted cDNA library constructed from 72 hour embryonic zebrafish heart . In the heart library, there are 11 mRNAs that are represented at frequencies between 0.4% (the ADP/ATP carrier protein) and 4.3% (the ribosomal proteins). Of these abundant mRNAs, two are represented once in our collection of 240 zebrafish cDNA sequences and the rest were not represented at all. There were two other mRNA loci that were represented twice among the zebrafish self-subtracted sequences, one encoding Ankyrin 3 and the other encoding Ran-binding protein 2. Neither of these two loci were represented in the 5000 sequences from the heart library.
There are several points of caution in using the procedure. In self-subtraction, it is the complexity of contaminating sequence, rather than its abundance, that may determine the extent of contamination in the final library. Even a few picograms of genomic DNA has a greater complexity than the entire functional RNA population. Although the importance and extent of non-coding transcripts is under revision  we did not observe any advantage to increasing the extent of self subtraction, either by including additional rounds of self subtraction or by increasing the Cot value of the subtractions. Clones chosen from a fourth library that had gone through an additional round of self-subtraction and amplification did not match any mRNAs (data not shown.)
The goal of the described procedure was to have as much as possible of the entire RNA population represented in a library, not to have it represented in full length copies. Some functional non-coding RNAs are almost certainly excluded from the libraries. We have used the procedure successfully on cDNA fragments as small as 100–200 bp. Self subtracted libraries of smaller sequences may be possible if the phosphate concentration in the hydroxylapatite binding buffer is decreased and only small sequences are included in the reaction. The smallest functional processed RNAs, e.g. the 22 nt microRNAs, will not be represented, although their primary transcripts certainly are. Since the procedure functions via inter-molecular reassociation kinetics, sequences with extensive self-homology (snap-back) will be removed independent of their abundance in the RNA population. The procedure will not work well for most full length transcripts since amplifying sequences larger than even 500 base-pairs in size is more difficult and more sensitive to reaction conditions. Even in our target size range of 200–300 bp there are undoubtedly some sequences that do not amplify well by PCR and are therefore underrepresented. We believe that the more complete representation possible with short sequences offsets the disadvantage of not cloning full length copies, especially as the sequences are more than long enough to identify significant matches in the databases, encode many complete protein domains, and to serve as probes for cloning or assembling full-length cDNAs by the many other methods available.
The simple procedures described here permit the construction of high-quality, directional cDNA libraries using small amounts of degraded total RNA. Since the method does not distinguish between polyA+ and polyA- species, all RNAs above 100 nt may be represented, including polyA- mRNAs and many functional but non-translated RNAs. The procedure should prove valuable in situations where more complete representation of the transcriptome is desired.
The following procedural details are relevant to the successful use of the method. They are cited in the Results and Discussion text as (Methods-#).
(1) The removal of genomic DNA contamination and purity of sample at the RNA extraction step is crucial. The self-subtraction procedure is more sensitive to the complexity of a contaminant than its abundance. Tissues should be processed in either disposable plasticware or baked glassware.
(2) Partial hydrolysis of the RNA is an important step. The method is PCR-based and PCR efficiency becomes increasingly sequence and reaction-condition dependent as the size of the template increases. To avoid this source of bias, short cDNAs are used. Producing these by shortening the RNA template through random hydrolysis has the added advantage of reducing the formation of intramolecular secondary structure that can interfere with priming and reducing the expected bias for the 5' ends of full-length molecules.
(3) The quality of the cDNA syntheses and subsequent PCR amplification steps was evaluated by tracing the reactions with 32P-dCTP and then determining the incorporated fraction to estimate the yield and agarose gel electrophoresis to gauge the quality of the reaction.
Oligonucleotide quality was found to be important. We used HPLC grade oligonucleotides for the library construction and self-subtraction. DRP1 is 5'P-GCTCGCCCTCGCGGCGCGCCNNNNNT. The lone-linker LL1 is the annealed product of LL1F and LL1R
AGACCGAGCGGGAGCGCCTAGGC 5' (LL1R)
(4) LL1P is 5'CTGGCTCGCCCTCGCGG. The correct amount of template and the number of cycles were determined empirically. Using the reaction conditions described, a yield of between 1 and 5 μg was typical. A yield greater than this may be stressing the reaction, resulting in partial reaction products and other artifacts.
(5) The hydroxylapatite was de-fined prior to use by resuspending the powder in a large volume of annealing buffer, allowing the matrix to settle by gravity, and removing the still slightly cloudy upper phase. This was repeated 3–5 times. 1 ml of hydrated hydroxylapatite binds approximately 100 μg of DNA.
(6) Failure to completely remove all hydroxylapatite results in binding of ssDNA as the phosphate concentration drops during the next buffer exchange step.
(7) Depending upon the extent of annealing, the concentration of remaining ssDNA may be low enough to result in a significant fraction binding to the filter membrane. To prevent this, the centricons should be passivated in 5% Tween-20 for one hour followed by four rinses with ddH2O.
This work was supported by an Israel Science foundation grant to CD and a Binational USA/Israel Science Foundation grant to IG. We thank Dr. Linda Holland for the embryonic Amphioxus sample.
- 9.Gagliardi D, Perrin R, Marechal-Drouard L, Grienenberger JM, Leaver CJ: Plant mitochondrial polyadenylated mRNAs are degraded by a 3'- to 5'-exoribonuclease activity, which proceeds unimpeded by stable secondary structures. J Biol Chem. 2001, 276 (47): 43541-43547. 10.1074/jbc.M106601200.PubMedCrossRefGoogle Scholar
- 13.Maciejewski-Lenoir D, Jirikowski GF, Sanna PP, Bloom FE: Reduction of exogenous vasopressin RNA poly(A) tail length increases its effectiveness in transiently correcting diabetes insipidus in the Brattleboro rat. Proc Natl Acad Sci USA. 1993, 90 (4): 1435-1439. 10.1073/pnas.90.4.1435.PubMedCentralPubMedCrossRefGoogle Scholar
- 22.Haeger P, Cuevas R, Forray MI, Rojas R, Daza C, Rivadeneira J, Gysling K: Natural expression of immature Ucn antisense RNA in the rat brain. Evidence favoring bidirectional transcription of the Ucn gene locus. Brain Res Mol Brain Res. 2005, 139 (1): 115-128. 10.1016/j.molbrainres.2005.05.024.PubMedCrossRefGoogle Scholar
- 23.Ton C, Hwang DM, Dempsey AA, Tang HC, Yoon J, Lim M, Mably JD, Fishman MC, Liew CC: Identification, characterization, and mapping of expressed sequence tags from an embryonic zebrafish heart cDNA library. Genome Res. 2000, 10 (12): 1915-1927. 10.1101/gr.10.12.1915.PubMedCentralPubMedCrossRefGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.