Analysis of the human Alu Ye lineage
- 5.5k Downloads
Alu elements are short (~300 bp) interspersed elements that amplify in primate genomes through a process termed retroposition. The expansion of these elements has had a significant impact on the structure and function of primate genomes. Approximately 10 % of the mass of the human genome is comprised of Alu elements, making them the most abundant short interspersed element (SINE) in our genome. The majority of Alu amplification occurred early in primate evolution, and the current rate of Alu retroposition is at least 100 fold slower than the peak of amplification that occurred 30–50 million years ago. Alu elements are therefore a rich source of inter- and intra-species primate genomic variation.
A total of 153 Alu elements from the Ye subfamily were extracted from the draft sequence of the human genome. Analysis of these elements resulted in the discovery of two new Alu subfamilies, Ye4 and Ye6, complementing the previously described Ye5 subfamily. DNA sequence analysis of each of the Alu Ye subfamilies yielded average age estimates of ~14, ~13 and ~9.5 million years old for the Alu Ye4, Ye5 and Ye6 subfamilies, respectively. In addition, 120 Alu Ye4, Ye5 and Ye6 loci were screened using polymerase chain reaction (PCR) assays to determine their phylogenetic origin and levels of human genomic diversity.
The Alu Ye lineage appears to have started amplifying relatively early in primate evolution and continued propagating at a low level as many of its members are found in a variety of hominoid (humans, greater and lesser ape) genomes. Detailed sequence analysis of several Alu pre-integration sites indicated that multiple types of events had occurred, including gene conversions, near-parallel independent insertions of different Alu elements and Alu-mediated genomic deletions. A potential hotspot for Alu insertion in the Fer1L3 gene on chromosome 10 was also identified.
KeywordsGene Conversion World Monkey Gene Conversion Event Primate Genome Mutation Density
The proliferation of Alu elements has had a significant impact on the architecture of primate genomes . They comprise over 10% of the human genome by mass and are the most abundant short interspersed element (SINE) in primate genomes . Alu elements have achieved this copy number by duplicating via an RNA intermediate in a process termed retroposition . During retroposition the RNA copy is reverse transcribed by target primed reverse transcription (TPRT) and subsequently integrated into the genome [4, 5, 6]. While unable to retropose autonomously, Alu elements are thought to borrow the factors that are required for their amplification from the LINE (long interspersed element) elements [6, 7, 8, 9], which encode a protein with endonuclease and reverse transcriptase activity [10, 11]. Because of their high copy number, Alu repeats have been a significant source of new mutations as a result of insertion and post-integration recombination between elements [12, 13].
The majority of Alu amplification occurred early in primate evolution, and the current rate of Alu retroposition is at least 100 fold slower than the peak of amplification that appears to have occurred 30–50 million years ago [2, 14, 15, 16]. Even though there are over one million Alu elements within the human genome, only a small number of these elements are capable of movement . As a result of the limited amplification capacity of Alu elements, a series of discrete subfamilies of Alu elements that share common diagnostic mutations have been identified in the human genome [18, 19, 20, 21]. A small subset of "young" Alu repeats are so recent in origin that they are present in the human genome and absent from the genomes of non-human primates, with some of the elements being polymorphic with respect to insertion presence/absence in diverse human genomes [16, 22, 23, 24, 25]. Individual SINE elements have proven to be essentially homoplasy-free characters which are therefore quite useful for resolving phylogenetic and population genetic questions [2, 26, 27, 28, 29, 30, 31, 32, 33, 34]. For example, young Alu subfamilies which arose around the radiation of Subtribe Hominina (gorillas, chimpanzees, and humans) four to six million years ago  were used as homoplasy free phylogenetic markers to resolve the branching order in hominids . Relationships among other primates have also been resolved using relatively large numbers of Alu elements as phylogenetic markers [28, 37, 38, 39, 40]
We have previously characterized a large number of recently integrated Alu elements found in the human genome that fall in six distinct lineages, termed Ya, Yb and Yc, Yd, Yg and Yi based upon their diagnostic mutations [41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]. Here, we describe the distribution in the human genome of three Alu subfamilies that are members of the Alu Ye lineage  and are characterized by four (Ye4), five (Ye5) and six (Ye6) diagnostic mutations, respectively.
Subfamily size and age
To estimate the copy number of the Ye4, Ye5 and Ye6 Alu subfamilies, we preformed BLAST searches of the draft sequence of the human genome using an Alu Ye lineage-specific oligonucleotide to query the database (as outlined in the methods). Seventeen of the 25 Alu Ye4 elements were unique (non-paralogous). There were also 76 unique Ye5 Alu elements and 23 unique Ye6 Alu subfamily members. Multiple alignments of the Alu elements from each subfamily were constructed and the number of mutations from the consensus sequence for each Alu subfamily was determined. In each case the mutations were divided into those that occur at CpG dinucleotides and those that occur at non-CpG positions without including small insertions or deletions as described previously [47, 48, 49]. The mutations are divided into these two different classes to estimate the average age of each subfamily because the CpG base positions in repeated sequences mutate at a rate that is about six times higher than non-CpG positions  as a result of the spontaneous deamination of 5-methylcytosine residues .
Mutation densities were calculated for each Alu Ye subfamily. For 17 elements from the Alu Ye4 subfamily, the non-CpG and CpG mutation densities were 2.1% (83/3944) and 12.5 % (106/850). Using a neutral rate of evolution of 0.15% per million years for non-CpG positions  and 0.9% per million years for the CpG base positions  along with the average mutation density yields age estimates of 14.03 and 13.86 million years old for the Ye4 subfamily. For the Alu Ye5 subfamily 76 elements were analyzed that contained a total of 17632 non-CpG nucleotides and 3800 CpG nucleotides that contained 351 non-CpG and 431 CpG mutations. The mutation densities of the Ye5 subfamily were 1.99% and 11.34% for the non-CpG and CpG nucleotides yielding age estimates based on the average mutation density of 13.27 and 12.60 million years old. For the Alu Ye6 subfamily 23 elements were analyzed that contained a total of 5336 non-CpG nucleotides and 1150 CpG nucleotides that contained 86 non-CpG and 92 CpG mutations. The mutation densities of the Ye6 subfamily were 1.61% and 8% for the non-CpG and CpG nucleotides yielding age estimates based on the average mutation density of 10.75 and 8.89 million years old.
In order to determine the approximate time of insertion for each Alu Ye4, Ye5 and Ye6 subfamily member, we performed a series of PCR reactions using human and non-human primate DNA samples as templates. Unfortunately, not all of the loci identified in the draft sequence were amenable to PCR analysis, as some of them had inserted into other repetitive regions of the genome making the design of flanking unique sequence PCR primers difficult.
For the Ye subfamilies, 120 of the 153 elements identified in the draft human genomic sequence were amplified by PCR. Examination of the orthologous regions of the various species genomes displayed a series of different PCR patterns indicative of the time of retroposition of each of the elements into the primate genomes. Results from a series of these experiments showed a gradient of Ye Alu repeats beginning with some elements that are recent in origin and unique to the human genome (e.g. Ye5AH110) and ending with elements that are found within all ape genomes (e.g. Ye5AH148). The distribution of all the Ye elements in various primate genomes is summarized in Additional File 2.
Gene conversion between Alu elements and in other regions of the human genome exerts a significant influence on the accumulation of single nucleotide diversity within the human genome [2, 50]. To estimate the frequency of gene conversion in the Alu Ye subfamily members, we compared the sequences of the elements found in the human genome to the consensus sequences of other Alu subfamilies. Using this approach, we identified two Alu Ye5 subfamily members that appeared to have been subjected to partial gene conversion at their 3' ends. Alu Ye5AH70 contains three mutations that are diagnostic for the Yb8/9 subfamily. Similarly, Alu Ye5AH173 contains three Alu Sc mutations. Each of the sequence exchanges occurred in a short contiguous sequence suggesting that they were products of gene conversion rather than homoplasic point mutations.
We identified one Alu-containing locus that was involved in full gene conversion/ replacement event, (Ye5AH181). In this case, the orthologous Alu elements have similar flanking sequences and direct repeats, although they are not precisely identical due to the random mutations that accumulated over time. DNA sequence analysis of this locus showed that the Alu element of selected new world monkey genomes (spider monkey, woolly monkey and tamarin) belonged to the Alu Sg subfamily. This suggests that a gene conversion of an older, pre-existing Alu Sg may have introduced the Ye5 sequence in the common ancestor of humans, chimpanzees, gorillas and orangutans. Amplification of this locus was unsuccessful in the old world monkey taxa tested.
Alu-mediated genomic deletions
Two deletions of part of the human genome appeared to be associated with newly inserted Alu Ye elements. These deletions were identified at loci Ye5AH24 and Ye5AH27. In the case of Ye5AH24, the deletion was associated with a gene conversion of an Alu Y in both orangutan and siamang to AluYe5 in human, bonobo, common chimpanzee and gorilla and involved the removal of about 500 bp from the 3' flanking region. For Alu Ye5AH27, the deletion was associated with a gene conversion of an Alu Sx element (orangutan and siamang) to AluYe5 (human, bonobo, common chimpanzee and gorilla) and involved the removal of 142 bp from the 3' flanking region. Based on this data, we estimate the frequency of Alu retroposition mediated deletions of approximately 1.67% (2/120).
The pre-integration sites for three elements (Ye5AH11, Ye5AH40 and Ye5AH173) did not amplify in any non-human primate species. Previously, the insertion of L1 elements has been shown to be associated with large genomic deletions . Thus, one possible explanation for the absence of pre-integration PCR products would be that a large deletion (>1 kb) occurred at each of these loci during Alu integration. If a deletion occurred during the integration of an Alu element in the human genome, then the pre-integration product size calculated computationally would be an underestimate of the true size of the locus. To investigate this possibility, we utilized long template PCR reactions of these loci that would facilitate the amplification of larger (up to 25 kb) products. Unfortunately, PCR amplicons were not generated by any of these loci, suggesting that the retrotransposition of these Alu elements in humans may have generated deletions greater than 25 kb in size. Alternately, the orthologous loci in non-human primate genomes may have undergone additional mutations at the oligonucleotide primer sites, preventing PCR amplification.
Independent Alu insertions
We also identified another near-parallel independent Alu insertion event at human Ye5AH16 locus in all the old world monkey genomes tested (Green monkey, Macaque and Rhesus), within the same locus where an Alu Ye5 element was located in the human, chimpanzee, gorilla and orangutan genomes. Thus, the near-parallel insertion most likely occurred after the divergence of humans and apes from old world monkeys, but before the radiation of the old world monkeys. The element present in the old world monkey genomes is an Alu Y and is 80 bp from the human insertion site.
Human genomic diversity
Human genetic diversity of Ye5AD167.
Average Heterozygosity 2
Our detailed analysis of the Alu Ye5 subfamily resulted in the recovery of two new Alu subfamilies, Ye4 and Ye6. Each of these Alu subfamilies has a relatively small copy number in the human genome. The proportion of polymorphic elements within each of the subfamilies is quite low with only 0.83% of the Alu Ye elements being polymorphic, only one member of Ye subfamilies (Ye5AD167) is polymorphic with respect to insertion presence/absence in the human genome. In contrast, many other young Alu subfamilies have levels of insertion polymorphism in excess of 20% . Therefore, the amplification of these Alu subfamilies within the human genome has occurred at a very low rate, and may have recently ceased entirely. The estimated average ages of ~14, ~13 and ~9.5 million years old for the Alu Ye4, Ye5 and Ye6 subfamilies, respectively are consistent with their relatively recent origin in primate genomes. It is also consistent with the master gene model of SINE retroposition which suggests that as a master element accumulates mutations over time, the resulting elements will share those mutations .
Members of the Alu Ye lineages are dispersed throughout the genomes of all hominoids (humans, greater and lesser apes) suggesting that this subfamily of Alu elements began to amplify about 15–20 million years ago. Therefore, the Ye subfamily appears to have been retroposition competent during hominoid evolution, but must have been relatively inefficient at producing copies. Although the rate of Ye amplification has not been dramatic within the human lineage, it may be quite interesting to recover Alu Ye subfamily members from other ape genomes and to determine the rate of Ye subfamily amplification in these genomes to see if there has been any differential amplification of these elements in non-human primate genomes. The differential amplification of ID SINEs within various members of the rodent lineage has been reported previously suggesting that the amplification of SINEs within various genomes is subject to changes [61, 62].
Gene conversion between Alu repeats has been reported previously [26, 63, 64]. The gene conversion events involve in three Alu Ye subfamily members were quite interesting. In one case (Ye5AH181), the Alu-containing locus was involved in full gene conversion event where Alu Sg in new world monkeys is replaced by an Alu Ye5 in Humans, chimpanzees, gorillas and orangutan. In the other two cases (Ye5AH70 and Ye5AH173), only a small portion of the 3' end of the Ye elements were involved in the gene conversion. This is in good agreement with the molecular nature of gene conversion events recently reported for the Ya5 and Yb8/9 Alu subfamilies [47, 48, 64, 65]. The detection of three gene conversion events from about 153 Alu Ye elements suggests that gene conversion of these events has been relatively rare, with a rate of 1.96%. However, this rate is comparable to that reported previously for the Alu Ya5 and Yb8 subfamilies within the human genome, as well as that for the Ta subfamily of human LINE elements [64, 65, 66].
In all cases, the Ye Alu family members that were involved in the gene conversion were monomorphic for insertion presence within the human genome. In the partial gene conversion events, the Ye Alu repeats were gene converted by Yb8/9 and Sx Alu elements. The Yb8/9 Alu subfamily was one of the first groups of Alu repeats that was ever reported to be involved in gene conversion, and may be more prone to these types of events as a result of a retroposition rate that is slightly higher than other recently integrated Alu subfamilies in the human genome [48, 64, 65]. The gene conversion between Alu elements may in part be a function of the length of time that the individual Alu elements have resided in the human genome [26, 50]. Based on an examination of low copy number transgenes in the mouse, it has been suggested that the germline recombination machinery in mammals has been evolved to prevent high levels of ectopic recombination between repetitive sequences . It is quite possible that the high copy number of Alu elements allows for pairing between regions of sequence identity of different Alu elements initiating the start of gene conversion before cellular control systems can terminate the process resulting in the production of small gene conversion tracts.
The identification of multiple paralogous Alu insertions involving an Alu Ye element (Ye5AH161) in humans, bonobo, common chimpanzee and gorilla lineage, Alu Sp in old world monkeys lineage and Alu Sx in new world monkeys lineage is also interesting. The paralogous insertion of an Alu repeat into the orthologous regions of human and non-human primate genomes is an independent evolutionary event . To date there are no known cases of the independent insertion of paralogous Alu elements into identical sites within different genomes. The detection of parallel insertions is a function of the rate of retroposition of Alu elements within various primate lineages and the time since the most recent common ancestor . However, this locus (Ye5AH161) supports the idea of hotspots for the integration of Alu repeats within primate genomes. Future studies on the integration of different SINE elements in syntenic regions of human and rodent genomes may yield new insight into the molecular nature of hotspots for SINE element integration.
Genomic deletions created upon LINE-1 retrotransposition using cell culture assays have been recently identified . The rate of LINE element deletion was estimated indirectly in the human genome to be about 3%  or 8–13% through sequencing variable sizes of the preintegration sites of L1HS in primates . The precise molecular mechanism of the LINE mediated genomic deletions is still unclear. Recently, an Alu-mediated deletion that resulted in the inactivation of the human CMP-N-acetylneuraminic acid hydroxylase gene  and Alu mediated deletions of noncoding genomic sequences have been identified . Here we report two new examples of Alu retroposition-mediated deletions that may have happened by a mechanism similar to that of the LINE element mediated genomic deletions since Alu and L1 elements utilize a common mobilization pathway [6, 8, 72]. In both cases, Alu Ye5AH24 and Alu Ye5AH27, the deletion appears to have occurred, after the separation of human, chimpanzee and gorillas from orangutan and Siamang, during the process of gene conversion similar to the lineage specific Alu deletion reported previously [70, 71].
Here, we have estimated the frequency of Alu retroposition associated genomic deletions as approximately 1.67%. The size of the deleted sequences was over 300 bp on average. New Alu integrations have been estimated to occur in vivo at a frequency of one new event in every 10 to 200 births . If sizable deletions accompany one in every 100 new Alu retroposition events in vivo, the genomic impact of these events could be substantial. This is not a trivial number of deletions when extrapolated to the copy number of Alu elements in the human genome which is over one million . Approximately about 16,700 Alu elements may have been involved in retroposition mediated deletion events within primate genomes. If each of these deletion events removes an average of 300 bp of genomic sequence, this would mean that Alu retroposition mediates the deletion of about 5 Mb of the primate genomic sequences. However, if the Alu associated deletions have involved larger sequences similar to those recently reported for LINE elements , then the impact of these events may be 50–500 Mb of lineage specific deletions. In either case, these types of events represent a novel mechanism of lineage-specific deletion within the primate order. Detailed studies of the orthologous regions of primate genomes deleted in this manner may prove instructive for understanding the genetic basis of the difference between humans and non-human primates.
The Alu Ye lineage has had an extended history of expansion in the human lineage. Its expansion appears to have begun soon after the divergence of the hominoids from the remainder of the catarrhine primates and proceeded at a relatively low level since then. Extended periods of relatively low levels of retrotransposition may allow some mobile elements to retain duplication capability for long periods of time. Despite a relatively low level of retrotransposition, the Alu Ye lineage has contributed to the architecture of the human genome through insertion mutations, retrotransposition associated genomic deletions, and gene conversion.
To identify Alu Ye elements in the draft sequence of the human genome (August 6, 2001, UCSC GoldenPath assembly), we used Basic Local Alignment Search Tool (BLAST)  queries of the draft sequence to identify exact complements to the oligonucleotide 5'- GAACCCCGGGGGGCGGAGCCTGCAG-3' that is diagnostic for the Ye lineage as shown in Fig. 1. All of the exact complements to the oligonucleotide queries along with 1000 bp of adjacent flanking unique DNA sequence were excised and stored as unique files and subjected to additional analysis as outlined previously [47, 48, 49]. A complete list of all the Alu elements identified in the searches is located in Additional file 2 and is available at http://batzerlab.lsu.edu/Additional_File_2_-_Supplemental_Table.doc.
DNA samples and PCR amplification
Oligonucleotide primers and PCR amplification reactions for each of the Alu Ye lineage loci analyzed were performed as previously described [47, 48, 49] using the primers and annealing temperatures shown in Additional file 2 for Alu Ye lineage members. Diverse human DNA samples were available from previous studies [47, 48, 49]. The cell lines used to isolate DNA samples were as follows: chimpanzee (Pan troglodytes), WES (ATCC CRL1609); gorilla (Gorilla gorilla) lowland gorilla Coriell AG05251B, Ggo-1 (primary gorilla fibroblasts) provided by Dr. Stephen J. O'Brien, National Cancer Institute, Frederick, MD, USA; bonobo (Pan paniscus) Coriell AG05253A; orangutan (Pongo pygmaeus) ATCC CRL6301; green monkey (Chlorocebus aethiops) ATCC CCL70 (old world monkey); and owl monkey (Aotus trivirgatus) OMK (OMKidney) ATCC CRL 1556 (new world monkey). Cell lines were maintained as directed by the source and DNA isolations were performed using Wizard genomic DNA purification (Promega). DNA samples from peripheral lymphocytes or tissue were prepared from the gibbon (Hylobates lar) and siamang (Hylobates syndactylus). Additional non-human primate DNA samples (Pan troglodytes, Pan paniscus, Gorilla gorilla, Pongo pygmaeus, Macaca mulatta (old world monkey), Macaca nemestrina (old world monkey), Saquinus labiatus (new world monkey), Lagothrix lagotricha (new world monkey), Ateles geoffroyi (new world monkey) and Lemur catta (prosimian) available as a primate phylogenetic panel (PRP00001) were purchased from the Coriell Institute for Medical Research.
DNA sequencing was performed on a gel purified PCR products that had been cloned using the TOPO TA cloning vector (Invitrogen) using chain termination sequencing  on an Applied Biosystems 3100 automated DNA sequencer. The sequence of the orthologous loci (that contained a paralogous Alu element) has been assigned accession numbers AY849282-AY849301. Sequence alignments of the Ye lineage subfamily members were performed using MegAlign software (DNAStar version 3.1.7 for Windows 3.2). The ages for each of the Alu Ye subfamilies were calculated using mutation densities as previously described [43, 47, 48, 49, 65] with rates suggested by Xing et al. .
This research was supported by Louisiana Board of Regents Millennium Trust Health Excellence Fund HEF (2000-05)-05, (2000-05)-01, and (2001-06)-02 (MAB), National Science Foundation BCS-0218338 (MAB) and EPS-0346411 (MAB) and the State of Louisiana Board of Regents Support Fund (MAB).
- 23.Batzer MA, Rubin CM, Hellmann-Blumberg U, Alegria-Hartman M, Leeflang EP, Stern JD, Bazan HA, Shaikh TH, Deininger PL, Schmid CW: Dispersion and insertion polymorphism in two small subfamilies of recently amplified human Alu repeats. J Mol Biol. 1995, 247: 418-427. 10.1006/jmbi.1994.0150.CrossRefPubMedGoogle Scholar
- 37.Ray DA, Hedges DJ, Hall MA, Laborde ME, Anders BA, White BR, Stoilova N, Fowlkes JD, Landry KE, Chemnick LG, Ryder O, Batzer M: Alu Insertion Polymorphisms and Platyrrhine Primate Phylogenetic Relationships. Mol Phylogenet Evol.Google Scholar
- 47.Carroll ML, Roy-Engel AM, Nguyen SV, Salem AH, Vogel E, Vincent B, Myers J, Ahmad Z, Nguyen L, Sammarco M, Watkins WS, Henke J, Makalowski W, Jorde LB, Deininger PL, Batzer MA: Large-scale analysis of the Alu Ya5 and Yb8 subfamilies and their contribution to human genomic diversity. J Mol Biol. 2001, 311: 17-40. 10.1006/jmbi.2001.4847.CrossRefPubMedGoogle Scholar
- 65.Batzer MA, Rubin CM, Hellmann-Blumberg U, Alegria-Hartman M, Leeflang EP, Stern JD, Bazan HA, Shaikh TH, Deininger PL, Schmid CW: Dispersion and insertion polymorphism in two small subfamilies of recently amplified human Alu repeats. J Mol Biol. 1995, 247: 418-427. 10.1006/jmbi.1994.0150.CrossRefPubMedGoogle Scholar
- 72.Battilana J, Bonatto SL, Freitas LB, Hutz MH, Weimer TA, Callegari-Jacques SM, Batzer MA, Hill K, Hurtado AM, Tsuneto LT, Petzl-Erler ML, Salzano FM: Alu insertions versus blood group plus protein genetic variability in four Amerindian populations. Ann Hum Biol. 2002, 29: 334-347. 10.1080/03014460110086835.CrossRefPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.