Predicting preferential DNA vector insertion sites: implications for functional genomics and gene therapy
Viral and transposon vectors have been employed in gene therapy as well as functional genomics studies. However, the goals of gene therapy and functional genomics are entirely different; gene therapists hope to avoid altering endogenous gene expression (especially the activation of oncogenes), whereas geneticists do want to alter expression of chromosomal genes. The odds of either outcome depend on a vector's preference to integrate into genes or control regions, and these preferences vary between vectors. Here we discuss the relative strengths of DNA vectors over viral vectors, and review methods to overcome barriers to delivery inherent to DNA vectors. We also review the tendencies of several classes of retroviral and transposon vectors to target DNA sequences, genes, and genetic elements with respect to the balance between insertion preferences and oncogenic selection. Theoretically, knowing the variables that affect integration for various vectors will allow researchers to choose the vector with the most utility for their specific purposes. The three principle benefits from elucidating factors that affect preferences in integration are as follows: in gene therapy, it allows assessment of the overall risks for activating an oncogene or inactivating a tumor suppressor gene that could lead to severe adverse effects years after treatment; in genomic studies, it allows one to discern random from selected integration events; and in gene therapy as well as functional genomics, it facilitates design of vectors that are better targeted to specific sequences, which would be a significant advance in the art of transgenesis.
KeywordsGene Therapy Sleep Beauty Gene Therapy Vector Sleep Beauty Transposase Common Integration Site
Risk of oncogene activation in gene therapy
Activation of oncogenes in mice by insertionally mutagenic retroviruses suggested that inadvertent oncogene activation resulting from the use of relatively benign therapeutic vectors is a potential risk associated with gene therapy. Gene therapy vectors are extensively minimized to eliminate their replicative potential and reduce their collateral effects on the target genome . However, extensive testing in animals demonstrated that the risk of oncogenic activation was real, although variable and dependent on the viral vector used, the genetic cargo, and the background genetics of the model system [16, 17, 18, 19, 20, 21, 22]. Given what was assumed to be acceptable risk, retroviral gene therapy trials have been conducted in human patients. Nearly 1,000 clinical gene therapy trials have been initiated, more than half with retroviral vectors , but as yet no vectors have been approved in the USA for clinical gene therapy outside the clinical trial setting . (Gendicine, an adenovirus designed to restore p53 function in cancerous cells, has been approved for commercial human gene therapy in China , although this vector is essentially nonintegrating and thus carries decreased risk for oncogene activation via vector insertion.)
The worst fears of the gene therapy field, oncogene activation, were realized when three of more than 20 patients treated for X-linked severe combined immunodeficiency disease (X-SCID) developed leukemia. These adverse findings, including one death, occurred 3 years or more after administration of therapeutic murine leukemia virus (MLV)-derived retrovirus vectors [25, 26]. The linkage between treatment and leukemias could be inferred because the expanded transformed cell populations harbored clonal integrations of the therapeutic vector, which suggested a biologic selection for the retrovirus-induced mutation [27, 28, 29, 30]. However, these studies also indicated that clonal expansions in some cases appeared to be temporary and did not always lead to adverse effects, features that could actually improve the likelihood of successful gene therapy. The cause of at least two of the leukemias appears to be insertion of the MLV vector close to the LMO2 oncogene, which led to LMO2's activation by enhancers in the long terminal repeat (LTR) sequences of the vector [31, 32, 33]. Retrospective examination of the role in LMO2 during development supported this conclusion [34, 35]. Subsequent studies in which the cargo gene IL2γc was over-expressed in mice (albeit at levels higher than in the X-SCID leukemia patients) suggested that this gene could itself act as an oncogene in T cells . Also, simultaneous activation of IL2γc and LMO2 by oncogenic retroviruses had been observed in one mouse, suggesting a possible genetic interaction between the cargo IL2γc gene and LMO2 . The relevance of these observations to clinical cases, however, is highly debatable [37, 38].
In contrast, other gene therapy trials that employed retroviral vectors to treat adenosine deaminase deficiency [39, 40, 41] and chronic granulomatosis disease (CGD)  have not yet reported any equivalent adverse events. In the CGD study, there appeared to be powerful selection for integration events of the spleen focus-forming virus vector, which also was used as a vector for X-SCID , into the neighborhoods of three previously identified genes, namely MDS-EVI1, PRDM16, and SETBP1, which have been associated with enhanced proliferation following integration of retroviruses with activating LTRs [44, 45, 46]. As noted previously, findings of preferential integration around certain genes is not necessarily due to a preference for these genes, but may rather be a consequence of clonal expansion that can be transient and thereby beneficial in terms of enhancing the number of therapeutic cells. A similar effect has also been observed in nonhuman primate studies, indicating that this result may not be unique . Despite the striking incidence of common integration sites that are often associated with tumor or leukemia formation [8, 47, 48], there has been no report of adverse events in the CGD patients and no indication that the corrective gene, gp91 phox , synergizes with any of the three common integration site genes to promote growth. Likewise, a murine stem cell retrovirus has been used to deliver the α and β chains of the antiMART-1 T-cell receptor complex ex vivo into peripheral blood lymphocytes to treat melanoma without any apparent adverse effects, although integration sites were not examined and the patient population had low odds for survival, even with the treatment (two out of 15 survived) for more than 1 year .
Taken together, the results of the CGD and X-linked plus adenosine deaminase SCID trials demonstrate that oncogenesis is not necessarily an inherent, inevitable side effect of gene therapy. In more than 20 patients, the genetic deficiencies of more than 80% have been fully corrected, allowing them to lead normal lives. However, tumors and leukemias can take years to manifest, and these trials are in their early years. A clearer understanding of the variables that underlie oncogenesis is needed in order to increase the safety of these trials. These variables include insertion site preferences of therapeutic vectors, their abilities to activate nearby genes, and interactions between specific genetic cargos and activated host genes. Although cargo-host interactions will be specific to each gene therapy approach, the vectors themselves govern other parameters of insertion preference and neighboring gene activation. Analyses of insertion preferences, in particular, have received much recent attention, and have sparked interest in the use of transposons as alternatives to viruses as gene therapy vectors.
Nonviral vectors for introduction of genetic cassettes into mammalian genomes
Transposable elements also have been used for insertional mutagenesis and genetic studies in model organisms, and are being developed as gene therapy agents in humans [50, 51, 52, 53]. The most well characterized DNA transposon vector used in mammals is the synthetic Sleeping Beauty (SB) transposon system , which over the past decade has become a powerful tool in functional genomics to identify genes in vertebrates, including fish and mammals [55, 56, 57, 58, 59, 60, 61]. Application of transposon-mediated gene transfer to gene therapy has been explored because it avoids several disadvantages of viral delivery systems. These disadvantages of viruses include the following: (1) their preference for integrating into genes [62, 63, 64, 65]; (2) the difficulty with purification to eliminate toxic or infectious agents ; (3) their potential to elicit unwanted immune or inflammatory responses [67, 68]; (4) the constraint on therapeutic cargo size; and (5) the difficulty and expense associated with their production in large quantities [69, 70]. In contrast to viral vectors, preparations of nonviral plasmid-based transposon vectors are relatively inexpensive to purify, are largely nonimmunogenic, and have no hard constraints on genetic sequences that can be delivered.
A negative tradeoff with DNA vectors is increased difficulty in delivery. Delivery of nonviral DNA into mammalian genomes involves avoiding or traversing numerous barriers, including enzymes in the blood and cellular environments, the endothelial lining of vessel walls, cellular plasma membranes, endosomal membranes, nuclear membranes, and chromosomal integrity .
There are three delivery approaches that work across the nanoscale, microscale, and macroscale . Nanoscale delivery involves particles or complexes that are most often designed to be about 100 nm or less in diameter, although sizes up to 1 μm fit into this category. The nanoscale approach comprises delivery of single or small numbers of DNA molecules, which most often are collapsed by polycationic polymers (for example, polylysine and other modified amino acids, and various linear and branched forms of polyethylenimine, among others) or lipids, with or without various ligands (for review, see the report by Wagner and coworkers ). Some polycationic complexes are cytotoxic or unstable in the blood, which can be circumvented by encasing the complexes in polyethylene glycol . Alternative delivery routes are those at the microscale and macroscale, in which DNA in packages up to 10 μm are phagocytized (microscale) or enter cells via fusions with other cells or entities larger than 10 μm (macroscale).
In mice, the most effective method for in vivo gene transfer and expression has been demonstrated in hepatocytes using simple infusion of naked plasmid DNA under increased pressure. This can be accomplished by hydrodynamic delivery of DNA using high pressure/high volume injection [74, 75]. In mouse, this procedure involves injection of a large volume (10% volume/weight) of DNA/saline solution through the tail vein in less than 10 seconds. This procedure results in uptake of infused DNA into as many as 10% of hepatocytes in test animals [74, 75] by expanding and rupturing liver endothelium, which in mice heals within 24 to 48 hours . Achieving a clinically feasible method of local delivery to liver in large animals, including humans, is a challenge that is being addressed by more localized hydrodynamic delivery using specialized catheters or pressure cuffs [77, 78]. On the microscale, condensing DNA with polyamines such as polyethylenimine to a complex small enough to be taken up by cells into endosomes has been studied intensively [79, 80]. Our findings (Hackett PB, Podetz-Pedersen K, Bell JB, McIvor RS, unpublished data) suggest that gene expression following hydrodynamic delivery is about 100-fold more effective than delivery using polyethylenimine [81, 82] and only about 10-fold to 100-fold less effective than viral delivery to liver . Alternative delivery ex vivo using electroporation is under development and has been achieved in hematopoietic stem cells .
Since the development of the SB system, nonviral, integrating DNAs have established themselves as potential vectors for gene therapy. Following hydrodynamic delivery, transposons have been used in mice to cure hemophilias A and B [84, 85, 86, 87] and tyrosinemia type I [88, 89]. Other somatic delivery methods were used to ameliorate blistering skin disease (junctional epidermolysis bullosa) , retard glioma xenographs [91, 92], produce Huntingtin protein in a model of Huntington disease , and as a preventive treatment for lung allograft fibrosis . Based on the findings summarized above, we estimate that only about one in 10,000 SB transposons that are delivered to liver or lung actually transpose into chromatin (Hackett PB, unpublished data). Although this is a small fraction, it is possible to deliver more than 108 therapeutic cassettes to an animal in order to treat as many as 10% to 20% of liver cells with a single injection of plasmids [84, 88, 95]. This procedure is sufficient to cure diseases such as hemophilia and tyrosinemia type 1, and to ameliorate other diseases such as mucopolysaccharidoses types I and VII. Although quantifying the number of transposon insertions per cell has not been done because of the difficulty of cloning insertion sites in mostly nondividing cells in most organs of animals, the expression data are consistent with a single integration in most if not all transgene-expressing cells.
Properties of nonviral integrating vectors proposed for gene therapy
TA sites, random
Highly tested/cargo capacity decreases efficiency
Highly tested/induces chromosomal mutations and rearrangements
TTAA sites (genes)
Too new to evaluate/targets transcription units
Cured tyrosinemia type 1 in mice/may target genes, too new to evaluate
Cured PKU in mice/too new to evaluate
Too new to evaluate
Factors governing insertion site preferences and their variation among vectors
Viruses and transposons exhibit a wide range of variability with respect to preference for genes and transcriptional units. Several studies have mapped hundreds to thousands of insertions into human or mouse genomes, and correlated insertion positions with known genes. Many retroviruses exhibit a nonrandom preference for genes . This could be due to greater accessibility of the DNA in 'open' chromatin or interaction of integrase enzymes with cellular factors bound to transcriptional regulatory elements. In the case of HIV, the LEDGF/p75 transcriptional factor may act as a tether between the integrase and transcriptionally activated chromatin [100, 101, 102], which is similar to an idea that was proposed previously for designer targeting of integrating vectors [103, 104, 105]. In a similar approach using the SB transposon, Yant and coworkers  found that SB exhibited a much lower (although nonrandom) preference for genes. Although a preference for transcriptional units might seem beneficial for functional genomics studies, the myriad of recently identified noncoding RNA genes  (as well as other RNA product genes such as those encoding rRNA and tRNAs) involved in gene regulation may not be targeted by viral vectors that preferentially integrate into or near protein encoding genes. Targeting of various vectors to these non-coding RNAs in gene therapy, and any resulting deleterious effects, has not been extensively examined.
Many vectors appear to exhibit a preference for specific genes. In insertional mutagenesis studies, the identification of recurrent viral insertions into a specific group of genes was taken to mean that viral activation of these putative oncogenes in individual cells led to clonal expansion among a pool of cells in which every host gene was an equal target for integration (as discussed above for LMO2). However, when MLV insertions were mapped in normal HeLa cells that did not undergo any type of selection, oncogenic or otherwise, many of these same genes harbored recurrent integrations, suggesting that vectors may inherently target specific genes . The basis of this selection is not understood, but it may be similar to that discussed above for HIV.
In addition to general preferences for genes, many viral vectors, including retroviruses, lentiviruses, and adeno-associated virus, preferentially target transcriptional units or their promoters. MLV retroviruses have a preference for integration proximal to transcriptional initiation sites [64, 65, 108, 109, 110, 111], which is a problematic trait, considering that MLV-based vectors are the most commonly used vectors in human gene therapy . HIV and adeno-associated viruses have preferences for entire transcriptional units [100, 108, 111, 112, 113] (see Note added in proof, below); this is in contrast to MLV, which targets only the region proximal to promoters. Additionally, expression array studies have shown that HIV has a preference for transcriptionally active genes  as well as an avoidance of chromatin regions in which transcription is repressed .
In contrast to these viral vectors, SB transposons and avian leukosis virus (a retrovirus) apparently have only a slight preference for either transcriptional units or their regulatory elements [106, 115], with little or no preference for transcriptionally active genes . In one survey, SB exhibited an overall preference for microsatellite repeats, found primarily in noncoding regions , possibly due to the preferred target sites found in TA repeats . A study that correlated insertions sites with hundreds of genome annotations  illustrated the degree to which genomic features and primary sequence influenced vector integration preferences for several vectors (for example, the L1 and SB transposon insertions were much more influenced by primary sequence than were retroviral vectors). This study also found variable preferences between vectors for elements such as CpG islands, DNase I sensitive sites, and transcription factor binding sites. The recent identification of a periodic sequence encoding nucleosome positioning  may also correlate with vector integration patterns, because nucleosomes have been shown to affect patterns of retroviral integration . Similar studies to identify trends for piggyBac and Tol2 with respect to genome-wide integration preferences will be valuable in assessing the relative safety of these vectors for gene therapy.
Local insertional preferences: DNA sequence and structure
Although many vectors exhibit a preference for genes, and even specific genes, few vectors repeatedly integrate into the same precise position with any significant frequency. Rather, most genes harboring frequent insertions show a distribution of insertions into several positions within the same gene. Some vector integrases, such as those for phages φC31 [119, 120, 121], φBT1 , as well as the Escherichia coli Tn7 transposon , recognize specific DNA sequences or degenerate sequences that exist in mammalian genomes. SB integrates specifically at a TA dinucleotide, and the piggyBac transposon integrates into the sequence TTAA. Because the oncogenic potential of a vector is related to its propensity to integrate in or near a select few genes, understanding local parameters that affect integration may contribute to our ability to assess the risk associated with these vectors in gene therapy.
For retroviruses and the SB transposon, consensuses sequences have been described surrounding the sites of integration [111, 124, 125, 126, 127]. Although retroviruses do not exhibit a strong consensus sequence, the nonrandom pattern of integrations and the observation that frequently hit sites did not match the consensus sequences led investigators to examine other properties of DNA sequences surrounding target sites, including structural characteristics of the DNA itself. DNA structural characteristics are based on non-Watson and Crick interactions between nucleotides and encompass deformations to the regular double helix structure caused by interactions between adjacent, planar bases (Figure 2). Originally characterized from analysis of crystal structures of DNA bound to histones and other proteins, these characteristics include 'protein-induced DNA deformability', 'A-philicity', and trinucleotide 'bendability'. These properties underlie local variations in DNA structure that are probably relevant to recognition of DNA by transposases and integrases. Early investigations into insertion preferences showed that viruses preferred 'bent' DNA [118, 128, 129], and several groups have investigated secondary DNA structural patterns in sequences that flank mapped insertion sites for both transposons [115, 124, 130, 131] and retroviruses [111, 126] to determine general characteristics of the flanking sequence of 'preferred' integration sites. Similarly, the RAG1/2 protein complex, which has properties akin to the cut-and-paste transposases, recognizes a specific sequence/structure for recombination of antigen receptor genes .
Different DNA sequences may produce highly similar patterns of DNA secondary structure, and thus common structural patterns that are preferred for integration may be obscured by approaches that analyze sequence alone. Analysis of secondary structure for a DNA sequence is based on translation of a sliding window of two or three bases into structural values for each 'step'. For example, the tendency of a B-form helix to adopt the A-form (A-philicity; Figure 2) can be predicted by translating each consecutive (over-lapping) dinucleotide into one of 10 A-philicity values for the 16 combinations of base pair transitions [133, 134, 135]. Similarly, protein-induced deformability encompasses several changes in base pair orientation from a 'perfect B-form double helix' in a transition between two consecutive base pairs (Figure 2c). All of these changes can be expressed as a single composite parameter of protein-induced DNA deformability known as V step [136, 137, 138]. V step represents the physical relationships of any two planar base pairs in terms of their relative shifts and angular orientation. In contrast to A-philicity and protein-induced deformability, DNA bendability is best modeled using a sliding window of three bases, with 64 possible trinucleotide bendability values .
For SB, the observation of general structural trends surrounding insertion sites eventually led to the identification of a specific DNA structural pattern governing insertion preference. Vigdal and coworkers  observed that increased DNA deformability and A-philicity were features of a consensus sequence that flanked SB TA insertion sites. Subsequently, Liu and colleagues  mapped about 200 integrations into a relatively small 7 kilobase plasmid sequence and observed that some common integration sites did not share the consensus sequence. These results identified several 'preferred' TA dinucleotides that harbored recurrent integrations. These preferred integration sites exhibited a striking, specific pattern of alternating high and low deformability (V step ) values that were absent in TA sites and that were rarely, if ever, used. This led to the conclusion that SB transposase prefers a 'zigzag' V step pattern of DNA deformability , which was later confirmed on a larger, genomic scale . It remains unknown whether these patterns influence the recognition and binding of the SB transposase, catalysis of the transposon integration, or some other mechanistic factor.
This analysis was repeated for other vectors, including piggyBac, P-elements, and several retroviruses . However, only weak structural signatures were detected, which were no more informative than the weak consensus sequences previously identified. A key difference in the SB screen was the level of saturation of a small target, which allowed for the identification of highly preferred sites over nonpreferred TA dinucleotides. In contrast, the datasets for the other vectors were derived from a relatively small number of insertions into mammalian genomes, which were insufficient to obtain an initial set of preferred sequences. Because nonpreferred sites are likely to vastly outnumber preferred sites in the genome for most vectors, any genome-wide screen will produce a mix of indistinguishable preferred and nonpreferred sites. For example, we have estimated that of the approximately 200,000,000 TA sites in a human genome, only about 10% fall into the preferred category , although in the screen conducted by Yant and coworkers  189 out of 573 (33%) genomic SB insertions were classified as preferred sites. Analysis of the bendability of all SB sites mapped in the screen reported by Yant and coworkers shows a peak at the center of the insertion site that is defined by the central TA dinucleotide. However, when only the preferred sites are analyzed, the surrounding nucleotides exhibit a much greater level of bendability (Figure 3d). This effect is in spite of the fact that the preferred sites were identified based on protein-induced deformability, V step , which is distinct from DNA bendability. The lesson from these studies is that most genome-wide datasets (particularly from experiments involving some form of genetic selection) will probably show a similar dilution effect of preferred sites by greater numbers of nonpreferred sites.
Integration preference versus oncogenic selection
The second application of predicting profiles of vector insertions may be as part of a risk assessment program. Although current understanding of integration site preferences for most vectors is still inadequate to allow prediction of the probability of integration into specific genes, genome-wide integration datasets may suggest the likelihood that a vector will integrate within the general vicinity of a specific gene. Similarly, analysis of DNA structural characteristics may be used to assess the likelihood that each vector will integrate within specific regions of genes. For example, although Braf can act as a potent oncogene, the pattern of SB integrations into Braf suggest that integrations into a relatively small region of the gene (introns 11 and 12) are the most highly selected for oncogenesis, in spite of the presence of hotspots across the entire gene. Thus, the range of possible insertions that are capable of generating an oncogenic transcript, combined with the relative 'attractiveness' of the sequence across these regions, will dictate the chances of insertional activation.
An analysis of several structural characteristics is presented for the mouse c-myc gene (Figure 5), the human ortholog of which is activated in many cancers . The figure highlights the 3 kilobase region encompassing the promoter that harbors the bulk of oncogenic retroviral integrations at this locus that have been deposited in the Retroviral-Tagged Cancer Gene Database (RTCGD ). The sequence was divided into 50 base pair (bp) bins, and the total values for V step , A-philicity, jaggedness, and bendability were summed across each bin. Measured in 50 bp bins, these structural parameters are highly variable across the sequence, and vary independently from each other. Actual oncogenic retroviral insertions observed in insertional mutagenesis screens and deposited into the RTGCD are shown for comparison in Figure 5a. The profiles indicate two features of transposons under consideration for gene therapy. First, the most likely sites for SB transposons to integrate (Figure 5g) are shifted away from the most commonly found activation sites, as revealed by retroviral integrations (Figure 5a). Second, the profile of TTAA sites, required by the piggyBac transposon (Figure 5f), is similar to the preferred SB sites, and further shows that some regions harboring retroviral integrations contain no TTAA sequences, making piggyBac insertions into these sites impossible. Thus, at first approximation, it would appear that the transposons are less likely to insert close to the c-myc promoter than are retroviral vectors. In support of this, c-myc is infrequently hit in SB-based insertional mutagenesis screens; to date, only one c-myc integration has been deposited into the RTCGD. In contrast, many retroviral insertions into c-myc have been mapped, although the number of deposited retroviral insertions is much higher than the number of transposons.
The relative lack of SB insertions into c-myc may be due to either a paucity of favorable SB insertion sites in regions of the gene competent for oncogenic activation, or an overall lack of oncogenic selection for insertions into this gene. In support of the former, transposon-free amplification of c-myc was one of the few genomic aberrations observed in tumors harboring mobile transposons (Largaespada DA, Collier LC, Hackett CS, unpublished observations), suggesting that activation of c-myc plays a role in the biology of these tumors (there was probably oncogenic selection for the genomic amplicon). Similar ProTIS analysis of the LMO2 locus revealed the most preferential integration sites for SB transposons that were considerably farther away from the LMO2 promoter than mapped integrations by activating retroviruses . That said, it is evident that prediction of vector integration is not precise and even rare integrations into unfavorable sites have a potential to promote oncogenic expansion, as indicated in Figure 6.
Vector behavior in risk/outcome assessment: lessons from intentional oncogenic insertional mutagenesis
In spite of the inherent behavior of each integrating vector, existing evidence suggests that the oncogenic potential of any given vector can be attenuated depending on how it is used. As with retroviruses, the SB transposon has been used for functional genomics as well as for delivery of therapeutic genes in mouse models of inherited disease. These studies were motivated by two limitations of retroviruses for insertional mutagenesis: the limitation of viruses to infect specific cell types and the tendency of many viral vectors to insert near and activate a possibly limited number of genes . In two recent SB mutagenesis screens, a transgenic concatemer of T2/Onc transposons carried in the germlines of mice was remobilized in somatic cells by a trans-acting, transgenic SB transposase. The two screens differed in expression level, domains of expression, and activity of the SB transposase, as well as the copy number of the transposon concatemers [58, 59]. An important finding from the two studies was that the oncogenic potential of the same T2/Onc transposon vector, which was engineered specifically to activate oncogenes and cause cancers in mice, varied between no observable phenotype on one end and rapid development of severe cancer at birth on the other. The oncogenic effect was directly related to the number and types of cells at risk for transposon-induced mutations and perhaps the remobilization rates. The same properties may be relevant for a wide range of other gene therapy vectors.
Coupled with the lack of a preference to integrate near genes, the chances that an SB insertion of a therapeutic gene (in contrast to a genetic cassette designed to wreak havoc on transcriptional units) will activate a neighboring host gene would seem to be lower than for vectors that have an affinity to integrate into genes [65, 97]. This feature may be a disadvantage for SB-based functional genomics studies aimed at mutating genes, but it may be advantageous for gene therapy.
Engineering safer vectors
As an alternative to finding vectors that do not target genes, several groups are attempting to target vector integration to a specific region of the genome by generating integrase and SB transposase molecules that are fused to DNA-binding domains that recognize specific DNA sequences [143, 144]. It appears that targeting introduces a reduction in activity, without much increase in specificity of integration into specific sites in a mammalian genome [144, 145]. This is not surprising if the ability of SB transposase to integrate promiscuously into TA sites is not abridged. There are about 2 × 108 potential TA-dinucleotide SB integration sites into which SB transposons can integrate, of which it is estimated that 2 × 107 are preferred integration sites . Consequently, the chances of a sequence-specific targeting motif added to SB transposase actually guiding transposition to a specific, low-copy target sequence is expected to be extremely low compared with the chances of integrating into any of the millions of other available TA sites. Similarly, to overcome the risk for activation of neighboring genes following vector integration, self-inactivating vectors are being engineered to have diminished ability to activate genes over long distances [146, 147], although it is not clear whether these vectors will be safer . The φC31 phage integrase system targets relatively few sites in mammalian genomes [119, 149], but it appears to introduce a relatively high level of chromosomal recombination [149, 150, 151]. Thus, further development of safer vectors remains an open area of investigation.
Ultimately, functional genomics and gene therapy would like to answer the same question for any given vector (while hoping for opposite outcomes) - what are the chances of activating genes? There are four major factors influencing the answer, with each retroviral and transposon having different characteristics for each factor. First, what is the overall tendency of the vector to integrate into genes or promoters? Second, are there adequate local target sites around genes of interest to attract the vector? Third, over what distance can the vector activate a gene? Fourth, to what end can the integration activity be modulated to control the overall likelihood of hitting specific insertion sites close enough for activation of specific genes? Theoretically, knowing each of these variables for every vector would allow researchers to choose the vector with the most utility and lowest risk for the specific purpose intended. In gene therapy, these parameters translate into the risk for hitting a specific oncogene or tumor suppressor gene that could lead to a severe adverse effect. If, in the future, hotspots for integration of SB and other potential gene therapy vectors can be predicted, then we should be able to assess more accurately and modify the various risks for adverse effects from therapeutic vectors. This goal should be within reach in the coming years.
Note added in proof
We thank the Arnold and Mabel Beckman Foundation for support of our work and all members of the Beckman Center for Transposon Research for a long history of contributions of ideas and results. We appreciate the help of Drs Nik Somia and Marina O'Reilly in determining the number of gene therapy trials reviewed by the RAC. We are especially grateful to Dr Darius Balciunas and Kirk Wangensteen for sharing their Tol2 dataset, and to Drs David Largaespada and Lara Collier, as well as two reviewers, for discussions about the manuscript. The authors were supported by DOD fellowship BC050930 (CSH), and NIH grants T32 HD007480 (AMG) and 1PO1 HD32652-07 and R43 HL076908-01 (PBH).
This article has been published as part of Genome Biology Volume 8, Supplement 1, 2007: Transposons in vertebrate functional genomics. The full contents of the supplement are available online at http://genomebiology.com/supplements/8/S1.
- 2.Mitchell KJ, Pinson KI, Kelly OG, Brennan J, Zupicich J, Scherz P, Leighton PA, Goodrich LV, Lu X, Avery BJ, et al: Functional analysis of secreted and transmembrane proteins critical to mouse development. Nat Genet. 2001, 28: 198-200. 10.1038/90074.Google Scholar
- 4.Edelstein ML, Abedi MR, Wixon J, Edelstein RM: Gene therapy clinical trials worldwide 1989-2004-an overview. Gene Med. 2004, 6: 597-602. 10.1002/jgm.619.Google Scholar
- 6.Connelly JB: Lentiviruses in gene therapy clinical research. Gene Ther. 2002, 9: 1730-1743. 10.1038/sj.gt.3301893.Google Scholar
- 13.Yi Y, Hahm SH, Lee KH: Retroviral gene therapy: safety issues and possible solutions. Curr Gene Therap. 2005, 5: 25-35.Google Scholar
- 16.Kohn D, Sadelain M, Dunbar C, Bodine D, Kiem HP, Candotti F, Tisdale J, Riviere I, Blau CA, Richard RE, et al: American Society of Gene Therapy (ASGT) ad hoc subcommittee on retroviral-mediated gene transfer to hematopoietic stem cells. Mol Ther. 2003, 8: 180-187. 10.1016/S1525-0016(03)00212-0.PubMedGoogle Scholar
- 17.Hematti P, Hong BK, Ferguson C, Adler R, Hanawa H, Sellers S, Holt IE, Eckfeldt CE, Sharma Y, Schmidt M, et al: Distinct genomic integration of MLV and SIV vectors in primate hematopoietic stem and progenitor cells. PLoS Biol. 2004, 2: e423-10.1371/journal.pbio.0020423.PubMedPubMedCentralGoogle Scholar
- 18.Kiem HP, Sellers S, Thomasson B, Morris JC, Tisdale JF, Horn PA, Hematti P, Adler R, Kuramoto K, Calmels B, et al: Long-term clinical and molecular follow-up of large animals receiving retrovirally transduced stem and progenitor cells: no progression to clonal hematopoiesis or leukemia. Mol Ther. 2004, 9: 389-395. 10.1016/j.ymthe.2003.12.006.PubMedGoogle Scholar
- 19.Calmels B, Ferguson C, Laukkanen MO, Adler R, Faulhaber M, Kim HJ, Sellers S, Hematti P, Schmidt M, von Kalle C, et al: Recurrent retroviral vector integration at the Mds1/Evi1 locus in non-human primate hematopoietic cells. Blood. 2005, 106: 2530-2533. 10.1182/blood-2005-03-1115.PubMedPubMedCentralGoogle Scholar
- 21.Themis M, Waddington SN, Schmidt M, von Kalle C, Wang Y, Al-Allaf F, Gregory LG, Nivsarkar M, Themis M, Holder MV, et al: Oncogenesis following delivery of a non-primate lentiviral gene therapy vector to fetal and neonatal mice. Mol Ther. 2005, 12: 763-771. 10.1016/j.ymthe.2005.07.358.PubMedGoogle Scholar
- 23.Center for Biologics Evaluation and Research: Cellular & Gene Therapy. [http://www.fda.gov/cber/gene.htm]
- 25.Hacein-Bey-Abina S, Le Deist F, Carlier F, Bouneaud C, Hue C, De Villartay JP, Thrasher AJ, Wulffraat N, Sorensen R, Dupuis-Girod S, et al: Sustained correction of X-linked severe combined immunodeficiency by ex vivo gene therapy. N Eng J Med. 2002, 346: 1185-1193. 10.1056/NEJMoa012616.Google Scholar
- 26.Hacein-Bey-Abina S, von Kalle C, Schmidt M, Le Deist F, Wulffraat N, McIntyre E, Radford I, Villeval JL, Fraser CC, Cavazzana-Calvo M, et al: A serious adverse event after successful gene therapy for X-linked severe combined immunodeficiency. N Engl J Med. 2003, 348: 255-256. 10.1056/NEJM200301163480314.PubMedGoogle Scholar
- 32.McCormack MP, Forster A, Drynan L, Pannell R, Rabbitts TH: The LMO2 T-cell oncogene is activated via chromosomal translocations or retroviral insertion during gene therapy but has no mandatory role in normal T-cell development. Mol Cell Biol. 2003, 23: 9003-9013. 10.1128/MCB.23.24.9003-9013.2003.PubMedPubMedCentralGoogle Scholar
- 33.Dave' UP, Jenkins NA, Copeland NG: Gene therapy insertional mutagenesis insights. Science. 2004, 303: 33-10.1126/science.1091667.Google Scholar
- 38.Thrasher AJ, Gaspar HB, Baum C, Modlich U, Schambach A, Candotti F, Otsu M, Sorrentino B, Scobie L, Cameron E, et al: Gene therapy: X-SCID transgene leukaemogenicity. Nature. 2006, 443: E5-10.1038/nature05219.Google Scholar
- 39.Schmidt M, Carbonaro DA, Speckmann C, Wissler M, Bohnsack J, Elder M, Aronow BJ, Nolta JA, Kohn DB, von Kalle C: Clonality analysis after retroviral-mediated gene transfer to CD34+ cells from the cord blood of ADA-deficient SCID neonates. Nat Med. 2003, 9: 463-468. 10.1038/nm844.PubMedGoogle Scholar
- 40.Aiuti A, Ficara F, Cattaneo F, Bordignon C, Roncarolo MG: Gene therapy for adenosine deaminase deficiency. Curr Opin Allergy Clin Immunol. 2004, 3: 461-466. 10.1097/00130832-200312000-00007.Google Scholar
- 41.Gaspar HB, Bjorkegren E, Parsley K, Gilmour KC, King D, Sinclair J, Zhang F, Giannakopoulos A, Adams S, Fairbanks LD, et al: Successful reconstitution of immunity in ADA-SCID by stem cell gene therapy following cessation of PEG-ADA and use of mild preconditioning. Mol Ther. 2006, 14: 505-513. 10.1016/j.ymthe.2006.06.007.PubMedGoogle Scholar
- 42.Ott MG, Schmidt M, Schwarzwaelder K, Stein S, Siler U, Koehl U, Glimm H, Kuhlcke K, Schilz A, Kunkel H, et al: Correction of X-linked chronic granulomatous disease by gene therapy, augmented by insertional activation of MDS1-EVI1, PRDM16 or SETBP1. Nature Med. 2006, 5: 401-409.Google Scholar
- 43.Gaspar HB, Parsley KL, Howe S, King D, Gilmour KC, Sinclair J, Brouns G, Schmidt M, Von Kalle C, Barington T, et al: Gene therapy of X-linked severe combined immunodeficiency by use of a pseudotyped gammaretroviral vector. Lancet. 2005, 364: 2181-2187. 10.1016/S0140-6736(04)17590-9.Google Scholar
- 48.Wu X, Luke BT, Burgess SM: Redefining the common insertion site. Virol. 2006, 344: 292-295. 10.1016/j.virol.2005.08.047.Google Scholar
- 50.Ivics Z, Izsvak Z: Transposable elements for transgenesis and insertional mutagenesis in vertebrates: a contemporary review of experimental strategies. Meth Mol Biol. 2004, 260: 255-276.Google Scholar
- 51.Hackett PB, Ekker SC, Largaespada DA, McIvor RS: Sleeping Beauty transposon-mediated gene therapy for prolonged expression. Adv Genet. 2005, 54: 187-229.Google Scholar
- 52.Hackett PB, Ekker SE, Essner JJ: Applications of transposable elements in fish for transgenesis and functional genomics. Fish Development and Genetics. Edited by: Gong Z, Korzh V. 2004, Hackensack, NJ, USA: World Scientific, Inc, 532-580.Google Scholar
- 73.Merdan T, Kunath K, Petersen H, Bakowsky U, Voigt KH, Kopecek J, Kissel T: PEGylation of poly(ethylene imine) affects stability of complexes with plasmid DNA under in vivo conditions in a dose-dependent manner after intravenous injection into mice. Bioconjugate Chem. 2006, 16: 785-792. 10.1021/bc049743q.Google Scholar
- 85.Ohlfest JR, Frandsen JL, Fritz S, Lobitz PD, Perkinson SG, Clark KJ, Nelsestuen G, Key NS, McIvor RS, Hackett PB, et al: Phenotypic correction and long-term expression of factor VIII in hemophilic mice by immunotolerization and nonviral gene transfer using the Sleeping Beauty transposon system. Blood. 2005, 105: 2691-2698. 10.1182/blood-2004-09-3496.PubMedGoogle Scholar
- 86.Baus J, Liu L, Heggestad AD, Sanz S, Fletcher BS: Correction of murine hemophilia a by hematopoietic stem cell gene therapy. Mol Ther. 2005, 12: 1034-1042. 10.1016/j.ymthe.2005.06.484.Google Scholar
- 89.Balciunas D, Wagensteen KJ, Wilber AC, Bell JB, Geurts AM, Sivasubbu S, Wang X, Hackett PB, Largaespada DA, McIvor RS, et al: Harnessing an efficient large cargo-capacity transposon for vertebrate gene transfer applications. PLoS Genet. 2006, 4: e169-10.1371/journal.pgen.0020169.Google Scholar
- 90.Ortiz S, Lin Q, Yant SR, Keene D, Kay MA, Khavari PA: Sustainable correction of junctional epidermollysis bullosa via transposon-mediated nonviral gene transfer. Gene Ther. 2003, 10: 1099-1104. 10.1038/sj.gt.3301978.Google Scholar
- 92.Ohlfest JR, Demorest ZL, Motooka Y, Vengco I, Oh S, Chen E, Scappaticci FA, Saplis RJ, Ekker SC, Low WC, et al: Combinatorial anti-angiogenic gene therapy by nonviral gene transfer using the Sleeping Beauty transposon causes tumor regression and improves survival in mice bearing intracranial human glioblastoma. Mol Ther. 2005, 12: 778-788. 10.1016/j.ymthe.2005.07.689.PubMedGoogle Scholar
- 95.Aronovich EL, Bell JB, Belur LR, Gunther R, Koniar B, Erickson DC, Schachern PA, Matise I, McIvor RS, Whitley CB, et al: Sleeping Beauty transposon-mediated gene therapy in the murine models of mucopolysaccharidoses (MPS) Type I and MPS Type VII. J Gene Med. 2007, 9: 403-415. 10.1002/jgm.1028.PubMedPubMedCentralGoogle Scholar
- 99.Berry C, Hannenhalli S, Leipzig J, Bushman FD: Selection of target sites for mobile DNA integration in the human genome. PLoS Comp Biol. 2006, 2: e157-10.1371/journal.pcbi.0020157.Google Scholar
- 107.Eddy SR: Non-coding RNA genes and the modern RNA world. Nat Rev Genet. 2006, 2: 919-925. 10.1038/35103511.Google Scholar
- 108.Mitchell RS, Beitzel BF, Schroder AR, Shinn P, Chen H, Berry CC, Ecker JR, Bushman FD: Retroviral DNA integration: ASLV, HIV, and MLV show distinct target site preferences. PLoS. 2004, 2: 1127-1136.Google Scholar
- 109.Laufs S, Nagy KZ, Giordano F, Hotz-Wagenblatt A, Zeller WJ, Fruehauf S: Insertion of retroviral vectors in NOD/SCID repopulating human peripheral blood progenitor cells occurs preferentially in the vicinity of transcription start regions and in introns. Mol Ther. 2004, 10: 874-881. 10.1016/j.ymthe.2004.08.001.PubMedGoogle Scholar
- 114.Lewinski MK, Bisgrove D, Shinn P, Chen H, Hoffmann C, Hannenhalli S, Verdin E, Berry CC, Ecker JR, Bushman FD: Genome-wide analysis of chromosomal features repressing human immunodeficiency virus transcription. J Virol. 2005, 79: 6610-6619. 10.1128/JVI.79.11.6610-6619.2005.PubMedPubMedCentralGoogle Scholar
- 116.Geurts AM, Yang Y, Clark KJ, Cui Z, Dupuy AJ, Largaespada DA, Hackett PB: Gene transfer into genomes of human cells by the Sleeping Beauty transposon system. Mol Therap. 2003, 8: 108-117. 10.1016/S1525-0016(03)00099-6.Google Scholar
- 117.Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, Moore IK, Wang JP, Widom J: A genomic code for nucleosome positioning. Nature. 2005, 442: 772-778. 10.1038/nature04979.Google Scholar
- 124.Vigdal TJ, Kaufman CD, Izsvak Z, Voytas DF, Ivics Z: Common physical properties of DNA affecting target site selection of Sleeping Beauty and other Tc1/mariner transposable elements. J Mol Biol. 2002, 323: 411-452. 10.1016/S0022-2836(02)00991-9.Google Scholar
- 127.Grandgennet DP: Symmetrical recognition of cellular DNA target sequences during retroviral integration. Proc Nat Acad Sci USA. 2005, 102: 5903-5904. 10.1073/pnas.0502045102.Google Scholar
- 142.Retrovirus Tagged Cancer Gene Database. [http://rtcgd.abcc.ncifcrf.gov/]
- 144.Yant SR, Huang Y, Akache B, Kay MA: Fusion proteins consisting of the Sleeping Beautytransposase and the polydactyl zinc finger protein hE2C direct transposon integration into a unique human chromosomal sequence. Nucleic Acids Res. 2007, Google Scholar
- 146.CPMP: Insertional mutagenesis and oncogenesis: update from non-clinical and clinical studies. Gene Therapy Expert Group of the Committee for Proprietary Medical Products (CPMP). J Gene Med. 2004, 6: 127-129. 10.1002/jgm.466.Google Scholar
- 147.Levine BL, Humeau LM, Boyer J, Macgregor RR, Rebello T, Lu X, Binder GK, Slepushkin V, Lemiale F, Mascola JR, Bushman FD, et al: Gene transfer in humans using a conditionally replicating lentiviral vector. Proc Natl Acad Sci USA. 2006, 103: 17372-17377. 10.1073/pnas.0608138103.PubMedPubMedCentralGoogle Scholar
- 155.Mouse Retrovirus Tagged Cancer Gene Database. [http://rtcgd.abcc.ncifcrf.gov/]