1 Introduction

Agrobacterium tumefaciens is a soil-borne bacterium well known for its unique ability of inter-kingdom horizontal gene transfer. In nature, this plant pathogen causes the disease crown gall (Smith and Townsend 1907). The disease is characterized by galls appearing at the plant’s root, stem, or crown area. These galls are tumor growths that form as a result of a transfer of a region of the Agrobacterium tumor-inducing (Ti) plasmid into a plant cell and stable integration of this DNA into the plant genome (Zaenen et al. 1974; Chilton et al. 1977; Fig. 1). The transferred DNA (T-DNA) of the Ti plasmid carries genes that cause uncontrolled cell divisions by modifying the plant’s hormonal balance, whereas other genes encode proteins involved in the production of opines, compounds that are utilized by the Agrobacterium colonies surrounding the galls.

Fig. 1
figure 1

Adapted from Singer (2013) doctoral dissertation

Schematic illustration of a model describing DNA transfer from Agrobacterium to the plant cell (see relevant book chapters for further details).

Agrobacterium has been extensively studied because of its role as a “natural genetic engineer.” Moreover, Agrobacterium has been harnessed by humans as a gene vector to genetically engineer plants. Agrobacterium-mediated transformation was the first method to generate transgenic plants (Barton et al. 1983; Zambryski et al. 1983). Three decades later, this bacterium is still a key player in many of the plant molecular genetics techniques used in agricultural biotechnologies (reviewed in Shiboleth and Tzfira 2012; Altpeter et al. 2016).

Whereas early events during Agrobacterium infection are relatively well studied, the final events in the plant nucleus are relatively less understood. Notably, the mechanism behind T-DNA integration into the plant genome is still unknown. Consequently, investigators often adopt different models to explain T-DNA integration. For instance, the process of T-DNA integration resulting in chromosome truncation has been explained in two different ways recently. In Teo et al. (2011), T-DNA is a single-stranded (ss) molecule during integration, whereas in Nelson et al. (2011) T-DNA is a double-stranded (ds) molecule. Moreover, according to the model in Teo et al. (2011), bacterial VirD2 is involved in integration, whereas according to Nelson et al. (2011), integration is mediated by plant host proteins. Thus, the literature may present conflicting models for T-DNA integration which are based on three decades of research evidence leading to somewhat conflicting conclusions. This review describes the various open questions that contribute to current models of T-DNA integration.

1.1 The Transfer of T-DNA

T-DNA integration requires the transfer of T-DNA from Agrobacterium into the plant cells. T-DNA transfer is a process related to bacterial conjugation (for review, see Lessl and Lanka 1994; Christie et al. 2005). Transfer begins inside the bacterium when T-DNA is separated from its parent plasmid, a Ti plasmid in natural strains or a binary plasmid in laboratory strains. The separation of T-DNA is initiated when a protein complex of VirD1, a helicase, and VirD2, an endonuclease, attaches to the left border (LB) and right border (RB) of a T-DNA region (Durrenberger et al. 1989; Scheiffele et al. 1995; Relic et al. 1998) (Fig. 2). The LB and RB are 25 base pairs (bp) of imperfect direct repeats (Yadav et al. 1982). VirD2 nicks the lower DNA strand between the third and fourth nucleotides of each of these repeats (Fig. 2) (Yanofsky et al. 1986; Wang et al. 1987). Consequently, a single-stranded (ss) T-DNA, termed the T-strand, is generated from the parent plasmid (Albright et al. 1987). A single VirD2 protein remains covalently attached to the 5′ end of the ss T-DNA (the RB side) (Herrera-Estrella et al. 1988; Ward and Barnes 1988; Young and Nester 1988; Vogel and Das 1992) (Fig. 2).

Fig. 2
figure 2

Adapted from Singer (2013) doctoral dissertation

Schematic illustration of T-DNA processing in Agrobacterium. a Only part of the plasmid (Ti or binary) is illustrated in the figure. The T-DNA region in the plasmid is marked in red (the lower DNA strand is processed and transferred). VirD1 (marked in black circle) and VirD2 (marked in purple circle) bind the left border (LB) and right border (RB) of the T-DNA region. b VirD2 nicks between the third and fourth nucleotide of each border (the 25 base pair DNA sequence of the LB and RB is illustrated; the nicking site is indicated by the scissors). c After the single-stranded T-DNA is separated from the parent plasmid, VirD2 protein remains attached to the 5ʹ end of the T-DNA (the RB of the T-DNA).

VirD2 protein pilots the ss T-DNA from its 5ʹ end through the Agrobacterium type IV protein secretion (T4S) system into the plant cytoplasm (Vergunst et al. 2005; van Kregten et al. 2009) (Fig. 1). In the plant cytoplasm, before entering the nucleus, the transported ss T-DNA is thought to be coated by multiple VirE2 ss DNA binding proteins (Citovsky et al. 1988; Das 1988; Abu-Arish et al. 2004), which are secreted into the plant cell from Agrobacterium independent of T-DNA (Otten et al. 1984; Citovsky et al. 1992; Binns et al. 1995; Li and Pan 2017). In addition, it is believed that other bacterial and plant proteins interact with ViE2 and VirD2 in the formation of a “T-complex” (for reviews, see Lacroix et al. 2006; Gelvin 2010). The proposed role of the T-complex is to protect the ss T-DNA from degradation (Durrenberger et al. 1989; Tinland et al. 1995; Rossi et al. 1996) and to facilitate its transport into the nucleus. There are a number of facilitators of T-complex nuclear transport. VirD2 facilitates nuclear transport by its nuclear localization signal (NLS) domain (Herrera-Estrella et al. 1990; Shurvinton et al. 1992; Howard et al. 1992; Tinland et al. 1992; Ziemienowicz et al. 2001; van Kregten et al. 2009). Similarly, VirE2 may facilitate T-complex transport via its two NLS domains (Citovsky et al. 1992, 1994; Zupan et al. 1996; Ziemienowicz et al. 2001). However, a number of groups have reported that VirE2 remains mostly in the cytoplasm (Bhattacharjee et al. 2008; Grange et al. 2008; Lee et al. 2008; Sakalis et al. 2014; Shi et al. 2014). Another proposed facilitator of nuclear transport of the T-complex is the VirE2-interacting protein 1 (VIP1), a plant transcription factor that enters the nucleus upon activation of the defense response (Tzfira et al. 2001; Li et al. 2005a; Djamei et al. 2007; Pitzschke et al. 2009; Wang et al. 2017). VirE3, which binds VirE2 in the plant cytoplasm, may substitute for VIP1 to facilitate nuclear transport (Lacroix et al. 2005). Recently, however, Shi et al. (2014) concluded that VIP1 is not important for Agrobacterium-mediated transformation. Finally, nuclear transport of the T-complex may be facilitated by additional host nuclear transporters that interact with T-complex components (Ballas and Citovsky 1997; Lacroix et al. 2005; Bako et al. 2003; Bhattacharjee et al. 2008).

1.2 Stable and Transient T-DNA in the Plant Cell

In the plant nucleus, T-complex proteins must be stripped off the ss T-DNA before integration. It has been hypothesized that stripping off these proteins from the T-complex is mediated by VirF and the host proteasomal degradation machinery (Schrammeijer et al. 2001; Tzfira et al. 2004a, b; Zaltsman et al. 2013). It is known that several T-DNA molecules can enter the plant cell simultaneously (Virts and Gelvin 1985). Whereas the number is unknown and likely varies under different conditions, it has been shown that the percentage of T-DNA molecules in the nucleus that eventually integrate into the plant genome is relatively low (Narasimhulu et al. 1996; Maximova et al. 1998; De Buck et al. 2000; Ghedira et al. 2013). Integration of T-DNA into the plant genome results in stable genetic transformation, whereas T-DNA that does not integrate into the plant genome results in transient genetic transformation.

Stable genetic transformation by Agrobacterium-mediated genetic transformation is the preferred method used by plant biologists to generate transgenic plants. Commonly, the desired DNA sequence is cloned between T-DNA borders. Consequently, after T-DNA integrates into the plant genome, a transgenic plant is produced. The Agrobacterium strains used for biotechnological applications are themselves genetically modified. This modification includes removal of the natural tumor-inducing genes from T-DNA so that the strains become “disarmed.” However, the ability of disarmed strains to transfer a modified T-DNA is unaffected because the only elements on the T-DNA that are necessary for T-DNA transfer are the T-DNA left border (LB) and right border (RB) (Fig. 2) (Hoekema et al. 1983; Ream et al. 1983; Wang et al. 1984). Moreover, in order to make genetic engineering simpler, T-DNA is often placed on a smaller binary plasmid instead of the natural Ti plasmid because the former is easier to work with and can replicate in E. coli as well as in Agrobacterium (Hoekema et al. 1983; for review, see Tzfira and Citovsky 2006). In addition to introducing desired genes into the plant genome, T-DNA has also been instrumental for the creation of large mutant and enhancer trap libraries because integration occurs randomly in the plant genome. These T-DNA insertion collections have been especially important for studies of Arabidopsis and rice (Sessions et al. 2002; Sallaud et al. 2004; O’Malley and Ecker 2010).

Transient expression of foreign genes is another method often used for manipulating plant genomes. Gene expression of those T-DNA molecules lasts for a few days before the genes are silenced (Johansen and Carrington 2001). Manipulating plant genomes can be achieved by transient expression of engineered nucleases such as meganucleases, ZFNs, TALENs, and Cas9. These nucleases have been used to target specific genomic sites and create double-strand breaks (DSBs) required for gene editing (for review, see Kumar and Jain 2015; Yin et al. 2017). In addition to its use for genomic modifications, Agrobacterium-mediated transient expression is an important investigative tool for plant biologists. For example, transient expression is commonly used to investigate cellular localization of proteins or to produce and isolate proteins in planta (Sparkes et al. 2006). Recently, transient expression by Agrobacterium-mediated transformation has been applied commercially using plants as factories for products such as vaccines and antibodies (for review, see Ko et al. 2009; Komarova et al. 2010).

In nature, T-DNA integration occurs during Agrobacterium infection of certain dicotyledonous plants and gymnosperms. However, under laboratory conditions, scientists have harnessed Agrobacterium to transform an increasing variety of plants, including monocotyledonous plant species. Agrobacterium-mediated transformation has also been successfully applied for transformation of non-plant eukaryotes (for review, see Soltani et al. 2010), such as yeast (Bundock et al. 1995; Bundock and Hooykaas 1996; Rolloos et al. 2014, 2015; Ohmine et al. 2016) and other fungi (de Groot et al. 1998; Korn et al. 2015), as well as for human cells (Kunik et al. 2001). Whereas studying T-DNA integration in non-plant organisms may contribute to understanding T-DNA integration in plants, T-DNA integration in non-plant organisms may involve mechanisms and enzymatic pathways that differ from that of T-DNA integration into plants. Therefore, the following review focuses on T-DNA integration in plants.

2 The Mechanism of T-DNA Integration

The evidence for much of our understanding of T-DNA integration has been facilitated by post-integration sequence analysis of T-DNA/plant genome junctions. This approach has been important for the development of the early models of T-DNA integration because it revealed the general patterns of T-DNA insertions (Mayerhofer et al. 1991; Gheysen et al. 1991). One of the earliest observations was that T-DNA integrates at random locations in the genome (Chyi et al. 1986; Gheysen et al. 1987). This topic is further discussed under the section “Which is the genomic site prerequisite for T-DNA integration”.

DNA sequencing of the junctions between the integrated T-DNA and the surrounding plant genome also revealed that no homology, or only a few homologous nucleotides (nt) at the junction point, existed (Mayerhofer et al. 1991; Gheysen et al. 1991). Therefore, it was evident that homologous recombination is normally not involved in T-DNA integration, and the terms “illegitimate” recombination (IR) and nonhomologous recombination (NHR) have been used to describe T-DNA integration in plants (Mayerhofer et al. 1991; Gheysen et al. 1991; Bleuyard et al. 2006). More recently, the nonhomologous end-joining (NHEJ) DNA repair pathway of plants is frequently mentioned as the likely pathway responsible for T-DNA integration. The NHEJ pathway is typically associated with a DNA repair pathway which is responsible for end joining between double-stranded DNA ends such as those present at genomic double-strand breaks (DSBs). Therefore, the NHEJ pathway may not describe well a model involving a single-stranded T-DNA intermediate. “Does a T-DNA integrate into the plant genome as a single- or a double-stranded intermediate” is discussed later on. It should be noted that the NHEJ repair pathway is usually associated with key enzymatic components such as the Ku70/Ku80 heterodimer (Critchlow and Jackson 1998). Nevertheless, recent studies have revealed the existence of additional DSBs repair pathways, often described as alternative NHEJ (A-NHEJ) or microhomology-mediated end joining (MMEJ), employing different enzymatic pathways and mechanisms (for review, see McVey and Lee 2008; Bleuyard et al. 2006). Recently, van Kregten et al. (2016) showed that DNA polymerase theta (pol θ) has an important role in T-DNA integration. This topic is further discussed under the section “What are the bacterial and plant factors involved in T-DNA integration?”

T-DNA integration is neither a “precise” nor a “clean” process (e.g., Kwok et al. 1985; Spielmann and Simpson 1986). It is not precise because T-DNA seldom preserves its two borders after integration in plants. It is not clean because insertions often include other DNA sequences from Agrobacterium. Commonly, the extra DNA sequences are derived from the parent plasmid (Ti or binary; Martineau et al. 1994; Kononov et al. 1997) but may also include DNA from unknown and known sources such as Agrobacterium chromosomal DNA (Ulker et al. 2008; Kleinboelting et al. 2015) and plant DNA (Kleinboelting et al. 2015). In addition, it is common for insertion sites to include two or more T-DNA molecules adjacent to each other (Cluster et al. 1996; Krizkova and Hrouda 1998; De Buck et al. 2000). The integration patterns differ under different experimental conditions and plant species (Grevelding et al. 1993; De Buck et al. 2009). In addition, transformed plants may contain more than a single T-DNA insertion site; each insertion may contain a single copy of T-DNA, or a cluster of T-DNA copies (e.g., Alonso et al. 2003; Rosso et al. 2003). Finally, major chromosomal aberrations may result from T-DNA integration (Nacry et al. 1998; Tax and Vernon 2001; Clark and Krysan 2010). The topic of “Why and how do complex T-DNA insertions form?” is integral to the question of T-DNA integration.

Finally, T-DNA integration models that include depictions of T-DNA vary in different publications. Often T-DNA is depicted as a straight line. In addition, the timing at which T-DNA ends interact with the target genome varies between the different models (for recent reviews of T-DNA integration, see Ziemienowicz et al. 2010; Windels et al. 2010; Magori and Citovsky 2011; Gelvin 2017). This topic is further discussed under the section “What is the spatial and temporal arrangement of T-DNA during integration?

The following review attempts to explain the bigger question of T-DNA integration by presenting smaller questions. This has been done for the purpose of discussion. However, it should be emphasized that T-DNA integration may be mediated by different pathways under different conditions, and perhaps simultaneously. It is most likely a complex and multistep process. As such, the different questions related to T-DNA integration are interrelated and may have more than a single answer. Correspondingly, whereas several major models have been proposed (reviewed in Tzfira et al. 2004a, b), when adopting different assumptions for different open questions, more models are possible.

3 The Major Unresolved Questions Related to the Mechanism of T-DNA Integration

3.1 Does T-DNA Integrate into the Plant Genome as a Single- or a Double-Stranded Intermediate?

T-DNA enters the plant nucleus as a single-stranded (ss) DNA but it is ultimately a double-stranded (ds) DNA when it becomes part of the host genome. However, without being able to visualize the integration process as it occurs, it is difficult to determine the timing of conversion from ss T-DNA to ds T-DNA. Mayerhofer et al. (1991) and Gheysen et al. (1991) discussed this question when proposing models for T-DNA integration via a mechanism of illegitimate recombination. According to the proposed ds T-DNA integration model, conversion from ss to ds T-DNA occurs extrachromosomally. Therefore, when T-DNA begins integration into the plant’s genome, it is already a ds T-DNA intermediate (Mayerhofer et al. 1991). On the other hand, according to the proposed ss T-DNA integration model, the integration process begins with an ss T-DNA intermediate and the conversion to ds T-DNA happens during integration (Mayerhofer et al. 1991; Gheysen et al. 1991).

The ss T-DNA integration model was refined by Tinland et al. (1995) and became widely accepted soon thereafter (Tinland et al. 1995; Tinland 1996). According to this model, integration begins when the LB side of the ss T-DNA (the 3ʹ end) anneals to homologous sequences in the plant DNA, possibly by invading A-T-rich regions of melted chromosomal DNA (Brunaud et al. 2002). This annealing through homology may not include parts of the sequences at the 3ʹ distal end of the LB side, resulting in the loss of some of the 3ʹ side of the T-DNA due to exonuclease degradation. Next, the RB side of the ss T-DNA (the 5ʹ end) ligates to the 3ʹ end of the plant DNA. Unlike the 3ʹ end of the LB side, the RB is protected from exonuclease degradation by VirD2. VirD2 may also be involved in ligation of the 5ʹ ss T-DNA end to a 3ʹ end of the plant DNA. Several observations from different early studies support the ss T-DNA integration model: (a) T-DNA enters the nucleus as ss DNA molecule. Moreover, extrachromosomal recombination assays suggested that the T-DNA derivatives inside the plant nucleus are mainly ss T-DNA molecules (Tinland et al. 1994; Yusibov et al. 1994); (b) When ss DNA was introduced into plant protoplasts, the integration rate was comparable to (Furner et al. 1989) or higher than (Rodenburg et al. 1989) that of ds DNA; (c) The deletions of T-DNA post-integration are usually more severe at the LB side in comparison to the RB side (e.g., Tinland 1996; Kumar and Fladung 2002; Kim et al. 2003; Zhang et al. 2008); d) The junctions between the T-DNA LB side and plant DNA after integration have been shown to contain higher microhomology levels compared to junctions involving the RB side of the T-DNA (e.g., Matsumoto et al. 1990; Tinland et al. 1995; Brunaud et al. 2002; Kim et al. 2003; Zhu et al. 2006; Thomas and Jones 2007); (e) There is evidence, although inconclusive, that VirD2 is involved in T-DNA integration (Pansegrau et al. 1993; Scheiffele et al. 1995; Mysore et al. 1998).

It should be noted that the ss T-DNA integration model could, in principle, apply to a T-DNA with ss DNA overhangs and a ds DNA internal body (Gheysen et al. 1991), although usually an ss T-DNA intermediate has been assumed (Gheysen et al. 1991; Tinland 1996; Brunaud et al. 2002; Meza et al. 2002). Moreover, the role of VirD2 in integration is inconclusive, and therefore the possible T-DNA intermediates depicted in Fig. 3 are shown with and without VirD2 attached to the 5ʹ end of the RB (Fig. 3).

Fig. 3
figure 3

Adapted from Singer (2013) doctoral dissertation

Possible configurations of the T-DNA integration intermediate. a Single-stranded (ss) T-DNA (also termed T-strand). The 5ʹ end is always the RB side, while the 3ʹ end is always the LB side. Illustration underneath demonstrates VirD2 (in purple) attached to the 5ʹ end. b Double-stranded (ds) T-DNA with blunt ends. Illustration underneath demonstrates VirD2 (in purple) attached to the 5ʹ end of the RB side. c Double-stranded (ds) T-DNA internal body with 3ʹ single-stranded overhangs. Illustration underneath demonstrates VirD2 (in purple) attached to the 5ʹ end of the RB side and protecting the end from possible resection.

The T-DNA integration model involving a ds intermediate is supported by evidence that T-DNA integration is linked to the repair of genomic DNA double-strand breaks (DSBs). The first evidence came from the observation that, when genomic DSBs are induced in protoplasts by X-ray irradiation, integration of foreign plasmid DNA is in enhanced (Kohler et al. 1989). Salomon and Puchta (1998) showed that when genomic site-specific DSBs are induced by Agrobacterium-mediated transient expression of the homing endonuclease I-SceI, DSBs are often repaired with a T-DNA captured within the repaired break. Moreover, early studies involving the sequencing of the junctions between T-DNA and plant DNA (e.g., Gheysen et al. 1991; Mayerhofer et al. 1991; Ohba et al. 1995; Takano et al. 1997) revealed patterns similar to those found in later studies of the mechanisms of DNA DSBs repair in plants (Gorbunova and Levy 1997; Salomon and Puchta 1998). The patterns of DSB repair in plants exhibited the characteristics of illegitimate/nonhomologous recombination. These included DNA deletions close to the breaks, repeated sequences or DNA from an unknown source (“filler” DNA), and little or no homology between DNA sequences forming the junctions. The notion that T-DNA integrates at genomic DSBs favors a model of a ds T-DNA as an intermediate because repair of DSBs involves end joining between two ds DNA ends. Moreover, Tzfira et al. (2003) and Chilton and Que (2003) provided evidence that T-DNAs captured at genomic DSBs were already ds intermediates prior to integration.

Evidence supporting the ds T-DNA model is also derived from the common formation of complex T-DNA insertions, in particular, complex insertions that include two T-DNAs ligated at their LB–LB sides or RB–RB sides without any microhomology within the ligated junction. The reason that this arrangement is difficult to explain via an ss model is that direct LB–LB end joining (“tail-to-tail” ligation) or RB–RB end joining (“head-to-head” ligation) cannot occur directly between the transferred ss T-DNA because there are always 3ʹ ends at the LB side and 5ʹ ends at the RB side (Gheysen et al. 1991; Mayerhofer et al. 1991; De Neve et al. 1997). In addition, extrachromosomal double-stranded circular T-DNA structures (T-circles) from Agrobacterium-infected plants have been isolated (Singer et al. 2012). By analyzing the DNA sequences of the extrachromosomal structures, it was found that the DNA junctions within the structures show the characteristic patterns of repaired DSBs. Importantly, it was possible to study the complete structure of the molecules (a feat more difficult to achieve in a genomic background when complex DNA repeats are involved). The structures included configurations such as multiple T-DNA copies arranged adjacent to each other or binary vector fragments attached to T-DNA sequences. Such structures are common post-T-DNA integration in transgenic plants. For example, according to different reports (e.g., Castle et al. 1993; Rios et al. 2002; De Buck et al. 2009), the integration of T-DNAs in clusters of two or more copies can account for about 50% of the integration events, and about 30–70% of events include sequences from the T-DNA parent binary plasmid (e.g., Martineau et al. 1994; Kononov et al. 1997; Nicolia et al. 2017). Therefore, these same structures were captured as ds DNA molecules before integration supports a notion that most T-DNA molecules integrate as ds T-DNA intermediates.

If T-DNA integrates as a ds T-DNA intermediate, then an important question is what mechanism accounts for the synthesis of the complementary strand. Liang and Tzfira (2013) showed that oligonucleotides can efficiently interact with the ss T-DNA and convert ss T-DNA to ds T-DNA molecules. Whereas the mechanism of this conversion is still unknown, it is has been shown that introduction of ss DNA into protoplasts using either electroporation or polyethylene glycol resulted in rapid synthesis of the complementary strand (Rodenburg et al. 1989; Furner et al. 1989). Therefore, this process can be mediated entirely by the plant DNA repair machinery. There is also evidence for the existence of extrachromosomal ds T-DNA molecules after Agrobacterium infection. The first piece of evidence is the rapid and broad transient expression of T-DNA genes in infected leaves (Janssen and Gardner 1990), no matter if the transferred ss T-DNA is the coding or noncoding strand (Narasimhulu et al. 1996). In addition, experiments involving homologous recombination between extrachromosomal T-DNA constructs delivered as noncomplementary strands suggested that at least one of the ss T-DNA constructs must have been converted to ds T-DNA prior to recombination (Offringa et al. 1990). Recently, Dafny-Yelin et al. (2015) showed that blocking ss T-DNA to ds T-DNA conversion reduced T-DNA gene expression. Therefore, although there is no question that extrachromosomal ds T-DNAs exist in plants immediately after Agrobacterium infection, the question remains whether they are the only, or the predominant, intermediates in the integration process.

Identification of plant components that are important for T-DNA integration can provide more clues regarding the form, or the predominant form, of the T-DNA intermediate during integration. For example, evidence that Ku70/80 heterodimer is important for integration may support the ds T-DNA theory because Ku70/80 is involved in nonhomologous end joining (NHEJ) between ds DNA ends (Critchlow and Jackson 1998). However, studies to identify the plant components important for T-DNA integration are still ongoing (discussed below).

3.2 What Are the Bacterial and Plant Factors Involved in T-DNA Integration?

The major approaches to identifying the proteins involved in the process of Agrobacterium-mediated transformation are forward and reverse genetics. These approaches have led to the identification of bacterial and plant factors involved in the transformation process and its last step of T-DNA integration. Experimental assays can distinguish between a block in T-DNA transfer, a step prior to actual integration, from a block in T-DNA integration. The principle allowing this distinction is that mutants blocked in stable T-DNA integration but not T-DNA transfer will be able to transiently express genes in plant cells, but not generate stable transgenic plants or plant calli. Therefore, many of the proteins involved in T-DNA integration, including the specific protein domains important for this process, have been identified by this principle coupled with protein localization and protein–protein interaction studies. In addition, large-scale screens in Arabidopsis have been conducted to identify host proteins involved in Agrobacterium transformation (Zhu et al. 2003; Anand et al. 2007a; Gelvin 2010). When considering the commonly used experimental methods to identify genes involved in stable T-DNA integration, a possible scenario should be noted. If T-DNA randomly integrates into and mutates a gene leading to increased gene silencing, fewer stable transgenic event would be recovered through selection if the selection gene has been silenced. Thus, a mutant may show reduced stable transformation, whereas stable T-DNA integration occurs at same or even higher rate as wild-type plants. This was demonstrated by Park et al. (2015) who analyzed T-DNA integration biochemically. Therefore, T-DNA integration does not necessarily equate to stable transformation.

Several lines of evidence suggest that plant factors mostly, if not entirely, mediate the process of T-DNA integration. First, there are not many Agrobacterium candidate proteins that can be involved in the process because T-DNA itself does not encode proteins that are required for T-DNA integration and only a few Vir proteins are known to be transported into the plant nucleus. Second, DNA sequencing of T-DNA/plant DNA junctions suggests that integration occurs through the same pathways responsible for DNA end-joining repair by the host plant cell (i.e., illegitimate/nonhomologous recombination). Another support for the notion that the host cell is responsible for T-DNA integration comes from results of Agrobacterium-mediated transformation of yeast (Bundock et al. 1995). In yeast, T-DNA can integrate via homologous recombination, a major pathway of DSBs repair that is used by this organism to repair DSBs, if sufficient homology between T-DNA and yeast sequences exists. Third, foreign DNA can be introduced into plant cells by other methods that do not include Agrobacterium, such as electroporation, polyethylene glycol, and particle bombardment transformation. By these methods, the introduced DNA integrates through illegitimate/nonhomologous recombination into the genome, demonstrating that the plant’s own DNA repair machinery can potentially accomplish the task of T-DNA integration without the assistance of foreign genes (for review, see Somers and Makarevitch 2004). As ongoing studies are improving our understanding of the mechanisms and pathways behind DNA DSB repair in plants, we anticipate a better understanding of how plant factors facilitate T-DNA integration.

DNA end joining during DSBs repair is less understood in plants in comparison to yeast or mammalian cells. However, it is known that the major pathway of DNA DSBs repair in plants is the nonhomologous end joining (NHEJ) pathway, which is the major pathway for DSB repair in higher eukaryotes. The NHEJ pathway includes the key heterodimer Ku70/Ku80 that binds double-stranded DNA ends formed by the DSBs (Critchlow and Jackson 1998). Several studies investigated the role of Ku80, both in repairing DNA DSBs in plants and in T-DNA integration. Friesner and Britt (2003) reported that Ku80-deficient plants are more sensitive to DSB inducing gamma radiation and are reduced in T-DNA integration rates. The results of Friesner and Britt (2003) supported the involvement of the NHEJ repair pathway in T-DNA integration. Li et al. (2005b) further demonstrated that Ku80 is important for T-DNA integration. First, overexpression of Ku80 in plants enhanced T-DNA integration, whereas Ku80-deficient plants were deficient in T-DNA integration. In addition, Ku80 interacted with ds T-DNA in planta, as demonstrated by immunoprecipitation experiments. Both of these studies were done in Arabidopsis plants, and results by Jia et al. (2012) and Mestiri et al. (2014) also support the notion that Ku proteins are involved stable T-DNA integration in Arabidopsis. Moreover, in rice, knockdown of the Ku70/80 heterodimer also confirmed reduced stable transformation rates (Nishizawa-Yokoi et al. 2012). On the other hand, contradictory results have been presented by other research groups. Gallego et al. (2003) found that whereas Ku80 has a role in NHEJ in Arabidopsis plants, a Ku80-deficient plant was not deficient in T-DNA integration. Park et al. (2015) examined a set of the NHEJ mutant genes in Arabidopsis, including Ku80 and Ku70, and determined that deficiency in NHEJ proteins increased the rate of T-DNA integration. According to the authors of that study, the contradictory results can be explained by increased random DNA DSBs in the plant genome that results from deficiency in NHEJ proteins. This results in T-DNA having more available target sites for integration. Therefore, T-DNA integration rate could be affected either way from a deficiency in NHEJ factors: whereas it may be enhanced from increased availability of genomic DSBs as a result of deficiency in NHEJ factors, the integration rate may also be reduced because of reduced ligation ability of the T-DNA into DSBs.

Similar conflicting results have been obtained for Ligase IV, another key component of the NHEJ pathway. Whereas the importance of Ligase IV for NHEJ DNA repair has been demonstrated in plants (Friesner and Britt 2003; van Attikum et al. 2003), Ligase IV has been shown to be both dispensable (van Attikum et al. 2003; Park et al. 2015) and involved but nonessential (Friesner and Britt 2003; Nishizawa-Yokoi et al. 2012) for T-DNA integration. Other components of the NHEJ pathway have been shown to be either required (Jia et al. 2012) or dispensable (Park et al. 2015; Vaghchhipawala et al. 2012; Mestiri et at. 2014) for T-DNA integration (reviewed in Saika et al. 2014).

Alternative NHEJ pathways, such as the microhomology-mediated end joining (MMEJ), have received increasing attention in recent years (reviewed in Wang and Xu 2017). Mestiri et al. (2014) showed that mutations in several alternative NHEJ pathway genes reduced T-DNA integration. In addition, a quadruple Arabidopsis mutant disabling several end-joining pathways, including NHEJ, was severely compromised in Agrobacterium-mediated transformation. However, that T-DNA integration still occurred suggests additional pathways. On the other hand, Park et al. (2015) found that T-DNA integration was not reduced, but increased, in a parp1 mutant. Finally, disabling another MMEJ component, polymerase theta (pol θ), completely eliminated T-DNA integration according to van Kregten et al. (2016). However, Gelvin’s group found that these same mutants can still be transformed at ~20% of wild-type levels (personal communication). Furthermore, a recent analysis of filler DNA at T-DNA junctions provided more support for an MMEJ mechanism acting at the LB end of the T-DNA (van Kregten et al. 2016).

The discrepancies of some of the results of these studies may be the result of different experimental conditions when measuring transient and stable T-DNA transformation, as well as conflating stable transformation with T-DNA integration. More interestingly, these discrepancies may point to other alternative pathways that are active under different conditions, such as tissue type and developmental stage.

Other plant proteins that have been identified as being involved or as affecting T-DNA integration include proteins that are involved in the chromatin structure or proteins that direct the T-DNA to the chromatin (for review, see Magori and Citovsky 2011). In particular, evidence suggests that histones play an important role in T-DNA integration. Several studies demonstrated that plants deficient in different histones were reduced in T-DNA integration rate, whereas overexpression of histones resulted in increased stable transformation and T-DNA integration (Mysore et al. 2000; Yi et al. 2002, 2006; Anand et al. 2007b; Iwakawa et al. 2017). In addition, a domain in VIP1 has been shown to be important for the interaction with histone proteins and for stable transformation (Li et al. 2005a). Finally, VIP2 a transcriptional regulator influencing histone mRNA levels has been shown to be important for T-DNA integration (Anand et al. 2007b).

The bacterial factors that are potential candidates to be involved in T-DNA integration are limited to those that are secreted into the plant cells. They include VirE2, VirE3, VirF, VirD5, and VirD2 (Vergunst et al. 2000; Schrammeijer et al. 2003; Vergunst et al. 2005). Indirect evidence for involvement of Vir proteins in T-DNA integration can be derived from the patterns of T-DNA integration. If comparing T-DNA integration to integration of foreign DNA delivered by other non-Agrobacterium methods, T-DNA integration is usually much more efficient. Also, although T-DNA integration can result in complex insertions, these are usually considered more “simple” and precise compared to insertions produced via other methods (Hu et al. 2003; Makarevitch et al. 2003; Travella et al. 2005). These observations may suggest that T-DNA integration, in contrast to DNA delivered by other methods, uses another bacterial factor or factors in addition to the host DNA repair machinery to facilitate integration. However, it is also possible that this may be merely a result of more efficient nuclear localization due to VirD2 piloting the T-DNA, or the protection of T-DNA from degradation through VirE2 coating.

The bacterial candidate important for T-DNA integration that has been studied most extensively is VirD2, because it is transferred into the nucleus while attached to the 5ʹ end of the single-stranded T-DNA (T-strand). The earliest support for VirD2 involvement in integration was provided in an in vitro assay showing that VirD2 has an ability to rejoin ends from the cutting reaction (Pansegrau et al. 1993). Therefore, it has been suggested that the 5ʹ end of a T-DNA is ligated to the plant DNA via VirD2. Potentially supporting the notion that VirD2 has a ligase-like activity in planta, Tinland et al. (1995) reported a VirD2 mutant (R129G) that resulted in reduced precision of the RB side after T-DNA integration. However, this mutation did not reduce the efficiency of T-DNA integration, suggesting that the loss of precision may be only due to VirD2’s role in protecting the 5ʹ end. Moreover, a different in vitro study rejected a general ligation activity of VirD2 (Ziemienowicz et al. 2000).

A better understanding of the potential role of VirD2 in T-DNA integration required investigating the different VirD2 domains. Whereas the N-terminal region of VirD2 contains a relaxase domain that is important for border nicking in Agrobacterium (Ward and Barnes 1988), the C-terminal domain contains three regions: a DUF domain, a bipartite NLS, and an omega (Ω) domain. The role of the DUF domain has been shown to be delivery of the ss T-DNA through the T4S system (van Kregten et al. 2009), whereas the role of the bipartite NLS domain is in nuclear transport (Howard et al. 1992; Shurvinton et al. 1992; Tinland et al. 1992, 1995; Rossi et al. 1993; Bravo-Angel et al. 1998; Mysore et al. 1998; van Kregten et al. 2009). The Ω domain has been shown to be important for tumorigenesis (Shurvinton et al. 1992; Bravo-Angel et al. 1998); however, its involvement in T-DNA integration is undetermined. Several reports have shown that a deletion or substitution mutation at the Ω domain reduced T-DNA integration to about 1–4% of the wild-type T-DNA rate (Shurvinton et al. 1992; Narasimhulu et al. 1996; Mysore et al. 1998), whereas the T-DNA transfer rate is reduced to only 20–30% of the wild-type rate (Narasimhulu et al. 1996; Bravo-Angel et al. 1998; Mysore et al. 1998). However, Bravo-Angel et al. (1998) and van Kregten et al. (2009) concluded that the Ω domain has no role in integration. Moreover, inducible expression of VirD2 in plants reduced the transformation efficiency (Hwang et al. 2006). Therefore, it is still controversial if VirD2 or any of the other bacterial Vir proteins have a direct role in T-DNA integration.

Recently, Zhang et al. (2017) showed that in yeast, VirD5 localizes to the centromeres/kinetochores in the nucleus and causes chromosome instability. The authors also showed that VirD5 inhibited cell growth in yeast and also in plants. Therefore, whether VirD5 is involved somehow in T-DNA integration is an interesting question.

3.3 What Is the Genomic Site Prerequisite for T-DNA Integration?

Large-scale analysis of T-DNA insertions has shown that insertions are distributed randomly among the plant chromosomes (e.g., Alonso et al. 2003; Sallaud et al. 2004). At the chromosome and gene level, there may be a distribution bias, although this is controversial. It has been suggested that T-DNA integrates preferably at genomic regions that are actively transcribed because T-DNA insertions are generally found more frequently at 5ʹ and 3ʹ regions of genes, but less frequent at regions closer to the centromeres and telomeres (Brunaud et al. 2002; Szabados et al. 2002; Alonso et al. 2003; Chen et al. 2003; An et al. 2003; Sallaud et al. 2004; Li et al. 2006; Zhang et al. 2007). A plausible explanation is that during transcription genomic DNA is more “open” and therefore more accessible to incoming T-DNA molecules. Indeed, it has been shown that T-DNA integration sites are preferably found in A-T-rich regions that have a relatively lower DNA duplex stability (e.g., Brunaud et al. 2002; Chen et al. 2003). In addition, a component of the T-complex may interact with host factors, such as a TATA-binding protein, that are involved in gene transcription (Bako et al. 2003). This way, they can guide the T-DNA to actively transcribed regions. Active regions may also be more prone to DNA damage, such as DSBs, and this may create “hot spots” for DNA repair factors and T-DNA integration.

On the other hand, results from previous large-scale studies may have been biased by the experimental method that relied on marker-based selection and regeneration of plants. If not selected, T-DNA integration events may be excluded from a studied collection. In most studies, the analysis is based on selection via the T-DNA’s own marker gene, such as antibiotic or herbicide resistance. However, if T-DNA integrates but the marker gene is not expressed, plants will not survive selection and therefore will not be included in the studied collection. In this regard, Francis and Spiker (2005) showed that in about 30% of transformed plants T-DNA genes are not transcribed. Furthermore, it has been shown that when not applying selection to detect T-DNA insertions, they are distributed randomly and are equally represented in centromeric and telomeric regions (Francis and Spiker 2005; Kim et al. 2007; Shilo et al. 2017). In addition, T-DNA integration can be mutagenic and therefore can disrupt genes that are essential for selection and recovery of plants. However, this likely occurs in a relatively small number of cases.

Different events have been proposed to stimulate the integration of T-DNA into specific sites in the plant DNA. The events include single-strand DNA nicks in the plant DNA, a relaxed duplex DNA forces that allow “invasion” of a T-DNA to the plant DNA, and genomic double-strand breaks (DSBs). Early models by Gheysen et al. (1991) and Mayerhofer et al. (1991) suggested that a nick in the plant DNA is first generated (Fig. 4a). This nick is later converted, via 5ʹ to 3ʹ exonuclease activity, into a gap (the “single-strand gap-repair” model). The LB and RB sides of a single-stranded T-DNA can anneal to the plant DNA at this gap through microhomologies and initiate integration. Revision of this model postulated that instead of annealing to DNA within a gap, the LB side invades and anneals to regions of microhomology at the plant DNA. This may happen more often at A-T-rich regions due to lower duplex stability (Tinland et al. 1995; Brunaud et al. 2002) (Fig. 4b). Recently, a link between T-DNA integration and genomic DSBs has become increasingly accepted (for review, see Magori and Citovsky 2011). It has been suggested that genomic DSBs are the prerequisite for T-DNA integration (Fig. 4c). The breaks may be spontaneous and may occur randomly in the genome under natural conditions. Extrachromosomal T-DNA molecules may be directed to DSBs, likely guided by host DNA repair proteins, and possibly also Agrobacterium proteins of the T-complex. Direct support for this model is that T-DNA can be directed to integrate into artificiality induced genomic DSBs (Salomon and Puchta 1998; Tzfira et al. 2003; Chilton and Que 2003; Zhang et al. 2018). Muller et al. (2007) reported that T-DNA insertions are found more frequently near or at palindromic sequences in the plant genome. This observation supports integration of T-DNA into genomic DSBs because palindromic regions are often found at sites of DSBs repair in plants (Muller et al. 1999) and, therefore, may be more susceptible to breaks due to their secondary structure. In addition, induction of genomic DSBs by irradiation increases integration of foreign DNA into plant genome (Kohler et al. 1989). Another possibility is that under natural conditions Agrobacterium can induce DSBs in order to facilitate T-DNA integration, as microbial pathogens have been shown to trigger host DNA DSBs in plants (Song and Bent, 2014). However, currently there is no evidence for such an activity induced by any of the Agrobacterium virulence factors.

Fig. 4
figure 4

Adapted from Singer (2013) doctoral dissertation

Possible genomic pre-conditions for T-DNA integration. a Nick (later expanded into a gab). b Relaxed duplex DNA region. c Double-strand break (DSB).

3.4 What Is the Spatial/Temporal Arrangement of T-DNA During Integration?

The T-DNA region of natural Agrobacterium strains is 5-25 kbp in length (Barker et al. 1983; Suzuki et al. 2000). The size range of engineered T-DNA constructs used in laboratory strains may be similar, but also T-DNA constructs of up to 150 kbp can be successfully transferred and integrated into the plant genome (Hamilton et al. 1996). On the other hand, T-DNA can integrate into the plant genome with remarkably minimal damage to the plant DNA. For example, Meza et al. (2002), Windels et al. (2003), and Kleinboelting et al. (2015) reported that in the T-DNA collections they chose to analyze, most sequenced integration events (sites in which both LB and RB T-DNA junctions with the genomic DNA had been sequenced) showed a deletion of 100 bp or less of plant genomic DNA bordering the integration site. This result raises the question how T-DNA is spatially arranged during the process of integration, considering that T-DNA is a large molecule in comparison to the small integration site (Fig. 5a). The current popular models do not provide an explanation for this question. The temporal mode of T-DNA integration, on the other hand, has been discussed in early models for T-DNA integration. The model proposed by Tinland (1996) suggests that the LB side of T-DNA interacts with the plant DNA first, following which the RB side attaches to plant DNA (Fig. 5b). That the LB is the initiator of integration is based on the observations that T-DNA insertions share higher degree of microhomology at the T-DNA LB side with the plant DNA junctions, in comparison to the RB side (Tinland et al. 1995; Brunaud et al. 2002; Kim et al. 2003; Zhu et al. 2006; Thomas and Jones 2007). However, several studies reported similar frequencies of microhomologies at both ends (Meza et al. 2002; Windels et al. 2003; Forsbach et al. 2003; Kleinboelting et al. 2015). Therefore, Meza et al. (2002) proposed that in some cases the RB side is the first side to initiate integration into the plant DNA (Fig. 5c).

Fig. 5
figure 5

Adapted from Singer (2013) doctoral dissertation

Possible spatial arrangements of T-DNA during integration. a T-DNA, at a size of a few kbp DNA, integrates in most cases without causing major deletions at the target genomic site. b LB first model suggests that the LB anneals first via microhomology. c RB first model suggests that the RB anneals first via microhomology. d LB and RB are in close proximity during integration.

Interestingly, the Muller et al. (2007) analysis of T-DNA/plant junctions revealed that T-DNA integration also involves microhomologies in inverted orientation. Based on this finding, these authors proposed that the LB and RB sides of a T-DNA strand anneal to plant DNA simultaneously via microhomologies and that T-DNA ends are in close proximity during integration. However, the model of Muller et al. (2007) does not explain how and when the two ends of a T-DNA are brought into close proximity. The discovery of T-DNA circles (T-circles) provides a possible explanation (Singer et al. 2012), as circular double-stranded T-DNA in plants contains LB and RB sequences ligated extrachromosomally. These results suggest that the LB and RB ends are recognized by plant DNA repair factors and that these factors pull the two ends of a T-DNA toward each other before integration (Fig. 5c). Therefore, it is possible that double-stranded T-DNA approaches the plant genome with the LB and RB sides already in close proximity (Figs. 5d and 6).

Fig. 6
figure 6

Illustration adapted from Singer (2013) Doctoral dissertation

A proposed model for T-DNA integration. Schematic illustration of double-stranded T-DNA (gray lines) and double-stranded plant DNA (black lines) during T-DNA integration into the plant genome. T-DNA is arranged in a looped mode in which the left border (LB) end and the right border (RB) end are in close proximity. The LB end has a 3ʹ single-stranded overhang that aligns via short microhomologies to the plant 3ʹ overhang through the Mre11, Rad50, and Xrs2 (MRX) complex of the MMEJ repair pathway. The dashed line represents a region of template-dependent DNA synthesis of the complementary strand. The RB end, with VirD2 covalently attached to the 5ʹ end, aligns to the plant DNA through the Ku70/80 mediated NHEJ DNA repair pathway.

3.5 Why and How Do Complex T-DNA Insertions Form?

As mentioned above, T-DNA integration is not a “precise” or “clean” process. Early studies indicated that T-DNA integration can often result in complex T-DNA insertions (Ooms et al. 1982; Kwok et al. 1985; Spielmann and Simpson 1986; Gheysen et al. 1987; Grevelding et al. 1993; Ohba et al. 1995). A complex T-DNA insertion can include, in addition to T-DNA, DNA sequences from various sources. The DNA sequences can be derived from the Agrobacterium binary or Ti plasmid (Martineau et al. 1994; Kononov et al. 1997) or even from bacterial chromosomal DNA (Ulker et al. 2008; Kleinboelting et al. 2015). In addition, plant DNA at the site of integration may be re-arranged and include duplications of plant DNA sequences that were not part of the original pre-integration genomic site (Gheysen et al. 1987; Takano et al. 1997; Windels et al. 2003; Kleinboelting et al. 2015). Also, several copies of T-DNA, or parts of the T-DNA sequence, are often clustered together at the integration site (Jorgensen et al. 1987; De Neve et al. 1997). In some cases, the additional DNA sequence that is found at the insertion site does not have any homology to a known DNA sequence. This kind of DNA is termed “filler” DNA, a term that is also used to describe additional DNA at DSB repair sites. The term “filler” is also used to describe additional DNA sequences that share homology with known DNA, such as DNA that is homologous to plant or Agrobacterium DNA (Gheysen et al. 1987; Gorbunova and Levy 1997; Windels et al. 2003; Kleinboelting et al. 2015).

The formation or appearance of the different DNA sequences that accompany complex T-DNA insertions can be explained in several ways. Therefore, each complex insertion can be explained by a different mechanism or by a combination of mechanisms. Nevertheless, it is possible to distinguish between two general types of potential sources for the DNA that is found in complex T-DNA insertions. The first type includes DNA fragments that are present in the nucleus at the time of integration, which may also be described as “free-floating” DNA fragments. The free DNA fragments can ligate with T-DNA prior to or during integration and form complex insertions. The second type includes DNA that is synthesized during the process of DNA repair in the plant nucleus. During DNA repair and ligation, synthesis of DNA can occur using random DNA sequences as templates. This process is also known as synthesis-dependent strand annealing (SDSA). It involves a single-stranded DNA strand invading random DNA sequence in cis (DNA from the same molecule) or in trans (DNA from a different molecule), using it as a template, and often switching between different templates. Filler DNA also characterizes ds DNA end joining in higher eukaryotes (Gorbunova and Levy 1997; Salomon and Puchta 1998). It is difficult to determine whether a specific DNA sequence in a complex insertion is the result of ligation between free existing DNA fragments or the result of DNA synthesis. However, as discussed below, in many cases it is possible to surmise the origin of the DNA sequence from the sequence identity, length, and overall arrangement in the complex structure.

Ligation between free extrachromosomal DNA fragments is likely when the DNA sequence can be traced to Agrobacterium chromosomal DNA, pTi, or binary plasmid DNA sequences. In many instances, T-DNA is transferred together with the backbone of the parent plasmid, termed a “read- through” transfer, due to incorrect processing of the T-DNA borders in the Agrobacterium (Kononov et al. 1997; Wenck et al. 1997). However, non-read through Agrobacterium DNA often found in complex T-DNA insertions may be transferred from Agrobacterium independently and ligated to T-DNA molecules in plants before integration, or alternatively, transferred from Agrobacterium already linked to T-DNA. Clusters of two or more T-DNAs probably result from T-DNA molecules that were transferred independently, ligated into the plant nucleus, and then integrated. De Neve et al. (1997) provided compelling evidence supporting this notion by transforming plants simultaneously with different Agrobacterium strains that contained different T-DNA constructs. The authors showed that the two types of T-DNAs can integrate adjacent to each other. Similarly, Singer et al. (2012) isolated extrachromosomal T-DNA structures composed of T-DNA originating from two different Agrobacterium strains.

Synthesis-dependent strand annealing (SDSA) is likely the mechanism that accounts for other regions of DNA at the junctions between end-joined DNA fragments. This mechanism can sometimes generate a patchwork of short sequences resulting from consecutive template switches (Gorbunova and Levy 1997; Salomon and Puchta 1998; van Kregten et al. 2016). These sequences can be identical to those of T-DNA or plant DNA; therefore, it is a matter of debate whether a specific DNA fragment is a broken fragment of molecule patched together with another DNA or a new synthesis product. The recent discovery that DNA polymerase θ (pol θ) is involved in T-DNA integration (van Kregten et al. 2016) supports the latter, as pol θ is associated with microhomology annealing and low-fidelity DNA synthesis (Wang and Xu 2017). The shorter the sequence and the more it is “scrambled”, the more likely it is a synthesis product.

T-DNA insertions, where T-DNA copies are arranged adjacent to each other in clusters, may also result from T-DNA replication after transfer. In that case, the replicated T-DNA copy integrates adjacent to its template, as proposed by Van Lijsebettens et al. (1986) and Jorgensen et al. (1987) based on analyzing structures of integration events that include adjacent T-DNAs. In some cases, a pair of adjacent T-DNAs shared the same truncation point at their ends. Therefore, a truncated T-DNA replicated to produce another identical copy with the same truncation. T-DNA replication has been supported by statistical analysis of co-transformation and integration of different T-DNAs at the same locus (De Buck et al. 2009).

Truncations of T-DNA ends, especially at the LB side, are another common pattern of T-DNA integration. There may be several reasons for T-DNA insertion having more severe truncations at the LB side. First, T-DNA is transferred from its RB side piloted by VirD2; therefore, the LB side may be more prone to incorrect processing or breaks during the transfer process. VirD2 attached to the 5ʹ end of the T-DNA may protect the RB side from exonuclease activity, whereas the LB is exposed to such activity. Second, in the plant nucleus, some of the LB side of T-DNA may be lost during synthesis of a complementary strand (Liang and Tzfira 2013). Synthesis of the complementary strand cannot start from the end of the LB, because the LB side is the 3ʹ end, whereas synthesis is from the 5ʹ to 3ʹ end and requires priming. Third, the LB side may be lost in the process of integration when the single-stranded LB anneals through microhomologies to the plant genome (or another T-DNA). Microhomology usually resides in a region internal to the LB end; in that case, the remaining LB side that is not annealed may be degraded and lost (Tinland 1996).

4 A Proposed Model

A T-DNA that is transferred as a linear DNA molecule may circularize via end joining between its LB and RB ends, thus generating a T-circle with head-to-tail end joining (Singer et al. 2012). Throughout different experiments, the majority of the detected T-circles were cases of simple end joining between the LB and RB sides of a single T-DNA, with some small deletions or additions of DNA (Singer and Gelvin, unpublished data). On the other hand, in some cases, T-circles were multimers comprised of several T-DNA molecules or other complex structures. Interestingly, when two T-DNAs are involved in an end-joining event, T-DNA ends preferably ligate tail-to-tail and head-to-head (unpublished data). These results suggest that when the LB and RB sides of a T-DNA are brought into close proximity, a process that is likely mediated by the plant DNA repair pathways, the LB and RB ends are not favorably ligated to each other. This condition can favor T-DNA integration if the T-DNA is situated next to a plant DNA double-stranded break, because the T-DNA ends may prefer to ligate to the plant DNA instead of ligating to themselves. A T-DNA LB–RB ligation event may not occur if the RB side is a blunt end that preferably utilizes an NHEJ DNA repair pathway, whereas the LB side is a 3ʹ overhang that preferably utilizes MMEJ DNA repair pathway.

It is unlikely that circularized (and ligated) T-DNAs are intermediates of T-DNA integration because T-DNAs after integration generally maintain the original linear left and right borders, whereas integration of a circular molecule will inevitably result in T-DNA with circularly permuted, random borders. On the other hand, an integration model involving a linear T-DNA in which the two ends are positioned at the opposite poles of molecule is also not likely, because precise and efficient T-DNA integration may require having the two ends of the T-DNA in close proximity.

Therefore, in the model presented T-DNA is proposed to integrate as a double-stranded DNA molecule that is spatially arranged in a looped form (Fig. 6). A looped configuration in which T-DNA ends are in close proximity can explain how T-DNA is often inserted into the genome without the target genomic sites suffering major deletions (Meza et al. 2002; Windels et al. 2003). The exposed T-DNA ends are likely brought together during the initial stage of the repair process by the DNA repair factors coating the ends. VirD2 may also be involved in bringing the T-DNA ends together, as purified VirD2 in vitro has been shown to catalyze end-joining reactions with single-stranded T-border DNA (Pansegrau et al. 1993). These factors may also facilitate the targeting of T-DNA ends to chromosomal sites where DNA repair occurs, such as sites of random genomic DSBs. Integration into these sites occurs when the ends of a T-DNA do not end join to each other to generate a T-circle, but instead end join with the chromosomal DNA (Fig. 6). It should be noted that the proposed model is simplified and does not explain other different outcomes of integration. For example, the frequent formation of filler DNA can be explained by synthesis activity of pol θ (van Kregten et al. 2016). Integration of other more complex structures can occur similarly following their formation extrachromosomally.

Whereas the proposed model suggests a spatial arrangement of T-DNA during integration, it also speculates that the two T-DNA ends utilize different DNA repair pathways for integration into the plant genome. The involvement of different repair pathways in T-DNA integration can explain the conflicting evidence regarding the importance of some key components of DNA repair pathways. It can also explain the tendency of T-DNA ends to generate LB–LB and RB–RB junctions. However, testing this model will require further biochemical and genetic experiments.

5 Conclusions

Agrobacterium tumefaciens remains the main vector used by plant biologists to genetically transform plants. However, there are still many questions to be answered in order to understand the mechanism of T-DNA integration. Because most of the questions presented in this review are interrelated, understanding T-DNA integration will require different experimental approaches to answer the different questions. In particular, because T-DNA integration most likely relies mostly on plant host factors, a further understanding of pathways of DNA repair in plants is important for improving the understanding of T-DNA integration.