L1 retrotransposition in the soma: a field jumping ahead
Retrotransposons are transposable elements (TEs) capable of “jumping” in germ, embryonic and tumor cells and, as is now clearly established, in the neuronal lineage. Mosaic TE insertions form part of a broader landscape of somatic genome variation and hold significant potential to generate phenotypic diversity, in the brain and elsewhere. At present, the LINE-1 (L1) retrotransposon family appears to be the most active autonomous TE in most mammals, based on experimental data obtained from disease-causing L1 mutations, engineered L1 reporter systems tested in cultured cells and transgenic rodents, and single-cell genomic analyses. However, the biological consequences of almost all somatic L1 insertions identified thus far remain unknown. In this review, we briefly summarize the current state-of-the-art in the field, including estimates of L1 retrotransposition rate in neurons. We bring forward the hypothesis that an extensive subset of retrotransposition-competent L1s may be de-repressed and mobile in the soma but largely inactive in the germline. We discuss recent reports of non-canonical L1-associated sequence variants in the brain and propose that the elevated L1 DNA content reported in several neurological disorders may predominantly comprise accumulated, unintegrated L1 nucleic acids, rather than somatic L1 insertions. Finally, we consider the main objectives and obstacles going forward in elucidating the biological impact of somatic retrotransposition.
KeywordsLINE-1 L1 Genomics Retrotransposon Mosaicism Neurobiology
Copy number variation
Clustered regularly interspaced short palindromic repeats
Enhanced green fluorescent protein
- LINE-1 (or L1)
Long interspersed element-1
Open reading frame
Single nucleotide variant
Target-primed reverse transcription
Target site duplication
Transposable elements (TEs) and their mobilization in somatic cells were first described by Barbara McClintock’s celebrated research on Ac/Ds loci in maize . In the intervening 70 years, somatic transposition (“cut-and-paste”) and retrotransposition (“copy-and-paste”) of TEs has been reported throughout the tree of life, including, for example, in plants [2, 3], insects [4, 5, 6, 7], rodents [8, 9, 10] and primates . By definition, mosaic TE insertions are present in at least one, but not all, cells from an individual. New TE insertions, or the deletion of existing TE insertions , may generate germline as well as somatic mosaicism. Indeed, the primary milieu for heritable LINE-1 (L1) retrotransposition in mammals is the early embryo , where new L1 insertions can enter the germline and contribute genetic diversity to offspring [14, 15, 16, 17] whilst potentially also causing somatic mosaicism in the original host [8, 10, 11, 18]. As embryonic development continues, L1 mobilization appears to become more lineage-restricted, perhaps to the extent that only neurons and their progenitor cells support endogenous L1 activity [19, 20, 21]. Somatic L1 retrotransposition may therefore be an evolutionary byproduct of TEs being active in the developmental niches most likely to spread new copies of themselves to as many germ cells as possible, combined with an inability to prohibit L1 activity in some committed lineages [20, 21, 22]. We presently lack compelling evidence to reject the null hypothesis that somatic retrotransposition in normal cells is of little consequence to human biology. Intriguing experimental data do however show that L1 activity is elevated coincident with environmental stimuli [23, 24, 25] and, more extensively, in psychiatric and neurodevelopmental disorders [26, 27, 28, 29]. As a summary view, we propose that retrotransposons can cause somatic mosaicism in mammals, yet the frequency, spatiotemporal extent, biological impact, and molecular processes regulating this phenomenon remain poorly defined.
The vast majority of highly active, or “hot”, human donor L1s belong to the L1-Ta subfamily [33, 38, 39, 80, 81, 82, 83] and fewer than 10 hot L1s are present in each individual . These hot elements are usually highly polymorphic, with millions of donor L1 alleles potentially yet to be found in the global population [14, 38, 39, 76, 83, 84, 85]. Approximately 1 in 150 individuals harbors a new L1 insertion . By contrast, three L1 subfamilies (TF, GF, A), defined by their monomeric 5′ promoter and ORF1 sequences, remain retrotransposition-competent in the mouse germline [16, 17, 40, 41, 42, 43, 87, 88, 89, 90]. At least 1 in 8 pups carries a new L1 insertion in inbred C57BL/6 J mice [13, 18]. As for human L1s, internal mutations can strongly influence the mobility of individual mouse L1s [40, 72, 91, 92]. Although the mouse genome contains many more full-length L1s with intact ORFs than the human genome , it is unknown whether mouse L1 retrotransposition potential is concentrated in a similarly small proportion (< 10%) of elements. The distinct promoter sequences driving L1 transcription in mouse and human, and associated differences in their regulation, may also result in divergent spatiotemporal patterns of L1 expression.
Host factors that regulate L1 mobilization
Several proteins inhibit L1 transcription. MeCP2 binds methylated cytosines in the CpG island core of the L1 promoter [27, 47, 75, 105]. MeCP2 occupancy prevents cytosine hydroxymethylation and L1 de-repression by the activator TET1, and facilitates the recruitment of methyltransferases affixing the repressive chromatin mark H3K9me3 [104, 285, 286]. Other factors such as KAP1, the HUSH complex and MORC2 bind and silence full-length L1s, including those located in euchromatic genomic regions, again via deposition of repressive marks [96, 103, 106, 285]. Another key repressor, SOX2, is a transcription factor that inhibits neuronal gene and L1 expression during development. Neuronal maturation requires SOX2 down-regulation, which may explain the potential specificity of L1 mobilization in neurons [20, 22]. By contrast, the transcription factors RUNX3 and YY1 assist L1 transcription and retrotransposition [69, 207]. Although the mechanism for L1 activation by RUNX3 is unclear, YY1 appears to direct transcriptional initiation to the correct (+ 1) L1 start site, and may also support loops involving the L1 promoter and enhancer elements [69, 287]. Numerous factors  also repress L1 at the post-transcriptional level, and may each do so in multiple ways. For example, the adenosine deaminase ADAR1 inhibits L1 mobilization via editing dependent and independent activities, which could involve binding to the L1 RNP [57, 245, 288, 289]. The exonuclease TREX1 has been shown to inhibit retrotransposition in vitro by depleting L1 ORF1p and altering its subcellular localization , whereas SAMHD1 inhibits L1 mobilization by limiting the availability of intracellular nucleotides and other mechanisms [244, 246]. Finally, during L1 integration, TPRT intermediate DNA-RNA hybrids are targeted by host factors, such as APOBEC3A, which deaminates transiently exposed single-stranded DNA .
Engineered L1 mobilization during neuronal differentiation
Engineered L1-EGFP insertions lacking an intact EGFP sequence due to severe 5′ truncation, as well as those affected by epigenetic silencing of the heterologous promoter driving EGFP expression [19, 21, 126], can result in GFP− cells where retrotransposition has actually taken place (Fig. 2) . As a further caveat, an EGFP-tagged human L1 introduced as a transgene is also likely not subject to the same host factor control as exerted in its native genome. Engineered L1 reporter systems [9, 10, 46, 71, 90, 127, 128] can still provide proof-of-principle evidence that the L1 machinery may enact retrotransposition of L1 and other TEs [31, 32, 34, 79, 129] in a given spatiotemporal context, although, to our knowledge, Alu or SVA trans mobilization by L1 is yet to be demonstrated in primary neurons or neuronal precursor cells. Engineered L1 systems have nonetheless predicted, with substantial success, L1 activity in cells where endogenous L1 mobilization was later confirmed by genomic assays, as for example in the case of the brain.
What is the frequency of endogenous L1 retrotransposition in neurons?
Endogenous L1 retrotransposition is established to occur in mammalian neurons (for reviews, see [35, 67, 130, 131, 132]). This conclusion is based on genomic analysis of “bulk” brain tissue [20, 133] and individual neural cells, with the latter requiring whole-genome amplification (WGA) [134, 135, 136, 137] or reprogramming via nuclear transfer followed by clonal cell amplification . Exemplary somatic L1 insertions reported to date include two events carrying 5′ or 3′ transductions [36, 68], which were recovered from individual human cortical neurons through WGA followed by whole-genome sequencing (WGS) . Subsequent insertion site-specific PCR amplification and capillary sequencing revealed structural hallmarks consistent with retrotransposition by TPRT . Analyses employing WGA and targeting human L1-genome junctions have also recovered neuronal L1 insertions [134, 135, 137]. Using an orthogonal approach, and in mouse, Hazen et al. applied WGS to stem cell clones reprogrammed via nuclear transfer of olfactory neuron nuclei, and again found somatic L1 insertions mediated by canonical TPRT . Impressively, this work identified 4 somatic L1 insertions in only 6 reprogrammed neuronal clones, with a false negative rate of at least 50%  as mouse L1 3′ ends are depleted in Illumina sequencing [18, 35]. These and other genomic analyses of neuronal genomes have thus far yielded results highly congruent with experiments employing the L1-EGFP reporter in vitro and in transgenic animals [19, 20, 21]. Together with somatic L1 insertions that may accumulate earlier in development [11, 18, 136], these data suggest that L1 mosaicism occurs relatively often in the mammalian brain. The expected frequency of L1 retrotransposition in neurons is however debated [35, 132, 134, 137] and depends on multiple factors, such as the methods used for WGA, library preparation and sequencing, how false positive and false negative rates are calculated, how insertions are validated, as well as the species, brain region and neuronal subtype being analyzed. Importantly, L1 insertion mapping strategies only find completed retrotransposition events. Host factors may eliminate TPRT intermediates in neurons before integration is fully executed (Table 1) [29, 139, 140] and, for this reason, the frequency of attempted somatic L1 retrotransposition events may be higher than what is found by studies of either endogenous or engineered L1 mobilization.
Current estimates of the L1 retrotransposition rate in human neuronal cells range from 0.04 to 13.7 L1 insertions per neuron . In this context, what is a “low” or “high” frequency? If we assume that the typical human brain contains ~ 90 billion neurons , and apply a conservative denominator of the current lowest estimate of 0.04 unique events per neuron, we would still expect at least 3.6 billion somatic L1 insertions per human brain, and many more events may be shared by multiple cells. Should this be considered as a low rate? Firstly, brain cells are far more physiologically and functionally interdependent than myocytes, hepatocytes, fibroblasts and other somatic cell types found in the body. Highly interconnected neuronal networks may hence be disproportionately impacted by mutations in “node” cells [142, 143]. Secondly, rather than occurring randomly throughout the genome, somatic L1 insertions may be found at a significantly higher rate in neuronally expressed genes [21, 133, 137], although at this stage the separation of potential endogenous L1 insertional preference from post-insertion selection and detection bias is challenging. Thirdly, neurodevelopmental disorders may be caused by somatic mutations penetrating less than 10% of neurons from a given brain region [144, 145, 146] and, moreover, of the two neuronal L1 insertions to undergo lineage tracing thus far, one was found in up to 1.7% of neurons sampled from the cortex . Fourthly, L1 insertions are only one of several types of genomic variant encountered in the brain . These include aneuploidy and other forms of copy number variation (CNV) [148, 149, 150], as well as single nucleotide variants (SNVs) [151, 152]. Analyses of bulk genomic DNA extracted from brain tissue have elucidated somatic Alu and SVA insertions [133, 153], while a single-cell WGS analysis of a relatively small set of cortical neurons did not find somatic variants attributed to either trans mobilized retrotransposon family . L1 insertions are far larger than an SNV and perhaps carry an average effect size more similar to that of a copy number or structural variant, depending on the genomic and biological context where the variant occurs. These considerations suggest that, with the improving resolution and expanding scale of single-cell genomic analysis applied to brain tissue, somatic L1 insertions causing a neuronal or cognitive phenotype will be identified in the coming years. At present, however, very few neurons, almost exclusively from a handful of neurotypical individuals, have been interrogated for endogenous L1 retrotransposition events. Single-cell genomic experiments that exhaustively survey neuronal subtypes, from numerous individuals and brain regions, are required to define the typical range of neuronal L1 retrotransposition frequency in humans . By also elucidating the genomic locations of new L1 insertions, and their functional effects, these future studies should greatly inform our view of whether L1-driven mosaicism has the potential to be a phenomenon of biological importance, building on foundational evidence now showing that endogenous L1s can jump in the brain.
L1 retrotransposition in non-neuronal brain cells
Somatic L1 insertions have been found in hippocampal glia by recent single-cell genomic analyses [134, 137]. By contrast, experiments based on cultured glial cells and the L1-EGFP system have suggested that retrotransposition in glia is uncommon . One possible explanation for the presence of somatic L1 insertions in glia is that neural stem cells can accommodate retrotransposition events prior to neuronal commitment, leading to occasional L1 insertions in multipotent precursor cells that ultimately commit to the glial lineage . Unlike most neuronal populations, glia can also divide and regenerate in response to injury [154, 155] and this capacity for cell cycling may facilitate retrotransposition [59, 156, 157, 158]. Comparisons of L1 retrotransposition rate in glia versus neurons are, for these reasons, not straightforward. Even if, on average, they accumulate fewer L1 insertions than neurons , individual glia can oversee more than 100,000 synapses  and impact the functional output of the neurons they support . To speculate, one can therefore envisage a situation where a somatic L1 insertion in a glial cell that supports or protects a large number of neurons could, by extension, alter the functional properties of at least some of those neurons, potentially adding to any direct impact of neuronal L1 insertions . This may be disproportionately likely in pathologic conditions, such as autoimmune diseases where L1 expression in astrocytes for example may be unusually high . It should again be noted, however, that a molecular or biological phenotype is yet to be demonstrated for any somatic L1 insertion arising in a neural cell. Moreover, glial proliferation and regeneration may buffer cells from the potential consequences of somatic L1 insertions, lessening the likelihood of downstream changes to neuronal circuits. Further experimental evidence is required to conclusively demonstrate that somatic L1 insertions can arise in committed glia, as opposed to multipotent progenitor cells. Similarly, L1 retrotransposition is heavily influenced by cellular host factors (Table 1), but we know little about the host factors that regulate L1 in neurons, as compared to those active in glial cells. Thus, it is likely that the L1 mobilization rate in glia and neurons, including neuronal subtypes, may be reliant upon the differential expression of L1 regulatory proteins in these cells.
Somatic retrotransposition outside of the brain?
To our knowledge, no single-cell genomic analysis of somatic retrotransposition has been reported for mammalian organs other than the brain, although a few immortalized skin cells have been surveyed by WGS without a specific search for mosaic TE insertions . This presents a major gap in the field as, at present, we cannot ascertain whether endogenous L1 retrotransposition really is enriched in the brain or occurs, for instance, in liver, heart or skin at a rate resembling that observed for neurons. Bulk sequencing approaches have found isolated examples of likely somatic L1 insertions in normal liver  and gastrointestinal tract [162, 163, 164, 165] tissues of cancer patients, as well as mosaic L1 insertions found in various adult mouse tissues but arising prior to gastrulation . By contrast, a bulk WGS analysis of 10 clonal cell populations expanded from single skin fibroblasts identified no somatic L1 insertions that could be traced to a parental cell . Transgenic L1-EGFP animals also present very few GFP+ cells outside of the brain and gonads [9, 21] and, when employed in vitro, the L1-EGFP reporter retrotransposes consistently in neural progenitor cells and post-mitotic neurons [19, 20, 21] but not mesenchymal or hematopoietic stem cells .
Taken together, these observations support a model where L1 insertions arising in the early embryo may generate low complexity mosaicism in multiple organs, complemented by ongoing retrotransposition in brain cells. Other adult cell types may also support somatic retrotransposition. However, single-cell genomic analyses of post-mortem, non-brain tissues from human individuals not affected by cancer or other relevant diseases will be required in the future to definitely assess endogenous L1 retrotransposition outside of the brain. That L1 mobilizes frequently in many epithelial tumors [72, 161, 162, 164, 165, 167, 168, 169, 170, 171, 172, 173, 174], but rarely in brain tumors [168, 169, 175, 176], suggests that dysplastic epithelial cells may specifically support L1 activity. The discovery of somatic L1 insertions in the pathologically normal cells of organs where tumorigenesis has occurred reinforces this conclusion [161, 162, 163, 164, 165] but falls short of demonstrating retrotransposition in a healthy organ. Nonetheless, cancer has provided the only examples thus far of somatic retrotransposition causing a clinical or molecular phenotype [161, 163, 167, 170, 171], and has greatly informed our understanding of L1 regulation in vivo (for relevant reviews, please see [109, 114, 177]).
Transposition in the fly brain
L1 and L1-like retrotransposons are found throughout the eukaryotic tree of life . In animals, somatic TE insertions have been almost exclusively reported in human and rodent tissues and experimental systems . The main exception is Drosophila, where R2, a highly site-specific, L1-like retrotransposon, and gypsy, an endogenous retrovirus found to often integrate into specific genomic hotspots, have been found to mobilize in somatic cells, including neurons [4, 5, 6, 7, 178, 179, 180] (for a review, see ). Targeted PCR and resequencing, and orthogonal reporter assays, have each indicated retrotransposon integration (e.g. R2 into rRNA genes [64, 182], gypsy into the ovo gene [183, 184]). However, in contrast to mammalian systems, genome-wide attempts to map endogenous TE mobilization in fly somatic cells have to date not corroborated the aforementioned data obtained from reporter assays. For example, Perrat et al. applied a shallow WGS analysis to pooled fly embryos, brain tissue, and pooled olfactory (αβ) neurons purified from mushroom body, generating an estimate of 129 somatic TE insertions per αβ neuron . However, a subsequent and thoughtful WGS analysis of additional αβ neurons, using improved sequencing depth but still incorporating pooled neuronal material, and analyzing the evolutionary age of mobilized TEs, found no evidence for somatic TE transposition in the fly brain . This second study reversed the earlier conclusion of widespread transposon-mediated genomic heterogeneity in the fly brain  and leaves the question of somatic transposition rate in fly unresolved. Interestingly, through additional analyses, the authors also challenged previous findings of increased transposition rate in ageing neurons  and ovaries obtained from dysgenic hybrids  but did not reanalyze the Perrat et al. sequencing data . Given the aforementioned R2 and gypsy experiments [4, 5, 6, 7], we would postulate that a single-cell genomic analysis of fly neurons, with appropriate genotypic controls (i.e. non-brain tissue from the same fly) would identify somatic transposition events. These would likely occur at a lower frequency than first reported by Perrat et al. but, given the extensive array of mobile TE families in the Drosophila genome , perhaps at a higher frequency than seen in mammalian neurons thus far, and with the caveat that somatic transposition in different fly strains may vary greatly in incidence . Aside from the available data obtained from some mammals and insects, it is currently unknown whether TEs can mobilize in the brain (or other somatic tissues) of other animals. The future discovery of somatic retrotransposition in additional species may greatly assist in elucidating any functional consequences of TE-derived mosaicism in neurons.
Donor L1s active in somatic cells: Different LINEs to retrotransposition
Provided these caveats and considerations, we would propose multiple proven or hypothetical scenarios for L1 to escape epigenetic repression and contribute to somatic genome mosaicism. Firstly, many donor L1s are indeed likely to be active in the early embryo (Fig. 3, red scenario) and then repressed in somatic cells, based on DNA methylation patterns observed for the human L1-Ta family overall [19, 20, 118, 121] and, consistently, for several individual hot L1s . Embryonic L1 insertions arising from these elements can be carried through development to generate somatic mosaicism [11, 18]. Secondly, a given donor L1 may be expressed in the embryo and never fully repressed in mature tissues (Fig. 3, blue scenario). One potential example of this was provided by an L1 on Chromosome 17  that was demethylated and expressed in a colorectal tumor, and also the matched normal colon . This donor L1 is a relatively new polymorphism (minor allele frequency 0.0036), is hot for retrotransposition in vitro  and is therefore likely to still be mobile during embryogenesis or in the committed primordial germline . Thirdly, a donor L1 may be repressed in the embryo but is found in a genomic locus that does not undergo methylation in differentiated tissues (Fig. 3, orange scenario). A likely example of this is an L1 found on Chromosome 22 that is very active in epithelial tumors [72, 171, 174, 197, 198] but almost inactive in the human germline and in cultured cells [39, 85]. Interestingly, this element is intronic to the gene TTC28, which is highly transcribed in epithelial cells and organs where neoplasia often supports retrotransposition of the donor L1 [174, 199] alongside its hypomethylation and transcription in normal and tumor cells [72, 84, 171, 174]. Finally, a donor L1 may be repressed in most contexts (Fig. 3, yellow scenario) but, if located downstream of an active endogenous active promoter, transcription directed by this external promoter may initiate upstream of, and read through into, the L1, thereby generating an intact L1 mRNA. This arrangement could yield somatic L1 insertions with 5′ transductions [36, 69, 73] and may explain one of the examples described above in cortical neurons . In principle, these scenarios present mechanistic bases for individual L1s escaping repression, being transcribed [84, 163, 195, 196], and producing somatic variants that are carried by mature differentiated cells where mobile L1 subfamilies are, overall, marked by epigenetic and transcriptional silencing [19, 20, 22, 27].
Non-canonical L1-associated somatic genome variation
Despite proof of somatic retrotransposition in mammalian brain cells, L1 could impact neuronal phenotype via other routes. For example, a single-cell genomic analysis  of L1 insertions in the human hippocampus identified TPRT-mediated retrotransposition events, corroborating a previous study . The authors also reported examples of somatic genome deletions flanked by germline L1 copies that were detectable in single cells but could also be PCR amplified in bulk hippocampus DNA via digital droplet PCR and PCR reactions performed on very high (500 ng) input template quantities . These deletions were attributed to DNA damage associated with L1 endonuclease activity independent of retrotransposition . Notably, the aforementioned WGS analysis of mouse olfactory neuron clones obtained by nuclear transfer  did not report L1-associated deletions, but also studied fewer neurons from a different species and neuroanatomical region. The frequency and distribution of L1-driven genomic deletion events in humans and other mammals therefore remain to be determined.
More recently, a WGS analysis of bulk human brain tissues  reported thousands of somatic L1 insertions although, surprisingly, the vast majority of these were found nested within L1 insertions annotated on the reference genome. This “L1-within-L1” scenario  presents a significant bioinformatic challenge as sequencing reads can align unreliably to highly repetitive regions , and for this reason insertions into existing younger L1 subfamily (e.g. L1-Ta, L1PA2) copies are usually filtered by TE insertion calling software . Moreover, the putative somatic L1 insertions appeared to not involve L1 ORF2p endonuclease activity , and were 3′ truncated, a feature of L1 integration not encountered for canonical TPRT-mediated L1 insertions in normal cells, where 5′ truncation is instead common [205, 206]. The authors of this study verified a set of nested germline L1 insertions identified by their approach and a publicly available long-read sequencing dataset but, importantly, did not present a similar analysis of long-read sequencing applied to the same brain samples already analyzed by WGS, or sequence matched non-brain tissues . Finally, the proprietary analysis tools required to identify TE insertions in sequencing data generated by this study, and other studies based on the Complete Genomics platform , significantly complicate data sharing and critical re-analysis. L1 may therefore alter the neuronal genome via unexpected pathways, but studies in this area require further investigation and replication, including additional validation and single-cell genomic analyses.
Non-integrated L1 sequences in neural cells
Full-length L1 mRNA transcription can occur in the normal brain [19, 20, 195, 196]. As well as via DNA methylation, the L1 promoter is in this context regulated by a variety of transcription factors, including SOX2 (Fig. 1, Table 1) [20, 22, 27, 47, 69, 105, 207]. An antisense promoter is also present in the human L1 5′UTR , is conserved in primates, and has independent protein-coding potential . This antisense promoter initiates transcription in numerous spatiotemporal contexts and can provide canonical promoters to protein-coding genes [117, 196, 208, 209, 210, 211, 212]. 5′ truncated L1s can also act as promoters in the brain, perhaps regulated by the Wnt signaling pathway [22, 196]. Thus, mobile and immobile L1 copies, where the latter are far more numerous, contribute various L1-initiated RNAs to the cellular environment. These can fulfill cis-regulatory roles and act globally to regulate chromatin structure [213, 214]. L1 transcription, protein abundance and mobilization rate may become uncoupled in vitro upon high L1 mRNA expression . The production of diverse sense and antisense L1 RNAs, and their cellular abundance, may therefore in itself impact neuronal phenotype, independent of retrotransposition.
Similarly, L1 DNA sequences not integrated into the host genome, perhaps generated by ectopic reverse transcription primed from other cellular RNAs, aborted retrotransposition events, or another process involving the L1 machinery, may be relevant to cellular function [216, 217, 218]. Human and mouse L1 CNV assays applying multiplex qPCR to template DNAs extracted from tissue have repeatedly shown variation in L1 DNA content, when brain regions are compared to each other, and when brain samples are compared to non-brain tissue [20, 24, 25, 27, 133, 137, 219]. These studies suggest that i) the hippocampus is a hotspot for L1 CNV and ii) brain tissues are generally enriched for L1 DNA, versus non-brain tissues. As has been proposed previously [112, 220], qPCR-based L1 CNV assays cannot alone demonstrate retrotransposition because they do not discriminate L1 sequences that are, or are not, integrated into the genome. Host factor defenses against retrotransposition very likely include the degradation of single-stranded DNA intermediates produced during TPRT (Table 1) [112, 139] and, where this process is deficient, cells may accumulate single-stranded L1 DNA molecules . Control experiments, such as enzymatically treating qPCR input templates to degrade single-stranded DNA, or selecting only high molecular weight DNA via gel electrophoresis, may reduce, but cannot exclude, the potential for non-integrated L1 DNA to dominate qPCR-based L1 CNV assays . Indeed, these qPCR-based assays can also return absolute L1 CNV values reflecting hundreds of new L1 insertions per cell, depending on normalization approach, when all single-cell genomic analyses performed to date have shown retrotransposed products at a rate far lower than this [35, 67]. It is possible that the qPCR-based assays are simply confounded by unanticipated technical issues and are quantitatively unreliable. In our view, it is more plausible that, alongside L1 RNA expression, neurons can accumulate L1 DNA molecules that are not integrated into the nuclear genome.
The origin, composition and cellular impact of non-integrated L1 DNA sequences remain unclear. They may arise due to a failure to resolve or degrade TPRT intermediates, ectopic L1 reverse transcription where the products are sequestered in the cytosol, or another mechanism by which L1 could form stable, extrachromosomal DNA sequences in vivo [216, 217, 218, 221, 222, 223, 224, 225, 226, 227]. Are these L1 DNAs predominantly single- or double-stranded? Are they predominantly full-length or heavily truncated? Notably, qPCR assays targeting L1 at its 5′UTR, ORF2 or 3′UTR regions can in some cases generate different L1 CNV results [25, 27], suggesting that the additional L1 DNA sequences are shorter on average than genomic L1 copies of the same subfamily, which supports the hypothesis that interrupted, or unusually inefficient, reverse transcription may be involved in the biogenesis of non-integrated L1 DNA molecules. Along these lines, when the L1 qPCR assay was applied to brain tissue obtained from i) Rett syndrome (RTT) patients, where mutations in the L1 transcriptional repressor MeCP2 (Table 1) [27, 75, 228, 229] cause a severe neurodevelopmental disorder, and ii) an MeCP2-mutant RTT mouse model, significant L1 copy number gain was observed in either species when L1 DNA content was measured at ORF2, when compared to controls . L1 CNV was not, however, observed when measured at the 5′UTR . It is relevant that conditional restoration of MeCP2 function in MeCP2-mutant mice leads to robust reversal of neurological phenotype . In work performed recently in our laboratory, we found that phenotypic reversal in these animals was accompanied by L1 DNA content returning from elevated to wild-type levels after rescue, when measured by qPCR against ORF2 (Morell et al., unpublished data).
These observations altogether suggest that at least some of the additional L1 DNA content reported in RTT brain samples may not be incorporated into the nuclear genome. More broadly, the increased presence of L1 and other TEs in neurological disorders [6, 27, 28, 29, 231, 232, 233, 234] elucidated by qPCR-based assays therefore may not involve new TE insertions, and any associated potential toxicity  may not be due to retrotransposition. It is tempting to speculate that the accumulation of non-integrated L1 DNA, for example via failed or incomplete elimination of TPRT intermediates [52, 139, 236], could still cause genomic lesions in neuronal genes  or otherwise “distract” host factors which, in addition to guarding against L1 integration, often regulate other cellular processes . L1 activity in the brain is potentially relevant to neuronal physiology and genome stability beyond any impact of somatic retrotransposition, although further experiments are required to demonstrate the biogenesis of non-integrated L1 DNA sequences in neurons and other cells.
Does elevated L1 content in the brain trigger autoimmunity?
Endogenous and exogenous nucleic acids may trigger immune responses mediated by various sensor pathways [for reviews, see [238, 239]]. As well as in RTT, elevated L1 DNA content has been reported in neurological disorders associated with autoimmunity, immunodeficiency and maternal infection, including Aicardi-Goutières syndrome [29, 137, 221], ataxia telangiectasia  and schizophrenia . As for normal individuals, the magnitude of L1 CNV reported in these disorders appears to far exceed what would plausibly be due to somatic retrotransposition and could be due to an accumulation of L1 DNA molecules that are not integrated into the nuclear genome . This scenario would have major implications for the treatment of any condition proven to be caused by L1 activity because the reversal of any associated symptoms would no longer be dependent on the challenging excision of somatic L1 insertions from neuronal genomes. Instead, processes leading to an accumulation of non-integrated single- or double-stranded L1 DNA could be targeted, for example, with reverse transcriptase inhibitors  or through targeted silencing  of heavily transcribed L1 copies .
Aicardi-Goutières syndrome (AGS) is a very rare interferonopathy that provides arguably the best developed example of a neurological phenotype potentially linked to L1-associated autoimmunity. Genetic analyses of AGS patients have revealed mutations most commonly in the genes TREX1, SAMHD1, ADAR1, RNASEH2A, RNASEH2B, RNASEH2C and IFIH1 [239, 243]. Most of these genes encode factors that have been shown to regulate retrotransposon activity (Table 1) [221, 234, 244, 245, 246, 247, 248, 249, 250, 251], supporting the hypothesis that the cytosolic accumulation of endogenous nucleic acids in AGS generates an interferon response [239, 252, 253, 254]. TREX1, for example, is an established exonuclease of aberrant single-stranded intermediates generated during DNA replication . An abundance of single-stranded L1 DNA has been reported in human and mouse TREX1-deficient cells [29, 221], whilst a single-cell genomic analysis of neurons obtained from one AGS patient carrying SAMHD1 mutations indicated that somatic L1 insertions occurred at a rate similar to that of controls . Whilst these experiments suggest L1 might play a role in AGS, the mechanism via which single-stranded L1 DNA could generate an abnormal neuronal phenotype is largely unclear, and it remains plausible that the accumulation of L1 DNA in AGS is a largely inconsequential result of nuclease mutations.
Intriguingly, a recent study demonstrated that media obtained from TREX1-deficient human astrocytes was toxic to healthy neurons, whereas media from TREX1-deficient astrocytes treated with L1 reverse transcriptase inhibitors was significantly less toxic . The authors ascribed this toxicity to an interferon response due to an accumulation of cytosolic single-stranded L1 DNA in astrocytes [29, 256]. By contrast, another recent work found that treatment of TREX1 mutant mice with L1 reverse transcriptase inhibitors had no impact on interferon response or the retrotransposition frequency of an engineered L1 reporter gene in vivo . Previously, different reverse transcriptase inhibitors have been shown to rescue  or not rescue  the lethal myocarditis phenotype of TREX1-deficient mice. These findings raise the prospect that a biochemical mechanism apart from the inhibition of L1 reverse transcriptase activity, perhaps instead targeting inflammation, is responsible for the amelioration of AGS phenotype .
At this stage, the etiological role of TREX1 in controlling L1 and other endogenous retrotransposons in AGS requires further study. It should however be noted that i) the somewhat opposing results detailed above for L1 were obtained using different species and cell types, ii) assays measuring engineered and endogenous L1 activity can provide different results [29, 221, 247, 257], iii) engineered L1 retrotransposition frequency and potentially immunogenic single-stranded L1 DNA content are not equivalent, and iv) host factors and reverse transcriptase inhibitors may act via multiple direct and indirect pathways to limit L1 activity. For example, instead of restricting L1 primarily by exonuclease activity, TREX1 may alter the subcellular localization of L1 ORF1p, and thereby reduce opportunities for cells to accumulate L1 DNA, whether via retrotransposition or another mechanism [221, 247].
As for TREX1, RNaseH2 has been alternatively reported as being a negative or positive regulator of L1 retrotransposition [249, 250, 260]. Some eukaryotic TEs encode ribonuclease proteins to facilitate the removal of their template RNA after reverse transcription [261, 262, 263], and also degrade other cellular DNA:RNA hybrids, supporting a positive role for RNaseH2 in L1 retrotransposition. Alternatively, biochemical assays using the Bombyx mori R2 retrotransposon previously revealed that the RNA in a hybrid DNA:RNA molecule generated during TPRT could be displaced during second strand DNA synthesis without the apparent involvement of a ribonuclease . Ribonuclease mediated degradation of the RNA strand of hybrid L1 DNA:RNA molecules prior to second strand synthesis has been demonstrated in vitro to expose the L1 cDNA to deamination, suggesting that ribonuclease activity may facilitate editing or 5′ truncation of L1 cDNAs in vivo . Nonetheless, we favor the view that the ribonuclease activity of RNaseH2 assists L1 mobility in vivo, even if other RNaseH2 functions are ultimately shown to inhibit retrotransposition. Overall, the available literature points to a potential role for L1 in the etiology and clinical management of AGS and other neurodevelopmental disorders associated with autoimmunity. Significant work is required to reconcile the somewhat opposing results reported for the use of reverse transcriptase inhibitors in disparate AGS experimental models, and to therefore clarify whether L1 activity is a pathogenic or coincidental feature of this disease.
Somatic mosaicism represents an intriguing and underexplored form of genetic and biological variation in mammals. Although L1 retrotransposon-driven mosaicism is now established to occur in brain cells, any impact of this phenomenon upon normal and abnormal neurobiological processes remains undemonstrated. Despite the recent development of tools, including single-cell genome, epigenome and transcriptome sequencing [151, 265, 266, 267, 268, 269, 270, 271, 272], in some cases employed in parallel [for a review, see ], as well as CRISPR-Cas9 based genetic and epigenetic engineering [242, 274, 275, 276, 277], conclusive proof is yet to be provided of any individual somatic L1 insertion arising in the neuronal lineage that has generated a molecular, biochemical or behavioral phenotype in vivo. Given the effect size of L1 insertions in genes, and the frequency of endogenous L1 insertions arising during neurodevelopment, adult neurogenesis or in post-mitotic neurons, it is likely that some L1 insertions could induce a biologically relevant neuronal phenotype. We believe such examples will be found in future studies. It is also plausible that L1 may impact neurobiology primarily through mechanisms not involving resolved retrotransposition events, given recent observations from neurological diseases, such as RTT and AGS.
Experiments to test the impact of individual somatic L1 insertions present a major challenge. Work in this area could be greatly accelerated through: i) the development of methods to reliably survey genome structural variation and transcription, genome-wide and from the same cell, using human brain tissue obtained post-mortem, or from tissue obtained during brain surgery [278, 279], or from animal models, ii) the large-scale production of WGS data from individual brain cells, retaining neuronal subtype information, as well as from non-brain cells, and iii) the ability to introduce, via CRISPR-Cas9 or another approach, L1 insertions found in vivo into cultured neurons, organoids or even animal models, to assess their impact upon the transcriptional and regulatory landscapes when established in a homogenous cellular population. Long-read sequencing approaches, such as those developed by PacBio and Oxford Nanopore, which can identify TPRT hallmarks ab initio by resolving L1 integration sites in full, may also prove particularly useful, even if simply applied at high depth to DNA extracted from brain tissue [280, 281, 282, 283, 284]. Beyond surveying the spatiotemporal extent and potential immediate functional impact of L1 mosaicism, we also need to be able to modulate endogenous retrotransposition and evaluate the consequences, if any, upon behavior. In neurological disorders where elevated L1 activity is apparent, it would be valuable to assess the impact restricting that activity has upon symptoms. These are long term and challenging experiments. However, neuronal genome mosaicism driven by engineered L1 retrotransposition was first reported in 2005  and has only been definitively shown to be recapitulated by endogenous L1s in vivo quite recently [133, 134, 135, 136, 137, 138]. Therefore, equipped with foundational knowledge, and improving tools, the field is well positioned to move rapidly towards establishing any functional impact of L1 mosaicism in the soma.
The authors thank Jose Garcia-Perez, Adam Ewing and members of the Faulkner laboratory for critical reading of the manuscript, as well as Alysson Muotri, Rayk Behrendt, Joshua Dubnau and Scott Waddell for comments received on the draft manuscript.
GJF acknowledges the support of a CSL Centenary Fellowship, Australian NHMRC Project Grants GNT1106206, GNT1125645, GNT1126393 and GNT1137895, and the Mater Foundation. Funding bodies played no role in the preparation of, or decision to submit, the manuscript.
GJF wrote the manuscript with input from VB. GJF and VB prepared the figures. Both authors approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 7.Wood JG, Jones BC, Jiang N, Chang C, Hosier S, Wickremesinghe P, Garcia M, Hartnett DA, Burhenn L, Neretti N, Helfand SL. Chromatin-modifying genetic interventions suppress age-associated transposable element activation and extend life span in Drosophila. Proc Natl Acad Sci U S A. 2016;113:11277–82.PubMedPubMedCentralCrossRefGoogle Scholar
- 13.Richardson SR, Faulkner GJ. Heritable L1 Retrotransposition events during development: understanding their origins. Bioessays. 2018;40:e1700189.Google Scholar
- 29.Thomas CA, Tejwani L, Trujillo CA, Negraes PD, Herai RH, Mesci P, Macia A, Crow YJ, Muotri AR. Modeling of TREX1-dependent autoimmune disease using human stem cells highlights L1 accumulation as a source of Neuroinflammation. Cell Stem Cell. 2017;21:319–31.Google Scholar
- 37.Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–62.Google Scholar
- 49.Dmitriev SE, Andreev DE, Terenin IM, Olovnikov IA, Prassolov VS, Merrick WC, Shatsky IN. Efficient translation initiation directed by the 900-nucleotide-long and GC-rich 5′ untranslated region of the human retrotransposon LINE-1 mRNA is strictly cap dependent rather than internal ribosome entry site mediated. Mol Cell Biol. 2007;27:4685–97.PubMedPubMedCentralCrossRefGoogle Scholar
- 51.Doucet AJ, Hulme AE, Sahinovic E, Kulpa DA, Moldovan JB, Kopera HC, Athanikar JN, Hasnaoui M, Bucheton A, Moran JV, Gilbert N. Characterization of LINE-1 ribonucleoprotein particles. PLoS Genet. 2010;6:e1001150.Google Scholar
- 57.Taylor MS, Altukhov I, Molloy KR, Mita P, Jiang H, Adney EM, Wudzinska A, Badri S, Ischenko D, Eng G, et al. Dissection of affinity captured LINE-1 macromolecular complexes. Elife. 2018;7:e30094.Google Scholar
- 59.Mita P, Wudzinska A, Sun X, Andrade J, Nayak S, Kahler DJ, Badri S, LaCava J, Ueberheide B, Yun CY, et al. LINE-1 protein localization and functional dynamics during the cell cycle. Elife. 2018;7:e30058.Google Scholar
- 74.Coufal NG, Garcia-Perez JL, Peng GE, Marchetto MC, Muotri AR, Mu Y, Carson CT, Macia A, Moran JV, Gage FH. Ataxia telangiectasia mutated (ATM) modulates long interspersed element-1 (L1) retrotransposition in human neural stem cells. Proc Natl Acad Sci U S A. 2011;108:20382–7.PubMedPubMedCentralCrossRefGoogle Scholar
- 85.Gardner EJ, Lam VK, Harris DN, Chuang NT, Scott EC, Pittard WS, Mills RE, 1000 Genomes Project Consortium, Devine SE. The mobile element locator tool (MELT): population-scale mobile element discovery and biology. Genome Res. 2017;27:1916–29.Google Scholar
- 91.Martin SL, Bushman D, Wang F, Li PW, Walker A, Cummiskey J, Branciforte D, Williams MC. A single amino acid substitution in ORF1 dramatically decreases L1 retrotransposition and provides insight into nucleic acid chaperone activity. Nucleic Acids Res. 2008;36:5845–54.PubMedPubMedCentralCrossRefGoogle Scholar
- 94.Soares ML, Edwards CA, Dearden FL, Ferron SR, Curran S, Corish JA, Rancourt RC, Allen SE, Charalambous M, Ferguson-Smith MA, et al. Targeted deletion of a 170-kb cluster of LINE-1 repeats and implications for regional control. Genome Res. 2018;28:345–356.Google Scholar
- 96.Robbez-Masson L, Tie CHC, Conde L, Tunbak H, Husovsky C, Tchasovnikarova IA, Timms RT, Herrero J, Lehner PJ, Rowe HM. The HUSH complex cooperates with TRIM28 to repress young retrotransposons and new genes. Genome Res. 2018;Google Scholar
- 97.Berrens RV, Andrews S, Spensberger D, Santos F, Dean W, Gould P, Sharif J, Olova N, Chandra T, Koseki H, et al. An endosiRNA-based repression mechanism counteracts transposon activation during global DNA demethylation in embryonic stem cells. Cell Stem Cell. 2017;21:694–703. e697PubMedPubMedCentralCrossRefGoogle Scholar
- 114.Scott EC, Devine SE. The role of somatic L1 Retrotransposition in human cancers. Viruses. 2017;9:e131.Google Scholar
- 115.Grassi DA, Jonsson ME, Brattas PL, Jakobsson J. TRIM28 and the control of transposable elements in the brain. Brain Res. 2018;Google Scholar
- 120.MacLennan M, Garcia-Canadas M, Reichmann J, Khazina E, Wagner G, Playfoot CJ, Salvador-Palomeque C, Mann AR, Peressini P, Sanchez L, et al. Mobilization of LINE-1 retrotransposons is restricted by Tex19.1 in mouse embryonic stem cells. Elife. 2017;6:e26152.Google Scholar
- 125.Dombroski BA, Scott AF, Kazazian HH, Jr. Two additional potential retrotransposons isolated from a human L1 subfamily that contains an active retrotransposable element. Proc Natl Acad Sci U S A. 1993;90:6513–17.Google Scholar
- 138.Hazen JL, Faust GG, Rodriguez AR, Ferguson WC, Shumilina S, Clark RA, Boland MJ, Martin G, Chubukov P, Tsunemoto RK, et al. The complete genome sequences, unique mutational spectra, and developmental potency of adult neurons revealed by cloning. Neuron. 2016;89:1223–36.PubMedPubMedCentralCrossRefGoogle Scholar
- 147.McConnell MJ, Moran JV, Abyzov A, Akbarian S, Bae T, Cortes-Ciriano I, Erwin JA, Fasching L, Flasch DA, Freed D, et al. Intersection of diverse neuronal genomes and neuropsychiatric disease: the brain somatic mosaicism network. Science. 2017;356:eaal1641.Google Scholar
- 171.Nguyen THM, Carreira PE, Sanchez-Luque FJ, Schauer SN, Fagg AC, Richardson SR, Davies CM, Jesuadian JS, Kempen MHC, Troskie RL, et al. L1 retrotransposon heterogeneity in ovarian tumor cell evolution. Cell Rep. 2018;23:3730–40.Google Scholar
- 186.Treiber CD, Waddell S. Resolving the prevalence of somatic transposition in Drosophila. Elife. 2017;6:e28297.Google Scholar
- 199.GTEX Consortium. Genetic effects on gene expression across human tissues. Nature. 2017;550:204–13.Google Scholar
- 214.Percharde M, Lin CJ, Yin Y, Guan J, Peixoto GA, Bulut-Karslioglu A, Biechele S, Huang B, Shen X, Ramalho-Santos M. A LINE1-Nucleolin partnership regulates early development and ESC identity. Cell. 2018.Google Scholar
- 219.Giorgi G, Virgili M, Monti B, Del Re B. Long INterspersed nuclear elements (LINEs) in brain and non-brain tissues of the rat. Cell Tissue Res. 2018;Google Scholar
- 234.Ahmad S, Mu X, Yang F, Greenwald E, Park JW, Jacob E, Zhang CZ, Hur S. Breaching self-tolerance to Alu duplex RNA underlies MDA5-mediated inflammation. Cell. 2018;172:797–810.Google Scholar
- 243.Crow YJ, Chase DS, Lowenstein Schmidt J, Szynkiewicz M, Forte GM, Gornall HL, Oojageer A, Anderson B, Pizzino A, Helman G, et al. Characterization of human disease phenotypes associated with mutations in TREX1, RNASEH2A, RNASEH2B, RNASEH2C, SAMHD1, ADAR, and IFIH1. Am J Med Genet A. 2015;167A:296–312.PubMedCrossRefGoogle Scholar
- 251.Chung H, Calis JJA, Wu X, Sun T, Yu Y, Sarbanes SL, Dao Thi VL, Shilvock AR, Hoffmann HH, Rosenberg BR, Rice CM. Human ADAR1 prevents endogenous RNA from triggering translational shutdown. Cell. 2018;172:811–24.Google Scholar
- 253.Herzner AM, Hagmann CA, Goldeck M, Wolter S, Kubler K, Wittmann S, Gramberg T, Andreeva L, Hopfner KP, Mertens C, et al. Sequence-specific activation of the DNA sensor cGAS by Y-form DNA structures as found in primary HIV-1 cDNA. Nat Immunol. 2015;16:1025–33.PubMedPubMedCentralCrossRefGoogle Scholar
- 256.Akwa Y, Hassett DE, Eloranta ML, Sandberg K, Masliah E, Powell H, Whitton JL, Bloom FE, Campbell IL. Transgenic expression of IFN-alpha in the central nervous system of mice protects against lethal neurotropic viral infection but induces inflammation and neurodegeneration. J Immunol. 1998;161:5016–26.PubMedGoogle Scholar
- 260.Benitez-Guijarro M, Lopez-Ruiz C, Tarnauskaite Z, Murina O, Mian Mohammad M, Williams TC, Fluteau A, Sanchez L, Vilar-Astasio R, Garcia-Canadas M, et al. RNase H2, mutated in Aicardi-Goutieres syndrome, promotes LINE-1 retrotransposition. EMBO J. 2018.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.