The cancer-associated CTCFL/BORIS protein targets multiple classes of genomic repeats, with a distinct binding and functional preference for humanoid-specific SVA transposable elements
- 1.8k Downloads
A common aberration in cancer is the activation of germline-specific proteins. The DNA-binding proteins among them could generate novel chromatin states, not found in normal cells. The germline-specific transcription factor BORIS/CTCFL, a paralog of chromatin architecture protein CTCF, is often erroneously activated in cancers and rewires the epigenome for the germline-like transcription program. Another common feature of malignancies is the changed expression and epigenetic states of genomic repeats, which could alter the transcription of neighboring genes and cause somatic mutations upon transposition. The role of BORIS in transposable elements and other repeats has never been assessed.
The investigation of BORIS and CTCF binding to DNA repeats in the K562 cancer cells dependent on BORIS for self-renewal by ChIP-chip and ChIP-seq revealed three classes of occupancy by these proteins: elements cohabited by BORIS and CTCF, CTCF-only bound, or BORIS-only bound. The CTCF-only enrichment is characteristic for evolutionary old and inactive repeat classes, while BORIS and CTCF co-binding predominately occurs at uncharacterized tandem repeats. These repeats form staggered cluster binding sites, which are a prerequisite for CTCF and BORIS co-binding. At the same time, BORIS preferentially occupies a specific subset of the evolutionary young, transcribed, and mobile genomic repeat family, SVA. Unlike CTCF, BORIS prominently binds to the VNTR region of the SVA repeats in vivo. This suggests a role of BORIS in SVA expression regulation. RNA-seq analysis indicates that BORIS largely serves as a repressor of SVA expression, alongside DNA and histone methylation, with the exception of promoter capture by SVA.
Thus, BORIS directly binds to, and regulates SVA repeats, which are essentially movable CpG islands, via clusters of BORIS binding sites. This finding uncovers a new function of the global germline-specific transcriptional regulator BORIS in regulating and repressing the newest class of transposable elements that are actively transposed in human genome when activated. This function of BORIS in cancer cells is likely a reflection of its roles in the germline.
KeywordsK562 Cell Electrophoretic Mobility Shift Assay CTCF Binding CTCF Binding Site Knock Down
cap analysis of gene expression
microarray analysis of ChIP
NGS analysis of ChIP
cancer testis (genes)
CTCF target sites
electrophoretic mobility shift assay
principal component analysis
ribosomal RNA genes
reverse transcription and quantitative polymerase chain reaction
SINE, VNTR, and Alu (transposable element)
singular value decomposition
topologically associated domains
tandem repeat finder
transcription start sites
variable number tandem repeat
Transposable elements (TEs) play active roles in normal genome evolution in humans  and in primates in general , as well as in sporadic genome rearrangement [3, 4, 5] including deleterious events associated with pathology [6, 7, 8, 9, 10, 11, 12]. Multiple polymorphisms and intron evolution in normal human populations are largely facilitated by TE insertions [13, 14]. A substantial and distinct role of satellite repeats was also recently demonstrated for double-strand breaks (DSBs) incidence upon replication stress . Active families of TEs (L1, Alu, and SVA) account for a large number of germline mutations . In cancer, insertions of mobile element and the recombination between them have been identified as causes of many cancers [12, 17, 18], with some repeats shown to become aberrantly expressed [17, 19] to acquire a potential to change the regulation of neighboring genes [17, 20, 21] and to destabilize chromosomes [7, 22]. The effect of repeated DNA in the origins and progression of cancer and tumor cell physiology could be two-pronged: the induced change of expression in neighboring or targeted genes [22, 23, 24] and the structural destabilization of the epigenetic landscape of chromosomes [2, 25]. These two effects are interrelated, as epigenetic changes in the repeats open chromosomal domains for both aberrant changes in gene expression and elevated somatic recombination. Some elements were also shown to act as bona fide enhancers .
The presence of a strong epigenetic component in such repeats and TE-mediated genome regulation and instability is well established [20, 27, 28, 29, 30]. In cancer cells, there is likely a higher epigenetic impact of TEs, compared to the norm , as promoters of expressed mobile elements become hypomethylated and their transcription elevated [22, 31, 32].
The array of epigenetic changes leading to repeat deregulation in cancer cannot be understood without molecular analysis of repeats’ chromatin. This brings to light the role of CTCF and its paralog CTCFL/BORIS in these processes. In addition to serving as a bona fide transcription factor, CTCF reads the epigenetic marks [33, 34, 35, 36] and plays a key role in the formation of topologically associated domains (TADs) in chromatin [37, 38, 39], in remodeling chromatin structure , and in the formation of chromatin boundaries [29, 41]. CTCF was also shown to have multiple binding sites embedded in TEs [42, 43]. CTCF target sites (CTSs) are also important for telomere repeat stability [44, 45]. Furthermore, the fact that CTCF control of gene expression and recombination requires physical contacts between different CTSs via looping [46, 47, 48, 49] indicates that CTCF sites in repeats are not inert in the chromatin architecture, as indeed was demonstrated at some instances [50, 51, 52, 53].
Taking into account the important role of CTCF in regulating TE expression and epigenetic maintenance, it is possible that the aberrant activation of its germline paralog CTCFL/BORIS in cancer has an impact on repeat physiology and genome stability. BORIS is a cancer testis (CT) gene , and its ectopic expression could be lethal or inhibitory for somatic cells because BORIS, being a germline transcription factor, activates gene expression of germline-specific genes on its own or in cooperation with CTCF . Nevertheless, some cancer cells undergo adaptation/addiction to BORIS activation and incorporate the BORIS protein into their physiology [55, 56]. BORIS also interferes with a variety of other CTCF-specific functions in somatic cells, such as in the organization of chromatin loops that are alternative to the chromatin configuration of normal cells . The ultimate molecular and physiological role of BORIS in cancer is still poorly understood, however, beyond the association with stemness , phenocopying of germline-specific gene expression pattern, and the corresponding 3D chromatin organization . In particular, it is not clear how some cancer cells became dependent on BORIS for their proliferation, making BORIS a potential anticancer target [57, 58].
While many genomic repeats are heavily methylated and BORIS has a probable role in DNA demethylation [57, 59, 60, 61], the role of BORIS in repeat biology has not been studied. Incidentally, even the most comprehensive genome-wide studies on CTCF tended to ignore the possible simultaneous presence of BORIS in cells studied, be it cancer or embryonic stem cells [48, 50, 62, 63, 64]. In this present study, we attempted to assess the specific pattern of BORIS recognition of genomic repeats in cancer cells and to link it to TE expression. As a result, we uncovered a surprising association of BORIS with one of the evolutionary youngest families of actively transcribed and mobile repeats in human genome, the SVA family of TEs. Follow-up analysis of the modulation of BORIS expression revealed that it predominately acts as one of the mechanisms repressing the expression of these elements.
BORIS expression in K562 forms a specific pattern of repeat binding
For the initial analysis, by ChIP-chip, anti-CTCF and anti-BORIS immunoprecipitations were conducted and microarray hybridization was performed as described in Methods. The plot of normalized ChIP-chip fluorescence intensities showed indications of distinct binding patterns for BORIS and CTCF on highly enriched tiles (Fig. 1c). Significance analysis of microarray (SAM) indicated that over 40,000 tiles were enriched differentially by CTCF and BORIS, but provided little clue about the occupancy of the rest of the repeats. The principal component analysis (PCA) of arrays hybridized to CTCF and BORIS ChIP samples confirmed the presence of differentially bound genomic repeats (Fig. 1d). The PCA also revealed the three expected scenarios of occupancy: binding by BORIS only, by CTCF only, and BORIS and CTCF co-binding being by far the largest group (Fig. 1e). As CTCF and BORIS have essentially the same DNA-binding specificities in vitro, the differences in occupancy observed in vivo must be largely driven by the epigenetic factors.
Prior to proceeding further with analyses of repeat binding sequences, we conducted a validation of ChIP-chip data using an alternative high-throughput procedure, ChIP-seq, as conventional qPCR validation methods are not applicable or scalable to the TRs genome-wide. We set out to validate the three identified subsets: first, repeats preferentially enriched by CTCF (Fig. 1e), second, repeats preferentially enriched by BORIS (Fig. 1e), and, third, repeats equally enriched by both CTCF and BORIS (Fig. 1e, a subset of the middle group). Based on detailed PCA analysis, an additional cutoff across the three groups was applied to make uniform criteria for selecting the representative subsets for validation. For co-bound repeats we chose the 4× enrichment for both proteins in all three ChIP-chip replicates, while for the Z5 groups we used 4× enrichment for one protein, with no enrichment for the other, also in all three replicates. Drawing the threshold at such a relatively high level also significantly reduced repeat redundancy in the TR dataset. For the ChIP-seq validation, we considered a ChIP-chip-positive repeat validated, if any tile from that repeat was reproducibly enriched at least twofold in ChIP-seq datasets with 95 % DNA match. Thus, all the repeats discussed below are repeats identified by ChIP-chip and validated by ChIP-seq.
Co-binding of BORIS and CTCF is characteristic for the simple tandem repeats
Analysis of CTCF and BORIS co-binding at repeated DNA would have been incomplete without assessing the least characterized region of human epigenome—the chromatin of nucleolar organizer (NOR, or rDNA repeats). The bona fide human genomic rDNA has a very complex structure with multiple intervening sequences , and the NOR sequence from any human chromosome still remains to be determined. Therefore, human rDNA was not represented at TRF database and was not present on our microarrays. While we did not validate rDNA binding by CTCF and BORIS in ChIP-chip, it is known that the repeat unit contains a strong hotspot for CTCF binding facilitating CTCF’s interaction with PolI transcription machinery . We used a “consensus” human rDNA repeat, as in , to align ChIP-seq reads and assess the potential differences between CTCF and BORIS binding (Additional file 2: Figure S2B). Comparing BORIS and CTCF binding showed that CTCF has a single binding site upstream of rDNA PolI promoter, consistent with published data in mice . At the same time, BORIS appeared to have some enrichment at additional sites (Additional file 2: Figure S2A). These locations, however, corresponded to low-complexity regions (Additional file 3: Table S1), which were also present elsewhere in the genome. Unlike the established CTCF binding site, the two selected BORIS sequences that appeared to be enriched in ChIP-seq were not confirmed to bind BORIS by EMSA in vitro (Additional file 2: Figure S2C). Thus, one may assume that such sites likely represent an artifact of short reads’ alignment to tandemly repeated DNA, and the additional such sites were not tested. The presence of BORIS at the main Pol I regulatory site in rDNA, however, indicates that BORIS might be involved in ribosome biogenesis in cancer cells by virtue of co-regulating the rDNA transcription with CTCF.
CTCF-only enrichment is found in older repeat classes
A movable and evolutionary youngest class of TEs is specifically enriched in BORIS binding
What could be BORIS activity at SVA elements? Our previous results on the genome-wide consequences of modulation of BORIS expression indicated that BORIS could serve as an activator as well as repressor . The distinct preference of the aberrantly expressed BORIS for SVA elements may potentially indicate that BORIS has some regulatory activity at these elements in germline and/or in cancer cells. As there is little doubt that SVAs mobilization is detrimental to genome stability, because they are under a strong repression in primates [73, 74, 75, 76], a possible BORIS involvement in the regulation of SVA transcription must be biologically important. Indeed, the transcription is required for SVA transposition, and it could also have a regulatory role in the expression of neighboring genes.
BORIS acts as a transcriptional co-repressor of a significant proportion of SVAs in K562 cells
While the transcription unit of SVAs is not well characterized [76, 77], the Alu-derived sequences are the chief drivers of transposition in SVA . Thus, SVAs contain sequences potentially transcribed by both RNA Pol III and Pol II, either of which can drive retrotransposition . At the same time, based on structural considerations, it is unlikely that SVA elements are actually transcribed by Pol III . We tested whether there was a difference in the occupancy of RNA Pol III factors at SVA elements between the publicly available ChIP-seq datasets for BORIS-positive K562 and BORIS-negative NHEK. Incidentally, we found no notable enrichment at any SVA elements for POLR3G, BDP1, BRF1, BRF2, or RPC155 (data not shown).
At this point, one may hypothesize that the affinity of BORIS to VNTRs of SVA elements demonstrated in K562 is a reflection of its role in germline pertaining to these elements and that this role is likely a repressive one. Indeed, we recently showed that despite BORIS previously perceived as an activator, BORIS upregulation was linked to the repression of some genes and, vice versa, BORIS downregulation has resulted in some gene being activated . Therefore, we investigated the K562 cells with downregulated BORIS. As SVA elements might be rapidly repressed by some other mechanism in the absence of BORIS, we could not rely on BORIS KO data , as the points of comparison there were separated by a long period of time. Instead, we experimented with the downregulation of BORIS expression in K562 cells for a short period of time using inducible shRNA. This approach enabled us to assess immediate downstream effects of BORIS downregulation. We constructed K562 cell lines with two alternative inducible anti-BORIS shRNA constructs stably integrated into the genome and conducted RNA-seq experiments after BORIS KD for 48 h. Neither the degree of BORIS depletion nor the time span of the experiment was sufficient to induce the differentiation, as was described for BORIS KO . While genome-wide expression of genes responding to BORIS KD was almost evenly divided between up- and downregulation of transcription (data not shown), SVA elements longer than 1 kb were notably activated (Fig. 6d). In order to address whether any SVA were actually downregulated upon BORIS KD, we isolated the subclass of SVA elements that were already expressed in K562 and compared their expression to BORIS KD cells. As shown in Additional file 5: Figure S3A, the 70 SVA elements that were expressed did not significantly change their expression upon the downregulation of BORIS.
In order to understand better the nature of SVA activation, we treated control K562 cells with 5-AzadCyD (5-Aza-2′-deoxycytidine), an inhibitor of DNA methylation [27, 80, 81, 82], and DZNep (3-deazaneplanocin A), which indirectly suppresses EZH2 that catalyzes histone H3 lysine 27 methylation [83, 84]. Both drugs result in the removal of inhibitory epigenetic marks from DNA and chromatin, respectively. RNA-seq analysis of K562 cells treated with these DNA methylation or H3K27me3 inhibitors indicated that SVA elements that were already active were upregulated slightly (Additional file 5: Figure S3B, S3C), while the group as a whole was preferentially activated. The 5-AzadCyD effect was similar to BORIS KD, and the DZNep effect was more pronounced (Fig. 6d). Thus, we next asked whether these treatments could be preferentially affecting the same subset of SVA elements as BORIS KD or a distinct one. Using the DZNep treatment as an example, Fig. 6e, f, we showed that BORIS KD largely acted concordantly with DZNep (correlation 0.77) to activate SVA transcription of the elements that were silent in the control. It was also evident that the BORIS KD-dependent activation was not specific to any particular subclass of SVA repeats (Fig. 6g), indicative of a common pathway.
A distinct type of BORIS function at the SVA-F1 TEs
In conclusion, it appears that BORIS acts as a co-repressor of SVA transcription in K562 cells, alongside DNA methylation and heterochromatinization. It is therefore likely that BORIS plays a similar role in the germline, with the exception of promoter-trapping events. These findings indicate a potential biological role of BORIS as a regulator of active TEs in human genome.
The “explosive” chromosome instability is confirmed to be one of the defining features of cancer genome [90, 91]. This notion has sparked multiple attempts to find either a unifying mechanism or a set of concurrent mechanisms for this process [92, 93]. The early onset of chromosome instability in cancer and pre-cancer cells strongly indicates the epigenetic roots of the destabilization. In this context, the roles of chromatin states of genomic repeats in cancer are of significant interest because they directly bridge the epigenetic landscape with a potential to destabilize genome via transposition and/or recombination. TEs that can pose a danger to genome integrity tend to be silenced for recombination and retrotransposition by epigenetic mechanisms [17, 73, 94]. Here, we found evidence of BORIS involvement in the co-regulation of TEs. The established role of BORIS as a transcriptional regulator in cancer [55, 95] and as activator of testis-specific genes [70, 96, 97] might also be applicable to the states of genomic repeats in cancer cells. Nevertheless, the role of BORIS with respect to genomic repeats was previously totally unknown, despite the significant recent progress in understanding the transposition as the primary venue of genome evolution pertaining to the distribution of CTCF binding sites .
In this study, we established that BORIS, upon its activation at a relatively high level in cancer cells, has a substantial capacity to occupy the same sites in the repeated elements as CTCF (Fig. 1e). We can presume, with a high level of certainty, that it is a manifestation of the BORIS’ co-functions with CTCF in the normal germline [55, 70]. While co-binding is generally expected due to the DNA-binding properties of the two proteins in vitro, the recent discovery of cluster sites being a prerequisite for CTCF and BORIS co-binding or binding of BORIS alone  suggests that a significant fraction of such repeats have cluster site configuration. Indeed, the assessment of DNA consensus characteristic for BORIS and CTCF co-bound repeat sites (Fig. 2c) showed no significant deviation from the basic unit of CTCF consensus derived from the genome-wide binding studies (Fig. 2b), but revealed the presence of a staggered arrangement (Fig. 2d), which potentially enables such TR locations to become super-cluster sites with ample co-binding capacity. The characterization of repeats that are co-occupied by CTCF and BORIS showed that the bulk of co-binding seems to be associated with the low-copy simple TRs (Fig. 2a). These elements have a relatively narrow length distribution, most are longer that 50 nt, indicating that they are under selection, possibly by the requirement to bind CTCF or BORIS. While expansion of short TRs is known to cause disease in a number of studied cases [98, 99], their genome-wide biological role is obscure. Thus, it is likely that BORIS and CTCF co-binding there uncovered a putative regulatory role for these elements in germline and/or cancer transcription.
The few repeat types that show a significant bias toward CTCF-only binding are rather enigmatic, as the function of CTCF-only sites genome wide is not well characterized . The most notable case here is the centromeric repeats, where recombination is highly undesirable , but the transcription was nevertheless found to be of paramount importance for normal kinetochore formation . While CTCF’s binding at alpha-satellites and its involvement in centromeric transcription were not studied, the interaction between CTCF and some centromeric proteins has been invoked at ectopic sites .
The most distinctive result generated by this study is the high preference of SVA repeats for BORIS binding, as compared to binding by CTCF in K562 (Fig. 4). Unfortunately, in the absence of ChIP data for BORIS from human testis one cannot be absolutely sure that it is also the situation in normal testis. The functions for SVA that are described so far are attributed to the disruption/features of insertion sites rather than to the transcription originating within the insertion [103, 104]; yet the finding of BORIS binding hints at the regulatory role of SVA VNTRs themselves. The presence of several BORIS binding sites within the VNTR repeats (Figs. 4c, f, 5), which are actually required for SVA transposition , indicates that the BORIS protein and SVA elements may have even undergone co-evolution, as has been recently suggested for other ZF proteins . Thus, one may expect the SVA elements to play a notable regulatory role in germline development and genome evolution in primates. In that regard, the recent studies on gibbon genome [2, 105] provided some invaluable insight into the new level of plasticity that SVA-like elements LAVA infused into primate genomes. At present, one cannot conclude whether SVA TEs merely represent a genetic load or actually have a physiological role in germline. Despite human SVAs being associated with at least some chromosomal breaks , we could probably exclude the direct contribution of SVA elements into the meiotic recombination, as DSB maps of human meiosis  did not correspond to SVA locations (not shown).
By applying RNA-seq analyses to the K562 cells, we found a strong evidence of a substantial fraction of SVA elements being transcriptionally activated upon BORIS KD (Fig. 6d–f). This was a strong indication that BORIS acted as a repressor of SVA transcription for that repeat group. This conclusion is further reinforced by the finding that this repressive activity is additive with DNA methylation and with the formation of repressive chromatin structure (Fig. 6e, f). Therefore, we could conclude that BORIS participates in the repression of SVA elements that are located in the heterochromatin-like regions of epigenome. This BORIS-mediated tier of SVA repression could have an exceptional significance in male germline, where the rounds of DNA demethylation  could potentially open SVA retrotransposons for a transient activation leading to germline mutations, as it has been found in pluripotent cells .
The addition of BORIS to cancer cells’ chromatin constitutes a potent epimutation, as it could introduce a substantial change into CTCF’s functions . Some of these changes were recently documented, particularly with respect of recapitulating the germline pattern of gene regulation . With respect to the genomic repeats, the associated rewiring of epigenetic regulatory network, which is normally embodied by CTCF alone in somatic cells, may greatly alter the functional role of inserted repeats themselves, e.g., their expression and transposition, as well as their propensity to regulate neighboring genes and chromatin domains.
As a result of this study, by employing ChIP-chip and ChIP-seq approaches, we characterized CTCF and BORIS binding patterns of genomic repeat binding upon aberrant BORIS expression in the K562 cancer cell line, which is dependent on BORIS for proliferation. This study showed that, while CTCF-only enrichment is found in most known repeat classes, BORIS and CTCF bind together predominately to the uncharacterized simple TRs, which likely form compound cluster binding sites. We discovered that the SVA elements, a presently active family of TEs in human genome with a strong mutagenic potential and a role in transcription regulation, are specifically enriched in BORIS, with binding concentrated at the VNTR region. Furthermore, RNA-seq analysis of BORIS KD in K562 showed that BORIS acts to repress multiple SVA, alongside the transcriptionally repressive histone modification and DNA methylation. These finding uncovered a novel function of BORIS in controlling the levels of TE transcription in cancer cells and likely in the germline.
Cell culture, transfection, and lentiviral infection
K562, Delta-47, and HL60 cell lines were grown in IMDM (Hyclone) supplemented with 10 or 20 % Tet-approved-FBS. HEK293T/17 cell line was grown in DMEM (Hyclone) supplemented with 10 % FBS. Transfection was done according to manufacturer’s instructions using X-tremeGENE 9 DNA Transfection Reagent (Roche). To package lentivirus, HEK293T/17 cells were cotransfected with the vector Tet-pLKO-Neo (Addgene) or anti-BORIS shRNA derivatives and two packaged plasmids psPAX2 and Pmd2.G. Lentivirus stocks were collected 72 h post-transfection and used to infect K562 at 40–50 % confluence using 500 µl lentivirus stock and 8 µg/ml polybrene (Sigma). The media were then changed 12 h after infection to include 600 µg/ml G418, and the cells were selected for G418 resistance for at least 4 weeks. The resistant clones were selected in 96-well plates and analyzed by RT-qPCR and immunoblotting. The stable clones were induced by 200 ng/ml doxycycline to activate the Tet-On promoter.
The tiling repeat microarray
The design for this custom array  was conducted at Roche/Nimblegen using tiling approach. As a source for the design, we used a catalogue of human TRs generated by TR finder [110, 111]. The version of TRF algorithm used for the design of the array generated 947,696 distinct repeat instances based on the human genome. The tentative estimate of redundancy conducted by applying the most stringent versions of TRF suggests that the repeat dataset had about 40 % sequence redundancy. The repeats were broken into 50-base tiles using the following rules: Tiles were picked based on the predicted hybridization normalization; when the repeat was shorter than 50 nucleotides, it was extended in tandem fashion. Our tiling approach has generated some additional redundancy within tiles themselves because long homogeneous repeats produced a number of identical tiles. The redundancies within the array did not interfere with microarray data analysis, as the primary hybridization signal was recorded for each tile independently of any other. The final array design contained 2,166,672 features, including two control sets: 29,161 random sequence tiles and 181 tiles from the rDNA locus of Saccharomyces cerevisiae.
ChIP-chip and ChIP-seq
For the ChIP-chip and ChIP-seq, anti-CTCF and anti-BORIS ChIP were conducted from at least 50 million cells growing asynchronously. ChIP-seq preparation and analysis were done essentially as described in . The specificity of ChIP reactions was validated by qPCR for known targets: the TSP50 and CST promoters for BORIS, and the MYC promoter sites for CTCF as in [96, 97].
For ChIP, cells growing asynchronously were cross-linked (10 min, 1 % formaldehyde, 23 °C) quenched for 10 min by 200 mM glycine, washed three times with PBS, and then resuspended in chromatin buffer (150 mM NaCl, 1 % Triton X100, 0.1 % SDS, 20 mM Tris–HCl pH8.0, and 2 mM EDTA). DNA was sheared using Covaris S220, so that most fragments were in the 300- to 500-bp range. Chromatin was immunoprecipitated overnight with magnetic beads (DiaMag, Diagenode, Inc.) loaded with anti-CTCF or anti-BORIS antibodies as described in . The immunoprecipitate was washed, cross-links reversed, protein component was digested with proteinase K, and DNA was extracted using phenol/chloroform/isoamyl alcohol. DNA concentration was measured by Qubit (Life Technologies) and/or Nanodrop (Thermo Scientific) fluorimeters. For ChIP-chip, the immunoprecipitated DNA was amplified using the Phi29 strand-displacement procedure (GE Bioscience) following the concatemerization of precipitated DNA fragments via ligation to double-strand adaptors containing BamHI overhangs and internal SapI sites. Both amplified and non-amplified samples showed essentially the same relative enrichment for known sites of CTCF and BORIS binding. Following the amplification, adapters were removed by SapI digestion and agarose gel purification. Input DNA was used as a hybridization reference for the hybridization of amplified ChIP DNA to a set of custom TR arrays (Roche-Nimblegen). Raw intensities for each channel were centered against the mean of control features set, including random oligonucleotides and yeast rDNA. Then, Lowess smoothing was applied to two-channel data to generate corrected M values that were used in subsequent analyses. The Lowess normalization, SAM, and PCA calculations were done using publicly available R scripts. For downstream analysis of ChIP-seq data, the Illumina reads (50 bp) were aligned to human repeat subgenome generated by TRF  using BLAT  (allowing 95 % identity) and/or Bowtie  (with parameters -v 2 --best --strata --tryhard). seqMINER  was used to analyze and plot CAGE expression data from published datasets. Motif Elicitation (MEME) software  was used to derive consensus sequences from genomic repeats with parameters (-mod oops -revcomp -w 20) to identify motifs on both DNA strands.
Analysis of public high-throughput genomic data
ENCODE/RIKEN data (GSE34448) for K562 and NHEK cell lines were used in this study. The DSB maps of human meiosis were derived from .
Protein extracts were prepared by lysing cells SDS-PAGE sample buffer after washing with PBS supplemented with 1× protease inhibitor cocktail (Roche Applied Science). Protein samples were separated by SDS-PAGE, transferred to a PVDF membrane, and incubated with the appropriate primary antibodies, followed by detection using LiCor secondary antibodies fused to fluorochromes. Photoluminescent images were captured by scanning and processed for quantification using LiCor workstation.
Immunofluorescent cell staining
K562 and HL60 cells were spun down in Cytospin centrifuge (Thermo Scientific) onto poly-Lysine-coated coverslips and fixed with 4 % paraformaldehyde for 10 min, followed by cold methanol for 10 min. Cells were permeabilized with 0.1 % Triton X-100/PBS for 10 min and then blocked with BSA for 30 min, after which they were incubated with primary antibodies. After washes, the anti-rabbit or anti-mouse secondary antibodies conjugated to either Alexa Fluor 647 or Alexa Fluor 488 were applied. Cells were mounted for microscopy in mounting media containing DAPI and images captured using either confocal (Zeiss) or wide-field (Olympus) inverted microscopes.
Electrophoretic mobility shift assay (EMSA)
To map CTCF and BORIS binding sites in SVA repeats, the SVA subfamily D repeat (chr11: 107,782,495–107,784,189, GRCh37/hg19) was covered with nine overlapping DNA probes either amplified by PCR or synthesized as oligonucleotides (Additional file 3: Table S1). PCR amplified products were cloned into the pCR2.1 TOPO vector (Invitrogen), and the sequence was confirmed by DNA sequencing. DNA fragments were labeled with [γ-32P] ATP at the 5′ ends by T4 polynucleotide kinase per Invitrogen protocol. Labeled DNA fragments were gel purified, and equal amount of each fragment was used for EMSAs. FL human CTCF, 11ZF domain of CTCF, and FL human BORIS were synthesized from pCITE expression vectors (EMD Millipore), using the reticulocyte lysate-coupled in vitro transcription-translation system (TNT, Promega). Binding reactions for EMSA were for 1 h at 23 °C with 4 µl of in vitro synthesized DNA-binding proteins in binding buffer [25 mM HEPES pH7.6, 100 mM KCl, 2 mM MgCl2, 10 % glycerol, 0.5 µg poly(dIdC) × poly(dIdC)]. DNA–protein complexes were resolved on 5 % non-denaturing polyacrylamide gels in 0.5× Tris-borate-EDTA buffer. Gal3ST1 promoter fragment was used in EMSA as a positive control for both CTCF and BORIS binding . To test methylation sensitivity of protein binding, all labeled probes used in EMSA were methylated using SssI methyltransferase (New England BioLabs) by the following protocol: 200 ng of each oligonucleotide was combined with 2.7 μl of NEBuffer 2, 3 μl (12 U) of SssI methylase and 1 μl of S-adenosylmethionine (32 mM). After 3 h of incubation at 37 °C, 0.5 μl of NEBuffer 2, 3 µl (12 U) of SssI methylase, and 1 μl of S-adenosylmethionine (32 mM) were added, and the reaction incubated for an additional 3 h at 37 °C. The completion of methylation was assessed by digesting them with the methylation-sensitive enzyme AciI (Additional file 2: Figure S2B).
RT-PCR and quantitative PCR
Total RNA was prepared using Trizol (Invitrogen). cDNA was prepared using the Primescript™ RT Reagent Kit with genomic DNA Eraser (perfect real time) (TaKaRa) according to the manufacturer’s protocol. Quantitative PCR (qPCR) was performed using SYBR Premix Ex Taq™ (TaKaRa) and the Mx30005P QPCR System (Agilent).
For the RNA-seq experiments, inducible BORIS knock down (KD) and control cell lines were created by infecting K562 cells with 3 different Tet-on lentivirus constructs: empty vector pLKO-Tet-ON-neo , and two alternative anti-BORIS shRNA constructs. Several stable clones of each infected cell line were selected using 600 µg/ml G418. BORIS KD vectors were constructed to express the following shRNA templates: GGAAATACCACGATGCAAATT (Site 1) and GGTGTGAAATGCTCCTCAACA (Site 2). For lentivirus vectors construction, the annealed oligonucleotides were inserted into the pLKO-Tet-On-neo vector between AgeI and EcoRI restriction sites. After 72-h induction by doxycycline, BORIS mRNA was reproducibly showing 2.5-fold to threefold reduction, while BORIS protein levels were robustly decreased over fivefold (Fig. 6c, d). For RNA analysis, these K562-inducible stable shRNA cells were plated in 10-cm plates at 40–50 % confluence in DMEM media and left to grow in the presence of doxycycline (200 ng/ml) for 96 h. For the 5-aza-deoxycytidine and DZNep experiments, cells were identically pretreated with doxycycline, harvested, and re-plated at 50–60 % confluence to grow 48 h in the presence of either 500 nM 5-aza-2′-deoxycytidine, 1 µM DZNep or DMSO. The degree of genomic DNA demethylation was assessed using DNA IP with anti-5-methylcytosine mAb MABE146, clone 33D3 (EMD Millipore), and qPCR against known targets. The effectiveness of DZNep treatment was assessed by immunoblotting against the EZH2 protein with D2C9 rabbit mAb (Cell Signaling Technology). The cells were then collected, frozen, and outsourced for Illumina sequencing to RiboBio (Guangzhou). The amount of RNA submitted for each individual run was on average 85 µg (Nanodrop). The quality of RNA was assessed by the Agilent 2200 TapeStation. About 20 million reads were obtained for each individual experiment. Four biological replicates were produced and analyzed for each set of experimental conditions. The results of all RNA-seq experiments were analyzed for consistency and reproducibility using Cufflinks 2.0.0  following reads alignment to the human reference genome (hg38) using TopHat2, with the default parameter setting. Upon that validation, for SVA alignment to RNA-seq data, a sub-genome file of 2223 SVA elements was assembled from elements mapped in hg38 that were longer than 1 kb, i.e., to ensure that VNTRs were included. The SVA elements were aligned to RNA-seq reads with Bowtie (-v0), and read counts per each element were normalized according to total read numbers in each experiment. Then, fold-enrichment ratios relative to the averaged normalized reads in the empty vector experiments were calculated.
AS, EMP, DL and VL conceived and designed the experiments; EMP, QFW, JJL, CC, CCM, JL, and AB performed experiments; AS, EMP, ET, JL, and APH conducted data analysis; SR and DL contributed reagents and tools; and AS, EMP, APH, DL and VL wrote the paper. All authors read and approved the final manuscript.
Authors would like to acknowledge the Drug Discovery Center of the Guangzhou Institutes of Biomedicine and Health for logistical support. It was funded by the Guangzhou sciences and technology Grant 201508020131.
The authors declare that they have no competing interests.
Availability of supporting data
NGS data were deposited to the Gene Expression Omnibus (GEO) repository with the accession number GSE70764. The TRF microarray design and the ChIP-chip datasets were deposited at the GEO with accession number GSE84326.
This work was supported by the PRC government’s “1000 Talents Program” grant to AS, the Guangdong provincial government’s “Guangdong High Talent” award to AS, and the Intramural Program of the National Institute of Allergy and Infectious Diseases for VL.
- 34.Landan G, Cohen NM, Mukamel Z, Bar A, Molchadsky A, Brosh R, Horn-Saban S, Zalcenstein DA, Goldfinger N, Zundelevich A, et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nat Genet. 2012;44:1207–14.PubMedCrossRefGoogle Scholar
- 37.Gomez-Marin C, Tena JJ, Acemel RD, Lopez-Mayorga M, Naranjo S, de la Calle-Mustienes E, Maeso I, Beccari L, Aneas I, Vielmas E, et al. Evolutionary comparison reveals that diverging CTCF sites are signatures of ancestral topological associating domains borders. Proc Natl Acad Sci USA. 2015;112:7542–7.PubMedCrossRefPubMedCentralGoogle Scholar
- 45.Stong N, Deng Z, Gupta R, Hu S, Paul S, Weiner AK, Eichler EE, Graves T, Fronick CC, Courtney L, et al. Subtelomeric CTCF and cohesin binding site organization using improved subtelomere assemblies and a novel annotation pipeline. Genome Res. 2014;24:1039–50.PubMedCrossRefPubMedCentralGoogle Scholar
- 55.Pugacheva EM, Rivero-Hinojosa S, Espinoza CA, Mendez-Catala CF, Kang S, Suzuki T, Kosaka-Suzuki N, Robinson S, Nagarajan V, Ye Z, et al. Comparative analyses of CTCF and BORIS occupancies uncover two distinct classes of CTCF binding genomic regions. Genome Biol. 2015;16:161.PubMedCrossRefPubMedCentralGoogle Scholar
- 57.Vatolin S, Abdullaev Z, Pack SD, Flanagan PT, Custer M, Loukinov DI, Pugacheva E, Hong JA, Morse H III, Schrump DS, et al. Conditional expression of the CTCF-paralogous transcriptional factor BORIS in normal cells results in demethylation and derepression of MAGE-A1 and reactivation of other cancer-testis genes. Cancer Res. 2005;65:7751–62.PubMedGoogle Scholar
- 59.Bhan S, Negi SS, Shao C, Glazer CA, Chuang A, Gaykalova DA, Sun W, Sidransky D, Ha PK, Califano JA. BORIS binding to the promoters of cancer testis antigens, MAGEA2, MAGEA3, and MAGEA4, is associated with their transcriptional activation in lung cancer. Clin Cancer Res. 2011;17:4267–76.PubMedCrossRefPubMedCentralGoogle Scholar
- 70.Sleutels F, Soochit W, Bartkuhn M, Heath H, Dienstbach S, Bergmaier P, Franke V, Rosa-Garrido M, van de Nobelen S, Caesar L, et al. The male germ cell gene regulator CTCFL is functionally different from CTCF and binds CTCF-like consensus sites in a nucleosome composition-dependent manner. Epigenet Chromatin. 2012;5:8.CrossRefGoogle Scholar
- 90.Moncunill V, Gonzalez S, Bea S, Andrieux LO, Salaverria I, Royo C, Martinez L, Puiggros M, Segura-Wang M, Stutz AM, et al. Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads. Nat Biotechnol. 2014;32:1106–12.PubMedCrossRefGoogle Scholar
- 96.Kosaka-Suzuki N, Suzuki T, Pugacheva EM, Vostrov AA, Morse HC 3rd, Loukinov D, Lobanenkov V. Transcription factor BORIS (brother of the regulator of imprinted sites) directly induces expression of a cancer-testis antigen, TSP50, through regulated binding of BORIS to the promoter. J Biol Chem. 2011;286:27378–88.PubMedCrossRefPubMedCentralGoogle Scholar
- 97.Suzuki T, Kosaka-Suzuki N, Pack S, Shin DM, Yoon J, Abdullaev Z, Pugacheva E, Morse HC 3rd, Loukinov D, Lobanenkov V. Expression of a testis-specific form of Gal3st1 (CST), a gene essential for spermatogenesis, is regulated by the CTCF paralogous gene BORIS. Mol Cell Biol. 2010;30:2473–84.PubMedCrossRefPubMedCentralGoogle Scholar
- 101.Bergmann JH, Jakubsche JN, Martins NM, Kagansky A, Nakano M, Kimura H, Kelly DA, Turner BM, Masumoto H, Larionov V, Earnshaw WC. Epigenetic engineering: histone H3K9 acetylation is compatible with kinetochore structure and function. J Cell Sci. 2012;125:411–21.PubMedCrossRefPubMedCentralGoogle Scholar
- 106.Vogt J, Bengesser K, Claes KB, Wimmer K, Mautner VF, van Minkelen R, Legius E, Brems H, Upadhyaya M, Hogel J, et al. SVA retrotransposon insertion-associated deletion represents a novel mutational mechanism underlying large genomic copy number changes with non-recurrent breakpoints. Genome Biol. 2014;15:R80.PubMedCrossRefPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.