Background

Telomeres, the ends of linear chromosomes, consist of repetitive DNA sequences bound by the shelterin protein complex [1, 2]. This protein assembly protects the DNA ends from degradation and accidental recognition as DNA double-strand breaks [3,4,5]. The progressive shortening of the telomere repeats that accompanies normal replication limits the number of cell divisions. Thus, it needs to be circumvented by cancer cells for unlimited proliferation. This is accomplished by activation of a telomere maintenance (TM) mechanism. It involves either the reactivation of the reverse transcriptase telomerase normally repressed in somatic cells via different mechanisms [6,7,8,9], or activation of the alternative lengthening of telomeres (ALT) pathway [10,11,12,13]. ALT activity in human cancer cells occurs via DNA repair and recombination pathways but details on the mechanism remain elusive. Thus, TM is a complex process that involves proteins that are part of the shelterin complex at telomere repeats [14, 15] or in close proximity [16, 17]. Factors that regulate transcription of telomere repeats and the activity of telomerase are also relevant [18, 19] as well as features of the ALT pathway like PML (promyelocytic leukemia) nuclear bodies at telomere repeats that are associated with a variety of proteins and referred to as APBs (ALT-associated PML nuclear bodies) [20,21,22,23]. Furthermore, studies of telomere shortening have linked a number of proteins to telomere crisis [24].

A well-studied model organism for telomere biology is the budding yeast Saccharomyces cerevisiae [25]. Several independent deletion screens with subsequent direct measurements of telomere length (TL) have identified a comprehensive list of yeast genes involved in TL regulation [26,27,28]. Since telomere structure and function are highly conserved between organisms, mammalian homologues exist for most of the genes identified in the various yeast screens. Thus, it is informative to relate TM phenotypes found in yeast to human homologues [29]. In S. cerevisiae, telomerase is constitutively active and its deletion leads to cellular senescence [30]. Survivor cells that overcome cellular senescence in the absence of telomerase use a mechanism based on homologous recombination for telomere elongation [31]. Interestingly, similar to ALT in human cells, so-called type II survivors are characterized by heterogeneous TLs [32, 33].

To compile telomere-relevant information several databases have been created: The Telomerase database (http://telomerase.asu.edu/overview.html) is a web-based tool for the study of structure, function, and evolution of the telomerase ribonucleoprotein [34]. It is a comprehensive compilation of information on the telomerase enzyme and its DNA substrate. In addition, MiCroKiTS (Midbody, Centrosome, Kinetochore, Telomere and Spindle; http://microkit.biocuckoo.org) provides information on the cellular localization of proteins relevant for cell cycle progression and also includes telomere proteins [35]. The TeloPIN (Telomeric Proteins Interaction Network) database was a collection of interaction data in human and mouse cells from available literature and GEO (gene expression omnibus) data [36] but it is no longer active. The same is true for the TeCK database that has been previously published as a collection of telomeric and centromeric sequences as well as telomerase, centromere and kinetochore binding proteins [37].

The above-mentioned databases cover telomere related information but lack an annotation of genes with respect to the TM mechanism. Accordingly, we here introduce the TelNet database as a compilation of information on TM relevant genes. TelNet currently comprises more than 2000 human, and over 1100 budding yeast genes that are involved in TM pathways. The annotation of these genes includes the classification of TM mechanisms (TMM) along with a significance score as well as TM specific functions and homology assignments between different organisms. Furthermore, links to the relevant literature sources are given. Thus, TelNet provides an integrative resource for dissecting TM networks and elucidating the alternative lengthening of telomeres pathway.

Construction and content

Implementation

The TelNet database was constructed using the Filemaker Pro software version 13. It is accessible at http://www.cancertelsys.org/telnet and is distributed with Filemaker server version 16 via its webdirect module. In addition, the TelNet webpage provides general information about TelNet as well as instructions on how to use it. Links to other databases and contact information are given as well.

Data source

To compile an initial set of TM relevant genes, we selected screening studies on genes or proteins that play a role in telomere biology (Fig. 1, Table 1) and included the following: (i) Proteins that were purified with a telomere probe in an ALT- and a telomerase-positive cell line [14], (ii) proteins from the analysis of telomeric chromatin of telomerase-positive cells [38], (iii) proteins in close proximity to shelterin components [16, 17], (iv) proteins that affected ALT-associated PML nuclear bodies [23, 39], (v) deregulated proteins linked to telomere shortening [24], (vi) genes identified from telomerase activity signatures derived from gene expression data [40], (vii) telomerase regulators identified in a kinase screen and transcription factors compiled in a review [18, 19] and, (viii) a gene set with potential relevance to telomeres and the ALT pathway [41]. In addition, more than 1100 budding yeast genes were included in TelNet. For yeast, the initial gene list was obtained from the following sources: (i) Deletion screens identifying TL associated genes [26,27,28], (ii) post-senescent survivor screening after telomerase knockout [42], (iii) transcription factors of telomerase [43], and (iv) all human and budding yeast genes with a GO annotation containing the term “telo” [44].

Fig. 1
figure 1

Data sources of TM genes included in TelNet. Selected screening studies and other references that served as sources for TM genes are shown. In total TelNet currently includes over 2000 human genes and more than 1100 budding yeast genes. Histograms of the TelNet scores are displayed for the complete gene sets per species and colored by their TM significance annotation. Color scheme: blue, predicted TM genes; beige, genes from screening studies; orange, validated genes

Table 1 Screening studies and database information included in TelNet for identification of TM genes

To classify the relevance of a gene or corresponding protein for TM we introduced the three categories “predicted”, “screened” and “validated”. The factors collected from the above-mentioned screening or review sources were initially classified as “screened”. Genes with a suggested role in telomere maintenance but lacking experimentally validation were assigned with the TM significance “predicted”. Those with gene specific experimental evidence for a connection to telomere maintenance were ranked as “validated”. Orthologues of gene’s classified as “screened” or “validated” in one organism were included in the TelNet database as “predicted” in the other organism if no further information was available. In this manner, we compiled an initial list of human and budding yeast genes that was further curated and annotated manually.

General information from external databases

For a standardized nomenclature [45], the converter system from DAVID Bioinformatics Resources (https://david.ncifcrf.gov/) [46] or the BioMarts tool from Ensembl [47] were used to provide gene and protein identifiers for Entrez, Hugo, Ensembl, Refseq and UniProt. To account for organism specific differences such as the lack of splicing isoforms in yeast or the absence of locus tags in human, the identifiers were selected differentially for each species. General gene information was retrieved from designated external databases and repositories, such as the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov) [48], HUGO Gene Nomenclature Committee (HGNC, http://www.genenames.org) [49], Ensembl (http://www.ensembl.org/index.html) [50, 51], and the Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) [52]. The approved gene symbol, full name, and synonyms were taken from NCBI. UniProt [53] and Yeastmine [54] were consulted for the description of the cellular function in human and yeast, respectively, and assignment of orthologues was done with YeastMine. Based on the Gene Ontology (GO, http://www.geneontology.org) annotations [44] and in line with general biocuration guidelines [55] as well as SGD practice [56] we generated a list of cellular functions. Every gene was manually annotated with the respective term that was most representative for its cellular function. In this manner, general information for every gene entry was compiled from a variety of external databases.

Telomere maintenance annotation with literature information and scoring

Genes were further annotated with TM information from peer-reviewed literature for assigning them to functional categories (Fig. 2). Up to five TM functions of an assembled list that comprises molecular functions as well as cellular processes and structures with regard to TM can be selected. A knock-out or knock-down phenotype related to TM features such as alterations in TL, increased or decreased ALT hallmarks, or effects on telomerase was described as free-text in the field “TM phenotype”. Details from the literature were summarized in the field “TM comment”. To quantify the significance of a given gene for TM we introduced the TelNet score ranging from 1 (low) to 10 (high) that was automatically calculated from information entered into the TelNet database (Table 2). Scoring criteria included the cellular function, number and relevance of assigned TM functions and the amount of experimental data associated with the TM function of a given gene. Information on the protein's activity was collected in the “TMM annotation” field. For human genes, it was distinguished between “alternative lengthening of telomeres (ALT)” versus “telomerase-mediated” regulation with the associated activities “repressing”, “enhancing” or “ambiguous”. The latter refers to cases where literature information was inconsistent or was used for genes that were mentioned in the context of ALT or telomerase without further details of regulation activity. Budding yeast genes were annotated as survivors using “type I recombination” or “type II recombination” or associated with “telomerase-mediated” regulation. Thus, the annotation of a given factor in TelNet assigns it to ALT or a telomerase-mediated TMM and provides information on how it affects this process. Furthermore, the corresponding TelNet score provides an assessment of the significance of this assignment.

Fig. 2
figure 2

TM part of the TelNet gene card view. Annotation fields and possible entries for TM significance, associated TelNet score, TM comment, TMM annotation, TM functions and TM phenotype are depicted

Table 2 Calculation of the TelNet score

Utility

TelNet user interface

On the start layout of TelNet, the user selects the organism, i.e. either H. sapiens or S. cerevisiae (Fig. 3). The default selection is H. sapiens. All genes can be browsed by clicking on the “show all” button. Furthermore, various search modes are available are described in more detail below. A navigation panel at the top allows switching between different views and returning to the front search page. Gene sets can be displayed as a scrollable list and the complete information of an individual gene is given by selecting the “card view”. A short explanation of each annotation field is given by clicking on the corresponding info button. Orthologous genes are connected via database hyperlinks. Furthermore, every gene is linked to selected publications.

Fig. 3
figure 3

Typical TelNet workflow. Top: On the front page, the organism is selected. Middle: Three different search options, namely “quick search”, “list search” and “advanced card search” are available to retrieve a set of genes. Bottom: The resulting genes can be displayed as a scrollable list or as a series of single gene cards. In addition, an overview of the associated TM annotations is provided on the statistics page

Search and statistics

The TelNet database can be used with three different search modes (Fig. 3) named “quick search”, “list search” and “advanced card search”. For a quick search throughout selected fields, one keyword can be entered into the search bar. If a user wishes to constrain the the results (e.g., to a gene symbol), the selection of fields can be adapted. By performing an advanced card search, the user can enter more and different search terms in respective fields. Furthermore, a complete list of gene identifiers can also be pasted into the list search. The organism and identifier provided are mandatory to perform a list search. Genes found are then displayed and can be selected for further analysis and TM network identification within TelNet or exported in various file formats. The statistics page gives a graphical overview over the distributions of various TelNet annotations such as a histogram of the TelNet score and the distribution of TM significance categories. Furthermore, TelNet statistics can be employed for a more detailed pathway analysis regarding TM functions. A predicted wild-type TMM is computed by evaluation of the TMM annotations retrieved. The wild-type phenotype of a given gene is used for predicting the likely active TMM for a set of genes. Every protein contributes with its TelNet score to one of the groups “ALT”, “telomerase-mediated” or “ambiguous”, which refers to its wild-type form. For example, a gene that is recurrently mutated in ALT positive tumors like ATRX (alpha thalassemia/mental retardation syndrome X-linked protein) would represent an ALT suppressor. It is thus classified as “telomerase-mediated” for the predicted TMM associated with its wild-type phenotype. The attribute “ambiguous” is used for genes lacking TMM information as well as genes with conflicting associations. Thus, TelNet informs about known and predicted TM features for the genes of interest via its different search and summary analysis tools.

Application of TelNet for telomere maintenance analysis

The added value of TelNet in comparison to existing databases lies in the straightforward annotation of genes with respect to a TM function without pre-existing knowledge on the user side. For example, the Yeastract database lists 22 transcription factors (TFs) as “documented” regulators of the yeast Est2 gene, encoding the telomerase catalytic subunit [43]. When submitting these TFs to the Saccharomyces Genome Database (SGD) with YeastMine all 22 genes were identified as transcription factors by the GO pathway analysis [54]. However, no enriched GO terms or publications related to telomeres/telomerase were returned because these TFs were not annotated with a telomere-associated GO term. In contrast, all 22 TFs were included in the TelNet database as Est2 regulators.

The information provided by TelNet is particularly useful for the evaluation of gene lists obtained from large scale data sets as illustrated in the following for a pan-cancer correlation analysis of gene expression data with TL estimates. It is based on the cancer genome atlas (TCGA) study of Barthel et al. [40] and uses TL data calculated from whole genome sequencing (WGS) and gene expression data (stdata_2016_01_28, file uncv2.mRNAseq_RSEM_normalized_log2) downloaded via the firehose data repository (https://gdac.broadinstitute.org/). A reduced patient data set (n = 281) was selected that comprised all samples where non-malignant control samples of matching tumor tissue were available. In order to normalize for tissue- and age-specific effects, we calculated the ratio of tumor over normal tissue for TL and the corresponding log2 ratio for gene expression. For the two ratios, the spearman correlation coefficient was computed. For 87 genes, a significant correlation (p < 0.01 and − 0.184 < Rho > 0.186) of TL with gene expression was found and 940 genes were differentially expressed (p < 0.01 and < − 0.782 log2 ratio > 0.852) (Fig. 4, Additional file 1: Table S1). For 5 genes both a correlation of TL and gene expression was found, namely NTN1, PTGER3, ARL4D, PLAU and NOSTRIN. It is noted that most of the tumor samples had shorter telomeres than the respective normal control sample. This could be the result of a higher tumor proliferation rate being only partly compensated by the active TMM. This confounding factor as well as the tissue specific expression programs in the different tumor entities are likely to lead to false negative results. For example, TERT (telomerase reverse transcriptase) expression did not show a significant correlation with TL. Thus, it might be also informative to examine deregulated genes that did not display an (anti-)correlation with TL with respect to potential TM activities.

Fig. 4
figure 4

Application of TelNet for a correlation analysis of telomere length and gene expression. Scatter plot showing the log2 ratio (tumor/normal) of gene expression versus the Spearman correlation coefficient Rho for gene expression and telomere length. For histograms of Rho and log2 ratio a Gaussian fit is shown with significance values defined from the 1%-tail of the fit. Genes that were either significantly (p < 0.01) up- (log2 ratio > 0.852) or downregulated (log2 ratio < − 0.782) or significantly (p < 0.01) correlated (Rho > 0.186) or anti-correlated (Rho < − 0.184) were colored in black. Genes above the significance thresholds that were present in the TelNet database are shown in red color

To further analyze the 1022 genes for which correlations or deregulations on the gene expression level were detected, we consulted the HumanMine database and its GO enrichment analysis [57]. The enriched GO terms did not return a telomere-related pathway. Without an enrichment threshold, 13 genes (RAD51, CCNE1, BRCA2, HIST1H4H, RECQL4, RFC4, FEN1, EXO1, BLM and HIST2H4A, PPARG, KLF4 and PARM1) were annotated with one of the GO terms “telomere maintenance”, “telomere organization” or “regulation of telomerase activity”. In contrast, we retrieved a set of 132 genes when using the TelNet “list search” option (Additional file 1: Table S1). TelNet finds more genes, because it includes homology assignments in both directions (30 from the 132 genes have a yeast homolog with a TM phenotype) as well as genes that do not have a GO term related to telomeres but have telomere related activities according to the papers referenced in TelNet. Out of the 132 genes found in the TelNet database, 12 showed a significant (anti-)correlation (0.186 > Rho < − 0.184) between TL and gene expression (Table 3): Only one gene, ARL4D (ADP-ribosylation factor-like protein 4D), additionally had a significant deregulation of gene expression in tumors (log2 ratio = − 1.02). ARL4D was included in TelNet since the deletion mutant of its yeast orthologue ARF1 (ADP ribosylation factor 1) has shorter telomeres than the wild-type reference [27, 42]. Furthermore, 4 genes of those listed in Table 3 were annotated in TelNet as “screened” or “validated” and had TelNet scores > 1 (Table 3). CTNNA1 (catenin alpha 1) and GIGYF2 (GRB10 interacting GYF protein 2) were found in a screen for genes that were upregulated upon telomere shortening [24]. This finding is consistent with the phenotype of the budding yeast homologue of GIGYF2 (SYH1), the deletion of which has been shown to lead to a TL increase [26]. In addition, SUMO3 (small ubiquitin-like modifier 3) and ERCC5 (excision repair 5 endonuclease) were included in TelNet as having a validated human TM phenotype. The SUMO3 domain is attached to key proteins of the ALT pathway and shows a positive correlation with TL. Sumoylation of PML and shelterin compounds are known to be essential for the formation of PML nuclear bodies and APBs [23, 58]. The ERCC5 endonuclease is involved in DNA recombination and repair by annealing single-stranded DNA. Furthermore, ERCC5 regulates the activity of the Werner syndrome helicase (WRN) [59] that is required for telomere maintenance in some ALT cell lines [60] and is involved in telomeric D-loop digestion in ALT cells [61]. We conclude from this TelNet supported analysis that a further investigation of ARL4D, GIGYF2, CTNNA1, SUMO3 and ERCC5 with respect to their role for telomere maintenance in tumor cells might be warranted.

Table 3 Genes with (anti-)correlations between telomere length and gene expression listed in TelNet

Discussion

The TelNet database offers a fast identification of genes from different “omics” approaches, e.g., WGS and RNA-seq data with respect to their potential activities for telomere maintenance. It is designed as an open-ended database for the collection of TM relevant genes in different organisms. An extension of TelNet in its next release will include compilations from TM genes from two additional organisms, namely S. pombe and M. musculus. Accordingly, new information on telomere maintenance will be added continuously. We encourage other researchers working on telomeres to communicate suggestions for missing genes or additional information on already existing entries via the link integrated in the database to telnet@dkfz.de.

A gene set derived from a preceding bioinformatics analysis pipeline can be directly used for a TelNet list search to get more detailed insight on the corresponding TM associated genes. Possible TM links can be explored in an iterative manner. This approach is particularly useful for the large data sets generated in current genome and transcriptome sequencing studies as illustrated here for the TCGA pan-cancer data analysis from ref. [40]. In a similar manner, a current study of the ICGC (international cancer genome consortium) made use of TelNet to characterize genomic features of the active TM in cancer [62]. It is noted that some well-established associations like mutations in ATRX and DAXX (death-domain associated protein) for ALT as well as TERT promoter mutations for telomerase-positive cells are absent in many tumor samples. Thus, one would expect that for these cases the mutation status of a given cancer sample and its active TM are linked via other genes, possibly as a combination of multiple factors. Consistent with this expectation, an integrative genome and transcriptome analysis of leiomyosarcoma applied TelNet for the TMM annotation and identified recurrent mutations in RBL2 (RB transcriptional corepressor like 2) and SP100 (SP100 nuclear antigen) as linked to ALT [63].

Conclusion

The gene annotations provided by TelNet largely facilitate a distinction between different TM mechanisms for a gene set of interest by providing corresponding functional terms and a significance ranking. With these features, TelNet supports the identification of TM networks in various ways. As illustrated here by an exemplary application, TelNet can be integrated into the annotation of genes identified from bioinformatics analysis pipelines to determine possible connections with TM networks. Accordingly, we anticipate that TelNet will prove to be a helpful analysis tool for revealing this type of correlations and will support the identification of active TM networks in different tumor entities.