Reanalysis of Chinese Treponema pallidum samples: all Chinese samples cluster with SS14-like group of syphilis-causing treponemes
Treponema pallidum subsp. pallidum (TPA) is the causative agent of syphilis. Genetic analyses of TPA reference strains and human clinical isolates have revealed two genetically distinct groups of syphilis-causing treponemes, called Nichols-like and SS14-like groups. So far, no genetic intermediates, i.e. strains containing a mixed pattern of Nichols-like and SS14-like genomic sequences, have been identified. Recently, Sun et al. (Oncotarget 2016. https://doi.org/10.18632/oncotarget.10154) described a new “phylogenetic group” (called Lineage 2) among Chinese TPA strains. This lineage exhibited a “mosaic genomic structure” of Nichols-like and SS14-like lineages.
We reanalyzed the primary sequencing data (Project Number PRJNA305961) from the Sun et al. publication with respect to the molecular basis of Lineage 2. While Sun et al. based the analysis on several selected genomic single nucleotide variants (SNVs) and a subset of highly variable but phylogenetically poorly informative genes, which may confound the phylogenetic analysis, our reanalysis primarily focused on a complete set of whole genomic SNVs. Based on our reanalysis, only two separate TPA clusters were identified: one consisted of Nichols-like TPA strains, the other was formed by the SS14-like TPA strains, including all Chinese strains.
KeywordsTreponema pallidum Syphilis Genome sequencing Phylogenetic analysis Single nucleotide variant
Treponema pallidum subsp. pallidum
Treponema pallidum subsp. pertenue
Treponema pallidum subsp. endemicum
sequence read archive
single nucleotide variant
The bacterium Treponema pallidum subsp. pallidum (TPA) is the causative agent of syphilis. Other subspecies comprise Treponema pallidum subsp. pertenue (TPE) and Treponema pallidum subsp. endemicum (TEN), the causative agents of yaws and bejel, respectively. Since the pathogenic treponemes cannot be continuously cultivated under in vitro conditions, much of our understanding of these pathogens comes from accumulation of genetic and genomic data . As previously shown by whole genome fingerprinting , analysis of several treponemal specific loci [3, 4, 5], whole genome sequence alignments of TPA reference strains [1, 6], and targeted whole genome sequencing of clinical isolates [7, 8], there is the evidence for two separate genetic subclusters within TPA treponemes, called Nichols-like and SS14-like, differing considerably at the DNA level (~ 400 nt differences).
Recently, Sun et al.  sequenced and analyzed eight TPA samples (propagated in rabbit testes) from syphilis-positive Chinese patients and compared them to the available treponemal genomes. Based on their results, a new “phylogenetic group” of TPA strains was identified and named Lineage 2 (Lineage 1 = Nichols-like, Lineage 3 = SS14-like). This Lineage 2 exhibited a “mosaic genomic structure” characterized by the insertion of Lineage 1-derived genes into the Lineage 3-derived genomic backbone. The authors also indicated that the analyzed Chinese TPA strains (Lineage 2) might be derived from recombination or lateral gene transfer events between Lineage 1 and Lineage 3. Until today, no genetic intermediates, i.e. strains containing a mixed pattern of Nichols-like and SS14-like genomic sequences, have been identified. Therefore, evidence for a new phylogenetic lineage would provide a new insight into the diversity and the phylogenetic relationships of TPA strains.
We reanalyzed the primary sequencing data from Sun et al.  with respect to the molecular basis of Lineage 2 using available SRA data (Illumina HiSeq 2500, 151 bp paired-end; Project Number PRJNA305961) and available TPA reference genomes. In contrast to Sun et al. , we used BWA MEM (instead of Bowtie2) and both Nichols and SS14 reference genome sequences for the genomic alignments and SNV analysis, supplemented with the de novo genome assembly analysis. Briefly, sequencing data were pre-processed with Trimmomatic. Sequencing reads were mapped to the reference genomes using BWA MEM. Resulting mappings were post-processed with Samtools to exclude low quality (MAPQ < 10), secondary, and not properly paired mappings. SNV for individual sequenced samples were called using FreeBayes. Hard-filters were applied to keep only high quality variants as recommended by FreeBayes authors; with a minimal depth of at least 5 (DP > 5) and variant call quality of at least 50 (QUAL > 50). Multiple whole genome alignment SNVs were called using NUCmer. Results were used in the phylogenetic analysis and processed with a custom R script. Only SNV detected in all analyzed samples were used in the analysis. For more details on the data collection and analysis, see Additional file 1.
In our analysis, we mapped 1.65–35.61% of all input read pairs to either the SS14 or the Nichols reference genome (see Additional file 2), with average coverage depth ranging between 58 and 1184× for both reference genomes (see Additional file 3). The remaining read pairs mapped primarily to the rabbit reference genome (57.01–89.80%; see Additional file 2). We achieved 99.19–99.35% and 98.87–99.08% genome coverage in the SS14 and Nichols genome alignments, respectively (see Additional file 3). The genomic regions which cannot be covered by uniquely mapped reads were located mainly in paralogous tpr genes (C, D, E, F, G, I, J, and K), RNA operons, and genes containing repetitive sequences, i.e. tp0433 (arp) and tp0470 (see Additional file 4). Overall mapping and genome coverage statistics calculated for the individual Chinese strains are summarized in additional files (see Additional files 2, 3 and 4).
In addition, we identified the tprD2 allele-specific sequence  among the sequencing reads from the Chinese SRA data in all samples (see Additional file 6). While the Nichols reference genome harbors identical copies of tprC and tprD genes, the SS14 reference genome carries the tprD2 allele which is not identical to the tprC gene and differs from the tprD allele by more than 300 nucleotides. The sequence alignment of these alleles comprising the most variable region is shown in Additional file 7.
Moreover, all Chinese samples were found to contain indels identical to those identified in the SS14 genome (see Additional file 6) when compared to the Nichols-like TPA strains.
The analysis of two loci, tp0136 and tp0548, potentially differentiating Nichols-like and SS14-like groups of TPA strains and isolates [11, 12], revealed that these genes were identical to the corresponding SS14 orthologous genes (see Additional file 8). In addition, the analysis of selected genes differentiating Nichols and SS14 reference strains at three or more nucleotide positions (tp0131, tp0136, tp0179, tp0304, tp0346, tp0462, tp0488, tp0515, tp0548, tp0558, and tp1031), showed them to be identical or nearly identical to the SS14 strain (data not shown).
The identification of two genetically distinct TPA lineages has been described by earlier genetic studies [5, 6] and these findings have been supported by recent whole genome sequencing studies [7, 8]. In Arora et al. , the phylogenetic analysis of 28 clinical isolates from different countries showed a clear separation of TPA isolates into two clusters, SS14-like and Nichols-like, although the Nichols-like cluster revealed greater variability. Moreover, Pinto et al.  described three clades—SS14-like (clade I), Nichols-like (clade II) and clade III (represented by only a single genome of the TPA Sea 81-4 strain ), it remains to be clarified whether this putative clade III will be supported by additional strains in the future. However, the Sea 81-4 strain shares the same phylogenetic cluster as the Nichols-like TPA strains (Fig. 1). Until now, no genetic intermediates having a mosaic structure of Nichols-like and SS14-like nucleotide sequences within TPA strains have been identified.
Sun et al.  described a new Lineage 2 of TPA based on a phylogenetic analysis of several genomic SNVs and sequences of tpr genes, presenting the Chinese strains as SS14-like strains containing recombined sequences originating from Nichols-like strains. This mosaic structure of Lineage 2 was characterized by the insertion of Lineage 1-derived genes (in particular tprC, D, G, and J genes) into the Lineage 3-derived genomic backbone.
There are, however, several issues in the presented analyses of Sun et al. . Sun et al. reported the achievement of 99.99% of the genome coverage for all samples using TPA Nichols reference genome (original Table 1 in Sun et al. ). However, TPA genomes have several repetitive regions (representing ~ 1% of the genome length) which cannot be covered by uniquely mapped reads. These regions comprise mainly tpr genes and RNA operons. Long-read sequencing, such as Pacific Biosciences, Oxford Nanopore or even Roche/454, could help to sequence repeat-containing and paralogous regions. However, this sequencing was not performed by Sun et al. . The Bowtie2 settings used by the authors without proper post-processing caused mappings of treponemal reads to wrong genomic locations as well as mapping of the host genome (rabbit) reads to the reference genome. The used read mapping stringency together with the use of inappropriate reference sequence (TPA Nichols) resulted in false chimeric sequences, designated as “Lineage 2”. Inappropriate consensus assembly conditions, combined with the absence of de novo assembly, resulted also in the overlooking of the presence of tprD2 allele-specific sequences in the raw data and filtering out of these tprD2 allele-specific sequences during assembly. Unlike the SS14 genome containing tprC and tprD2 alleles, the Nichols reference genome harbors identical alleles in the tprC and tprD genes.
Sun et al. (, original Figure 4A) used alignment settings and phylogenetic trees of tprC/D loci to draw evolutionary inferences. The use of tpr genes alone to disentangle evolutionary relationships is problematic since these genes are likely subject to intra-strain genomic recombination events [3, 14, 15, 16] as well as selection pressures, which may confound phylogenetic analyses.
To date, all clinical isolates typed using tp0136 and tp0548 genes, routinely used in sequencing-based molecular typing scheme of syphilis-causing strains, grouped with either Nichols-like or SS14-like TPA groups [11, 12, 17, 18]. Moreover, more widely used enhanced CDC typing scheme sequencing an 83 nt-long fragment of the tp0548 gene , showed that 94.5% of 1974 characterized clinical isolates from different countries belong to the SS14-like group , which is consistent with the findings related to a recent spread of an epidemic cluster . For all the Chinese strains, tp0136 and tp0548 genes together with 11 other loci (differentiating Nichols and SS14 reference strains at three or more nucleotide positions) were shown to be identical or nearly identical to the SS14-like TPA strains.
Our reanalysis was based on all whole genomic SNVs rather than a subset of several genomic SNVs and highly variable but phylogenetically poorly informative genes (i.e. tpr genes) that confounded the Sun et al.  analysis and resulted in misleading conclusions. Based on the whole genome SNV reanalysis, only two separate clusters were identified: one consisted of Nichols-like TPA strains, the other was formed by the SS14-like TPA strains, including all the Chinese strains. Our data clearly showed that all Chinese strains clustered within SS14-like TPA strains.
Only available SRA data deposited in the NCBI SRA database (Project Number PRJNA305961) were reanalysed.
Only ~ 99% of genome coverage can be achieved for all TPA Chinese strains due to several repetitive regions which cannot be covered by uniquely mapped reads.
Paralogous and repetitive genomic regions comprising tpr genes were excluded during the processing of the sequencing data, therefore mosaic structure identified in tpr genes by Sun et al.  cannot be proved.
Conceptualization: MS, JO, NA, KN, FGC, DS. Formal analysis: MS, JO, NA, KN, FGC. Funding acquisition: MS, LM, DS. Investigation: MS, JO, LM, NA, KN, FGC, DS. Writing—original draft: MS, JO, DS. Writing—review and editing: MS, JO, LM, NA, KN, FGC, DS. All authors read and approved the final manuscript.
Access to computing and storage facilities, owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the program “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042), is greatly appreciated. We acknowledge the CF New Generation Sequencing Bioinformatics supported by the CIISB research infrastructure (LM2015043 funded by MEYS CR) for their support with obtaining scientific data presented in this paper. We also thank Thomas Secrest (Secrest Editing, Ltd.) for his assistance with the English revision of the manuscript.
The authors declare that they have no competing interests.
Availability of data and materials
Sequencing data for eight Chinese TPA strains (SHC-0, SHD-R, SHE-V, SHG-I2, B3, C3, K3, and Q3)  analyzed during the current study are available in the NCBI SRA database, Project Number PRJNA305961 (https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA305961). All data generated during this study are included in this article and its Additional information files. The datasets designated as “not shown” are available from the corresponding author on request.
Consent for publication
Ethics approval and consent to participate
This work was supported by grants from the Grant Agency of the Czech Republic to DS and MS (GA17-25455S, GJ17-25589Y), by grant from the Czech Health Research Council to DS (17-31333A), and by funds from the Faculty of Medicine of the Masaryk University to junior researchers LM and MS.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 6.Pětrošová H, Pospíšilová P, Strouhal M, Čejková D, Zobaníková M, Mikalová L, et al. Resequencing of Treponema pallidum ssp. pallidum strains Nichols and SS14: correction of sequencing errors resulted in increased separation of syphilis treponeme subclusters. PLoS ONE. 2013;8:e74319.CrossRefPubMedPubMedCentralGoogle Scholar
- 11.Flasarová M, Pospíšilová P, Mikalová L, Vališová Z, Dastychová E, Strnadel R, et al. Sequencing-based molecular typing of Treponema pallidum strains in the Czech Republic: all identified genotypes are related to the sequence of the SS14 strain. Acta Derm Venereol. 2012;92:669–74.CrossRefPubMedGoogle Scholar
- 12.Grillová L, Pětrošová H, Mikalová L, Strnadel R, Dastychová E, Kuklová I, et al. Molecular typing of Treponema pallidum in the Czech Republic during 2011 to 2013: increased prevalence of identified genotypes and of isolates with macrolide resistance. J Clin Microbiol. 2014;52:3693–700.CrossRefPubMedPubMedCentralGoogle Scholar
- 16.Giacani L, Brandt SL, Puray-Chavez M, Reid TB, Godornes C, Molini BJ, et al. Comparative investigation of the genomic regions involved in antigenic variation of the TprK antigen among treponemal species, subspecies, and strains. J Bacteriol. 2012;194:4208–25.CrossRefPubMedPubMedCentralGoogle Scholar
- 17.Gallo Vaulet L, Grillová L, Mikalová L, Casco R, Rodríguez Fermepin M, Pando MA, et al. Molecular typing of Treponema pallidum isolates from Buenos Aires, Argentina: frequent Nichols-like isolates and low levels of macrolide resistance. PLoS ONE. 2017;12:e0172905.CrossRefPubMedPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.