Novel metrics for quantifying bacterial genome composition skews
Bacterial genomes have characteristic compositional skews, which are differences in nucleotide frequency between the leading and lagging DNA strands across a segment of a genome. It is thought that these strand asymmetries arise as a result of mutational biases and selective constraints, particularly for energy efficiency. Analysis of compositional skews in a diverse set of bacteria provides a comparative context in which mutational and selective environmental constraints can be studied. These analyses typically require finished and well-annotated genomic sequences.
We present three novel metrics for examining genome composition skews; all three metrics can be computed for unfinished or partially-annotated genomes. The first two metrics, (dot-skew and cross-skew) depend on sequence and gene annotation of a single genome, while the third metric (residual skew) highlights unusual genomes by subtracting a GC content-based model of a library of genome sequences. We applied these metrics to 7738 available bacterial genomes, including partial drafts, and identified outlier species. A phylogenetically diverse set of these outliers (i.e., Borrelia, Ehrlichia, Kinetoplastibacterium, and Phytoplasma) display similar skew patterns but share lifestyle characteristics, such as intracellularity and biosynthetic dependence on their hosts.
KeywordsNucleotide skew Leading strand Lagging strand Obligate intracellular Compositional bias Genome metrics Lyme disease
General Feature Format
National Center for Biotechnology Information
Origin of replication
Root mean square deviation
Replication terminator site
Bacterial genomes display significant compositional biases, both in terms of G + C content and in compositional skews, i.e., strand asymmetries in ‘T’ vs. ‘A’ and ‘G’ vs. ‘C’ usage . These biases are proposed to arise from the complex interplay of differential mutation rates and multiple selective constraints [2, 3], particularly for energy efficiency , involving the replication, repair, and transcription enzymes. Bacterial chromosomes are replicated in both directions, from the origin of replication site to the terminator site; the “leading” strand is replicated continuously while the “lagging” strand is replicated in segments by different enzymes. Some genes are transcribed in the same direction as they are replicated (“leading strand genes”) while others are transcribed in the reverse direction (“lagging strand genes”). Each enzyme mediates both mutational and selective constraints, resulting in different compositional biases in different replicative, transcriptional and translational contexts . Analyses of skews in each context have the potential to expose multiple compositional constraints and their interactions, and ultimately inform about the DNA repair capacity, metabolism, and lifestyle of the species [5, 6].
Compositional bias and strand asymmetry have been measured and analyzed in a variety of ways and contexts [5, 7]. These methods include the original definitions (GC skew, (G-C)/(G + C); AT skew, (A-T)/(A + T)) , slight variants (e.g. G/(G + C)) , variants based on the three independent axes of Z Curves (x = R-Y, y = M-K, and z = S-W) [9, 10], ANOVA , correspondence analysis of codon bias metrics [12, 13], and competing mutational and selective parameters in an explicit evolutionary model , and have involved comparison of leading versus lagging contexts, transcribed versus intergenic regions, and restriction to each codon position.
Early examples of extreme compositional biases and asymmetries were found among species in the family Borreliaceae, tick-borne spirochetes including species causing Lyme disease (genus Borreliella, formerly Borrelia) as well as relapsing fever (genus Borrelia) . Since its discovery in 1982 , the Borreliella burgdorferi spirochete has been of particular interest in the United States as the primary causative agent of Lyme disease. The sequencing of B. burgdorferi B31 in 1997 allowed an in depth exploration of the many intriguing features of the genome of this bacterium, from its small size and unusual structure (one large linear chromosome, several linear and circular plasmids) to its very low G + C content .
Significant skews in the third position of codons have been reported on both the leading (increased G and T) and lagging (increased A and C) strands in B. burgdorferi [12, 17]. Among the first 43 genomes investigated , B. burgdorferi had the most extreme difference between leading and lagging strand nucleotide compositions. Both mutation and selection biases, variously induced by replication, transcription and translation constraints, have been suggested to play a role in B. burgdorferi [18, 19] and more generally, across all prokaryotes. The loss of some DNA repair genes may also contribute to the low G + C content and heightened skew seen in B. burgdorferi [6, 20].
Thanks to the much expanded availability of complete genome sequences of bacterial species, it is now possible to perform large-scale comparative genomics studies [2, 4, 21, 22]. A much larger number of bacterial genomes are in draft form, assembled to different levels of contiguity (contigs, scaffolds) and tentatively annotated using automated pipelines. Most of the existing methods for analyzing compositional biases and skews rely on fully or mostly contiguous genomic sequence and on the availability of precise and detailed annotation of genes and direction of replication; such methods are much less applicable to the study of incomplete draft genomes. Furthermore, existing methods largely assess the skews in individual genomes, without taking advantage of the vast knowledge available on the genomes of other bacterial species.
To address these difficulties, we present here three novel metrics for quantitative analysis of genome skews. Our metrics address dependence of skew metrics on G + C content and focus on the differences between nucleotide usage on the leading versus lagging strands, which underlie interpretation of nucleotide skews in terms of both selective and mutational processes. Our skews are also robust to assembly status and can be computed on incomplete genomes with draft annotation, greatly increasing the range of species that can be analyzed. Using these metrics, we analyzed a large collection of bacterial genomes—both complete and draft. We identified several groups of species and genera as outliers on one or more metrics. These outlier species include many pathogens and tend to have unusual lifestyles, like B. burgdorferi.
In addition, we have generated and made publicly available an interactive online resource for exploring the skew metrics for thousands of bacterial genomes, and a tool for generating visualizations of skew plots for any bacterial genome of interest with available annotation.
1581 “completed” genomes, (125 “reference”, 1456 “representative”),
3303 “incomplete” genomes, (2 “reference”, 3301 “representative”), and
3064 additional genomes, not repeating species names from the previous two sets, and prioritizing more advanced levels of completion where multiple assemblies are available for a given species.
For each genome, we included in the analysis all chromosomes, plasmids and sequence contigs at least 100 kb long. We removed from further analysis 210 genome assemblies for which the longest available sequence was shorter than 100 kb. The final set of genomes analyzed included 7738 assemblies.
Identification of origins of replication and terminator sites
For each sequence (chromosome, plasmid, scaffold and contig) in each genome assembly, we computed likely origins of replication and replication terminator sites using the GC disparity method [23, 24], namely by identifying the minimum and maximum difference between the cumulative count of G and C along the genome. This method works well as long as replichores are long  and is independent of gene annotation and arbitrary window sizes; it can also efficiently determine the likely direction of replication for sequence fragments (scaffolds and contigs), whether or not they include an origin of replication or a terminator site (see below for evaluation by simulation). When the resulting origin or terminator site lay within 1% of either end of the sequence, we corrected the location to coincide with the nearest sequence end. Where available, we used existing annotations of origin of replication. We obtained the most recent version of the DoriC database of origins of replication . We compared the locations of origins of replication as predicted using the GC disparity method to those annotated in DoriC and evaluated the discrepancy between the two as a fraction of chromosome size, i.e., 0 for no discrepancy and 0.5 for diametrically opposite annotations. DoriC includes 2733 annotations of origins of replication. For 532 species, there are multiple origins annotated on the same chromosome, in which case we retained the one with lowest discrepancy score, i.e., nearest the location predicted by GC disparity. We observed 85 species (3.87%) with discrepancy larger than 0.25 (red points, Additional file 1: Figure S1) and an additional 88 species (4%) with discrepancy larger than 0.1 (green points, Additional file 1: Figure S1). Since DoriC does not include annotations of terminator sites, when using DoriC annotations for origins of replication we assumed the terminator site to be located 0.5 chromosome lengths away from the origin.
Segmentation and analysis
We used available gene annotation (in the GFF files) to segment each sequence 100 kb or longer into a series of contiguous and disjoint segments (of variable lengths) which can be genes (including CDS, tRNA, and rRNA) or intergenic segments. We stratified intergenic segments by considering the relative orientations of the flanking genes: between two genes in the same orientation, or between two genes in opposite orientations (“head to head” or “tail to tail”). Infrequently, consecutive gene segments may be annotated as overlapping. We excluded such overlapping segments from computation of skews since they have overlapping and likely contradictory constraints.
We computed for each segment (genic or intergenic) its length, G + C content, GC skew, and TA skew. We further determined for each oriented segment (namely genes and intergenic segments between genes transcribed in the same orientation) whether their orientation is the same or opposite to the direction of replication, i.e., whether they are on the leading or lagging strand, relative to origin and terminator sites predicted as described above. As described below, skew computations are done relative to the sense strand of each transcript, but stratified by whether the transcript is on the leading strand during replication (by Pol III) or the lagging strand (replicated by Pol I).
Computation of characteristic skews
Given a set of comparable segments in a genome assembly (e.g., all genes on the leading strand), we computed skews (GC skew = (G-C)/(G + C) and TA skew = (T-A)/(T + A)) for the set as the average of the corresponding individual segment skews, weighted by segment length. We thus computed four characteristic skews for each species: leadGC and leadTA for leading strand genes, and lagGC and lagTA for lagging strand genes. We also evaluated weighted medians instead of weighted averages, which yielded very similar results (not shown).
Computation of the cross-skew and dot-skew
The four characteristic skews for a species can be interpreted as two characteristic skew vectors: one for the leading strand genes (leadGC, leadTA) and the other for the lagging strand genes (lagGC, lagTA). We computed the cross-skew as:
Computation of the residual skew
We modeled each of the four characteristic skews (leadGC, leadTA, lagGC and lagTA) as a function of the G + C content for 7738 bacterial genome assemblies. For each characteristic skew we separated the genome assemblies with G + C content below or above 50% G + C (3635 and 4103 genomes, respectively), and fitted a robust regression line to each subset using the R function MASS::lqs() (least trimmed sum of squares, ). We then computed a single skew deviation magnitude metric (the residual skew) for each genome as the root mean square deviation (RMSD) from the regression line across the four characteristic skews.
Identification of outliers
For dot-skew and cross-skew, we identified outliers at both ends of the distribution using the MAD-Median Rule  at a significance threshold of 1%. Residual skew has a non-negative distribution, and only atypically high values are of interest; we observed that distribution of residual skew is well approximated by a χ2-distribution with 5.8 degrees of freedom, divided by 100 (5.8/100 is the mean observed residual skew value). We therefore used the 99th percentile of this model distribution (0.1647) as a 1% significance threshold.
Simulation of draft genomes
Starting from a completed genome sequence, we simulated progressively less finished draft genomes by randomly choosing from 1 to 100 cut sites, subdividing the sequence and annotation into contigs based on the cut sites, and computing all metrics as above. We ignored annotations straddling cut sites and, as above, resulting contigs shorter than 100 kb.
Overview of the strategy
Robustness to fragmentary status of draft sequences
A significant advantage of our method for determining origins of replication and terminator sites based on GC disparity is that it can be applied to finished and draft sequences alike. Since the GC disparity changes nearly monotonously along chromosomes, particularly for highly skewed species (Fig. 1a), it is possible to hypothesize a direction of replication for any fragmentary sequence (Fig. 1b).
We evaluated robustness by simulation: starting from the finished sequences of the large (4.6 Mb) E. coli genome and the small (910 kb) B. burgdorferi main chromosome, we simulated progressively less finished draft versions of these genomes. The resulting cross-skew and dot-skew metrics were well approximated from simulated draft versions; skew metric variation increased with decreasing simulated genome draft length (Additional file 1: Figure S2). Other skew parameters were also robust to significant fragmentation of the genome (Additional file 1: Figure S3).
The characteristic skews of B. burgdorferi genes
In B. burgdorferi, the majority of genes are transcribed in the same direction as they are replicated (‘leading strand genes’, blue in Fig. 1) while some are transcribed in the direction opposite to replication (‘lagging strand genes’, orange in Fig. 1). Leading strand genes tend to display stronger GC skew (Fig. 1c), while lagging strand genes have strong TA skews. In intergenic segments, the two skews tend to be positively correlated (not shown).
To quantify the unusual pattern observed in B. burgdorferi, we defined two metrics, which we call the cross-skew and dot-skew (see Methods). The cross-skew metric reflects the orthogonality and magnitude of the vectors, and thus is expected to be particularly strong for B. burgdorferi. The dot-skew metric emphasizes the collinearity of the vectors. We computed the cross-skew and dot-skew for B. burgdorferi: cross-skew = 0.0541, dot-skew = 0.0083. For other species within the Borrelia and Borreliella genera, these respectively ranged from 0.0509 to 0.0727 and from 0.0044 to 0.0314. These cross-skew values are much larger than observed for other bacteria, as detailed below. In contrast, we computed much smaller cross-skew values for S. thermophila (0.0051), M. tuberculosis (7.2 10− 5), F. periodonticum (− 0.0028) and A. phagocytophilum (0.0089), reflecting their small magnitude skew vectors and/or their angles.
Learning from thousands of genomes
The leadGC and lagTA values of Borreliaceae genomes are large and are clear outliers relative to the entire data set of 7738 genomes (Fig. 4). On the other hand, while the Borreliaceae leadTA and lagGC are close to zero and are not outliers relative to the entire data set, they are unusual for bacterial species with low G + C content, which tend to have negative values for these characteristic skews (Fig. 4). Thus, Borreliaceae genomes are unusual for all four characteristic skews. The deviations of characteristic skews for B. burgdorferi from the skews predicted by the fitted lines at the G + C content for B. burgdorferi are 0.091, 0.120, 0.106 and 0.124 for leadGC, leadTA, lagGC and lagTA, respectively. Borrelia species that cause relapsing fever have even larger deviations from the expected values.
Three novel metrics for analyzing genome skews
We described above several parameters for quantifying skews in individual bacterial genomes: the four characteristic skews and the magnitudes and angles of the vectors they define. Using these parameters, we defined two interrelated metrics for comparing and contrasting the skews of leading strand vs. lagging strand genes: the cross-skew and the dot-skew (see Methods). Furthermore, taking advantage of the availability of many thousand bacterial genome assemblies, we estimated expected values for each characteristic skew, as a function of the G + C content. We then used the observed deviations from these expected values to define a third metric: the residual skew.
We computed these three metrics for 7738 bacterial genome assemblies (available at ) and evaluated their relationship with G + C content (Fig. 5). All three are more independent of G + C content than the four characteristic skews; however, a large number of low G + C taxa have significantly negative dot-skew. Since G + C is not necessarily correlated with metabolism or lifestyle, these statistics may be more closely related to metabolism or lifestyle. For high G + C content bacteria, we observed that the cross-skew and the dot-skew are much more constrained than for lower G + C content species; these two metrics are most diverse for bacterial genomes under ~ 35% G + C. Compared to these two metrics, the residual skew is more diverse for all levels of G + C content. Borreliaceae genomes are clear outliers for all three metrics.
We compared dot-skew, cross-skew, and residual skew across 10 large clades on the same species discussed in  (Additional file 1: Figure S7). Dot-skew and cross-skew values were clustered at zero for the largest and most G + C diverse clades (e.g., Proteobacteria, Firmicutes); dot-skew clusters below zero only for certain very low G + C groups (Tenericutes, Thermotogae). Clade therefore provided little ability to explain dot- or cross-skew. In contrast, residual skew had a broader distribution within each clade, and clade-specific typical values. In contrast to results reported from the Z-curve correlation metric , which observed similar skews in Firmicutes, Tenericutes, and Thermotogae relative to other large clades, including Proteobacteria, dot-skew and cross-skew distributions for Proteobacteria are highly similar to their distributions in Firmicutes, and overlap significantly with Tenericutes and Thermotogae.
The landscape of bacterial skews
We identified outlier taxa for each skew metric independently at a 1% significance level (see Methods, Additional file 1: Figures S8, S9; identified taxa are listed in Additional file 3: Table S2). Even at this stringent significance level, 1666/7738 (21.5%) of all taxa were considered outliers with negative dot-skew values, with a clear trend toward more negative dot-skew with decreasing G + C content for a subset of taxa. In summarizing the relationship between genus and skews (Additional file 3: Table S2a), we excluded outliers with negative dot-skew from consideration, and focused on association between genus and outliers of any other type.
The most extreme outliers observed are Borreliaceae, particularly the group of Borrelia genomes that cause relapsing fever. Borreliaceae species have a low G + C content genome suggesting an increased indifference towards GC rich codons and thus energetically cheaper amino acids. This is further supported by the fact that they lack amino acid and nucleotide synthesis pathways. The observed strand specific nucleotide usage and skew patterns in these bacteria may thus be driven by a relaxation of the selection for energy efficiency . We computed selection coefficients for the genomes discussed in  and provide plots of these values against dot- and cross-skews (Additional file 1: Figures S10, S11). For Borreliaceae, we observe relaxation of selection on TA biases on the lagging strand.
Our finding on the difference in genome skew metrics among the relapsing fever and the Lyme causing species could be associated with their unique ecological and vector-related traits such as the variety of vectors, speed of transmission to a new host and maintenance in nature by transovarial transmission. For example, the vectors for the relapsing fever Borrelia species represent a variety of arthropods while for the Lyme disease causing group, the widely known competent vectors are certain species of the prostriate genus Ixodes . In comparison to the Lyme disease spirochetes, most relapsing fever Borrelia are efficiently transmitted to the host soon after feeding by the tick begins [31, 32]. In addition, most species in the relapsing fever Borrelia group exhibit transovarial transmission in their tick vectors . Genomes in the genus Ehrlichia (see example in Fig. 4) are also outliers in all three metrics and show similar skew values as Borreliella genomes. Ehrlichia are intracellular vector-borne pathogens of vertebrates; like Borrelia, they have diminished biosynthetic abilities . Ehrlichia are in the Rickettsiales order and are phylogenetically unrelated to Borreliaceae; the genome of Ehrlichia canis has a single circular chromosome and no plasmids . Multiple other genera became evident as outliers of interest, discussed below.
We have devised three novel metrics to study bacterial genome composition biases, integrating knowledge of the nucleotide skews in annotated genes, the direction of transcription relative to replication, and the G + C content of the genome.
The first two metrics (cross-skew and dot-skew) are computed based on an individual genome’s characteristic skew vectors, and they quantify the strength and relationship between the mutation and selection pressures on genes on the leading vs. lagging strands. Since the two metrics depend on the magnitudes of these vectors and the angle between them, identical skew metric values might be computed for bacterial species that differ, for example, in leading strand angle while maintaining a constant angle between the two vectors. We found this to be the case only for bacteria with very small skews, leading to cross-skew and dot-skew values near zero for all possible leading strand angles (Additional file 1: Figure S6).
Strong and positive dot-skew values (Fig. 6, right) indicate similar compositional constraints on all genes, relative to the direction of replication; an example of this pattern is observed in the obligate intracellular parasite Chlamydia pneumoniae  (Fig. 2). Conversely, strong and negative dot-skew values (Fig. 6, left) reflect opposite compositional constraints on leading and lagging strand genes (i.e., transcribed in the same or opposite direction as they are replicated); extreme examples of this pattern are observed in fusobacteria including Fusobacterium periodonticum , Leptotrichia buccalis , and Streptobacillus moniliformis , the causal agent of rat bite fever. Positive dot-skew values can thus be interpreted as reflecting constraints driven mostly by the replication process, while negative dot-skew values largely reflect transcriptional and translational constraints that arise from preference for nucleotides and amino acids that are energetically cheaper to synthesize .
The cross-skew quantifies the strength and orthogonality of the compositional skew vectors for leading and lagging strand genes. Genomes with high cross-skew values (Fig. 6, top) demonstrate skew patterns inconsistent with purely replicational or transcriptional constraints; Borreliaceae and Ehrlichia species are prime examples of this pattern. Borreliaceae and Ehrlichia species lack amino acid and nucleotide synthesis pathways; the observed elevated cross-skew values in these pathogens may thus reflect a relaxation of the selection for energy efficiency that drives nucleotide usage and thus skews , possibly combined with more complex constraints imposed by a life cycle that involves recurring transitions between mammalian and invertebrate (tick) hosts. We observed similar skew patterns in Kinetoplastibacteria (Fig. 6), which are endosymbionts of insect-infecting trypanosomatid flagellates  with multiple biosynthetic adaptations to life in the intracellular environment. Likewise, we observed distinct skew patterns among Blochmannia species (Fig. 6); these are also intracellular endosymbionts that lost multiple biosynthetic pathways and rely on the metabolic machinery of their carpenter ant hosts . Other groups with a more modest, but significant enrichment for high cross-skew include other host-associated anaerobes such as Bacteroides, Lachnospiraceae, Eubacterium spp., and other Clostridiales, and extremophiles such as Thermoanaerobacter, Thermosipho, and Thiomicrospira. In contrast, heavily sampled clades that appear to have fewer than expected cross-skew, residual skew, or positive dot-skew outliers include Actinobacteria (Streptomyces, Mycobacterium, Corynebacterium, Nocardia), γ-Proteobacteria (Pseudomonas, Vibrio), and Bacilli (Bacillus, Lactobacillus, Streptococcus, Paenibacillus).
The third metric (residual skew) capitalizes on the current availability of thousands of complete or draft bacterial genomes to empirically assess how unusual a genome’s skews are relative to the expected values as learned from other genomes. This analysis, which has not been possible until recent times, revealed that bacterial genomes with low G + C content typically have negative TA skews in leading strand genes and negative GC skews in lagging strand genes, and that these negative skews increase in magnitude as G + C content decreases (Fig. 4). On the background of these trends, the weakly positive skews observed in Borreliaceae species are highly unusual. This pattern is not evident relative to the global collection of genomes since the weakly positive Borreliaceae skews are comparable to those observed in high G + C content bacteria.
The presence or absence of chromosomal maintenance pathways can shape genome composition skews. Outlier genomes such as B. burgdorferi, Candidatus Kinetoplastibacterium crithidii, Ehrlichia chaffeensis, Buchnera aphidicola, and Blochmannia floridanus, have reduced genomes with more limited repair mechanisms. Yet the major pathways are all present, even if they tend to contain fewer genes than bacteria such as Escherichia coli and Yersinia pestis (see Additional file 4: Table S3). No single gene or simple combination of genes defines the outliers. The outliers do lack mismatch-repair mutH, recombination lexA, and base excision repair mug, but so do non-outlying genomes such as Francisella tularensis. Our regression analysis quantifies these deviations from expectation and integrates them into a unified metric that highlights the unusual skews in Borreliaceae species (Fig. 5) and also identifies other species as having skew patterns that are significantly unusual relative to the bulk of bacterial species. Of particular note are Phytoplasma species (Fig. 6); these are intracellular pathogens of multiple plant species that use insects as transmission vectors [41, 42], in similarity to Borreliaceae and Ehrlichia for mammals.
Through analysis of all genic regions of any conservation level, our metric measurements could accurately predict and/or support taxonomical distinctions among closely related genomes with shared biological and genetic features. An example is the Lyme-causing and relapsing fever groups of spirochaetes that have long belonged to the same genus Borrelia. The two groups have just recently been split into two distinct genera .
We described here three novel metrics for quantifying bacterial genome composition skews and presented examples of their application to identify bacterial species with unusual skew patterns. Our metrics take advantage both of information about the genome of a single species and of patterns discernable from studying genomes of thousands of species - even those not yet finished and fully annotated. While some of the genera identified as skew outliers are phylogenetically close (e.g., Fusobacterium, Streptobacillus and Leptotrichia), our metrics identified similar skew patterns in genera of bacteria that are phylogenetically unrelated, like Borrelia, Ehrlichia and Kinetoplastibacterium, and (when considering the residual skew) Phytoplasma. These very disparate bacterial species share lifestyle characteristics (intracellularity and transmission via insect vectors), suggesting that our novel metrics successfully capture effects on genome composition of biosynthetic constraints and of interaction with the hosts.
We wish to thank Arian Smit, Jeff Boore and an anonymous reviewer for helpful discussions.
This work was supported by generous donations from The Wilke Family Foundation, Jeff and MacKenzie Bezos, and The Steven & Alexandra Cohen Foundation. The donors were not involved in the in the design of the study, in the collection, analysis, and interpretation of data, nor in writing the manuscript.
Availability of data and materials
Software for computing skews for individual genomes, regression RMSD parameters, a table of skew characteristics for the 7738 bacterial genomes, and Vega-Lite  interactive visualizations are publicly available at  and on GitHub . The table of characteristics for 7738 bacterial genomes is also included as supplementary material. The sequences and annotations used in this study are publicly available from NCBI. Data on origins of replication are from the DoriC database , used with permission.
GG conceived of the study. LMJC, MR and GG designed the methods and implemented the software. LMJC, MR, NJ, CL and GG performed analyses and contributed to the writing of the manuscript. All authors approved its final version.
Ethics approval and consent to participate
Not applicable since this study has not directly involved humans, animals or plants.
Consent for publication
The authors declare that they have no competing interests.
- 26.Venables WN, Ripley BD. Modern Applied Statistics with S, 4th ed. New York: Springer-Verlag; 2002.Google Scholar
- 27.Wilcox R. Introduction to Robust Estimation and Hypothesis Testing, vol. 2016. 4th ed. Cambridge: Academic Press. p. 101.Google Scholar
- 28.Skew metrics of bacterial genomes. [cited 27 Nov 2017]. Available from: http://db.systemsbiology.net/gestalt/skew_metrics/
- 31.Barbour AG, Adeolu M, Gupta RS. Division of the genus Borrelia into two genera (corresponding to Lyme disease and relapsing fever groups) reflects their genetic and phenotypic distinctiveness and will lead to a better understanding of these two groups of microbes Margos et al. (2016) There is inadequate evidence to support the division of the genus Borrelia. Int. J. Syst. Evol. Microbiol. 2017;67:2058–67. https://doi.org/10.1099/ijsem.0.001717.CrossRefPubMedGoogle Scholar
- 39.Alves JMP, Serrano MG, Maia da Silva F, Voegtly LJ, Matveyev AV, Teixeira MMG, et al. Genome evolution and phylogenomic analysis of Candidatus Kinetoplastibacterium, the betaproteobacterial endosymbionts of Strigomonas and Angomonas. Genome Biol Evol. 2013;5:338–50.CrossRefPubMedPubMedCentralGoogle Scholar
- 44.Glusman G. GitHub repository for skew-metrics code and data. GitHub. [cited 11 Apr 2018]. Available from: https://github.com/gglusman/skew-metrics
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.