Phylogenetic relationships among Staphylococcus species and refinement of cluster groups based on multilocus data
- 6.6k Downloads
Estimates of relationships among Staphylococcus species have been hampered by poor and inconsistent resolution of phylogenies based largely on single gene analyses incorporating only a limited taxon sample. As such, the evolutionary relationships and hierarchical classification schemes among species have not been confidently established. Here, we address these points through analyses of DNA sequence data from multiple loci (16S rRNA gene, dnaJ, rpoB, and tuf gene fragments) using multiple Bayesian and maximum likelihood phylogenetic approaches that incorporate nearly all recognized Staphylococcus taxa.
We estimated the phylogeny of fifty-seven Staphylococcus taxa using partitioned-model Bayesian and maximum likelihood analysis, as well as Bayesian gene-tree species-tree methods. Regardless of methodology, we found broad agreement among methods that the current cluster groups require revision, although there was some disagreement among methods in resolution of higher order relationships. Based on our phylogenetic estimates, we propose a refined classification for Staphylococcus with species being classified into 15 cluster groups (based on molecular data) that adhere to six species groups (based on phenotypic properties).
Our findings are in general agreement with gene tree-based reports of the staphylococcal phylogeny, although we identify multiple previously unreported relationships among species. Our results support the general importance of such multilocus assessments as a standard in microbial studies to more robustly infer relationships among recognized and newly discovered lineages.
KeywordsBayesian Inference Bootstrap Support Lateral Gene Transfer Novobiocin Maximum Likelihood Analysis
The genus Staphylococcus currently contains more than 60 species and subspecies. Many are of clinical, agricultural, and economic interest because they lead to high levels of infection among human populations or agricultural loss within the dairy, swine, and poultry industries. Moreover, multiple species within this genus are common pathogens in non-human animals and thus should be monitored with concern as these animals provide reservoirs for pathogenic bacteria [1, 2, 3]. Although seemingly uncommon, host switching is an important mechanism in the evolution of Staphylococcus. For example, in S. aureus, human-to-poultry  and bovine-to-human  host switches have been observed. As such, a thorough understanding of species relatedness is a necessity for understanding host-pathogen interactions within this genus [5, 6, 7].
Many previous estimates of the staphylococcal phylogeny have been based on single locus gene trees, which in many cases exhibit differing topologies. As such, robust species tree estimations have proved to be difficult. Historically, staphylococcal species identification has been a laborious task, requiring multiple biochemical and genotypic methodologies [6, 8]. Fortunately, more efficient and reliable assays based on PCR and DNA sequencing have become commonplace as part of the identification process of novel species (and differentiating closely related species). As with most bacterial systems, the 16S rRNA gene continues to be the most common method for staphylococcal species identification, although its utility is limited due to high sequence similarity among different staphylococcal species [9, 10]. For this reason, increased emphasis has recently been devoted towards identifying additional genes for use in species identification that offer greater taxonomic resolution between closely related species, while also limiting the incidence of misidentification. Such genes as rpoB (β-subunit of RNA polymerase), tuf (elongation factor Tu), and dnaJ (heat shock protein 40), have been found useful for the identification of staphylococcal species. With the exception of one study where dnaJ and rpoB were concatenated and assessed under a single evolutionary model , each has only been analyzed singularly in a phylogenetic context.
We targeted two related primary goals in this study. First, we aimed to utilize a multilocus phylogenetic dataset to critically evaluate the proposed cluster groupings of species of Staphylococcus, and to amend these groupings to reflect estimates of phylogeny. Second, on a broader scale, we aimed to infer the deeper phylogenetic relationships among cluster groups of all Staphylococcus species using multilocus data analyzed under different strategies including concatenated and species-tree methods. We analyzed a large multilocus Staphylococcus dataset in multiple ways to thoroughly explore the phylogenetic signal in the data, and provide robust confirmatory evidence for the relationships among species. We first analyzed the combined four-gene dataset using partitioned Bayesian and maximum likelihood analyses, in which a single species tree was inferred. Such probabilistic methods of phylogeny are particularly powerful as they incorporate alternative models of character evolution into the analysis and search for a tree that ultimately maximizes the probability of data given the tree [12, 13]. Their accuracy, however, can be dependent on the complexity and biological realism of the models of sequence evolution used.
There is a tradeoff between having enough parameters to accurately capture the complexity of sequence evolution in a multilocus dataset, while not having more parameters than can be accurately estimated from the data [14, 15, 16, 17]. We therefore tested multiple differently partitioned model schemes to identify which best fit the multilocus dataset. Generally, we expect such partitioned model analysis of the combined (concatenated) dataset will have the best power for inferring the phylogeny of Staphylococcus, as long as basic assumptions of the approach are met. The most important of these assumptions is that all the underlying gene trees are the same as the species tree. There are, however, situations where gene trees and species tree are not the same [18, 19], or where systematic error in gene-tree estimation may lead to overconfidence in an incorrect species tree . There is some indication, however, that in such cases, maximum likelihood bootstrap support values may be more sensitive to conflicting phylogenetic signals in the data than Bayesian posterior probability support for nodes, although both concatenated data analysis approaches are likely to experience some error [21, 22, 23].
Therefore, we also used an alternative approach to estimate relationships among species of Staphylococcus in which gene trees are estimated separately, and jointly considered to estimate an underlying species tree. This approach, called Bayesian Estimation of Species Trees analysis , thereby avoids concatenation of multiple loci, and estimates a species tree based on a model that accounts for deep coalescence of gene trees. Although this approach does not specifically model all possible scenarios that may violate the assumptions of the concatenated analysis, comparisons of results between this approach and concatenated analyses provides added perspective on the relative robustness of species-level phylogenetic inferences.
DNA sequence acquisition and alignment
DNA sequences for a total of four genes from 57 staphylococcal species, and two outgroup species (Macrococcus caseolyticus - strain JCSJ5402, and Bacillus subtilis - strain 168) were downloaded from NCBI's GenBank. For each species included in the analysis, sequences were specifically downloaded from the type strain. The four loci collected included the non-coding 16S rRNA gene, and the three protein coding genes: dnaJ, rpoB, and tuf. The list of all species analyzed in this study with the accession numbers for each of the four gene fragments is given in Additional file 1: Table S1.
Nucleotide sequences were aligned using ClustalW in MEGA 4.1 , with manual adjustment to ensure that complete codons remained in tact for downstream analyses. Regions of high variability were omitted from the alignments because assessment of homology was questionable . This was only observed to be the case for dnaJ in which nucleotide positions 63–93 in the original sequence was omitted. Additional manual codon adjustment of this region did not improve the alignment and thus, was omitted. Secondary structure predictions (i.e. stem and loop regions) for 16S rRNA gene fragments were estimated using the RNAalifold approach [26, 27]. The data matrix and trees have been deposited in TreeBase (; http://purl.org/phylo/treebase/phylows/study/TB2:S12505). Analyses of incongruence length differences (ILD; ) among partitions of the dataset were performed using PAUP* 4.0 . Nucleotide diversities and species divergence calculations were performed using MEGA 4.1  and DnaSP v5 .
Nucleotide model selection
Models of nucleotide evolution for each gene and nominal partition of the data were estimated using jModelTest v0.1.1 [32, 33] based on Akaike Information Criterion (AIC). For the purpose of model testing (and later partitioned Bayesian analyses) we divided the dataset by gene, and into biologically relevant subsets: coding versus non-coding gene fragments, codon position, and stem versus loop secondary structures (for the 16S rRNA gene fragment). These individual partitions, and the best-fit evolutionary model selected for each partition, are shown in Additional file 2: Table S2.
Description of alternative model partitioning strategies tested for fit to the combined nucleotide data
# of partitions
# of free model parameters
Description of model partitions
Single model for concatenated dataset
16S; All protein coding gene fragments (dnaJ; rpoB; tuf)
Independent partition for each gene fragment (16S; dnaJ; rpoB; tuf)
16S; two partitions for each gene fragment (codon positions 1 and 2; codon position 3)
16S, stems; 16S, loops; two partitions for each gene fragment (codon positions 1 and 2; codon position 3)
16S; three partitions for each gene fragment (codon positions 1, 2, and 3, separately)
16S, stems; 16S, loops; three partitions for each gene fragment (codon positions 1, 2 and 3, separately)
16S, stems; 16S, loops; All protein coding gene fragments (dnaJ; rpoB; tuf)
16S, stems; 16S, loops; Independent partition for each protein coding gene fragment (dnaJ; rpoB; tuf)
Bayesian phylogenetic analysis
Bayesian inference (BI) was carried out using the Metropolis-Hastings coupled Markov chain Monte Carlo method in MrBayes v3.1.2 [34, 35] and BEST v2.3.1 . All Bayesian phylogenetic analyses performed in this study were carried out using the STOKES IBM High Performance Computing Cluster at the University of Central Florida. MPI-enabled versions of MrBayes v3.1.2 and BEST v2.3.1 were compiled and run in parallel . For each BI run, gaps in alignments were treated as missing data. For each MrBayes analysis, two independent BI runs were carried out using random starting trees with one cold chain and three heated chains (following program defaults). Each model was assessed in triplicate with summary statistics being estimated from all runs.
In addition to performing BI runs in MrBayes on the unpartitioned multilocus dataset (using the evolutionary model specified by AIC), eight additional models were assessed where independent models of evolution were applied to different nucleotide regions within the combined dataset (refer to nucleotide model selection section). This was achieved by using the “unlink” command in MrBayes v3.1.2. Each BI run consisted of 4 million generations with every 100 steps being sampled. As verified using Tracer v1.5 , stationarity was reached in all BI runs prior to 500 000 generations and a conservative burn-in of 1 million (25%) generations was performed. To verify that additional sampling (i.e., increasing the number of generations) for MrBayes runs would not affect the outcome of the data, a final run of 20 million generations with sampling every 1 000 steps and a burn-in of 4 million generations was performed.
In addition to reconstructing phylogenies using MrBayes v3.1.2, Bayesian phylogenetic reconstruction was also performed using BEST v2.3.1, which is a modified version of MrBayes. BEST was implemented by setting the prior for BEST = 1, and unlinking topologies, branch lengths, and mutation rates across loci. For each independent BEST analysis, four simultaneous runs consisting of 16 chains each were performed for 20 million generations with sampling every 1 000 generations. A prior for theta was set at 0.04, based on the mean estimates of theta for the dataset calculated in DnaSP and MEGA 4.1. Consistent with previous reports [39, 40], run convergence was only achieved by setting a uniform prior for branch lengths (prset brlenspr = clock:uniform). As with all BI runs, nucleotide regions were assigned nucleotide substitution models based on AIC, estimated in jModelTest.
Assessment of BI runs
All partitioning strategies employed using MrBayes were run in triplicate to verify reproducibility while BEST analyses were run two separate times. Subsequently, MrBayes and BEST runs, under each model, were assessed using multiple criteria to determine the success of each model and the overall best-fit model. Bayes factors (BF; 2ΔlnB10) were calculated from estimates of the harmonic mean of the posterior distribution of cold chain likelihoods. Consistent with previous reports [14, 16, 41], we set a cutoff of BF > 10 to support one model over another. Recently, multiple studies have suggested that biases introduced by using the harmonic mean estimator may practically affect model selection using BF [42, 43, 44, 45]. Based on our results however, we discuss why such biases are practically tolerable in this study (i.e., model choice has little effect on topology and nodal support).
Akaike weights (Aw)  were also used to identify best-fit partitioned models . Initially AIC values were calculated by the equation AIC = −2lnL + 2 k where k equals the total number of free parameters within the model. For small samples sets, where the sample size (n) to free parameter (k) ratio is <40, it has been suggested that a small-sample bias adjustment be applied to the AIC calculation, thus calculating AICc instead [47, 48]. The sample size of the staphylococcal dataset (with outgroups) is 59 and the minimum number of free parameters was 10 for model MB1. As such, the n/k ratio was always <40, so we calculated the AICc instead. The equation for . The ΔAICc was then calculated by subtracting the model with the minimum AICc (AICcmin) (i.e. highest lnL) from the ith model using the equation . Following calculations of the ΔAICc for each model, Aw were calculated using the equation . By this equation, the relative likelihood of a model given the data is normalized over all models and thus, the greater the Aw for a given model, the greater the relative support for that model .
Further assessment of model performance was based on examining the output of model parameters and carried out by analyses of multiple additional features. Posterior distributions of parameters and analysis of trace plots were assessed for failed convergence and stationarity using Tracer v1.5 . Also, because model overparameterization has been linked to estimates of tree length in partitioned Bayesian analyses , we also compared tree length estimates among runs.
Maximum likelihood analysis
Phylogenetic reconstruction using maximum likelihood (ML) analysis was carried out using the program GARLI v.2.0 , using default parameters except where specified. Phylogenetic estimates using ML were performed using both the combined, unpartitioned dataset as well as the combined dataset partitioned by locus (Additional file 2: Table S2). Five ML search replicates were run for each dataset using random starting trees, and up to five million generations were employed for each run unless the scoring topology lnL did not improve by ≥ 0.01 for 20 000 generations, in which case the run was terminated prematurely and the next bootstrap replicate was begun. Two hundred bootstrap replicates were conducted for each run and consensus trees were generated using the SumTrees v.3.0 software which is part of the DendroPy v.3.7 phylogenetic computing library . Likelihood ratio tests (LRTs) [13, 52] were performed to compare competing model partitioning schemes, M0 and M1. Statistical support for model M0 over M1 (or vice versa) was assessed using the Chi-square distribution for q degrees of freedom (df) where q equals the difference in the number of free parameters between model M0 and M1 (df = 19 in this study) .
Gene fragments used for analyses contain differing degrees of variability
Among the four gene fragments analyzed in this study, 3 521 nucleotides were included (1 481 from the 16S rRNA gene fragment, 816 from dnaJ, 474 from rpoB, and 750 from tuf) for 59 different taxa. The dataset contained 1 016 parsimony-informative sites and 2 142 conserved sites. The nucleotide diversity of the 16S rRNA gene fragment was 0.029 substitutions (subs.) per site, while that for dnaJ, rpoB, and tuf was 0.241, 0.147, and 0.097 subs. per site, respectively. The average theta per site for the combined dataset was 0.04. The lowest interspecies divergence was between S. pseudintermedius and S. delphini (0.014 subs. per site). The highest estimated evolutionary divergence within the complete dataset was between S. piscifermentans and the outgroup species, B. subtilis (0.266 subs. per site), while the highest level among staphylococcal taxa was between S. piscifermentans and S. vitulinus (0.182 subs. per site).
Individual gene tree analyses
Phylogenetic analysis of individual genes revealed that 16S rRNA and dnaJ fragments resolved similar major clades but different branching orders of these clades (Additional file 3: Figure S1). Similarly, most relationships and clusters of species within rpoB and tuf gene trees were in general agreement with the 16S and dnaJ gene trees, but multiple higher-level clades were present in these that were unique (Additional file 3: Figure S1). Thus, individual gene tree analyses supported similar clusters of species, but varying arrangements of these clusters relative to one another. As expected, nodal support values for individual gene trees were relatively low, particularly for nodes more deeply nested in the tree. Formal partition homogeneity (or incongruence length difference test ) tests indicated that there were significant differences between all partitions except for 16S rRNA gene and dnaJ. This test, however, is based on parsimony criteria and known to have highly variant type-1 and type-2 error rates depending on different tree structures, rates across sites, and informative site contents among datasets [53, 54]. We therefore interpret these results cautiously, as the potential for there to exist some conflicting phylogenetic signal among genes, and incorporate this caution in later interpretations of the combined data analyses. Such conflicting signal might result from multiple sources, including phylogenetic estimation error leading to different inferences from individual gene trees, and/or different underlying evolutionary histories among genes due to lateral gene transfer or lineage sorting effects.
Dataset partitioning improves likelihood estimates of Bayesian phylogenetic analyses
Bayes factors and Akaike weights reveal differences in model fitness for the different partitioning strategies applied to the concatenated, multilocus dataset
Inspection of TL identified that the more highly partitioned models (MB4-MB7) yielded TLs between two and four times longer than less partitioned models (MB1-3; MB8-9; refer to Additional file 5: Figures S3 and Additional file 6: Figure S4). The more highly-partitioned model runs with high TLs also tended to show very high TL variance among generations, resulting in quite broad TL posteriors (Additional file 5: Figures S3 and Additional file 6: Figure S4). Considering this evidence for unreliability in the more highly partitioned model runs, we tempered our choice of partitioning scheme. A combination of lnL (BF and Aw) and TL reliability criteria suggest that MB8 is the preferred partitioned model, since it had better lnL than other models (e.g., MB1-2) while resulting TL estimates were apparently uninflated and of low variance (Additional file 5: Figures S3 and Additional file 6: Figure S4). Hereafter, we discuss results based on the BI runs from model MB8, and identify any notable differences between this model and others (particularly MB5 and MB7). It is important to note however that while lnL and TLs differed between partitioning scheme models, tree topologies remained nearly identical (discussed below). It is possible that the inconsistency between the BF-based support for more highly partitioned models versus evidence for model overparameterization that we observed may be related to calculation of BFs based on harmonic mean approximations of marginal likelihoods, which has been shown previously [42, 43, 45]. Thus, while no substantial topology or support value differences were observed between results from different models, we have taken a conservative approach and chosen to use MB8 as the preferred model because more complex models exhibited excessive TL indicative of overparameterization.
Bayesian and maximum likelihood analyses of concatenated data
Bayesian concatenated phylogenetic estimates supported strongly (Pp = 1.00) the separation of staphylococcal species into two deeply-diverging major clades (Figure 2). One of the two clades contained all of the oxidase positive staphylococcal species (frequently referred to as the Sciuri group), with the second group containing all other oxidase negative staphylococcal species (Figure 2). The single lineage S. auricularis formed the sister group to all other members of this second group, with the next most basally-diverging lineage in this clade including the following species: S. simulans, S. condimenti, S. carnosus (both subspecies), and S. piscifermentans (Pp = 1.00). The subspecies of S. carnosus proved to cluster tightly together, as expected, and formed the sister group to S. condimenti.
The next major divergence within the staphylococcal tree was that of a strongly supported clade (Pp = 1.00) containing the pathogenic species S. saprophyticus (Figure 2). This clade contained many members of the polyphyletic group of coagulase negative, novobiocin resistant species, and included the recently described species S. massiliensis and S. pettenkoferi. Following this divergence, species of heightened clinical significance diverged, including S. aureus, S. epidermidis, S. warneri, S. haemolyticus and S. lugdunensis, which formed a well-supported clade (Pp = 1.00) (Figure 2). We also found that the most recently discovered Staphylococcus species, S. agnetis formed a strongly supported clade (Pp = 1.00) with S. hyicus, for which S. chromogenes was the sister lineage.
Concatenated and unconcatenated phylogenetic methods broadly agree on clustering of staphylococcal species
Estimation of staphylococcal phylogeny was also performed on the unconcatenated dataset using Bayesian Estimation of Species Trees (BEST) analysis . BEST analyses treated each locus as an independent gene, thereby inferring the likely species tree given four independent gene trees. Trees inferred from duplicate BEST runs were identical in topology with no nodes differing by Pp > 0.05, indicating that multiple runs converged on nearly the same posterior tree space. With three exceptions, the BEST tree resolved the same major clades as the concatenated BI and ML trees, although in some cases the relative branching order of major clades differed (Additional file 7: Figure S5). BEST estimated that S. auricularis formed a clade with S. sciuri (Pp = 0.99) as opposed to the later (less basal) divergence of S. auricularis observed in concatenated BI and ML estimates (Pp = 1.00 and BS = 100%, respectively; Additional file 7: Figure S5). BEST also estimated a more divergent relationship between S. kloosii and S. arlettae, whereas these two species formed an exclusive clade in concatenated data analyses. The concatenated BI and ML analyses estimated S. felis to diverge more basally than was inferred by BEST analyses, although support for the placement of S. felis was generally low among all methods (BEST, Pp < 0.50; concatenated BI, Pp = 0.75; concatenated ML, BS < 50%).
In order to achieve convergence using BEST method, we chose to enforce a molecular clock to reduce the number of parameters in the analysis. To evaluate the impact of this parametric restriction on the resulting inferences, we also conducted BEST analyses without enforcing a molecular clock. Only minor differences in cluster groupings between analyses were observed where S. agnetis and S. hyicus formed a clade with S. chromogenes as the sister taxon in Additional file 7: Figure S5, while the alternate (non-clock BEST) analysis estimated S. agnetis and S. chromogenes to form a clade, sister to S. hyicus (data not shown). However, because convergence was not achievable without application of a strict molecular clock, overall node supports for this tree tended to be lower than the clock-constrained BEST analysis. These results also suggest that differences in tree topology between concatenated methods and BEST analyses are not necessarily the result of applying a molecular clock to the dataset.
Using multilocus data to infer the Staphylococcus phylogeny
Staphylococcus is a species-rich genus of importance from both a human health and economic perspective. Relevant to the first goal of this study, our results provide strong evidence that the current groupings of Staphylococcus species require revision, and provide a clear consensus across analyses on how this could be done to reflect inferred evolutionary relationships among species groups. The second goal of the study was to infer higher-level relationships among species and cluster groups, and our results provide good evidence for much consensus across methods although there remains some alternative hypotheses for such higher-level relationships that differed between methods. To infer phylogenies relevant to both species-grouping and higher-level relationships, we used a combination of Bayesian and maximum likelihood analyses of multilocus data. We found that both Bayesian and maximum likelihood analysis of multilocus data yielded high-resolution species trees with overall high nodal support values for relationships among Staphylococcus species. We also found that partitioned-model analysis of the combined dataset, versus the concatenation-free analysis using BEST, produced near-identical estimates of the species composition of major clades and putative revised cluster groupings (i.e., more recent relationships).
In contrast to broad consensus across methods for resolution of relationships among more recent groupings of species, concatenated and gene-tree-based methods (BEST) inferred several alternative relationships among more ancient lineages of staphylococcal species. It is not entirely clear, however, what the source of these differences are (e.g., different evolutionary histories among genes being differentially resolved between methods, difference in how methods extracted signal from the multilocus data, etc.). It is notable that our finding that the BEST method inferred similar major clades as the concatenated methods, but inferred a different branching order among these major clades, has also been observed in other studies . Such an observation may be indicative that although the phylogenetic signal contained within gene trees affords robust estimates of membership of particular species to major clades, conflicting signal or simply very little signal for deeper relationships among major lineages is available from single gene tree inferences (as in BEST). Our analysis of individual gene trees further supports this hypothesis whereby there appears to be substantial disagreement about higher-level relationships, but individual gene trees are consistent with one another regarding the placement of species within clusters towards the tips of the tree. It is notable that the modified BEST analysis, in which only two gene fragments were incorporated, more consistently resolved higher order relationships with the concatenated BI and ML methodologies. This suggests further that conflicting signal within the other two gene fragments was contributing to the discord among the original BEST analysis, incorporating all four loci.
It has been shown that staphylococcal species routinely laterally transfer genes , and it is therefore reasonable to consider that lateral gene transfer might complicate inference of phylogeny in this study. For example, lateral transfer (potentially combined with phylogenetic inference error) may explain instances of disagreement between gene trees and multi-locus inferences. Particularly in the case of inferring bacterial phylogeny, generally high instances of gene transfer inherently complicate inference of species-level trees, and even raise philosophical questions about the meaning of such species-level inferences . Our results do, however, provide good evidence that there is indeed phylogenetic signal of an underlying species-level tree with many shared relationships across analytical methods, and this tree contrasts strongly with the existing higher-level classification scheme of the group that was based on less robust inferences methods. Our results largely agree across methods about the membership of species and subspecies to major clades, and thus provide new important confirmatory information sufficient for refining the nomenclature of the group.
Refined phylogeny and classification of Staphylococcus spp
Consistent with previous studies [55, 63, 64, 67, 68], our analyses identified the monophyletic group containing the novobiocin-resistant, oxidase positive species (Sciuri group; Figure 6, blue cluster group) as the sister group to all other Staphylococcus. This cluster group also contains the recently discovered species, S. stepanovicii . Within this group, we inferred a close relationship, with little sequence divergence, between S. vitulinus and S. pulvereri (BI and BEST Pp = 1.00; BS = 100%), potentially supporting the reclassification of S. pulvereri as a later synonym of S. vitulinus . After the basal divergence of the Sciuri group, the second lineage to diverge from the remaining staphylococcal lineages was the oxidase negative Auricularis group, containing only S. auricularis (Figure 6). Our phylogeny therefore suggests that cytochrome C oxidase was lost in Staphylococcus sometime in the common ancestor of S. auricularis and the remaining Staphylococcus species, after their divergence from the Sciuri group (Figure 6, red star).
Our phylogenetic placement of S. auricularis as the sister lineage to all non-Sciuri group staphylococci is unique to our study, and we find strong support for this inference (MrBayes Pp = 1.00 and ML BS = 99%). Based on the 16S rRNA gene alone, Takahashi et al.  estimated that S. auricularis shared a common ancestor with the S. saprophyticus, S. lugdunensis, S. haemolyticus, S. warneri, S. epidermidis and S. aureus cluster groups. More recently, Ghebremedhin et al. estimated a similar relationship to that of Takahashi et al. based on 16S rRNA gene alone. Analyses of subsequent gene fragments, however, yielded varying relationship estimates for S. auricularis, and no previous studies have found particularly strong support for the placement of this lineage. For example, Ghebremedhin et al.  recovered bootstrap support of 31% for a clade containing S. auricularis and S. kloosii based on the 16S rRNA gene, although average BS support across their tree was particularly low, at BS = 52%. Similarly, S. auricularis was placed as the sister lineage to S. kloosii plus the S. saprophyticus group, with BS = 25% based on analysis of the 16S rRNA gene by Takahashi et al. .
We inferred that the next lineage of Staphylococcus to diverge was the Simulans species group (Figure 6), which contains four species that are all novobiocin susceptible and coagulase negative. For consistency with previous nomenclature [6, 64], we refer to this clade as the Simulans-Carnosus cluster group and the species group as the Simulans group (Figure 6). Our estimate of relationships among species of this group agree with previous studies, although the inclusion of S. condimenti in our trees is novel [6, 64]. We inferred a single clade (Simulans-Carnosus cluster) containing the novobiocin susceptible, coagulase negative species, S. simulans, S. condimenti, S. carnosus and S. piscifermentans.
Following the split of these three early-diverging lineages, the remaining Staphylococcus species diverged into three large species groups. The first of these to diverge from the remaining was the Saprophyticus species group (Figure 6), which we inferred consists of four cluster groups. Within this species group, the Pettenkoferi-Massiliensis cluster group contains novobiocin susceptible species while all of the remaining members of the Saprophyticus group are novobiocin resistant. Thus, it seems that an alternative gyrase B gene conferring novobiocin resistance may have been acquired in this clade sometime after the Pettenkoferi-Massiliensis cluster group diverged from the rest of the Saprophyticus species group. Based on analysis of the 16S rRNA gene, Al Masalma et al.  reported the newly discovered species S. massiliensis to be a member of the Simulans group, although they failed to recover this relationship in analyses of the dnaJ, rpoB, and tuf genes, where they instead placed it with S. pettenkoferi as we have here. It is also notable that the close relationship between these coagulase-negative species was also suggested based on their phenotypic similarities across a range of biochemical tests . Additionally, in the Saprophyticus cluster group, we inferred a close relationship between S. equorum, S. succinus, S. saprophyticus, and S. xylosus with S. gallinarum as the sister lineage to these four species. The placement of S. gallinarum in other studies is variable, but on multiple occasions has clustered with the Arlettae-Kloosii group [6, 57, 60, 63, 64]. This alternative placement of S. gallinarum seems reasonable as we find the Arlettae-Kloosii cluster group to be closely related to the Saprophyticus cluster group (Figure 6).
The Epidermidis-Aureus species group contained five cluster groups, including the most common taxa of heightened clinical significance . In general, our estimates of relationships among these species are consistent with previous reconstructions [57, 64]. Relationships within the Haemolyticus cluster group also agree with previous estimates , with the placement of the recently discovered coagulase-negative bovine strain, S. devriesei, forming a clade with S. haemolyticus . Lastly, the Hyicus-Intermedius species group contained species of the "S. hyicus-S. intermedius cluster group" originally proposed by Takahashi et al.  based on a 16S rRNA gene dataset, and additional studies have found similar estimates of relationships based on analyses of other loci [1, 6, 60, 61, 63, 70]. The limited number of taxa assessed in these studies has, however, prevented a more detailed understanding of species relationships within this species group prior to our analysis here. Moreover, recent novel species discovery (in particular S. rostri, S. microti, and S. agnetis) has also contributed to the enhanced diversity of the Hyicus-Intermedius group. We have divided this species group into three cluster groups based on their phylogenetic relationships, which is also supported by their phenotypic diversities (Figure 6). Species among the Intermedius cluster group are all coagulase positive, excepting S. schleiferi schleiferi. Interestingly, S. schleiferi coagulans is coagulase positive, consistent with the other members of this cluster group, implying a recent loss in S. schleiferi schleiferi. In contrast, the Muscae cluster group contains only coagulase negative species (S. muscae, S. rostri, and S. microti). Within the last two years, both S. rostri and S. microti were discovered and found to cluster with S. muscae, thus altering previously known relationships within this species group. The Hyicus cluster group is coagulase-variable, including coagulase positive (S. hyicus), negative (S. chromogenes, S. felis), and variable (S. agnetis) species (Figure 6, red cluster group).
Through the analysis of multiple loci under a variety of phylogenetic methods, we achieved one of our main goals of inferring a robust estimate of the cluster groupings among staphylococcal species. We have used this estimate of cluster groupings to refine the current knowledge of the systematics and nomenclature for this important genus. Our results also contribute to a presumably more accurate understanding of the higher-level relationships among Staphylococcus species, although we do note that there are several outstanding questions left by the alternative resolutions of our concatenated versus species-tree-based inferences. We have attempted to present these yet unresolved inferences in a transparent fashion such that future work might directly test remaining alternative hypotheses and add further clarity to the relatively small number of remaining questions about relationships among staphylococcal species. The availability of such a comprehensive estimate of the evolutionary origins of, and relationships among, staphylococci provides an important context for understanding patterns of gain and loss of genetic and physiological attributes, and the potential role of lateral gene transfer in both pathologically-relevant phenotypes and in estimation of phylogenetic relationships among species. Such questions are of particular relevance considering the clinical and economical significance of some Staphylococcus species. Approaches such as this will provide a more natural classification of species based on phylogenetic inferences and lend support to future evolutionarily-informed studies of microbial diversity and physiology.
Funding for this study (including that for Open Access, stipend support, computational software/hardware, and usage of the STOKES IBM High Performance Computer Cluster) was provided in part by a doctoral postgraduate scholarship (PGS-D) from the Natural Sciences and Engineering Research Council of Canada (to R.P.L.), a National Institutes of Health grant AI060753 (to A.M.C.), and a National Science Foundation grant 0525429 (to C.L.P.). The authors would also like to thank Matthew P. Wood for assistance in editing the manuscript and the University of Central Florida STOKES Advanced Research Computing Center for providing computational resources and support that have contributed to results reported herein.
- 3.Sakwinska O, Giddey M, Moreillon M, Morisset D, Waldvogel A, Moreillon P: Host range and human-bovine host shift in Staphylococcus aureus. Appl Environ Microbiol. 2011Google Scholar
- 4.Lowder BV, Guinane CM, Ben Zakour NL, Weinert LA, Conway-Morris A, Cartwright RA, Simpson AJ, Rambaut A, Nubel U, Fitzgerald JR: Recent human-to-poultry host jump, adaptation, and pandemic spread of Staphylococcus aureus. Proc Natl Acad Sci U S A. 2009, 106 (46): 19545-19550. 10.1073/pnas.0909285106.PubMedPubMedCentralCrossRefGoogle Scholar
- 6.Ghebremedhin B, Layer F, Konig W, Konig B: Genetic classification and distinguishing of Staphylococcus species based on different partial gap, 16S rRNA, hsp60, rpoB, sodA, and tuf gene sequences. J Clin Microbiol. 2008, 46 (3): 1019-1025. 10.1128/JCM.02058-07.PubMedPubMedCentralCrossRefGoogle Scholar
- 8.Freney J, Kloos WE, Hajek V, Webster JA, Bes M, Brun Y, Vernozy-Rozand C: Recommended minimal standards for description of new staphylococcal species. Subcommittee on the taxonomy of staphylococci and streptococci of the International Committee on Systematic Bacteriology. Int J Syst Bacteriol. 1999, 49 (Pt 2): 489-502.PubMedCrossRefGoogle Scholar
- 28.Sanderson MJ, Donoghue MJ, Piel W, Eriksson T: TreeBASE: a prototype database of phylogenetic analyses and an interactive tool for browsing the phylogeny of life. Am J Bot. 1994, 81 (6): 183-Google Scholar
- 30.Swofford DL: PAUP*. Phylogenetic analysis using parsimony (*and other methods). 2003, Sinauer Associates, SunderlandGoogle Scholar
- 50.Zwickl DJ: Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. Dissertation. 2006, The University of Texas at Austin, AustinGoogle Scholar
- 61.Kloos WE, Schleifer KH: Genus IV Staphylococcus. Bergey's Manual of Systematic Bacteriology. Edited by Sneath PHA, Mair NS, Sharpe ME, Holt JG, vol. 2. 1986, Williams & Wilkins, Baltimore, 1013-1035.Google Scholar
- 65.Kloos WE, Schleifer KH, Gotz R: The genus Staphylococcus. The Prokaryotes: a Handbook on the Biology of Bacteria: Ecophysiology, Isolation, Identification, Applications. Edited by: Balows HG, Truper M, Dworkin M, Harder W, Schleifer KH. 1991, Springer, New York, 1369-1420. 2Google Scholar
- 67.Svec P, Vancanneyt M, Sedlacek I, Engelbeen K, Stetina V, Swings J, Petras P: Reclassification of Staphylococcus pulvereri Zakrzewska-Czerwinska et al. 1995 as a later synonym of Staphylococcus vitulinus Webster et al. 1994. Int J Syst Evol Microbiol. 2004, 54 (Pt 6): 2213-2215.PubMedCrossRefGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.