Genome-Based Comparison of Clostridioides difficile: Average Amino Acid Identity Analysis of Core Genomes
Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same hospital. We conclude that metagenomics can contribute to the identification of CDI and can assist in characterization of the most probable causative strain in CDI patients.
KeywordsC. difficile AAI MLST Community-acquired infections Comparative genomics
Clostridioides difficile (C. difficile) is a Gram-positive, anaerobic bacillus that is responsible for pseudomembranous colitis; it is also a common cause of nosocomial diarrhea, conditions whose morbidity and mortality have dramatically increased in the past decade [1, 2, 3]. Nosocomial infections caused by C. difficile are a recurrent problem and increasingly young individuals are recognized as being at high risk, in contrast with the historical C. difficile incidence trends, which pointed to elderly hospitalized patients mainly . In the traditional view, hospitalized patients are exposed to the spores of C. difficile by direct contact with medical staff, via contaminated utensils or from the hospital environment, although food can also be involved in the transmission process . In patients with an effective immune response, colonization of the gut by C. difficile can occur without the presentation of clinical signs. However, in individuals with a history of repeated exposure to certain antibiotics, those who are immunocompromised or suffer from underlying enteric diseases, C. difficile can dramatically increase in numbers . This has knock-on effects on the gut microbiome and results in a disbalance of other bacterial species (dysbiosis), resulting in a semi-permanently altered gut micro-environment. Once this condition is established, treatment becomes very difficult. In such cases, a fecal transplant may be the only option, which has shown efficacy rates as high as 90% [7, 8].
The taxonomic description of the genus Clostridium, to which C. difficile used to belong, has undergone multiple revisions over the years. The genus expanded and was split up again, resulting in some confusion about its members. There have been at least 240 bacterial species that, at some time in the past, were accepted as a member of Clostridium. In 1994, a major revision of the genus was proposed, which moved a number of its members to novel genera . This proposal was based on phylogenetic analysis of 16S rRNA sequences and was later backed up by phylogeny of a selection of protein genes . Currently, there are 71 recognized species belonging to the genus Clostridium sensu stricto  whose type strain is C. butyricum. The species C. difficile is no longer part of this genus, as it was placed in the novel genus Clostridioides, together with C. mangenotii [9, 12].
The main virulence determinants of C. difficile are toxin A (ToxA) and toxin B (ToxB), which are encoded on pathogenicity locus (PaLoc) by tcdA and tcdB genes, respectively, together with three regulatory genes tcdC, tcdE, and tcdR . It has been suggested that PaLoc is a mobile element , in which case transfer of the complete locus can convert a non-toxigenic strain into a toxigenic one . In addition, a third toxin may be present, known as binary toxin or CDT (short for C. difficile toxin, not to be confused with the cytolethal distending toxin of Gram-negative bacteria). This toxin is encoded by cdtA and cdtB genes and belongs to the Iota-family of toxins to which also Clostridium perfringens toxin belongs , although the relevance of CDT in clinical disease is still discussed .
Genetic differentiation of C. difficile strains is important to identify possible nosocomial outbreaks, in which multiple patients are infected by a single strain. A common method for genetic differentiation is multilocus sequence typing (MLST). By this method, C. difficile strains have been classified into six phylogenetically different clades (clades 1 to 5 and C1) , although Clade C1 is not defined in the MLST database collected at the University of Oxford, UK. These clades may contain both pathogenic and non-pathogenic strains, but the vast majority of strains produce one or more toxins . This suggests that MLST may not be an ideal approach to highlight differences among the toxin gene repertoire of the isolates under study.
Technical developments now allow routinely sequencing whole genome sequences (WGS), instead of a limited number of gene fragments only. A large number of WGS from C. difficile strains are already available in the public domain and these sequences have been useful in multiple ways. WGS characterization has been helpful in the study of C. difficile infections (CDI) , and it identified the true genetic diversity that exists among C. difficile isolates, which was originally assessed by identification of single-nucleotide variants (SNVs) within a limited set of known genes only . In addition, WGS has assisted epidemiological investigations, as it is superior in identifying CDI transmission sources, in particular in patients with recurrent CDI infections [19, 20, 21]. By means of WGS, it was recently shown that most cases of CDI in hospitalized patients are due to endogenous strains carried by the patients who were asymptomatic prior to their hospitalization, while in the hospital, exposure to high doses of antibiotics may favor the growth of the bacteria, resulting in symptomatic CDI . In such a scenario, even non-toxigenic strains can pose a risk, since they can acquire the PaLoc by horizontal gene transfer, carry alternative virulence genes, or result in the condition of dysbiosis [15, 23]. WGS has further been applied to assess the effectiveness of fecal transplants in severe CDI cases . Lastly, metagenomics, in which all DNA present in a (clinical) sample is being assessed, is increasingly being applied to fecal analysis and may become a commonly applied technique in medical microbiology in the near future. Its main advantage is the identification of microorganisms without the need of culture, saving time in an outbreak investigation .
WGS results in a lot more data than what is assessed by SNP-based analysis or MLST. Since complete proteomes can be predicted, these can be subjected to average aminoacid identity (AAI) analysis, a method that compares all conserved protein-coding genes present in a given set of genomes, clustering strains into groups that sharing more than 95% AAI [25, 26]. This method has proven to have higher resolution power at the species level than comparison of 16S rRNA or MLST, since it assesses a far larger fraction of the genome .
Here, we compared all available C. difficile genomes by AAI analysis based on WGS data, starting with a comparison of taxonomic type strains of Firmicutes. The aims were to (1) determine if AAI analysis can confirm the status of C. difficile as a unique species; (2) assess if AAI produces groupings within the species that are of clinical relevance, to accurately identify pathogenic strains. For this second aim, the findings were compared to presence or absence of the toxin genes in different strains. As aim (3), we investigated whether metagenomics can identify CDI together with its associated strain (s), through a detailed analysis of published whole shotgun metagenomic sequences.
Average Amino Acid Identity Analysis
AAI analysis was performed as previously described  and briefly summarized here. Amino acid sequences of all proteins from the analyzed genomes were extracted from their original GenBank accessions. AAI analysis was then carried out for every possible pair of genomes. The conserved reciprocal best match for each protein from each genome pair was first identified using UBLAST  with a cutoff of 30% sequence identity and a required minimum of 70% alignment length of the query sequence. For each pair of genomes, the average amino acid identity was then calculated based on the identities of all conserved reciprocal best matches, a calculation that is not always symmetrical. In such cases, the average of the two AAI values was assigned to each pair of genomes. Genomic clusters were then generated from the AAI values with a default cutoff of 95% (unless stated otherwise), meaning that members from different clusters cannot share more than 95% AAI identity, and members within a cluster have paths consisting of edges connecting isolates with AAI > 95%. The exact way how this clustering was done is described elsewhere [25, 26].
The first AAI tree was produced of all 25 taxonomically valid type strains of Firmicutes for which whole-genome sequences were available at the time of analysis. The list of type strains was obtained from the Names4Life website (www.namesforlife.com, accessed on 14 March 2017). The pairwise comparisons used to calculate AAI values for this set of genomes involved between 236 and 1766 genes (666 genes on average). The AAI tree was built with BIONJ  to dissimilarities of AAI values (100% minus AAI).
For comparison of all Clostridia members, the GenBank database was accessed on February 12, 2017, and all available complete genomes and chromosomes (n = 234) within the Clostridia class were downloaded. This selection was restricted to completely sequenced genomes and included 8 genomes from C. difficile. The AAI values of the Clostridia selection involved compared between 131 and 5017 genes (955 genes on average).
A third AAI tree was built using all 663 C. difficile genomes that had genome quality scores > 0.8 as defined elsewhere , this time including complete genomes as well as draft genome sequences available at the time of analysis. The dataset contained 653 sequences described as obtained from C. difficile, completed with 10 genomes with no species designation but presumed to be C. difficile based on gANI (genome-wide average nucleotide identity) . Two genomes of Clostridioides mangenotii (GCA_000498755 and GCA_000687955) were added to serve as an outgroup. A tree that contains over 600 branches would be hard to read, and since many genomes are extremely similar, the tree would end in many very short branches. For graphical representation of such a large dataset, branches that contained identical or highly similar members were collapsed. We attempted to define a suitable cutoff for such a collapse by varying the required percentage of similarity within this dataset as described in the results. Bootstrap values were calculated conceptually similar to alignment-based trees: among the reciprocal conserved protein pairs identified for a given pair of genomes, we selected pairs randomly with replacement of as many as the number of the original pairs and repeated this procedure 100 times, resulting in 100 bootstrap AAI values for that pair of genomes. Then, 100 bootstrap AAI trees were generated and bootstrap values were calculated, defined as the occurrence of clades of the AAI tree in 100 bootstrap trees.
In Silico MLST
The seven gene fragments of housekeeping genes adk, atpA, dxr, glyA, recA, sodA, and tpi were extracted from the C. difficile genomes by NBLAST using the sequence of allelle number 1 as the query; these query sequences were extracted from the MLST database (https://pubmlst.org/cperfringens/) collected by the University of Oxford. The best NBLAST hit with each genome was retrieved, sequences were concatenated, and a NJ tree was constructed by Muscle . Redundancy was removed by deleting multiple sequences per sequence type (ST), recording the number of members per ST. For comparison, the complete genes instead of MLST fragments were also concatenated and analyzed.
Identification of Toxin Genes by PFAM Domain Searches
Prodigal software was used to identify all protein-coding genes across all analyzed C. difficile genomes . The Pfam domains in the proteins of these genomes were identified using HMMER 3.1b2  to scan across the 16,306 profile hidden Markov Models in the Pfam database version 30.0 . Presence of genes coding for toxin A or toxin B was identified on the basis of presence of the Pfam domains PF12918, PF12919, PF11713, and PF12920. CDT protein A was identified on the basis of presence of the Pfam domain PF03496 and CDT protein B by presence of PF07691 and PF03495.
Read Coverage Analysis from Metagenomic Data
To assess if metagenomics can be used for identification of CDI, we investigated the recently published shotgun metagenomes produced from stool samples of patients with suspected or confirmed CDI in two different hospitals located in Canada  and Italy . A total of 228 metagenomic samples were available from the Canadian study, including CDI cases and controls. We blinded our analysis, not knowing which of these samples were from controls and which were from patients. Of the 15 Italian metagenomic samples available, we only analyzed two datasets from clinically confirmed CDI cases, as diagnosed by the authors . Reads were downloaded from NCBI BioProjects PRJNA297252 and PRJNA297269 at NCBI’s Sequence Read Archive (SRA) database. The SRA metagenomic data were converted into FASTQ format using ‘SRA Toolkit’ software and then aligned to the 663 C. difficile genomes using Burrows-Wheeler analysis (BWA) software version 0.715  using default parameters. All reads were compared to C. difficile sequences and the C. difficile genome to which the most abundant reads were matched was then identified. The genome coverage of the metagenomic reads against that reference genome was determined by using the genomecov function in BEDTools software . For four datasets that had a genome coverage of over 50%, the reads were plotted using the Integrative Genomics Viewer (IGV)  to illustrate the presence or absence of toxin operon genes, using the complete genome of C. difficile NC_009089.1 as a template. A sample with lower genome coverage was included as a control. The five sequences presented here include SRR2565933, SRR2565934, and SRR2565548 (BioProject ID PRJNA297252) and SRR2582247 and SRR2582248 (BioProject ID PRJNA297269).
Results and Discussion
Average Amino Acid Analysis of Taxonomic Type Strains Belonging to Firmicutes
Average Amino Acid Analysis of Completely Sequenced Clostridia Members
Average Amino Acid Analysis of 663 C. difficile Genomes
By increasing the cutoff to 96% (panel 3B), the number of branches increases to 20, of which 18 are clusters. The 17 strains that clustered together at 95% are maintained (now in cluster number 17), as is the cluster of four genomes (now cluster number 18). The remaining 640 genomes are now divided over 15 clusters, the largest one containing 168 genomes and the smallest 3. The genomes that had been derived from ‘Clostridium sp.’ were distributed over different clusters, so these did not form a group of their own. This trend is continued when the cutoff is increased to 97%, which maintains the clusters with 17 and 4 genomes (cluster numbers 23 and 24). However, at a level of 98%, only the 4-genome cluster remains intact, while the 17 strains are now divided over 4 small clusters and 4 single-genome branches, as shown in panel d of Fig. 3. The small clusters are only partly region-specific, for instance, three Canadian strains are combined (cluster number 32), but another cluster combines the two Irish with the Italian and an Austrian strain (cluster number 30). Continuing this analysis with a cutoff of 99% results in 92 clusters and 151 single-genome branches (results not shown).
From this analysis, we conclude that the included Clostridium sp. genomes can be considered to have been obtained from C. difficile members. Twenty-three submitted C. difficile genomes are clear outliers, which form two clusters and two single-genome branches. Further, with an increase of the chosen cutoff, larger clusters break up into smaller ones, and this is a continuous process.
MLST Analysis of C. difficile Genomes
Although clearly separated by MLST, the clades 1 and 2 were not clearly visible in the AAI trees of Fig. 3. We therefore re-analyzed the seven MLST genes, this time including complete coding sequences instead of the typically used MLST fragments. The result (Fig. 4b) showed that clades 1 and 2 are distinct but closely related, which explains why this division was not visible by AAI. In fact, a number of STs that are part of clade 1 are closer related to clade 2 members, based on their full-length MLST gene assessment, and four ST2 members are mixed with clade 1 STs (these 71 genomes are identified in Table S1). This illustrates that the distinction in clades based on MLST fragments is somewhat arbitrary and depends on how these fragments were chosen. Thus, the clades do not necessarily represent truly distant lineages.
Presence of Toxin Genes in C. difficile Genomes
We also searched for presence of Pfam domains indicative of the two proteins that make up CDT. Only 191 genomes contained (after translation) the Pfam domains typically present in CdtA, and two more contained CdtB domains. Interestingly, nearly all of the CDT-positive genomes also contained one or two copies of ToxA/B: 172 genomes contained ToxAB (deduced from presence of two copies of the toxin-specific Pfam domains) and 18 contained one copy. Since all genomes lacking ToxA/B also lacked CDT (with one exception), it seems the presence of these different toxins is highly overlapping. Only one genome analyzed here contained CDT but not ToxA/B (A − B − CDT+), while 310 genomes reported A + B + CDT−; the latter is most often found in the investigated genome collection. The presence of CDT in strains lacking ToxA/B has been described before [46, 47, 48, 49, 50, 51], but apparently, this is rather uncommon, as we find this in only 0.15% of the strains for which a genome sequence is currently available.
Metagenomic Analysis Can Identify C. difficile Infection in Stool Samples
Summary statistics for each of the metagenomic datasets for the four patients with C. difficile infection
SRA ID of human gut metagenomic sequences
Total metagenomic reads (n)
Metagenomic reads mapped to C. difficile (n, %)
C. difficile strain to which the reads map best
Strain name Assembly ID origin
Strain 5.3 GCA_000586575 (Australia)
Strain VL_0181 GCA_900012755 (Canalda)
Strain VL_0083 GCA_900011925 (Canada)
Strain IT1118 GCA_001497755 (Italy)
Strain Y384 GCA_000451045 (USA)
Genome size (Mb)
Coverage of reads (%) on best matching C. difficile genome
Presence of tcd genes
Presence of cdt genes
In total, these data show that metagenomic analysis of stool samples can identify presence of C. difficile, and the degree of genome coverage can be taken as a measure for abundance of the organism. In the future, as the cost of metagenomic sequencing becomes more affordable and faster, this approach might become economically feasible for more routine analysis. We further conclude that, although patients may be diagnosed with a CDI in the same hospital, their infection was unlikely due to a common nosocomially transferred strain. More likely, endogenous, community-acquired strains, may have been responsible for these analyzed cases.
The cluster analysis presented here has shown that for different members of Firmicutes, AAI clustering provides valuable insights on similarities that broadly agree with taxonomic position. At the level of genera and species within Clostridia, the clusters are less well resolved, as various genera are mixed. The taxonomic classification of C. difficile has encountered difficulties in the past. The WGS analysis presented here was based on AAI, which captures a large fraction of protein gene content. That clearly identified all analyzed 663 C. difficile members belong to a single species that is distinct from its closest relatives. At the level of strains within the C. difficile species, AAI analysis groups the vast majority of genomes within one cluster at 95% cutoff. This cluster subdivides as the cutoff for similarity is increased, but a clear optimal cannot be identified. Most genomes with identical STs group in AAI clusters, but there are exceptions. MLST clade 1 contains a number of STs that are more similar to members of clade 2. The analysis further showed that the toxin genes are unevenly distributed over the strains.
Metagenomic analysis of stool samples can identify cases of CDI, and CDI-causing strains can be atoxigenic. Detection of multi-copy RNA genes exclusively in metagenomic reads may be indicative of low numbers of C. difficile in stools. The detected sequences suggest CDI cases may be caused by different strains in patients form the same hospital. These findings support evidence for the acquisition of the pathogen within the community, with autogenous strains causing the infection. The onset of symptoms during hospitalization may be a result of treatment rather than in-hospital spread of an epidemic strain.
We thank Dr. George Garrity for providing the selection of type strain genomes.
This project was partly supported by the Translational Research Institute (TRI), grant UL1TR000039 through the NIH National Center for Research Resources and National Center for Advancing Translational Sciences; by NIH/NIGMS grant 1P20GM121293; and from the Helen G. Adams & Arkansas Research Alliance Endowment in the Department of Biomedical Informatics, College of Medicine at UAMS. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Compliance with Ethical Standards
Conflict of Interests
The authors declare that they have no conflict of interests.
- 2.Peery AF, Dellon ES, Lund J, Crockett SD, McGowan CE, Bulsiewicz WJ, Gangarosa LM, Thiny MT, Stizenberg K, Morgan DR, Ringel Y, Kim HP, DiBonaventura MD, Carroll CF, Allen JK, Cook SF, Sandler RS, Kappelman MD, Shaheen NJ (2012) Burden of gastrointestinal disease in the United States: 2012 update. Gastroenterology 143:1179–1187.e1173. https://doi.org/10.1053/j.gastro.2012.08.002 CrossRefPubMedPubMedCentralGoogle Scholar
- 5.Eyre DW, Cule ML, Wilson DJ, Griffiths D, Vaughan A, O'Connor L, Ip CLC, Golubchik T, Batty EM, Finney JM, Wyllie DH, Didelot X, Piazza P, Bowden R, Dingle KE, Harding RM, Crook DW, Wilcox MH, Peto TEA, Walker AS (2013) Diverse sources of C. difficileinfection identified on whole-genome sequencing. N Engl J Med 369:1195–1205. https://doi.org/10.1056/NEJMoa1216064 CrossRefPubMedGoogle Scholar
- 6.Vincent C, Miller MA, Edens TJ, Mehrotra S, Dewar K, Manges AR (2016) Bloom and bust: intestinal microbiota dynamics in response to hospital exposures and Clostridium difficile colonization or infection. Microbiome 4:12. https://doi.org/10.1186/s40168-016-0156-3 CrossRefPubMedPubMedCentralGoogle Scholar
- 7.Gianotti RJ, Moss AC (2017) Fecal microbiota transplantation: from Clostridium difficile to inflammatory bowel disease. Gastroenterol Hepatol (N Y) 13:209–213Google Scholar
- 9.Collins MD, Lawson PA, Willems A, Cordoba JJ, Fernandez-Garayzabal J, Garcia P, Cai J, Hippe H, Farrow JA (1994) The phylogeny of the genus Clostridium: proposal of five new genera and eleven new species combinations. Int J Syst Bacteriol 44:812–826. https://doi.org/10.1099/00207713-44-4-812 CrossRefPubMedGoogle Scholar
- 13.Monot M, Eckert C, Lemire A, Hamiot A, Dubois T, Tessier C, Dumoulard B, Hamel B, Petit A, Lalande V, Ma L, Bouchier C, Barbut F, Dupuy B (2015) Clostridium difficile: new insights into the evolution of the pathogenicity locus. Sci. Rep. 5:15023. https://doi.org/10.1038/srep15023 CrossRefPubMedPubMedCentralGoogle Scholar
- 14.Dingle KE, Elliott B, Robinson E, Griffiths D, Eyre DW, Stoesser N, Vaughan A, Golubchik T, Fawley WN, Wilcox MH, Peto TE, Walker AS, Riley TV, Crook DW, Didelot X (2014) Evolutionary history of the Clostridium difficile pathogenicity locus. Genome Biol Evol 6:36–52. https://doi.org/10.1093/gbe/evt204 CrossRefPubMedGoogle Scholar
- 17.Cowardin CA, Buonomo EL, Saleh MM, Wilson MG, Burgess SL, Kuehne SA, Schwan C, Eichhoff AM, Koch-Nolte F, Lyras D, Aktories K, Minton NP, Petri Jr WA (2016) The binary toxin CDT enhances Clostridium difficile virulence by suppressing protective colonic eosinophilia. Nat Microbiol 1:16108. https://doi.org/10.1038/nmicrobiol.2016.108 CrossRefPubMedPubMedCentralGoogle Scholar
- 18.Jia H, Du P, Yang H, Zhang Y, Wang J, Zhang W, Han G, Han N, Yao Z, Wang H, Zhang J, Wang Z, Ding Q, Qiang Y, Barbut F, Gao GF, Cao Y, Cheng Y, Chen C (2016) Nosocomial transmission of Clostridium difficile ribotype 027 in a Chinese hospital, 2012–2014, traced by whole genome sequencing. BMC Genomics 17:405. https://doi.org/10.1186/s12864-016-2708-0 CrossRefPubMedPubMedCentralGoogle Scholar
- 20.Mac Aogáin M, Moloney G, Kilkenny S, Kelleher M, Kelleghan M, Boyle B, Rogers TR (2015) Whole-genome sequencing improves discrimination of relapse from reinfection and identifies transmission events among patients with recurrent Clostridium difficile infections. J Hosp Infect 90:108–116. https://doi.org/10.1016/j.jhin.2015.01.021 CrossRefPubMedGoogle Scholar
- 21.Culligan EP, Sleator RD (2016) Advances in the microbiome: applications to Clostridium difficile infection. J Clin Med 5(9). doi: https://doi.org/10.3390/jcm5090083
- 22.Caroff DA, Yokoe DS, Klompas M (2017) Evolving insights into the epidemiology and control of Clostridium difficile in hospitals. Clin Infect Dis. https://doi.org/10.1093/cid/cix456
- 23.Roy Chowdhury P, DeMaere M, Chapman T, Worden P, Charles IG, Darling AE, Djordjevic SP (2016) Comparative genomic analysis of toxin-negative strains of Clostridium difficile from humans and animals with symptoms of gastrointestinal disease. BMC Microbiol 16:41. https://doi.org/10.1186/s12866-016-0653-3 CrossRefPubMedPubMedCentralGoogle Scholar
- 24.Mahato NK, Gupta V, Singh P, Kumari R, Verma H, Tripathi C, Rani P, Sharma A, Singhvi N, Sood U, Hira P, Kohli P, Nayyar N, Puri A, Bajaj A, Kumar R, Negi V, Talwar C, Khurana H, Nagar S, Sharma M, Mishra H, Singh AK, Dhingra G, Negi RK, Shakarad M, Singh Y, Lal R (2017) Microbial taxonomy in the era of OMICS: application of DNA sequences, computational tools and techniques. Antonie Van Leeuwenhoek. https://doi.org/10.1007/s10482-017-0928-1
- 27.Jun SR, Wassenaar TM, Nookaew I, Hauser L, Wanchai V, Land M, Timm CM, Lu TY, Schadt CW, Doktycz MJ, Pelletier DA, Ussery DW (2015) Diversity of Pseudomonas genomes, including Populus-associated isolates, as revealed by comparative genome analysis. Appl Environ Microbiol 82:375–383. https://doi.org/10.1128/aem.02612-15 CrossRefPubMedPubMedCentralGoogle Scholar
- 28.Edgar RC Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461. https://doi.org/10.1093/bioinformatics/btq461
- 35.Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44:D279–D285. https://doi.org/10.1093/nar/gkv1344 CrossRefPubMedGoogle Scholar
- 36.Milani C, Ticinesi A, Gerritsen J, Nouvenne A, Lugli GA, Mancabelli L, Turroni F, Duranti S, Mangifesta M, Viappiani A, Ferrario C, Maggio M, Lauretani F, De Vos W, van Sinderen D, Meschi T, Ventura M (2016) Gut microbiota composition and Clostridium difficile infection in hospitalized elderly individuals: a metagenomic study. Sci Rep 6:25945. https://doi.org/10.1038/srep25945 CrossRefPubMedPubMedCentralGoogle Scholar
- 44.Beaz-Hidalgo R, Hossain MJ, Liles MR, Figueras MJ (2015) Strategies to avoid wrongly labelled genomes using as example the detected wrong taxonomic affiliation for aeromonas genomes in the GenBank database. PLoS One 10(1):e0115813. https://doi.org/10.1371/journal.pone.0115813 CrossRefPubMedPubMedCentralGoogle Scholar
- 45.Eckert C, Coignard B, Hebert M, Tarnaud C, Tessier C, Lemire A, Burghoffer B, Noel D, Barbut F (2013) Clinical and microbiological features of Clostridium difficile infections in France: the ICD-RAISIN 2009 national survey. Med Mal Infect 43:67–74. https://doi.org/10.1016/j.medmal.2013.01.004 CrossRefPubMedGoogle Scholar
- 46.Kurka H, Ehrenreich A, Ludwig W, Monot M, Rupnik M, Barbut F, Indra A, Dupuy B, Liebl W (2014) Sequence similarity of Clostridium difficile strains by analysis of conserved genes and genome content is reflected by their ribotype affiliation. PLoS One 9:e86535. https://doi.org/10.1371/journal.pone.0086535 CrossRefPubMedPubMedCentralGoogle Scholar
- 47.NIH HMP Working Group, Peterson J, Garges S, Giovanni M, McInnes P, Wang L, Schloss JA, Bonazzi V, McEwen JE, Wetterstrand KA, Deal C, Baker CC, Di Francesco V, Howcroft TK, Karp RW, Lunsford RD, Wellington CR, Belachew T, Wright M, Giblin C, David H, Mills M, Salomon R, Mullins C, Akolkar B, Begg L, Davis C, Grandison L, Humble M, Khalsa J, Little AR, Peavy H, Pontzer C, Portnoy M, Sayre MH, Starke-Reed P, Zakhari S, Read J, Watson B, Guyer M (2009) The NIH human microbiome project. Genome Res. 19:2317–2323. https://doi.org/10.1101/gr.096651.109 CrossRefGoogle Scholar
- 48.He M, Sebaihia M, Lawley TD, Stabler RA, Dawson LF, Martin MJ, Holt KE, Seth-Smith HM, Quail MA, Rance R, Brooks K, Churcher C, Harris D, Bentley SD, Burrows C, Clark L, Corton C, Murray V, Rose G, Thurston S, van Tonder A, Walker D, Wren BW, Dougan G, Parkhill J (2010) Evolutionary dynamics of Clostridium difficile over short and long time scales. Proc Natl Acad Sci U S A 107:7527–7532. https://doi.org/10.1073/pnas.0914322107 CrossRefPubMedPubMedCentralGoogle Scholar
- 49.Maiden MC (2006) Multilocus sequence typing of bacteria. Annu Rev Microbiol 60:561–588. https://doi.org/10.1146/annurev.micro.59.030804.121325 CrossRefPubMedGoogle Scholar
- 53.Wasels F, Barbanti F, Spigaglia P (2016) Draft genome sequence of Clostridium difficile strain IT1118, an epidemic isolate belonging to the emerging PCR ribotype 018. Genome Announc 4(4). https://doi.org/10.1128/genomeA.00717-16
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.