A method for automated pathogenic content estimation with application to rheumatoid arthritis
- 779 Downloads
Sequencing technologies applied to mammals’ microbiomes have revolutionized our understanding of health and disease. Hence, to assess diseases’ progression as well as therapies longterm effects, the impact of maladies and drugs on the gut-intestinal (GI) microbiome has to be evaluated. Typical metagenomic analyses are run to associate to a condition (disease, therapy, diet) a pool of bacteria, whose eubiotic/dysbiotic potential is assessed either by α-diversity, a measure of the varieties populating the microbiome, or by Firmicutes to Bacteroides ratio, associated to systemic inflammation, and finally by manual and direct inspection of bacteria’s biological functions, when known. These approaches lead to results sometimes difficult to interpret in terms of the evolution towards a specific microbial composition, harmed by large areas of unknown.
We propose to additionally evaluate a microbiome based on its global composition, by automatic annotation of pathogenic genera and statistical assessment of the net varied frequency of harmless versus harmful organisms. This application is intuitive, quantitative and computationally efficient and designed to cope with the currently incomplete species’ functional knowledge. Our results, applied to human GI-microbiome data exemplify how this layer of information provides additional insights into treatments’ impact on the GI microbiome, allowing to characterize a more physiologic effects of Prednisone versus Methotrexate, two treatments for rheumatoid arthritis (RA) a complex autoimmune systemic disease.
Our quantitative analysis integrates with previous approaches offering an additional systemic level of interpretation here applied, for its potential to translate into clinically relevant information, to the therapies for RA.
KeywordsMicrobiome Pathogens Rheumatoid arthritis
Chronic rheumatoid arthritis
New onset rheumatoid arthritis
Nonsteroidal anti-inflammatory drugs
Operational taxonomic unit
Psoriatic arthritis patients
Trimmed mean of M-values
With the development of high-throughput technologies, large amounts of metagenomic data have been produced, especially with the sequencing of the 16S ribosomal RNA gene, used as proxy for taxa abundances in a microbial community. This has demonstrated how the gut intestinal (GI) microbes respond and adapt to different situations , how alterations of the microbial community impact on the development and functioning of the immune and metabolic systems , and, globally, how divergences from homeostasis (eubiosis) in this district are predictive of diseases (dysbiosis). Typical approaches to analyze these data consist of the evaluation of the α-diversity of Operational Taxonomic Units (OTUs, computational proxies for species) within each sample to understand the microbial population structure using Shannon  and Simpson  indexes. This is based on the observation that more variability offers a larger spectrum of microbial molecular functions and hence of responses to environmental variations , and, reversely, this criterion relies on the observed limited α-diversity in inflammatory bowel disease  and obesity .
Along the same line, evaluation of the imbalance in the physiologic abundances of Bacteroides and Firmicutes is observed to be a measure of the inflammatory state of the system and a proxy for dysbiosis due to the relative increase of facultative anaerobic microbes able to exploit byproducts of the host inflammatory processes .
From a different perspective, differential analyses compute microbial variations, and highlights OTUs whose abundance are significantly changed between two conditions, followed by annotation of OTUs to taxa and manual search of known organisms whose functions within the host environment help to shed light, for example, on the mechanisms that trigger or sustain a disease.
Worldwide, large efforts are ongoing to complete the taxonomy of mammalians’ microbes, with a particular focus on their effects on health and disease (Human Microbiome Project, HMP) in synergy with metatranscriptomics and metaproteomics analyses to elucidate functional information . Nevertheless, little is still known to date. As a result, despite the possibility to screen GI microbiomes at relatively low costs and with minimal invasiveness, it remains difficult to gain global understanding on the beneficial or deleterious effect of a condition, limited by the known bacteria (functions), thus leaving unaddressed, for example, the impact a novel therapy on the GI tract and, in the long run, on the immune and metabolic systems.
While awaiting for a (more) complete characterization of bacteria in the human GI microbiome, we propose to add a layer of interpretation by quantification of the varied composition of pathogens, with respect to a baseline, in statistical terms. This represents an informed base to further screen specific strains.
In fact, microbiology has cumulated, on harmful bacteria, a remarkable amount of information. From the well and long known Mycobacterium tuberculosis , more recent findings have shown how previously unsuspected noncommunicable diseases are also affected by bacterial alterations leading to the characterization of Porphyromonas gingivalis  in the mouth microbiome and Prevotella copri  in the GI microbiome as drivers of RA and to Lactobacilli-rich food conversely reported to improve RA symptoms .
As a result, it is possible to define bacteria as harmful when explicitly associated to a disease, or harmless (rather than beneficial, in a conservative perspective) otherwise. The collection of such information is not yet centralized, and we here offer a first curated database of this type of classification (part of the eudysbiome package, also added as Additional file 1: Table S1 for convenience).
This approach overcomes two current lacks: on one side, efficient and automated usability of the pathogenic potential information; and on the other side, a genera annotation strategy capable to fill the paucity of information available at the OTU level. Namely, we overcome these issues by: (i) centralizing available pathogenic annotation resources; (ii) devising a pathogenic genera definition, both implemented in a statistical pipeline available as Bioconductor package, offering tabular and graphical output.
Two words of cautions must be put forward for the usage of this approach. First, to offer the most detailed annotation we rely on OTUs/species (see Methods), that however imply a number of unknown/unannotated elements discarded from further analyses to avoid bias in the results. Second, the abundance of pathogens must be put into context, for example, healthy and long-lived hunter-gatherer populations are characterized by GI microbiomes with higher α-diversities than urban populations , including in this diversity numerous pathogens; however, when comparing the effects of treatments on a clinically uniform set of patients, the increased abundance of pathogens represents an added risk of comorbidity in individuals with already debilitated general health conditions. It is recommended, as in any omic analysis, to further manually investigate such global harmless/harmful trends by manual investigation of the emerging strains (as it is done for example in transcriptomics with the manual inspection of the genes identified in a statistically significant Gene Ontology biological function).
Globally, this approach should be considered as integrative and complementary to the existing ones to shed additional light on the effects of maladies, treatments and other external input on the host-microbiome supra-organism. To present the usability and informativeness of this approach, we apply it to the analysis of the GI microbiome of patients affected by rheumatoid arthritis (RA), a model for chronic, inflammatory and autoimmune diseases, spreading at very fast pace, and whose microbial composition is being continuously unveiled. For its incidence (1 % worldwide) and its exemplar characteristics (model disease) our results represents not only an important example of application but also meaningful results per se.
National Center for Biotechnology Information (NCBI) Pathogen Detection system (http://www.ncbi.nlm.nih.gov/pathogens/), using information on human pathogens (not foodborne pathogens) of “Acinetobacter” and “Klebsiella”;
Genome Database of Pathogens (GeneDB, ) for prokaryotic and eukaryotic pathogens and closely related organisms, collected via downloading the bacteria information in a “protein-coding” Gene Type giving rise to 12 pathogenic genera;
Pathosystems Resources Integration Center (PATRIC, ), a bacterial information system with 2365 bacteria genomes hosted by humans and involved in diseases;
Virulence Factor Database (VFDB, ), an integrated and comprehensive online resource for virulence factors of 30 pathogenic genera and related species;
Human Opportunistic Pathogens (HOPs) library, collected by the Gifu University, Genetic Information Genetic Resource Center of Human Pathogens (http://gtc.jpn.com/?p=1);
“Indigenous and pathogenic microorganisms by human body site”, by the Hardy Diagnostics company (https://catalog.hardydiagnostics.com/cp_prod/Content/hugo/IndigPathogOrganisms.htm) with two attributes: frequency (expected in a clinical specimen, from 1 to 3) and pathogenicity (expected when the organism is present, ≧2).
Additional missing species were searched in Pubmed with query terms < species name, human, pathogen>, manual screening of the resulting literature, and finally update into the above Genus-Species table.
eudysbiome R package
The package eudysbiome is developed in the statistical computing environment R and is released under the GNU General Public License within Bioconductor . It performs the analysis including species-level classifications of unknown 16S rRNA sequences, genus annotation as harmful or harmless based on the described pathogenic Genus-Species table above, and tests the association between microbial variations and a given condition.
The package takes as input a list of differential microbes abundances’ (reads) variation (Δg = g1 – g2) defined as the difference between a genus’ abundance in condition1 (g1) and at the baseline condition2 (g2). The calculation of Δg is left to the users, given the different types of normalizations and considerations to be done on a case by case basis. We here recommend to use limma  for good performance on small sample data, and tools such as metagenomeSeq , LefSe , metastats  for more general cases.
As a genus can collect under its name both harmful and harmless species, the proper annotation of a genus as harmless or harmful can benefit from the investigation of the species actually present in each dataset, so that, if a genus, including by definition also harmful species, does not include them in a specific sample, the genus can be annotated as harmless. By the same token, if none of this genus’ species actually appears in the data under study, the genus is discarded from the analysis for lack of (annotation on the) species, leading to the impossibility to annotate the genus as harmful/harmless. eudysbiome allows this (optional) more careful species classification and hence annotation, even in the case where the input data is given in the form of differential genera by directly calling the Mothur  command “classify.seqs” and mapping the unknown 16S rRNA sequences to a well-curated representative dataset of 16S rRNA reference sequences by Wang’s naïve Bayesian classifier, recognized as an efficient method and accurate classifier [24, 25]. To guarantee a fast species-level classification and minimize the needed computational resources, the package rely on the latest QIIME  released SILVA  (16S/18S, SSU119, https://www.arb-silva.de/no_cache/download/archive/qiime/) representative set created by clustering at 97 % sequence identity. After the annotated Δgs are made available, the package permits to group frequencies |Δg| into ∑|Δg| as increase of harmless bacteria abundances plus decrease (absolute value) of harmful bacteria abundances for the eubiotic contributions and viceversa for the dysbiotic. This is visually represented in a Cartesian plane with harmful/harmless microbes on the x-axis and ∑|Δg| on the y-axis, and summarized in a Condition × Impact table, both outputs of the package. The package further evaluates statistically the abundance of harmless/harmful variation’s impact of a given condition on the microbiome, in comparison to the microbiome of the reference condition. To elaborate the significance of the association between conditions and eubiotic/dysbiotic impacts, Fisher's exact test  is used on the frequency counts for testing the null hypothesis that conditions are equally likely to lead to a mostly harmless-composed microbiome when compared to the control (two-sided) or that one condition is more likely to be associated to a mostly harmless microbiomes than the other (one-sided Fisher).
Application to rheumatoid arthritis (RA)
16S rRNA genes from human samples collected in  represent the GI microbiomes of RA patients, either newly diagnosed (new onset RA, NORA) or chronically affected (Chronic RA, CRA), as well as psoriatic arthritis patients (PsA) treated with methotrexate (MTX), prednisone, opioids and, optional for all treatments, nonsteroidal anti-inflammatory drugs (NSAIDs). These data are analyzed, in the manuscript of origin, in search of disease-associated (NORA, CRA, PsA) variations of the GI microbiome in comparison to a healthy (HLT) baseline, independently of the therapy. Here, we deepened the investigation in search of RA treatment-associated GI variations. Irrespectively on the assumption of NSAIDs, samples were selected and re-grouped into five arms: 39 untreated new-onset rheumatoid arthritis (NORA), 11 untreated chronic rheumatoid arthritis (UCRA), 9 CRA samples treated with MTX (MTX), 3 CRA samples treated with prednisone (Prednisone) and 28 healthy controls (HLT). The only patient treated with opioids was removed from the analysis and so were the PsA patients. The representative sequences for each OTU and the OTUs abundance table with read counts down to the genus classification were downloaded from https://github.com/polyatail/scher_et_al_2013/tree/master/16S_Analysis.
Microbial diversity and differential analysis
OTU-based diversity was evaluated on read counts by Shannon  and inverse Simpson index  calculated by the R Vegan package  and averaged among samples in each arm for comparisons. OTUs were grouped at the genus level before differential analysis and genera lacking of genus classifications were classified to their higher-order taxonomy. To minimize the noise associated to low abundance, reads with small within group variance, genera with null abundance in more than 1 sample or summed abundance among samples below 5, were filtered out. Abundances were further normalized with trimmed mean of M-values (TMM) and converted to log2-cpm (counts per million) by Voom in the edgeR package to make data suitable to linear regression in limma differential analysis. Significantly differential genera were selected by fold change (FC > 2) and p-value (p < 0.05), differential ones with higher-order classifications were removed from further analyses.
Results and Discussion
The original analysis by Scher et al.  focuses on the GI variations from a healthy baseline (HLT) in association to a (stage of the) disease (NORA, CRA, PsA). As drug interventions strongly affect the immune response via the modulation (also) of the GI microbiome , we deepen the characterization of the GI microbiomes, disease-wise and explore additionally the effects of RA on the GI microbiome, therapy-wise (NORA, UCRA, MTX, Prednisone).
By the Firmicutes/Bacteroides criterion (Fig. 2c), the UCRA arm stands out with a ratio 2.4, 2.9, 3.3 and 2.8 folds higher than HLT, NORA, MTX and Prednisone, respectively (Fig. 2d), matching the well known inflammatory/dysbiotic state of UCRA patients. Globally we can conclude that the progression of the disease (NORA to CRA) is characterized by increasing diversity, where the increasing OTUs variety falls into the Firmicutes phylum (at the expenses of Bacteroides ).
It seems that once UCRA patients receive treatment, MTX lowers the diversity (Fig. 2a-b) and the inflammatory environment (Fig. 2c-d) bringing the system back to levels characteristic of the earlier stage of the disease (NORA), while Prednisone allows for a more physiological gain of diversity (Fig. 2a-b) and inflammatory environment (Fig. 2c-d), seemingly bringing the state of the GI closer to the HLT samples.
Contingency and contingency tests with HLT baseline
a. Contingency ∑|Δg|
b. Contingency test p-values
In particular, we can see that the eubiotic trend in Prednisone is due to the sole contributions of increasing harmless genera (1st quadrant in Fig. 4, Eubiotic frequency = 266 in Table 1a), limited by a dysbiotic contribution given by the increase of pathogens (2nd quadrant in Fig. 4 and Dysbiotic frequency = 102 in Table 1a). Differently, MTX presents only eubiotic variations (Dysbiotic frequency = 0 in Table 1a), obtained by the two fold contribution of harmless genera increase (1st quadrant) and pathogens’ decrease (3rd quadrant, globally reaching the Eubiotic frequency = 1965 in Table 1a). This leads, remarkably, in the MTX samples to the reduction of the population of Prevotella, well known trigger of the disease , which remains conversely uncontrolled in Prednisone.
These results account for variations across a large number of species in the GI suggesting a systemic effect broader than the the host metabolism as anti-inflammatory action known for Prednisone  and the host anti-proliferative effect for MTX . Indeed despite the well known limits of MTX and although its therapeutic activity is known to be associated to adverse effects also in the GI districts , not enough focus has been put yet on the broader impact of drugs on the patients as a whole, and only marginal attention is put to compensate such detrimental events with GI protective or boosting strategies [13, 34].
In order to help elucidate the functionalities promoted or harmed in the GI district by diseases and other environmental triggers, we propose to integrate the study of the composition of the GI microbiome with an automated and statistical characterization of its pathogenic potential. Application of this approach should be done in synergy with current approaches like the study of α-diversity and the Firmicutes/Bacteroides ratio. In particular we present an application to rheumatoid arthritis, a model malady for all autoimmune diseases (including diabetes), whose etiology and control at the microbiome level represent a critical topic in clinical research and we show how the addition of the pathogenic information can help in differentiating the forces at work in the complex host-microbiome interaction system.
We would like to thank Yuanhua Liu and Youtao Lu for valuable discussion.
This work has been supported by the NSFC n. 31171277.
Availability of data and materials
eudysbiome is an R package released under the GNU General Public License within the Bioconductor project, freely available at http://bioconductor.org/packages/eudysbiome/.
XZ implemented the methods, analyzed the data and wrote the manuscript; CN designed the study, contributed to data analysis and wrote the manuscript. Both authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
- 5.De Filippo C, Cavalieri D, Di Paola M, Ramazzotti M, Poullet JB, Massart S, et al. Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa. Proc Natl Acad Sci U S A. 2010;107(33):14691–6. doi: 10.1073/pnas.1005963107.CrossRefPubMedPubMedCentralGoogle Scholar
- 10.Ryan KJ, Ray CG, Sherris JC. Sherris medical microbiology : an introduction to infectious diseases. 4th ed. New York: McGraw-Hill; 2004.Google Scholar
- 23.Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75(23):7537–41. doi: 10.1128/AEM.01541-09.CrossRefPubMedPubMedCentralGoogle Scholar
- 28.Rice JA. Mathematical statistics and data analysis, Duxbury advanced series. 3rd ed. Belmont: Thomson/Brooks/Cole; 2007.Google Scholar
- 29.Jari Oksanen FGB, Kindt R, Legendre P, Minchin PR, O'Hara RB, Simpson GL, Solymos P, Stevens MHH, Wagner H. vegan: Community Ecology Package. 2016. Available at https://cran.r-project.org/web/packages/vegan/index.html.
- 30.Kinross JM, Darzi AW, Nicholson JK. Gut microbiome-host interactions in health and disease. Genome Med. 2011;3. Doi 10.1186/Gm228
- 34.Tieri P, Zhou X, Zhu L, Nardini C. Multi-omic landscape of rheumatoid arthritis: re-evaluation of drug adverse effects. Front Cell Dev Biol. 2014. doi: 10.3389/fcell.2014.00059.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.