Trait-based analysis of the human skin microbiome
The past decade of microbiome research has concentrated on cataloging the diversity of taxa in different environments. The next decade is poised to focus on microbial traits and function. Most existing methods for doing this perform pathway analysis using reference databases. This has both benefits and drawbacks. Function can go undetected if reference databases are coarse-grained or incomplete. Likewise, detection of a pathway does not guarantee expression of the associated function. Finally, function cannot be connected to specific microbial constituents, making it difficult to ascertain the types of organisms exhibiting particular traits—something that is important for understanding microbial success in specific environments. A complementary approach to pathway analysis is to use the wealth of microbial trait information collected over years of lab-based, culture experiments.
Here, we use journal articles and Bergey’s Manual of Systematic Bacteriology to develop a trait-based database for 971 human skin bacterial taxa. We then use this database to examine functional traits that are over/underrepresented among skin taxa. Specifically, we focus on three trait classes—binary, categorical, and quantitative—and compare trait values among skin taxa and microbial taxa more broadly. We compare binary traits using a Chi-square test, categorical traits using randomization trials, and quantitative traits using a nonparametric relative effects test based on global rankings using Tukey contrasts.
We find a number of traits that are over/underrepresented within the human skin microbiome. For example, spore formation, acid phosphatase, alkaline phosphatase, pigment production, catalase, and oxidase are all less common among skin taxa. As well, skin bacteria are less likely to be aerobic, favoring, instead, a facultative strategy. They are also less likely to exhibit gliding motility, less likely to be spirillum or rod-shaped, and less likely to grow in chains. Finally, skin bacteria have more difficulty at high pH, prefer warmer temperatures, and are much less resilient to hypotonic conditions.
Our analysis shows how an approach that relies on information from culture experiments can both support findings from pathway analysis, and also generate new insights into the structuring principles of microbial communities.
KeywordsSkin microbiome Trait-based analysis Enzyme activity Substrate use Temperature NaCl and pH range Bergey’s Manual of Systematic Bacteriology
The development of rapid, cost-effective sequencing technology has resulted in an explosion of microbiome research over the past decade. Microbial communities are now being sampled in almost every environment imaginable, ranging from the depths of the ocean [1, 2] to outer space [3, 4]. Reflecting the tremendous scope and magnitude of microbiome research are recent initiatives such as the Human Microbiome Project (HMP) [5, 6, 7, 8, 9] and the Earth Microbiome Project (EMP) [10, 11, 12]. The former aims to characterize all microbes on and in the human body, and the latter seeks to describe microbiomes across the entire globe. Already, discoveries from these and other, similar efforts are proving invaluable for understanding human disease [13, 14, 15, 16], developing novel therapeutics [17, 18], and improving agricultural yields [19, 20, 21].
Existing microbiome research tends to focus on cataloging taxonomic diversity. Microbial function, by contrast, is less well studied [22, 23]. Unfortunately, without an understanding of microbial traits and, in particular, how traits differ among different environments, it is virtually impossible to answer key biological questions, like why certain microbes live where they do . Trait-based analyses, which have a long history in macroscopic ecology [25, 26, 27], allow researchers to connect ecological traits to environmental associations, helping to explain the mechanisms underlying observed microbial distributions. The sheer diversity of typical microbiomes, however, makes trait-based analysis daunting.
Several strategies have been developed to circumvent challenges associated with trait-based microbial ecology. Shot-gun sequencing studies, for example, have been queried against reference databases, including COG/KOG, KEGG, eggNOG, Pfam, and TIGRFAM, to determine overrepresented genes, proteins, operons, and higher-order cellular processes [28, 29, 30, 31, 32, 33, 34, 35] that reflect microbial function. Meanwhile, similar efforts have been extended to amplicon sequencing using PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States)  and Tax4Fun —bioinformatics tools that infer microbial function based on reference databases, along with various assumptions about phylogenetic conservation. Although amplicon and shot-gun sequencing approaches appear comparable [37, 38], neither performs particularly well —likely because of problems with the underlying reference databases, which are coarse-grained , represent only a minute fraction of microbial diversity, and are heavily biased toward a few organisms and environments . More recently, machine learning techniques have been applied in an attempt correct for some of these problems and improve accuracy of trait prediction [40, 41].
Despite ongoing improvements in functional reference databases, the gold standard for defining microbial traits remains culture experiments. Decades of lab-based analyses have led to an impressive understanding of the functions of diverse microbial taxa, including many of those prevalent in microbiome studies. This information, however, is largely available through journal articles and Bergey’s Manual of Systematic Bacteriology [42, 43, 44, 45], neither of which is methodical in its presentation of data. Recently, there has been an effort to catalog trait information in more manageable and centrally organized databases, including StrainInfo , which collects trait data from biological resource centers and the JGI GOLD database, which allows users to enter known information on a handful of traits, including oxygen use, motility, and Gram stain. In addition, a recent text-parsing tool was developed that collects microbial descriptions from six separate sources, and then uses this information to predict microbial traits, including confidence scores . The alternate, more precise but also more work-intensive approach is to link traits determined from lab- and culture-based experiments to output from microbiome sequencing studies directly, by manually curating every organism identified in a particular metagenomics sample. Although the effort involved is immense, if curation is done in a systematic fashion, then the resulting database has added, long-term value.
Here, we introduce such a trait database for human skin microbial communities, and then use it to characterize the bacterial residents of human skin in trait space. Bacterial traits are further compared to characteristics of bacteria more broadly using a similar database generated without any bias toward a particular habitat . Finally, we compare traits across different skin environments to determine whether dry, moist, and sebaceous skin sites have functionally different microbial constituents. Many of the traits that we observe in skin microbiomes are consistent with expectations. For example, skin bacteria prefer warmer habitats and have higher salt requirements, in keeping with abiotic conditions on the skin surface. Several findings, however, suggest novel biological insight. Cocci, for example, are overrepresented on skin. Bacteria that form spores and possess phosphatases, by contrast, are underrepresented. Finally, relative to bacteria as a whole, skin bacteria are more likely to be anaerobic—a feature that is reflected not only in patterns of oxygen use, but also in distributions of oxidase and catalase activity, both of which are primarily beneficial in oxygen-rich environments.
Trait composition of the human skin microbiome
Figure 1b–f presents categorical traits for skin microbes. The majority of skin microbes are facultatively anaerobic, although there are sizeable fractions of strictly aerobic and strictly anaerobic organisms as well. Most skin microbes are also non-motile, and this is particularly true of abundant taxa. Still, an unexpectedly large proportion—approximately 40%—have flagella. No other forms of motility are strongly represented. Most skin bacteria are rod-shaped and occur in clumps. Overall, skin microbes are predominantly Gram-negative, although abundant bacteria are split equally between Gram-negative and Gram-positive taxa.
Mean quantitative trait data for all skin bacteria (>0.001% of reads in at least one sample) and abundant skin bacteria (0.1% of reads in at least one sample)
mean for all taxa (mean for abundant taxa)
GC content (%)
50.6 (51.7) %
NaCl concentration (%)
Comparing abundant versus rare skin bacteria, abundant taxa are more likely to use amino and organic acids. Eight amino acids (alanine, asparagine, aspartate, glutamate, glycine, leucine, proline, and serine; see Additional file 1: Supplemental Information II Table S2.3) are used more by abundant microbes than by the skin community as whole. Similarly, nine organic acids (acetate, citrate, formate, gluconate, malate, malonate, pyruvate, succinate, and valerate; see Additional file 1: Supplemental Information II Table S2.3) are used more by abundant microbes. For both amino acids and organic acids, all significant differences indicate that abundant skin taxa use these compounds more than skin taxa as a whole. Differences in consumption of other compounds, including alcohols and saccharides, are less biased toward overuse by abundant species. Indeed, two complex sugars (xylose and cellobiose) are used less by abundant taxa. Glucose, a simple sugar, on the other hand, is used more by abundant taxa (see Additional file 1: Supplemental Information II Table S2.3).
It is well known that certain taxonomic groups, for example Actinobacteria, are overrepresented among skin microbes and, in particular, among abundant skin microbes. While these groups are likely overrepresented because they have traits that make them uniquely adapted to the skin environment, it is possible that the traits that are important for living on skin are not those that we measured. Instead, the skin relevant traits may be other traits and the differences that we observe in the traits that we did measure may merely exist as a result of phylogenetic conservation. For this reason, we performed an additional analysis regressing the probability of a taxon being abundant versus rare against each trait individually, both for a naïve logistic regression and for a regression where phylogenetic relatedness was accounted for using the phylolm package in R . To test the overall significance of a fitted regression, we compared it to a null model using a likelihood ratio test. In general, we found that many of the differences between abundant and rare taxa were preserved when phylogeny was accounted for. For instance, oxygen use, spore formation, Gram stain, type of motility, H2S production, the presence of catalase, aesculin hydrolysis and urease, and use of succinate, acetate, gluconate (organic acids), serine, proline, and glutamate (amino acids) were significantly different among abundant and rare taxa, whether or not phylogeny was considered. A few traits were not significant once phylogeny was included, for example cell shape, the presence of alkaline phosphatase, pyrazinamidase and gelatinase, and use of xylose, glucose, cellobiose (saccharides), malonate, formate, valerate, pyruvate, citrate, aspartate (organic acids), asparagine, alanine, leucine, and glycine (amino acids). Finally, use of 2-ketogluconate (organic acid) and the ability to perform nitrate reduction were only significant when accounting for phylogeny (see Additional file 1: Supplemental Information II, Table S2.1–S2.3).
Trait overrepresentation on human skin
We do not consider differences in carbon substrate usage between skin and the world because this information was collected differently in the skin database relative to the world database, making comparison impossible (see “Materials and methods” section).
Phylum level differences
Summary of binary trait results across dominant phyla from the human skin microbiome. Black is used for traits that are over-represented in the world; red is used for traits that are over-represented in the human skin microbiome. (See Table S3.1 for more detail)
Summary of categorical trait results across dominant phyla from the human skin microbiome. Black is used for traits that are over-represented in the world; red is used for traits that are over-represented in the human skin microbiome. (See Table S3.2 for more detail)
Summary of quantitative trait results across dominant phyla from the human skin microbiome. Black is used for traits that take on higher values in the world; red is used for traits that take on higher values in the human skin microbiome. (See Table S3.3 for more detail)
Trait differences among skin sites
Human skin microbiomes generally structure according to skin environment, with three environments—dry, moist, and sebaceous—represented (see Additional file 1: Supplemental Information I, Table S1.1). Because taxonomic composition differs among these three environments, functional diversity may vary as well. To test this hypothesis, we performed pairwise comparisons (dry vs. moist, dry vs. sebaceous, and moist vs. sebaceous) for all traits and substrate utilizations in our database (see Supplemental Information V). Surprisingly, not one difference emerged among skin environments for enzyme activities, gas production, spore formation, pigment production, nitrate reduction, Gram stain, cell aggregation, or pH, temperature, and NaCl requirements (see Additional file 1: Figure S5.1i, iii, S5.2i, iii, S5.3i, iii). Abundant bacteria at sebaceous sites are less likely to be rods as compared to abundant taxa at moist sites (49% versus 68%, see Additional file 1: Figure S5.3iv). As well, anaerobes are slightly underrepresented at dry sites as compared to sebaceous sites (see Additional file 1: Figure S5.2ii), and GC content is slightly lower at dry sites as compared to moist sites (see Additional file 1: Figure S5.5), although these latter two trends only emerge when considering the full skin microbiome, not just abundant taxa. Unfortunately, when accounting for phylogeny, the model for cell shape was degenerate for abundant taxa. However, variation in oxygen use between dry and sebaceous sites was observed even with phylogenetic correction. We did not attempt to control for phylogeny for GC content, since this was a quantitative trait.
Substrate usage (see Additional file 1: Supplementary Information V, Figure S5.6–S5.11) is similarly constant among skin environments, and what few differences do exist only occur between moist and sebaceous sites. Specifically, bacterial use of three organic acids—quinate, malonate, and caprate—as well as glucosamine (a monosaccharide) is overrepresented at sebaceous sites. By contrast, bacterial use of three saccharides—rhamnose, xylose, and cellobiose—as well as glycine (an amino acid) and urea are overrepresented at moist sites.
Our finding of high similarity among skin sites is in keeping with previous studies , but contrasts with a KEGG analysis performed in Oh et al. . The discrepancy between our trait database analysis and the KEGG analysis may be because we considered a different set of functions. Alternatively, it may be because of differences in our definition of function prevalence. In particular, Oh et al.  quantified commonness of pathways across samples, whereas we quantify commonness of functions across taxa. Defining prevalence across species is not possible using pathway analysis, highlighting a distinction and benefit of our trait-based approach.
We have undertaken a comprehensive trait-based analysis of the microbial constituents of human skin. In doing so, we have built an extensive trait-based database that will benefit future endeavors to characterize the functional properties of the skin microbiome. Below, we discuss some of our findings in terms of biological insight and interpretations.
Catalase, oxidase, and oxygen tolerance
Catalase is the most broadly distributed enzyme across the entire skin microbiome, and the only enzyme present in a significantly higher fraction of abundant skin taxa as compared to skin taxa as a whole. This suggests that catalase may be particularly beneficial for survival on skin, which should not be surprising. The majority of human skin is exposed to oxygen, while the role of catalase is to protect cells against hydrogen peroxide (H2O2)—an oxidant primarily generated as a result of reaction between oxygen and growth substrates . Interestingly, however, catalase is still less common in skin bacteria as compared to bacteria as a whole. We speculate that this is because of the existence of one or more diverse, low-oxygen niches on human skin. Further evidence for such niches comes from the markedly lower prevalence of oxidase and the increased fraction of facultative and strict anaerobes and microaerophiles found on skin (see Additional file 1: Figure S3.1). One potential low oxygen niche is sebaceous follicles. These house the classic skin anaerobe, Propionibacterium acnes , and have been previously shown to be dominated by anaerobic taxa . Sequencing studies, however, have pointed to low microbial diversity within follicles , which is not consistent with our finding that ~ 1/3 of culturable bacterial diversity on skin is either anaerobic or microaerophilic. Thus, we hypothesize that there are additional, low-oxygen environments hosting anaerobic taxa. One potential candidate is mixed-species biofilms . Another is lower dermal layers, which may have been collected through scraping of the skin .
Several previous studies have considered the anaerobic portion of the skin microbiome, which is of interest because of its role in wound infections [65, 66]. These studies have found that counts of aerobes outnumber counts of anaerobes . Although this may seem to contradict our conclusions, our analysis is based on diversity, rather than absolute counts. Based on our work, we theorize that, though anaerobes and microaerophiles may be less abundant, they must still be quite diverse. Consistent with previous findings, we observe evidence of increased anaerobicity among microbes at sebaceous sites (see Additional file 1: Figure S5.2) . Similarly, our conclusion that anaerobes are less common at dry sites (see Additional file 1: Figure S5.2) accords with the KEGG analysis performed in , which found that dry sites harbored an abundance of citrate cycle modules.
Acid and alkaline phosphatases
Phosphatases enable bacteria to utilize certain components of soluble organic phosphorus , and thus are prevalent in environments where inorganic phosphorus is limiting. Almost 50% of microorganisms in soil and plant roots possess phosphatases [69, 70, 71]. By contrast, we find acid phosphatase in 7–8% of skin bacteria, and alkaline phosphatase in 12–13%; thus, we conjecture that phosphorus limitation is not significant in skin environments. This is surprising, because an experiment designed to measure loss of inorganic elements through healthy skin did not detect any phosphorus , nor is phosphorus abundant in human sweat [73, 74]. One explanation could be that skin bacteria rely on host-produced phosphatases [75, 76] to meet their needs. This would circumvent the metabolic cost of producing phosphatases, highlighting potentially unique aspects of microbial strategies in human-associated environments.
In a recent review article, Lennon and Jones  outlined factors promoting bacterial dormancy, with spore formation being an extreme case. Unlike the human gut, where few microbial genomes (~ 15%) show evidence of sporulation , human skin satisfies many of the conditions for dormancy. Skin, for example, is a highly inhospitable, exposed environment, lacking in resource availability . By contrast, the gut is well-fed and generally protected. Furthermore, residence times on skin are long as compared to in the gut. Despite these differences, we find that the prevalence of sporulation is similar on skin and in the gut, both of which are significantly lower than rates among bacteria more broadly (see Fig. 3). Only ~ 20% of skin taxa produce spores, and this number is drastically lower (3%) when considering abundant taxa. Clearly, then, human microbiomes favor species without sporulation. We surmise that this is a result of the constant environment provided by host homeostasis.
Cell shape and aggregation
Relative to the broader world, skin microbiomes are enriched for cocci and coccobacilli (see Fig. 3). There are several hypotheses for why this might occur. First, rods allow for increased surface-to-volume ratios, improving nutrient uptake by passive diffusion  or when nutrients are directly acquired from a surface . The fact that relatively fewer skin bacteria are elongated may thus indicate that nutrients on skin are readily available or, at the very least, are not acquired by passive diffusion (but see ). Second, although rods and filamentous cells are predicted to perform better under shear stress , cocci may be better able to fit into small pockets and pores of the stratum corneum. This is an alternate strategy for protection  that may be particularly advantageous on skin. Third, rod-shaped cells are more hydrodynamic, and thus can propel through liquid more efficiently . This, however, may be of minimal importance in skin environments (although it is worth noting that rods appear to be enriched in moist regions). By contrast, cocci move much faster under conditions of Brownian motion . Because skin bacteria frequently spread from one person to another through airborne release , a coccoid shape could facilitate interpersonal dispersal. Interestingly, coccoid cells can acquire some of the advantages of a rod shape (e.g., increased surface attachment) by growing in chains . Despite this, chains, like rods, are underrepresented on human skin, further supporting our conclusion that skin selects for a spherical, rather than elongated shape.
Although many different substrates are consumed by skin bacteria, several stand out as being particularly important for success. Bacterial use of organic and amino acids, for example, shows enrichment in abundant skin bacteria. Interestingly, all eight of the amino acids that we find used significantly more by successful skin species have been positively identified in fingerprint samples . This is consistent with our conclusion that these are important skin nutrients. Similar to amino acids, many of the organic acids that are used by a greater fraction of abundant skin taxa also appear commonly on human skin. This includes lactate, pyruvate , formate , caprate, and valerate . In other cases, nutrients whose use is overrepresented among abundant taxa may not be produced by human skin, but rather, by dominant skin constituents. Succinate, for example, is a skin fermentation product of Staphylococcus epidermidis, meaning that it is likely widely available on the skin surface . Further analysis of the chemical composition of skin secretions, not only by the human host but also by the entire skin microbiome, will help elucidate our findings regarding preferential substrate use.
Substrates that are less used by abundant skin taxa tend to be plant sugars, for example cellobiose , rhamnose , and xylose . It is not difficult to understand why the ability to consume plant compounds provides little advantage on skin. Surprisingly, however, consumption of these sugars seems to be preferentially concentrated at moist sites, at least relative to sebaceous sites (see Additional file 1: Supplemental Information V, Figure S5.8 and S5.9). It is not obvious why there would be any benefit of plant sugar consumption in these regions. Urea use is also more common at moist sites (see Additional file 1: Supplemental Information IV, Figure S5.11), again for reasons that are unclear. In fact, urea use in general is surprising. Despite being prevalent on human skin , urea is one of the least commonly used substrates in our study (see Figs. 1 and 2). Why urea is not used by more skin bacteria, and why it seems to be used most at moist sites, highlights how trait-based analyses can uncover new, and unexpected trends, opening novel lines of inquiry that will ultimately help to elucidate factors governing skin microbiome composition.
Comparison to ProTrait
Both our database and the ProTrait database  draw from a vast literature of culture-based experiments. Whereas we manually curate our data, the ProTrait database uses a text-mining algorithm. Not surprisingly, our database contains information on fewer bacterial species (971 vs. 3046, with 25 unique to our database). Coverage of traits, however, is similar. We include several enzymes and carbon sources (for example arylsulfatase, pyrazinamidase, tellurite reductase, caprate, itaconate, suberate, succinate, urocanate, valerate, 3-hydroxybutyric acid, 3-hydroxybenzoate, asparagine, ornithine, phenylalanine, proline, threonine, tryptophan, glucosamine, methyl-B-d-glucoside, butanol, xylitol, 2,3-butanediol, carnitine, phenethylamine, putrescine, thymidine, uridine, and 2-aminethanol) that are not in ProTrait; however, the ProTrait database contains other enzymes and substrates that are not in our database. Interestingly, there do not appear to be significant differences in error rates between the two databases, at least for traits whose values are specified. The databases do, however, substantially differ in trait coverage. In particular, our database specifies the values of traits for a greater number of organisms, whereas the ProTrait database is more likely to report traits as unknown, at least using a precision of ≥ 0.9 (see Supplemental Information VI for several example comparisons).
Our curated trait-based approach has many benefits, but also some draw-backs. First, we only consider well-defined taxa, ignoring detected taxa that have not been fully characterized, as well as all “dark matter” . This could bias some of our predictions. While functional database methods are not as restricted in this way, they still rely on detection of orthologous genes. Consequently, both approaches are likely to miss at least some traits, particularly when these arise from poorly characterized taxonomic groups. Another complication of our approach is that it relies on conservation of functional traits within a species. Though our assumptions are likely less severe than tools like PICRUSt, functional traits are not always conserved. In compiling our database, we recorded evidence of strain variation, which suggested that interstrain differences in carbon source utilization are most common (14% of taxa), followed by differences in enzyme activity (11% of taxa). Although such variability complicates our analyses, it is more likely to obscure patterns than create them. Thus, when a pattern is detected, it likely reflects true biology.
Many opportunities exist for increased trait-based analysis of microbiome communities. Future studies considering additional human and non-human environments will help elucidate the structuring principles and biological mechanisms driving patterns in worldwide microbial distributions. Meanwhile, extended analyses of skin microbiomes will further highlight the principles governing community assembly. Analyses that quantitatively account for microbial abundance, for example, could clarify differences among dry, moist, and sebaceous sites, while further gradation by body location is also possible. Another extension would be to consider functional trait differences between different people—something that would be particularly informative when comparing individuals with skin disease to healthy controls.
Trait-based analyses and functional comparisons are the next step in microbiome research. Although most studies attempting to do this have taken a functional database/pathway analysis approach, culture and lab-based studies afford unique benefits. Our analysis of the skin microbiome has elucidated some of these benefits, detecting different patterns than were observed using KEGG . This, in turn, has opened up a range of questions about why specific microbes exist in certain skin environments, and what they are doing to survive.
Materials and methods
Species list for the human skin microbiome
We defined a list of skin bacterial species using a recent study  that employed shotgun sequencing (see Additional file 1: Supplemental Information I, Table S1.1). Specifically, whole genome shotgun data from the NCBI Sequence Read Archive (SRA) project SRP002480 was obtained from the SRA FTP site and converted to paired-end FASTQ format using the splitsra script in our Git repository hosted at: https://bitbucket.org/skinmicrobiome/metagenomics-scripts. FASTQ data originating from the same BioSample were consolidated into the same file using a custom shell script and the SRA RunInfo table found here: http://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP002480.
A reference database was constructed for the Kraken classifier  using the complete genomes in RefSeq for the bacterial (2199 taxonomic IDs), archaeal (165 taxonomic IDs), and viral (4011 taxonomic IDs) domains, as well as eight representative fungal taxonomic IDs, the Plasmodium falciparum 3D7 genome, the human genome, and the UniVec Core database (ftp://ftp.ncbi.nlm.nih.gov/pub/UniVec). Low complexity regions of the microbial reference sequences were masked using the dustmasker program with a DUST level of 20 [http://www.ncbi.nlm.nih.gov/pubmed/16796549]. After masking, every 31-mer nucleotide sequence present in the collection of reference FASTA sequences was stored at the taxonomic ID of the lowest common ancestor among the leaf nodes that share that 31-mer (see  for details). The total size of the database plus index was 110 GB.
Each input read from SRA project SRP002480 was assigned a taxonomic ID using Kraken by finding exact matches between every 31-mer nucleotide sequence present in that read and the database of 31-mers constructed above. Because of the hierarchical storage of k-mers in the database, reads can be classified at more general taxonomic levels than the specific strain sequences that were used to build the database. Output from the Kraken classification was summarized by taxonomic ID along with the number of unique k-mers detected in the data using the kraken-report-modif script (present in the metagenomics-scripts repository linked above). The total number of unique k-mers for each taxonomic ID in the database was obtained using the count_kmers.pl script, and full taxonomic strings were generated using the taxid2taxstring script, both included in the metagenomics-scripts git repository linked above.
Two separate lists were constructed from the above output (see Additional file 1: Supplemental Information I, Table S3.1). The first list, representing all human skin taxa, was determined by recording any species that occurred in at least one sample with a relative abundance > 0.001% of reads. We set a lower bound on the percentage of reads because taxa with only a handful of reads may be spurious and/or may represent incorrect taxonomic assignments. The second list, representing abundant skin taxa, was determined by recording any species that occurred in at least one sample with a relative abundance of 0.1% of reads. We chose to consider abundance classes (all taxa vs. abundant taxa), rather than specifically accounting for abundance because abundance estimation from shotgun sequencing data is notoriously difficult.
Skin database compilation
Using the lists of taxa generated above, we compiled a database of microbial traits. For this, we relied on Bergey’s Manual of Systematic Bacteriology [42, 43, 44, 45] and the initial journal articles describing each species. We only considered validly described species and did not include Candidatus taxa, since little information was available for these. Our database contains information for 971 species.
World database compilation
We used a database compiled from species descriptions in the International Journal of Systematic and Evolutionary Microbiology. A full description of this database, including its availability, can be found at  (see also, Additional file 1: Supplemental Information I, Table S1.2).
Depending on the variable, we performed three types of comparisons: binary, categorical, and quantitative, across two sets of contrasts: skin vs. world and within skin bacteria, among the three skin environments: dry, moist, and sebaceous. These comparisons were conducted across all Bacteria and the four major phyla, separately considering abundant (> 0.1% of reads) and all taxa (> 0.001% of reads) respectively.
Binary comparisons were performed on variables that had two outcomes (e.g., positive and negative). When making two-way binary comparisons, we estimated proportion of occurrence with standard errors using a standard binomial model. For an overall test of difference in proportion, we used a Chi-square test. Pairwise comparisons were made using the standard errors of the binomial proportion. We visualized the comparisons with scatter plots of point estimates and error bars, using the 45° equality line as a guide for relative prevalence of the variables.
Categorical comparisons were performed on variables with multiple discrete, unordered outcomes (e.g., chain, clump, or singly). We compared the relative frequencies of the different outcomes in skin vs. world (or pairwise across skin environments) using a randomization test in which we resampled the data 105 times and computed a p value for the null hypothesis of equality of proportions by computing the number of randomized samples that were less extreme than the observed proportion.
Quantitative outcomes (e.g., volume, pH tolerance) were compared using a nonparametric relative effects test based on global rankings using Tukey contrasts . We chose this test because it is robust to highly non-normal distributions and non-uniform variances and controls appropriately for multiple comparisons. We used box-and-whisker plots of each variable for visualization of the medians and deviations in the data.
Finally, to explore the role of phylogenetic conservation as an explanation for observed trends, for all binary and qualitative traits, we regressed the probability of a taxon being abundant versus rare or being from skin versus the world against each trait individually, both for a naïve logistic regression and for a regression where phylogenetic relatedness was accounted for. For the latter, we used the phylolm package in R  and the phylogenetic tree from Yarza et al. . A handful of taxa were missing from the tree, and these were ignored in subsequent analysis. To test the overall significance of a fitted regression, we compared the regression to a null model using a likelihood ratio test. We then compared p-values for the naïve logistic regression and the regression with phylogenetic correction.
All statistical analysis was performed using the R programming language (R Code Team 2016), with the quantitative analysis performed using the nparcomp package .
Note that we have ignored several compounds (e.g., carnitine, phenylethylamine, methyl-pyruvate) where results were only reported for a handful (< 20) of species.
We thank Stuart Jones and Noah Fierer for guidance and for providing their database of world microbes.
This work was support by the U.S. Army Research Laboratory and the U.S. Army Research Office under contract/grant number #W911NF-14-1-0490 (DK).
Availability of data and materials
SB, DK, and WF conceived of the paper, and SB oversaw trait database generation. EG and JW performed all statistical analyses. Each of JB, CD, and RF entered data on over 200 organisms. TM, PT, and FB performed all analyses related to generation of skin taxon lists. SB, DK, WF, FB, and EG wrote the paper.
Ethics approval and consent to participate
Consent for publication
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
- 21.Barea J. Future challenges and perspectives for applying microbial biotechnology in sustainable agriculture based on a better understanding of plant-microbiome interactions. J Soil Sci Plant Nutr. 2015;15(2):261–82.Google Scholar
- 25.Schimper AFW. Plant Geography Upon a Physiological Basis, eds Groom P, Balfour IB, Fischer WR. Oxford: Clarendon Press; 1903.Google Scholar
- 32.Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2011:gkr988.Google Scholar
- 33.Muller J, Szklarczyk D, Julien P, Letunic I, Roth A, Kuhn M, et al. eggNOG v2. 0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 2010;38(suppl(1):D190–D5. https://doi.org/10.1093/nar/gkp951.PubMedPubMedCentralCrossRefGoogle Scholar
- 42.Garrity G, Staley JT, Boone DR, De Vos P, Goodfellow M, Rainey FA, et al. Bergey's manual® of systematic bacteriology: volume two: the Proteobacteria: Springer New York; 2006.Google Scholar
- 43.Vos P, Garrity G, Jones D, Krieg NR, Ludwig W, Rainey FA, et al. Bergey's manual of systematic bacteriology: Volume 3: the Firmicutes: Springer New York; 2011.Google Scholar
- 44.Bergey DHB, Garrity GM, Boone DR, Brenner DJ, Castenholz RW, Goodfellow M, et al. Bergey's manual of systematic bacteriology: the Bacteroidetes, Spirochaetes, Tenericutes (mollicutes), Acidobacteria, Fibrobacteres, Fusobacteria, Dictyoglomi, Gemmatimonadetes, Lentisphaerae, Verrucomicrobia, Chlamydiae, and Planctomycetes: Springer New York; 2011;4.Google Scholar
- 45.Goodfellow M, Kämpfer P, Busse H-J, Trujillo ME, Suzuki K-i, Ludwig W, et al. Bergey’s manual® of systematic bacteriology: volume five the Actinobacteria, part a: Springer New York; 2012.Google Scholar
- 47.Brbić M, Piškorec M, Vidulin V, Kriško A, Šmuc T, Supek F. The landscape of microbial phenotypic traits and associated genes. Nucleic Acids Res. 2016:gkw964.Google Scholar
- 52.Yosipovitch G, Xiong GL, Haus E, Sackett-Lundeen L, Ashkenazi I, Maibach HI. Time-dependent variations of the skin barrier function in humans: transepidermal water loss, stratum corneum hydration, skin surface pH, and skin temperature. J Investig Dermatol. 1998;110(1):20–3.PubMedCrossRefGoogle Scholar
- 69.Cosgrove D. Metabolism of organic phosphates in soil. Soil Biochem. 1967;1:216–28.Google Scholar
- 71.Khan AA, Jilani G, Akhtar MS, Naqvi SMS, Rasheed M. Phosphorus solubilizing bacteria: occurrence, mechanisms and their role in crop production. J Agric Biol Sci. 2009;1(1):48–58.Google Scholar
- 95.Konietschke F, Placzek M, Schaarschmidt F, Hothorn LA. Nparcomp: an R software package for nonparametric multiple comparisons and simultaneous confidence intervals. J Stat Softw. 2015;64(9):1–17.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.