Background

Artificial gut models have been used for decades to replicate the human intestinal environment and study the dynamics of resident microbes [1,2,3]. These systems have the advantage of being sampled with arbitrary frequency, house environments that can be precisely controlled, and often face fewer ethical concerns than in-human studies [4]. Artificial gut models have therefore been used to discover the effect of nutritional supplements on infant gut microbiota [5], mechanisms by how commensals repress Salmonella virulence [6], and dose-dependencies between microbiota-targeting therapies and metabolite production [7].

Yet, despite the utility and long development of artificial gut models, fresh questions regarding their fundamental biology remain. Such questions include the rapidity with which the composition of these microbial communities varies both in the presence and absence of perturbations [1, 8]. While it is well known that in vivo microbiota may change on sub-daily timescales due to host forcing, it is unclear whether such sub-daily dynamics may be seen in the absence of host effects [9]. The degree to which replicate ex vivo systems exhibit stochastic behavior, or conversely behave deterministically, remains another outstanding question [1, 10]. Finally, a deeper understanding of the reproducibility of artificial gut models may also have implications for our understanding of the relative importance of various factors in shaping the dynamics of host-associated microbiota [11].

Gaining greater insight into the biology of artificial gut models though requires addressing analytical and statistical challenges. One key challenge is the often unquantified impact of artificial sources of intra-study variation such as variation due to sequencing counting and technical variation from sample processing (e.g., unintended experimental errors and batch effects) [12,13,14,15,16,17,18]. In particular, while the impact of variation due to sequence counting is more commonly addressed and modeled [12, 19,20,21], the impact of technical variation from sample processing is often unquantified and less well understood. Such technical variation may alter inference of both the magnitude and direction of variation among bacterial taxa. In addition, it is well established that sequencing studies of microbiota provide information only on the relative amounts of taxa and not their absolute abundance [22,23,24]. The analysis of such relative (or compositional) data remains an open area of study and naive analyses can lead to a distorted view of the patterns of variation present in a community [21, 22, 25,26,27]. Notably, the variation due to sequence counting, technical variation from sample processing, and compositional effects are all challenges facing in vivo microbiota studies, in addition to artificial gut experiments.

The design of ex vivo microbiota studies is also confronted by the lack of appropriate benchmark datasets. For example, defining suitable sampling frequencies for artificial gut studies requires insight into the timescales on which human gut microbiota fluctuate [28]. Yet, even though gut microbiota dynamics in vivo and bacterial mock communities in vitro are known to behave on the timescale of hours [9, 29, 30], most longitudinal studies to date in ex vivo models are only sampled on the order of days [1, 8, 10, 31]. Additionally, while the collection of technical replicates could be used to quantify the effects of technical variation [32], such replicate sampling is generally not performed in longitudinal microbiota studies.

Here, we integrated model development and experimental design to address key challenges facing the analysis and design of longitudinal artificial gut studies. We collected longitudinal samples with up to hourly frequency from replicate artificial gut models over the course of 1 month. We combined this longitudinal sampling with the collection of technical replicates so that we could characterize the impact of technical sources of variation on observed microbiota dynamics. To isolate separate biological and technical sources of community variation in our dataset, we created a modeling framework called MALLARD that is based on a class of generalized dynamic linear models appropriate for microbiota time-series data. Together, our dataset and modeling framework allowed us to investigate the patterns and timescales of microbiota variation in an artificial human gut.

Results

Longitudinal modeling

To separate biological and technical variation in artificial gut time-series, we introduce an extension of dynamic linear models (DLMs) tailored for microbiota data. DLMs have widespread use including industrial applications such as commercial forecasting and engineering control systems [33]. At their core, DLMs model a system as a time-varying state that is observed through a noisy process. We extended DLMs to a class of multinomial logistic-normal dynamic linear models by building off of the work by Cargnoni C, Muller P, and West M [34]. We refer to this as the MALLARD class of models.

We analyzed the artificial gut dataset described below using a MALLARD model that is generative and assumes there exists an unobserved microbial composition (θt; the state) that evolves through time (Fig. 1a) due to stochastic biological variations (wt). We regard the state sequence (θ1, …, θt, …, θT) as the true microbial dynamics in a time-series. Random technical variations (vt) are then added to the true system state (θt) resulting in the composition ηt (Fig. 1b). We observe ηt through a multinomial counting process. This formulation is similar to the constant level model commonly used in Bayesian time-series analysis [34]. By separately modeling the process generating wt and vt with distinct covariance matrices (W and V, respectively), we can decouple biological and technical variations in artificial gut datasets (Fig. 1c). Visually, we found this model provided a good fit to artificial gut data (Additional file 1).

Fig. 1
figure 1

A generative model for microbial dynamics obscured by technical variation. a Microbial dynamics result from biological variation. The time series (θ1, , …, θt, …, θT) defines the dynamics of a microbial community and results from biological variations (w1, , …, wt, …, wT) which are assumed to be independent and identically distributed (i.i.d.) logistic-normal with mean zero and covariance W. b Technical variation obscures microbial dynamics. Technical variation (v1, , …, vt, …, vT) from sample processing introduces noise into measurements of microbial dynamics and are assumed to be i.i.d. logistic-normal with mean zero and covariance V. c Replicate sampling enables quantification of technical variation. Hypothetical collected samples are denoted by green squares. Differences between longitudinal samples reflect both biological and technical sources of variation. In contrast, differences between technical replicates (samples from the same time point) should reflect only technical sources of variation and can be used to estimate V (“Methods” section). d Overview of longitudinal model. The microbial dynamics and the confounding technical variation (a, b) are modeled in an isometric log-ratio (ILR) space such that the logistic-normal distribution transforms to a multivariate normal distribution for mathematical convenience (“Methods” section). The observed count data is assumed to be distributed multinomial from the compositions (η1, , …, ηt, …, ηT) (“Methods” section)

Daily and hourly gut microbiota time-series in an artificial human gut

We applied our model to an artificial gut that was constructed using continuous-flow anaerobic bioreactor systems that have been validated as models of human gut microbiota [1, 6, 14, 35, 36]. The same starting human fecal inoculum was seeded into replicate ex vivo vessels (n = 4) and cultured for 1 month (Fig. 2 and Additional files 2, 3, 4, and 5). Throughout the experiment, pH, temperature, media input rates, and oxygen concentration were all fixed (“Methods” section). To introduce microbial dynamics into our systems, a single bolus of Bacteroides ovatus isolated from the stool donor was administered to the system on Day 23 (“Methods” section). The B. ovatus bolus did not have discernable effects on microbial dynamics, but the media it was suspended in appeared to induce minor shifts in the relative abundances of select bacterial taxa that were visible with hourly sampling (Additional file 3). Additional microbial dynamics related to media input were generated by an inadvertent feed disruption in two vessels between days 11 and 13 of the study. Overall, similar to previous studies [1, 10], the artificial gut maintained much of the microbial diversity present in the inoculating stool: 91% of bacterial families present on days 1–5 of the study were detected between days 23 and 28. The 9% of bacterial families that did not persist represented only 0.06% of the total sequencing reads in the dataset.

Fig. 2
figure 2

Proportions of ten most abundant bacterial families over time. Four identical continuous-flow anaerobic bioreactor systems were each inoculated from a single human fecal specimen and cultured over the course of 1 month. Proportions of bacterial taxa at sampled time points were estimated by dividing observed read counts from 16S rRNA sequencing by the total number of counts observed for each sample. In addition to daily sampling (shown here), hourly samples were taken from each vessel (480 hourly samples total) as well as 20 technical replicate samples from the final time point of each vessel (Additional file 4). Proportions from the hourly sampling data are shown in Additional file 3. PCoA analysis using the Aitchison distance is shown in Additional file 5

To investigate the timescales on which microbial communities vary ex vivo and to characterize the impact of technical variation on our measurements, we developed a sampling scheme for our artificial gut that featured fine temporal resolution and numerous technical replicates (Additional file 4). As in previous studies, all four vessels of the artificial gut were sampled daily over the course of 1 month. To investigate potential sub-daily variation, we aimed to oversample the system and collected 120 sequential hourly samples, from each replicate vessel, during a 5-day period. To estimate the technical variation in our measurements, we collected 20 replicate samples from the final time-point of each artificial gut vessel (Fig. 1c; “Methods” section). As samples collected in the same vessel at the same time are expected to be biologically identical (i.e., feature no biological variation), any variation in the resulting community measurements between these technical replicates can be ascribed solely to technical sources (Fig. 1c). To ensure that the technical variation profile of the replicate sampled matched that of the longitudinal samples, all samples were randomized and processed together for sequencing.

The structure and magnitude of technical variation from sample processing

After fitting the MALLARD model to the resulting artificial gut dataset, we investigated the technical variation from sample processing (V) to our inference of variation due to biological sources (W). Variation from sample processing and variation due to sequence counting were represented as separate processes (Fig. 1d). Differing correlation structure between V and W would support our hypothesis that sources of technical variation could obscure biological forces acting on microbiota within an artificial gut. Indeed, a permutation analysis indicated it highly improbable that V and W had the same correlation structure (posterior probability < 1%; “Methods” section). Low-dimensional projections of the posterior distributions of V and W supported this conclusion and revealed how overall variation patterns involving bacterial families like Lachnospiraceae, Fusobacteriaceae, and Bacteroidaceae more strongly resembled patterns of technical variation than biological variation (Fig. 3a). Thus, some patterns of covariation among taxa in artificial gut studies may be due to technical sources of variation.

Fig. 3
figure 3

Structure and magnitude of biological and technical variation. a Ternary plot showing the 95% probability regions of the logistic-normal distributions corresponding to W, V, and W + V along the Bacteroidaceae, Fusobacteriaceae, and Lachnospiraceae subcomposition. To visualize posterior uncertainty, 100 posterior samples of each of these quantities are plotted. b Mean and 95% credible interval for total biological (Tr(W)) and total technical (Tr(V)) variation as a function of sampling interval in hours

We next investigated the relative magnitude of technical variation from sample processing and biological variation as a function of sampling frequency. Total variation is defined as the trace of a covariance matrix [37]. We therefore computed Tr(W)/Tr(V), which is analogous to the signal-to-noise ratio used in signal processing [38]. We found that for our time-series analyzed on an hourly basis (“Methods” section), the median ratio of biological to technical variation was 0.30 (0.26–0.36 95% credible interval). We therefore estimate that only 24% (21%–26%) of the total variation in relative abundances was due to biological sources. This result was insensitive to perturbation of our priors (Additional file 6). Moreover, because variance is additive between time-points within our model, we could estimate biological and technical variation as a function of sampling interval (Fig. 3b; “Methods” section). We found that at a sampling interval of 3.5 h, the total biological and technical variation was approximately equal. At sampling frequencies faster than 3.5 h, technical variation outweighed biological variation; at slower sampling frequencies, signal associated with biological variation exceeded technical noise.

Timescales of microbial dynamics

In order to explore the timescale of microbial dynamics within replicate artificial gut vessels, we visualized microbial dynamics using PhILR balances. PhILR balances represent the log-ratio between phylogenetically neighboring clades of taxa, providing a phylogenetically informed way to study microbial dynamics without compositional artifacts [22]. We quantified the magnitude of changes in balance values in units of evidence information (e.i.), which are a measurement of compositional change [39]. A 1 e.i. change is equivalent to an approximately 4-fold change in the ratio of two bacterial taxa and a 2 e.i. change is equivalent to an approximately 17-fold change in the ratio of two taxa (see “Methods” section for further discussion). Multiple balances exhibited sub-daily dynamics (Additional files 7 and 8), most notably the ratio of bacteria from the phylum Bacteroidetes to bacteria from the phyla Proteobacteria and Fusobacteria (Fig. 4). The balance between Bacteroidetes and Proteobacteria/Fusobacteria appeared to fluctuate on timescales shorter than 1 day with an amplitude of approximately 1 e.i. (0.5–1.5, 95% credible interval; Fig. 4b). Balance dynamics did not correspond to recorded environmental or technical variations (e.g., media changes, identity of the researcher collecting the sample, sequencing batch number, B. ovatus supplementation, or the feed disruption of vessels 1 and 2) and did not display an exact 24-h periodicity (Additional file 9). Balance fluctuations were observed in all four replicate artificial gut models, but did not appear to be synchronized as would be expected if these dynamics were driven by a shared environmental factor (Additional file 8). We ultimately could not identify a technical or environmental cause of fluctuating balance dynamics on sub-daily timescales.

Fig. 4
figure 4

Sub-daily microbiota dynamics within the artificial intestine. a An annotated phylogenetic tree defining PhILR balance between the phyla Fusobacteria, Proteobacteria, and Bacteroidetes. The balance reflects the scaled log-ratio of the geometric mean relative abundance of families in the Bacteroidetes phyla (numerator, +) to the geometric mean relative abundance of families in the Fusobacteria and Proteobacteria phyla (denominator, −) (“Methods” section). b Mean, 50%, and 95% credible interval of posterior distribution for θ (microbial dynamics in absence of technical variation) for the PhILR balance defined in (a). The full list of PhILR balances and corresponding posterior estimates are shown in Additional files 7 and 8

Exploratory analysis of biological variation

We next explored the biological variation captured in our model by investigating the pairwise variation between bacterial families in our dataset. Direct taxon-level analysis of the covariance matrix W is difficult though because the elements of this matrix are balances—not individual taxa. We therefore investigated the temporal variation of specific taxon pairs using a tool from compositional data analysis called the variation array [40]. A variation array represents the variance of the log-ratio between pairs of taxa (“Methods” section). When variance of a pairwise log-ratio is near zero, the two taxa positively covary; whereas when this variance is, high the two taxa exhibit either unlinked or exclusionary patterns [22, 41]. The resulting variation array (corresponding to W) revealed that most temporal variation was contained in log-ratios between four bacterial families: the Rikenellaceae, Synergistaceae, Enterobacteriaceae, and Fusobacteriaceae. By contrast, log-ratios between other bacterial families were approximately 1 order of magnitude smaller (Fig. 5a). Overall, we estimated that the Rikenellaceae, Synergistaceae, Enterobacteriaceae, and Fusobacteriaceae accounted for 72% (69–75%; CLR basis) of the total biological variation seen in the dataset (Additional files 10 and 11).

Fig. 5
figure 5

The decomposition of biological variation among bacterial families. a Heatmap of posterior distribution of log-ratio variance (ρ) between pairs of bacterial families. Heatmap color is given by the median of the posterior distribution of ρ. Columns and rows refer to the bacteria in the numerator and denominator of the corresponding log-ratios respectively. A similar decomposition of technical variation is shown in Additional file 11. bd The highest variance bacteria displayed distinct temporal patterns (95% posterior credible regions for θ). Hourly sampling results in smaller posterior credible intervals and more resolvable dynamics in the time around day 23. b The balance between the family Rikenellaceae versus all other bacterial families decreased substantially during the feed disruption of vessels 1 and 2. c The balance between the Fusobacteriaceae and Synergistaceae versus all other bacterial families slowly increased over the course of the study after an initial acclimation period. d The balance between the family Enterobacteriaceae versus all other bacterial families displayed fluctuating dynamics likely related to those shown in Fig. 4

We noticed an inverse relationship between the biological variation of taxa to their relative abundance in the starting inoculum. We observed this relationship by fitting a linear regression model to each posterior sample in a CLR-transformed space (95% posterior credible interval for regression slope: − 0.26 to − 0.18; Additional file 12; “Methods” section). Thus, our analyses suggested that more variable taxa in the artificial gut tended to be rarer at the time of inoculation. This result was insensitive to perturbation of our prior assumptions (Additional file 13), and there was only a 3.8% posterior probability that the observed inverse relationship was an artifact of low abundance families being more numerous than high abundance ones (“Methods” section).

To better understand the patterns of variation we observed from the above analysis, we manually created three balances which together highlighted patterns that we believe explain the high variability of the Rikenellaceae, Synergistaceae, Enterobacteriaceae, and Fusobacteriaceae families (Fig. 5bd). Together, these three manually curated balances explained 73% of the biological variation in the dataset (69–76%, 95% credible interval). This manual curation was informed by the inspection of microbial dynamics within the PhILR basis and a hierarchical cluster analysis which suggested that the Fusobacteriacae and Synergistaceae shared similar dynamic patterns and should therefore be grouped together for analysis (“Methods” section, Additional file 14). One highly variable balance, which represented the ratio of Rikenellaceae to the other nine bacterial families, exhibited a 3 e.i. decrease during days 11–13 in two artificial gut replicate vessels before recovering over the subsequent week (Fig. 5b). This decrease corresponded with the transient feed disruption in those replicate vessels. A second highly variable balance, the ratio between the Fusobacteriaceae and Synergistaceae to all other bacterial families, increased by approximately 8 e.i. between days 2 and 7 and the end of the experiment (Fig. 5c). The nadir of this balance coincided with an initial microbiota adaptation phase that is repeatedly reported in artificial gut studies [1, 10]. A third balance, representing the ratio of Enterobacteriaceae to the other nine bacterial families, exhibited irregular sub-daily oscillatory behavior (Fig. 5d and Additional file 15). These balance dynamics are likely related to those observed earlier between the phyla Bacteriodetes and Fusobacteria/Proteobacteria (Fig. 4), given the membership of Enterobacteriaceae within the phylum Proteobacteria.

Discussion

Our findings in concert offer several insights that could be useful for the design and analysis of ex vivo studies of human gut microbiota. Our study suggests that technical variation introduced during sample processing can affect observations of microbiota dynamics not just in scale but also in the patterns of covariation among taxa. To date, knowledge has been limited on the effects of technical variation in ex vivo studies of human gut microbiota. Select exceptions include prior bioreactor studies that found technical variation to be limited and dominated by biological variation [10].

Our methods and results also have implications for the choice of sampling frequency in ex vivo artificial gut studies. While oversampling can waste resources, under-sampling can bias inference through a mechanism known as signal aliasing [28, 42, 43]. Here, to balance between resource-intensive exploration of sub-daily dynamics while still capturing longer multi-day dynamics, we made use of MALLARD and mixed rate sampling (e.g., sampling at both daily and hourly timescales). Our findings suggest that future studies interested in tracking the full range of ex vivo microbiota dynamics or that focus on community responses to rapidly changing external factors (e.g., single dose small molecule or prebiotic supplementation studies) should consider collecting sub-daily measurements. By contrast, studies interested solely in longer term changes (e.g., overall nutritional studies) may find daily sampling sufficient. Additionally, by combining replicate sampling with purposeful oversampling, we were able to determine an effective signal-to-noise ratio as a function of sampling frequency. This ratio could be used to estimate an upper-limit sampling frequency above which the benefits of increases samples are diminished by the relatively high levels of technical variation.

Another methodological approach we develop here for artificial gut studies is an exploratory technique for discovering temporal patterns among microbial taxa. As gut microbiota time-series often involve many taxa, methods of dimension reduction can aid data interpretation. Yet, many standard time-series tools for dimension reduction, such as dynamic principal component analysis, are not well suited for microbiota data in the form of relative taxa abundances. We overcome this challenge and identify distinct patterns in our dataset using a tool from compositional data analysis called the variation array [40]. Hierarchical cluster analysis and manual curation then allowed us to identify three balances (Fig. 5bd), highlighting four bacterial families, that together accounted for over 70% of the variation in our dataset. These three balances revealed key dynamical patterns and aided in our interpretation of the forces acting on our artificial intestine.

Beyond methodology for designing ex vivo gut microbiota studies, our findings also have several implications for the underlying biology of these systems. First, we observed distinct differences in the replicability of dynamics between vessels at hourly compared to daily timescales. In keeping with previous reports, we found that replicate artificial guts display replicable dynamic patterns when analyzed on daily timescales [1]. Notably, even after a feed interruption, vessels 1 and 2 returned to a similar composition as the two continuously fed vessels after roughly 1 week and appeared to display similar overall dynamics thereafter (Fig. 5b and Additional file 7). Such observations suggest that these systems follow distinct and potentially predictable trajectories on the timescales of days. Conversely, dynamics appeared less synchronized at hourly timescales (Additional file 8). For example, balances involving the family Enterobacteriaceae (e.g., balance n12), which demonstrate oscillatory patterns in multiple replicate vessels, do not appear in phase with one another. Together, these observations suggest a conceptual model for our artificial gut systems in which strong deterministic forces such as resource availability drive predictable community dynamics on longer timescales while other, potentially more subtle and variable ecological forces, drive temporal variation on shorter timescales.

Another implication of our results to the biology of ex vivo gut microbiota involves sub-daily dynamics of bacterial families such as the Enterobacteriaceae. We acknowledge that despite our efforts to keep culture conditions constant in the artificial gut, we cannot exclude the possibility that these dynamics were caused by fluctuating and unmeasured aspects of our artificial gut setup (e.g., room temperature, ambient humidity, room lighting). Although, we consider it unlikely that such external driving forces would cause dynamics that were unsynchronized between replicate vessels (Additional file 9). Moreover, rhythmic dynamics within microbial communities have previously been observed or predicted in host-free environments [44,45,46], and bacteria have been speculated to harbor circadian clocks [47]. The dynamics we observed may have even underestimated the rate at which microbiota varied in our system as genomic material likely remains measurable for a period after cell death and our analysis at the family level could smooth dynamics at finer taxonomic scales. Thus, our results support hypotheses that dynamical properties inherent to microbiota could contribute to sub-daily oscillations observed in the mammalian gut [48].

We also found that rare bacterial taxa tended to harbor the greatest variation over time. Such variation may reflect a general property of artificial gut systems, in which dormant enteric microbes take advantage of differences between ex vivo and in vivo environments to expand [49, 50]. In particular, we found that in all four artificial gut vessels, the Fusobacteriaceae and the Synergistaceae underwent an approximately 4 e.i. drop in their relative abundance upon transplantation from the in vivo to ex vivo environment and then a subsequent increase to approximately 4 e.i. greater than their initial levels (Fig. 5c). These patterns may reflect a form of ecological succession in which bacterial families are well suited for the ex vivo environment, but are initially outcompeted by faster growing microbes. Such succession is well described in both in vivo and environmental ecosystems [51,52,53], and supports the general hypothesis that select theoretical frameworks from the field of environmental microbial ecology are also applicable to host-associated microbiota [54].

Still, insights from our study here into the methodology and underlying biology of artificial gut experiments have several limitations. First, some sources of technical variation, such as systematic biases introduced due to separation of samples into batches for DNA extraction, PCR, or sequencing, may have exploitable and reproducible structure that could be controlled for during modeling thus improving the resolution of longitudinal microbiome studies. Here, we have chosen to treat these sources of variation as random en batch and instead leave such extensions of our analysis for future work. We note however, that mixed effects models, which have proven to be a powerful method of accounting for such systematic biases or batch effects, and their dynamic extensions both represent special cases of dynamic linear models [38]. We therefore believe MALLARD will prove useful in modeling and controlling for such batch effects. Second, computational limitations restricted our analysis to relatively few dimensions. Due to this limitation, we analyzed only the ten most abundant bacterial families, which in turn could have masked dynamics occurring at finer taxonomic scales [28]. Two future refinements may improve computational efficiency: univariate filtering methods for multivariate time-series have been developed and would reduce the dimensionality of matrix manipulations used in MALLARD [55], and emulation methods that iteratively refine probabilistic models have been developed for high-dimensional count data and show promise [56]. Third, we chose to collect all technical replicates from the same time-point and relied on randomizing both longitudinal and replicate samples into batches to ensure that the replicate samples faithfully represented the technical variation profile in our study. As an alternative, we could have instead spread technical replicate samples out over the course of the experiment, collecting duplicate or triplicate samples at regular intervals. As the MALLARD framework enables technical replicate samples to be collected with any distribution within the study, distributed replicate samples may have estimated microbial dynamics with differing precision.

Despite our study’s limitations though, we believe that our methodological insights are useful for the design of artificial gut experiments and, more broadly, designing in vivo longitudinal studies of host-associated microbial communities. Longitudinal studies in humans have provided unique insights into disease and therapy [28, 43], including optimal antibiotic treatment regimens [57], mechanisms underlying disease and recovery from acute secretory diarrhea [58] or intestinal cleanout [59], as well as identifying vaginal microbiota signatures associated with preterm birth [60]. Aspects of our modeling framework here could be applied to future longitudinal analyses in humans. Specifically, attention has been given recently to issues arising in the analysis of relative microbiome data [22,23,24,25,26,27, 41, 61, 62] and there is growing awareness of how count variability influences microbiota surveys [20, 21, 63]. Missing data are also a common challenge in longitudinal microbiome analyses, particularly when the temporal evolution of observed data is modeled. MALLARD is unique in working at the intersection of compositional analysis, count variability, and linear longitudinal models that can account for technical variation in datasets with many missing observations [64,65,66,67].

Aspects of our artificial gut sampling design could also be applied to longitudinal microbiota studies in humans. Replicate sampling could be used to quantify the effects of technical variation for in vivo microbiota time-series. Moreover, choosing an appropriate sampling frequency remains an outstanding challenge in translational microbiome studies [68]. Pilot time-series experiments that intentionally sample in vivo systems more frequently than expected microbiota dynamics (i.e., oversampling) could be used to compute tradeoffs between technical and biological variation at higher sampling frequencies as we do here; aiming to sample at frequencies where technical variation does not exceed biological variation could help guide economical use of laboratory resources. Indeed, potentially oversampled longitudinal datasets for human gut microbiota already exist (albeit with limited replicate technical sampling) [69, 70]. Ultimately, by improving the design and analysis of microbial community dynamics in longitudinal in vivo studies, an improved understanding of the role of microbiota in human health and disease can be achieved.

Conclusions

Here, we created a modeling framework that allowed us to partition a densely sampled artificial human gut time-series into components associated with technical and biological sources of variation. Our results demonstrate that technical variation from sample processing can influence ex vivo microbiota dynamics, accounting for 76% of community variation on hourly timescales. Still, we observed evidence for bona fide microbiota dynamics on sub-daily timescales. Our integrated analyses also resulted in approaches for characterizing microbiota variation over time. By investigating the distribution of biological variation among taxa, we identified three distinct dynamic patterns that accounted for over 70% of the variation within our dataset. Together, our results contribute to our understanding of the dynamics of ex vivo artificial gut systems, as well as the design and analysis of future longitudinal microbiota studies.

Methods

Artificial intestine experiments

Collection and preparation of fecal inoculate

A fresh fecal sample was obtained from a healthy volunteer who provided written informed consent (Duke Health IRB Pro00049498). The sample was stored and prepared for inoculation into artificial gut systems within an anaerobic chamber (Coy). The fecal sample was weighed into 50 ml conical tubes, approximately 5 g per tube, and then pre-reduced MGM (McDonald Gut Media; [1]) was used to fill the tube. The fecal matter was homogenized briefly using a benchtop vortex and then centrifuged 10 min at a speed of 175×g. The supernatant was decanted into syringes for inoculation.

Artificial gut preparation

A four-vessel continuous flow artificial gut system (Multifors 2, Infors) was used to culture gut microbiota seeded from human stool. Vessels were sterilized and prepared with 300 ml of fresh MGM. Inoculation of reactors used 100 ml of fecal inoculate resulting in an overall volume of 400 ml. The media feed was started 24 h after inoculation at a constant rate of 400 ml per day to emulate the 24-h average passage time in the human gut. Media was changed 16 times throughout the course of the experiment, each time media was prepared fresh. On day 13, it was discovered that the feed line to vessels 1 and 2 was blocked; this blockage could have occurred any time after day 11.

In addition to media feed rate, oxygen, pH, temperature, and stir rate were controlled by the IRIS software (v6, Infors). Oxygen concentration in the vessels was kept below 1% via positive nitrogen pressure at 1 LPM. Oxygen concentration was measured continuously using Hamilton VisiFerm DO Arc 225 probes. The oxygen probes were calibrated using a two-point calibration performed with nitrogen flowing at 1 LPM as the zero-point calibration and room air flowing at 1 LPM as the 100% calibration point. pH was maintained between 6.9 and 7.1 using a 1 N HCl solution and a 1 N H3PO4 solution. pH was measured continuously with Hamilton EasyFerm Plus PH ARC 225 probes. The pH probes were calibrated with a two-point calibration with standardized pH buffers at 4.00 ± 0.1 and 10.00 ± 0.1 (BDH). Vessels were maintained at 37 °C via the Infors’ onboard temperature control system. Vessels were continuously stirred at 100 rpm using magnetic impeller stir-shafts.

Bacteroides ovatus delivery

To study community dynamics in response to changes in a single bacterial taxa, we supplemented replicate vessels 1 and 4 with 2 ml of isolated B. ovatus at 1010 cells/ml (estimated by optical density) suspended in anaerobic blood heart infusion (BHI) agar, and vessels 2 and 3 with 2 ml of anaerobic BHI as a control on day 23. No evidence of B. ovatus increase was detected via community composition. Longitudinal modeling suggest effects of adding delivery media were limited to a transient 0.5 e.i. shift in the balance between the families Bacteroidaceae and Porphyromonadaceae (Additional file 8).

Sampling

For each time-point sampling of the four replicate vessels was done as follows. Prior to sampling, sampling ports were cleared with a sterile syringe and wiped clean with ethanol. Samples were collected in the following order: vessel 1, vessel 2, vessel 3, and vessel 4. Sampling consisted of the collection of 3 ml of active artificial gut culture via sterile syringe and then immediate storage in labeled and barcoded cryovials in a –80 °C fridge. The full list of sampled time-points including daily, hourly, and technical replicate samples is shown in Additional file 4.

DNA extraction, PCR amplification, and sequencing

For all samples, 16S rRNA gene amplicon sequencing was performed using custom barcoded primers targeting the V4 region of the gene [71] and published protocols [71,72,73]. All samples were randomized into 7 sets of 96 for all sample preparation steps. Extractions were performed using the MoBio PowerMag Soil DNA Isolation kit (p/n 27100-4-EP) adapted for use without robotic automation. Due to the large number of samples, samples were split randomly into 2 pools of 336 samples for sequencing to ensure adequate read depth per sample. The final DNA concentration for pool 1 was 35.5 ng/μl and for pool 2 was 34.5 ng/μl as assessed by Picogreen assay. Both sequencing runs were standardized to 10 nM and sequenced using an Illumina MiSeq with paired end 250 bp reads using the V3 chemistry kits at the Duke Molecular Physiology Institute core facilities.

Identifying sequence variants

We used DADA2 to identify and quantify sequence variants in our dataset [74]. To prepare data for denoising with DADA2, 16S rRNA primer sequences were trimmed from paired sequencing reads using Trimmomatic v0.36 without quality filtering [75]. Barcodes corresponding to reads that were dropped during trimming were removed using a custom python script. Reads were demultiplexed without quality filtering using python scripts provided with Qiime v1.9 [76]. Bases between positions 10 and 180 were retained for the forward reads and between positions 10 and 140 were retained for the reverse reads based on visual inspection of quality profiles. This trimming, as well as minimal quality filtering, of the demultiplexed reads was performed using the function fastqPairedFilter provided with the dada2 R package (v1.1.6). Sequence variants were inferred by dada2 independently for the forward and reverse reads of each of the two sequencing runs using error profiles learned from a random subset of 40 samples from each sequencing run. Forward and reverse reads were merged for each of the two sequencing runs. Bimeras were removed using the function removeBimeraDenovo with tableMethod set to “consensus.” Finally, the two sequencing runs were merged together into a single count table.

Taxonomy assignment

Initially, taxonomy was assigned to sequence variants using a Naive Bayes classifier [77] trained using version 123 of the Silva database [78]. Initial taxonomic assignments were then augmented by searching for exact nucleotide matches to the Silva database. This resulted in 96% of sequence variants being classified at the family level, 85% at the genus level, and 15% at the species level.

Data preparation for modeling

After investigating the distribution of sample sequencing depth, we chose to retain only samples with more than 5000 read counts to remove outlying samples that may have been subject to library preparation or sequencing artifacts. This step retained 99.8% of total sequence variant counts. For computational tractability and to ensure a maximal number of retained sequence variant counts, we preformed our analysis at the family level and we retained only those families that were present with at least three counts in more than 90% of samples. While these filters yielded only ten bacterial families, they represented 97.7% of total sequence variant counts.

Construction of phylogenetic sequential binary partition

To use the PhILR transform [22], we manually created a sequential binary partition based on the phylogenetic relationships between the bacterial families in our dataset. This manual partition was created in accord with the phylogenetic relationships between bacterial families specified in Rajilic-Stojanovic M and de Vos WM [79]. The resulting sequential binary partition is given in Additional file 7.

The MALLARD framework

To accommodate time-series with replicate observations (as depicted in Fig. 1c), we refer to samples by sample index k ∈ {1, …, K} rather than time index t ∈ {1, …, T} such that K ≥ T. Further, let the function ϕ provide a mapping between the sample index and the time index such that ϕ(k) = t. As in standard time-series notation, we assume that these sample indices are temporally ordered such that ϕ(k) ≤ ϕ(k + 1) for all k. With this notation, denote a typical longitudinal microbiome dataset as a matrix Y with element Ykd representing the number of counts measured for taxa d ∈ {1, …, D} in sample k. We also denote the total sequence counts (sequencing depth) attributed to the kth sample as \( {n}_k=\sum \limits_{d=1}^D{Y}_{kd} \). In what follows, we will introduce the MALLARD framework in full generality and then demonstrate how features such as missing observations, multiple concurrent time-series, and replicate observations can be handled. After introducing the MALLARD framework, we will then introduce the MALLARD model used to analyze the artificial gut dataset in this work.

MALLARD overview

To introduce MALLARD, we first present the entire MALLARD framework and then discuss the motivation and intuition behind each component individually. MALLARD can be written as the following hierarchical model

$$ {Y}_k\sim \mathrm{Multinomial}\left({\pi}_k,{n}_k\right) $$
(1)
$$ {\pi}_k={\mathrm{ILR}}^{-1}\left({\eta}_k\right) $$
(2)
$$ {\eta}_k={F}_k^{\hbox{'}}{\theta}_k+{\nu}_k,\kern0.5em {\upsilon}_k\sim N\left(0,{V}_k\right) $$
(3)
$$ {\theta}_k={G}_k{\theta}_{k-1}+{w}_k,\kern1em {w}_k\sim N\left(0,{W}_k\right) $$
(4)
$$ {\theta}_0\sim N\left({m}_0,{C}_0\right) $$
(5)
$$ {V}_{1,}\dots, {V}_K,{W}_1,\dots, {W}_k\sim p\left(\zeta \right) $$
(6)

where ILR−1 represents the inverse Isometric Log-ratio transform [22, 80] and p(ζ) is a placeholder for a variety of potential priors.

Equation 1 models the process of sequence counting. High-throughput DNA sequencing does not measure the total number of target DNA transcripts in a biological system but only a random subset of this total. The size of this subset is represented by the sequencing depth of a sample. This feature of DNA sequencing leads to a competition to be counted between transcripts in which more abundant transcripts can exclude observations of less abundant transcripts. To capture this behavior, MALLARD models DNA sequencing as a multinomial counting process where a D-dimensional vector of counts from each sample Yk = (Yk1, …, YkD) provides a noisy measurement of the relative abundance of each taxa in the sequencing library (πk = (πk1, …, πkD) such that ∑dπkrd = 1).

While the multinomial component of our model accounts for uncertainty due to the counting process underlying DNA sequencing, the multinomial component alone is insufficient to account for the other sources of technical and biological variation in longitudinal microbiome studies. To allow for this type of extra-multinomial variability, MALLARD treats the multinomial parameters (π) as distributed logistic-normal. We chose the logistic-normal distribution for two reasons. First, the logistic-normal has greater flexibility than the more common Dirichlet distribution allowing for both positive and negative covariation between taxa [40, 81]. Second, the logistic-normal distribution represents the central limiting distribution over the space of multinomial parameters assuming multiplicative errors [37, 82]. We chose to model with a multiplicative error structure due to the multiplicative nature of bacterial growth and DNA amplification.

Although the logistic-normal can be difficult to work with in terms of relative abundances (πk), under the isometric log-ratio transform (ILR) [22, 80], the logistic-normal simplifies to a multivariate normal distribution on the transformed parameters ηk [37]. Additionally, while many log-ratio methods suffer from numerical problems when trying to take the log or ratio of zeros, by modeling the process of sequence counting and treating the relative abundances πk as an unknown parameter to be inferred, zeros are handled by MALLARD without the need for pseudo-counts or multiplicative replacement schemes.

To model the relationship between samples, we consider that the specific multivariate normal model relating the parameters η1, …, ηK is a class of linear Gaussian state-space models (Eqs. 3 and 4) often referred to as dynamic linear models (DLMs) [38]. The DLM can be thought of as modeling an unobserved system parameterized by a p-dimensional vector θk that evolves through time based on a deterministic linear model (θk = Gkθk − 1) where Gk is a p × p matrix of covariates. Additionally, the state evolution has a random component which is modeled as mean zero multivariate normal random perturbations wk with covariance Wk. Furthermore, the DLM models that the state parameters θk is translated into the parameters ηk through a similar, though independent, deterministic linear model with added multivariate normal variation (\( {\eta}_k={F}_k^{\prime }{\theta}_k+{v}_k\Big) \) where ηk and vk are D − 1 dimensional vectors and Fk is a p × (D − 1) matrix of covariates. In this work, we use this later component to model technical variation. Flexibility in the specification of Fk, Gk, Wk, and Vk, allows the MALLARD framework to encompass many different types of models as special cases including dynamic and static mixed effects or factor models, models with seasonal or polynomial trends, and even dynamic regression models [38]. A thorough review of the use of dynamic linear models, and MALLARD models by extension, is given in West M and Harrison J [38].

We specified two types of prior beliefs for parameters in the MALLARD likelihood model, the priors over the state vectors θ and the priors over the covariance components (V1, …, VK and W1, …, Wk). For the state vector component, Eq. 5 enables prior knowledge regarding the value of the state vector 1 time-step prior to the first observation to be encoded as a normal distribution with mean m0 and covariance C0. In contrast to the state vector component, we allow greater flexibility in prior form regarding V1, …, VK and W1, …, Wk. Classically, priors for these terms are based on the inverse Wishart distribution; however, in this work, we instead modeled using a decomposition of these covariance matrices for numerical stability (see subsection “MALLARD model specifications for analysis of the artificial gut dataset”). Finally, flexibility in the specification of these covariance terms allows MALLARD to model time-varying covariance structures as is commonly done in stochastic volatility models [38].

Handling multiple concurrent time-series with MALLARD

Many longitudinal microbiome studies involve multiple concurrent time-series either from different individuals or, as in this study, different artificial gut vessels. Denoting each of R concurrent time-series by an index r ∈ {1, …, R}, we introduce a state expansion as one method for modeling concurrent time-series with MALLARD. First, let \( {\theta}_k={\left[{\theta}_k^{(1)\prime },\dots, {\theta}_k^{(R)\prime}\right]}^{\prime } \) where \( {\theta}_k^{(r)} \) denotes a p-vector of state parameters for time-series r at sampling point k. Similarly, we can define \( {\eta}_k={\left[{\eta}_k^{(1)\prime },\dots, {\eta}_k^{(R)\prime}\right]}^{\prime } \). Using this state expansion, we can write the MALLARD model for multiple concurrent time-series just as in Eqs. 1, 2, 3, 4, 5 and 6 but with the replacement of Eqs. 1 and 2 by

$$ {Y}_k^{(r)}\sim \mathrm{Multinomial}\left({\pi}_k^{(r)},{n}_k\right) $$
$$ {\pi}_k^{(r)}={\mathrm{ILR}}^{-1}\left({\eta}_k^{(r)}\right). $$

With this specification, θk is a vector of length Rp and ηk is a vector of length R(D − 1). This state expansion also enables flexible modeling of covariation between and within concurrent time-series through an induced expansion of components V1, …, VK, W1, …, WK. For example, consider that if θk denotes an Rp vector of the combined states of R concurrent time-series, then the off-diagonal blocks of Wk model the covariation between concurrent time-series while the diagonal blocks represent the covariation within each time-series.

Posterior inference

The inference goal for the above model is to sample from the posterior distribution p(θ, η, V, W| D, B) where D = {Y, F, G}, B = {m0, C0, ζ} and we have denoted sets of parameters by dropping the associated subscripts (e.g., V represents the set {V1, …, VK}). The inference method we describe is based on the conditional decomposition of this posterior as

$$ p\left(\theta, \eta, V,W|D,B\right)=p\left(\theta |\eta, V,W,D,B\right)p\left(\eta, V,W|D,B\right). $$

In particular, we show that the density p(η, V, W| D) can be efficiently computed and sampled using MCMC in combination with the Kalman filter to marginalize over the state parameters θ and that, conditional on having samples from the conditional posterior p(η, V, W| D), we can efficiently sample from p(θ| η, V, W, D) using recursive samplers (i.e., the Kalman smoother or the backwards sampling algorithm; see below). Importantly, this approach removes all state parameters θ from the MCMC and instead samples them directly from the conditional posterior using more efficient recursive samplers. This is in contrast to prior work with multinomial conditionally Gaussian dynamic linear models which used a Metropolis-within-Gibbs sampling scheme but did not make use of such marginalization [34]. Additionally, in contrast to prior work, our sampling scheme allows us to use MCMC methods that incorporate posterior adaptation further improving MCMC efficiency.

The term p(η, V, W| D) can be efficiently computed using the Kalman filter as follows. Using Bayes rule and the conditional independence relationships Y ⊥ B, V, W, F, G ∣ η, η ⊥ ζ ∣ V, W, and V, W ⊥ m0, C0, F, G ∣ ζ, we can write

$$ p\left(\eta, V,W|D,B\right)\propto p\left(Y|\eta \right)p\left(\eta |V,W,F,G,{m}_0,{C}_0\right)p\left(V,W|\zeta \right). $$

Importantly, this first term is easily calculable and is given by \( p\left(Y|\eta \right)=\prod \limits_{k=1}^K\mathrm{Multinomial}\left({y}_k|{\pi}_k\right) \). The third term is also easily calculable as it is the density of the prior for the covariance matrices evaluated at V and W. Letting Ξ = {V, W, F, G, m0, C0} for notational convenience and by noting the first-order Markov structure of the DLM, we can simplify the second term as

$$ p\left(\eta |\varXi \right)=p\left({\eta}_1|\varXi \right){\prod}_{k=2}^Kp\left({\eta}_k|{\eta}_{k-1},\dots, {\eta}_1,\varXi \right). $$

This relation is directly calculable as the product of 1-step ahead predictive densities in the Kalman filter (see Additional file 16). As the above represents an efficient method of calculating the density of p(η, V, W| D, B) up to a proportionality constant, sampling from this density can be accomplished via MCMC. In this work, we choose to use adaptive Hamiltonian Markov Chain Monte Carlo (HMCMC) provided by the Stan modeling language to simulate from the density p(η, V, W| D, B) [83, 84]. Finally, the use of the Kalman filter also enables simple and efficient handling of missing observations (see Additional file 16).

Given samples from p(η, V, W| D, B), we now describe an efficient method for sampling from the conditional posterior distribution p(θ| η, V, W, D, B). Using the conditional independence relationship θ ⊥ Y, ζ ∣ H, V, W, we can simplify this conditional density to p(θ| H, V, W, F, G, m0, C0) which we show in Additional file 16 can be sampled from directly using either the Kalman smoother or the backwards sampling algorithm (see Additional file 16).

MALLARD model specifications for analysis of the artificial gut dataset

We describe the specific MALLARD model used in this work as a special case of the general MALLARD framework. We introduce these simplifications in four parts relating to model structure, prior specifications, handling of missing data, and posterior inference. Regarding model structure, here we analyzed four concurrent time-series (from four artificial gut vessels) using the state expansion mentioned above. As these vessels were physically isolated from each other, we modeled the four vessels as being conditionally independent (given shared covariance components) of each other such that we could rewrite Eqs. 3 and 4 as

$$ {\displaystyle \begin{array}{l}{\eta}_k=\left[{I}_R\otimes {F}_k^{\hbox{'}}\right]{\theta}_k+{\nu}_k,\kern0.5em {\nu}_k\sim N\left(0,\left[{I}_R\otimes \kern0.5em {V}_k\right]\right)\\ {}\end{array}} $$
$$ {\theta}_k=\left[{I}_R\otimes {G}_k\right]{\theta}_{k-1}+{w}_k,\kern1em {w}_k\sim N\left(0,\left[{I}_R\otimes {W}_k\right]\right). $$

where ⊗ represents the Kronecker product. This specification also had computational advantages as p(η| Ξ) could then be written as \( \prod \limits_{r=1}^Rp\left({\eta}^{(r)}|{\varXi}^{(r)}\right) \) and therefore sampled using R independent p-dimensional Kalman filters rather than a single Rp dimensional Kalman filter over the expanded state. Additionally, we model the state of each vessel as being identified with an unobserved microbial composition before the addition of confounding technical variation. With this specification, the dimension of the state space p is equal to 4(D − 1). As our primary interest was in retrospective inference, we chose to model the state evolution as a simple random walk (or constant level) such that \( {F}_k^{\prime }={G}_k={I}_{\left(D-1\right)\times \left(D-1\right)} \). Furthermore, as we randomized our samples through all steps of preprocessing, we assumed that the technical noise profile of each sample is identical such that we could write Vk=V. Finally, to simplify the model and mitigate overfitting, we assumed that there was a single biological variation profile that was identical across all non-equal time-points such that

$$ {W}_k=\left\{\begin{array}{c}W,\mathrm{if}\ \phi (k)\ne \phi \left(k-1\right)\\ {}0,\mathrm{if}\ \phi (k)=\phi \left(k-1\right).\end{array}\right. $$
(7)

The conditional relationship in Eq. (7) allowed us to account for the lack of temporal evolution between replicate samples. Importantly, while this assumption of a single biological variation profile does not directly model complex dynamic patters such as linear trend or oscillation, it is flexible enough to infer such patterns if they are strongly supported by the data (e.g., Figs. 4 and 5). Finally, we chose the phylogenetic basis defined in Silverman JD, Washburne AD, Mukherjee S, and David LA [22], without tip or branch weights as a default basis for all posterior computations. A table showing the total number of modeled parameters induced by these choices is given in Additional file 17.

Regarding the prior specifications, we specified two types of prior beliefs for parameters in the MALLARD likelihood model, the priors over the state vectors θ and the priors over the covariance components V and W. For the state vector component, we specified a distribution over the true composition of each of the replicate vessels 1-h prior to the first observed sample such that for r ∈ {1, …, 4}

$$ {\theta}_0^{(r)}\sim N\left({m}_0,{C}_0\right) $$

with m0 = 0(D − 1) and C0 = 25 · I(D − 1) × (D − 1). With this specification, there is an approximately 66% probability (1 standard deviation) that no single taxa was greater than approximately 200 times more abundant than the geometric mean of the remaining taxa. As we are only modeling bacterial families that are present with at least three counts in at least 90% of samples, we believed that such concentration of our prior about zero was warranted.

To quantify our uncertainty in the covariance components of our model, we also specified a prior distribution for V and W. While this is most commonly done using inverse Wishart distributions due to their conjugacy with the multivariate normal distribution, here we chose to use a reparameterization of the covariance matricies V and W with non-conjugate priors to improve numerical stability. A covariance matrix Σ ∈ {V, W} can be parameterized as

$$ \varSigma =\sigma \varLambda {\sigma}^{\prime } $$

where Λ denotes the correlation matrix corresponding to Σ, and σ represents the diagonal matrix with positive diagonal entries (σ1, …, σD − 1) which dictate the scale of the covariance matrix. Thus, for the covariance matrices V and W, we specified the components \( \left({\sigma}_1^V,\dots, {\sigma}_{D-1}^V\right) \), ΛV, \( \left({\sigma}_1^W,\dots, {\sigma}_{D-1}^W\right) \), and ΛW respectively. As the representation of ΛV and ΛW depends on the chosen ILR basis and we had no prior knowledge regarding how the technical and biological variation will decompose in our chosen basis, we chose to use a uniform prior over the space of symmetric positive semi-definite matrices such that

$$ {\varLambda}^V\sim \mathrm{LKJ}\left({\zeta}^V\right) $$
$$ {\varLambda}^W\sim \mathrm{LKJ}\left({\zeta}^W\right) $$

with ζV = ζW = 1 and where LKJ represents copula-based distribution over correlation matrices introduced by Lewandowski D, Kurowicka D and Joe H [85]. With regards to the terms \( \left({\sigma}_1^V,\dots, {\sigma}_{D-1}^V\right) \) and \( \left({\sigma}_1^W,\dots, {\sigma}_{D-1}^W\right) \), we chose independent log-normal priors to ensure that these terms were strictly positive and to allow parameterization of uncertainty with respect to multiplicative fold-changes. In particular, for each i ∈ {1, …, D − 1}, we specified

$$ {\sigma}_i^V\sim \mathrm{Log}-\mathrm{Normal}\left({\xi}_i^V,{\tau}_i^V\right) $$
$$ {\sigma}_i^W\sim \mathrm{Log}-\mathrm{Normal}\left({\xi}_i^W,{\tau}_i^W\right) $$

with \( {\xi}_i^V=1 \), \( {\tau}_i^V=2 \), \( {\xi}_i^W=0 \), and \( {\tau}_i^W=2 \). This specification reflects a 95% probability that the ratio of total technical (\( {\sum}_i{\left[{\sigma}_i^V\right]}^2 \)) to total biological variation (\( {\sum}_i{\left[{\sigma}_i^W\right]}^2 \)) is between 10−2 and 102 with an expected value of approximately 10.

In this work, we chose to distinguish two types of missing observations based on whether they were to be imputed or marginalized over during posterior sampling. To ensure each of the four replicate vessel time-series remains synchronized, we considered any sample point that is missing from one vessel but observed in at least one other vessel as a missing value to be imputed by augmenting HMCMC simulations with a corresponding vector \( {\eta}_k^{(r)} \). Conversely, we considered sample points that are present in none of the four replicate vessels as missing values that were marginalized over using the Kalman filter (see Additional file 16). We padded periods of daily sampling with missing values so that the entire dataset could be analyzed at an hourly base interval.

For this model, posterior inference was performed using the No-U-Turn Sampler (NUTS; a variant of HMCMC) provided in the Stan modeling language [83, 84] using 4 chains run in parallel each with 1000 transitions for warmup and adaptation and 1000 iterations collected as posterior samples. Preliminary results suggested that the time required for the NUTS sampler to converge to the typical set could be quite sensitive to entirely random parameter initializations. To address this computational limitation, the following parameters were manually specified: η was set by first adding a pseudo-count of 0.65 to each observed counts and then normalizing the counts of each sample to sum to 1, ΛV and ΛW were each initialized to the p × p dimensional identity matrix, and all other parameters were randomly initialized. To mitigate the potential bias introduced by fixing these parameters during initialization, approximate posterior samples from the model were first drawn using a variational algorithm [86] and then four randomly selecting posterior samples from this approximate posterior sample were used to initialize four parallel HMCMC chains. Convergence of the chains was determined both by manual inspection of sampler trace plots and through inspection of the split \( \widehat{R} \) statistic [87, 88]. All sampled parameters had an \( \widehat{R} \) value less than 1.01. Posterior intervals for all calculations derived from directly sampled quantities were calculated by preforming the necessary computations on each posterior sample independently and then summarizing the resulting distribution over calculated quantities.

Comparing the correlation structure of technical and biological variation

To quantitatively compare the correlation structure of technical and biological variation, we analyzed the probability that posterior samples of the correlation matrices corresponding to V and W came from the same distribution using a permutation scheme and a distance metric on the space of square symmetric positive semi-definite matricies. To isolate our inferences to only involve the correlation structure of V and W and not the magnitude of the variation, we transformed sampled covariance matrices into corresponding correlation matrices which we denote Vc and Wc. We took as a measure of distance between two correlation matrices S1 and S2 the Riemannian metric on the space of square symmetric positive definite matrices defined by

$$ {d}_R\left({S}_1,{S}_2\right)=\left\Vert \log \left({S}_1^{-\frac{1}{2}}{S}_2{S}_1^{-\frac{1}{2}}\right)\right\Vert $$

as described in [89] and calculated using the function distcov with option “Riemannian” in the R package shapes [90]. Using this distance metric, we calculated a distance matrix between 500 posterior samples of Vc and Wc each. Let D represent the resultant 1000 by 1000 distance matrix such that Dij represents the distance between correlation matrix i and correlation matrix j. Let l represent the vector of labels of the correlation matrices such that li ∈ {Vc, Wc} for i ∈ {1, …, 1000}. We define a statistic δ as the ratio of the within to between group distances in D as

$$ \delta =\frac{\sum_{l_i\in {W}^c,{l}_j\in {W}^c}{D}_{ij}+{\sum}_{l_i\in {V}^c,{l}_j\in {V}^c}{D}_{ij}}{2{\sum}_{l_i\in {W}^c,{l}_j\in {V}^c}{D}_{ij}}. $$

Thus, low values of δ indicate that most of the distance between the correlation matrices is attributable to intergroup differences in correlation structure, whereas larger values suggest that most of the pairwise distances come from differences within groups. We built a distribution of δ under a model in which the sampled correlation matrices all came from the same distribution by permuting the labels l and recomputing δ 1000 times. The probability that posterior samples of Vc and Wc came from the same distribution was calculated by calculating the probability of a test statistic more extreme than that which we observed under the created permutation distribution.

Computation of total technical and biological variation

Denoting the trace of a matrix as Tr(·) , total biological and total technical variation were calculated as Tr(W) and Tr(V) respectively. The proportion of variation attributable to biological sources was calculated as Tr(W)/(Tr(V) +Tr(W)). Based on the linear systems assumption underlying our model, we choose to calculate the total biological variation as a function of sampling interval L as L · Tr(W) (Fig. 3b). Technical variation does not vary with sampling interval and was therefore held constant as a function of sampling interval.

Changing representations of posterior state estimates and covariance matrices

While we chose the PhILR basis [22] for all posterior computations as well as initial analysis (Fig. 4 and Additional files 7 and 8), we also made use of the isometric properties of the ILR transform to represent our posterior state estimates and our posterior samples of covariance matrices W and V with respect to alternative coordinates. In what follows, we denote vectors or matrices represented with respect to something other than the PhILR coordinates with an associated superscript. We used the inverse PhILR transform to represent any vector x = (x1, …, xD − 1) in terms of raw relative abundances such that x= PhILR−1(x) [22]. Any other log-ratio quantities could then be calculated from x. Transforming a covariance matrix Σ between representations was done by making use of the following identities [37]

$$ {\Sigma}^{ILR}={\varPsi}^{ILR}\left({\varPsi}^{\hbox{'}}\varSigma \varPsi \right){\varPsi}^{(ILR)^{\hbox{'}}} $$
(7)
$$ {\Sigma}^{CLR}={\varPsi}^{\hbox{'}}\varSigma \varPsi $$
(8)

where Ψ denotes the contrast matrix of the PhILR transform [22] and ΨILR represents the contrast matrix of an arbitrary ILR transform. In addition, we made use of the variation matrix representation of a covariance matrix in which the covariance matrix is represented as a matrix describing the variance of all pairwise log-ratios [40]. Letting ρij represent the entry in the ith row and jth column of ΣCLR, we compute the variation matrix T element-wise as [37]

$$ {t}_{ij}={\rho}_{ii}+{\rho}_{jj}-2{\rho}_{ij}. $$
(9)

Hierarchical clustering of variation matrix

To develop a sparse representation of the principal directions of biological variation in our dataset, we make use of an algorithm for determining a sequence of orthonormal balances that maximize successively the explained variance in a dataset (principal balances) [91]. As TW, the variation matrix calculated from W is proportional to the Aitchison distance between the bacterial families in our dataset [91, 92], applying Ward clustering to the matrix TW results in an approximate solution to the problem of determining principal balances [91, 93]. For each posterior sample of W, we calculated TWusing Eqs. (8) and (9). We then calculated a sequential binary partition from each posterior sample of TW using Ward clustering [94]. To summarize this posterior sample of sequential binary partitions, we computed the majority-rule consensus tree and the frequency with which a given bipartition occurred in our posterior sample using the functions consensus and prop.part from the R package ape [95].

Exploring the relation between starting composition and biological variation

To investigate the relationship between starting community composition and biological variation in our replicate vessels, we made use of the CLR representation of both the community state at the first observed time-point (\( {\theta}_1^{(r)} \)) and the biological variation (W). The mean composition in the first observed sample (\( {\widehat{\theta}}_1=1/R\cdotp {\sum}_{r=1}^R{\theta}_1^{(r)} \)) represented in the CLR basis was used as a measurement of the starting community. As we were primarily interested in the relative degree to which bacterial families vary in our dataset, we normalized the diagonal elements of the CLR representation of the biological variation matrix W to sum to 1 forming a composition and then used the CLR transform of these normalized variances as our measure of the relative biological variation of each bacterial family. We represent the resulting CLR transformed relative biological variation of each bacterial family by the vector ω = (ω1, …, ωD). For each posterior sample, a univariate linear regression model given by the relation \( {\omega}_i\sim \beta {\widehat{\theta}}_{1i}+{\beta}_0 \) where β denotes the slope of fitted model and β0 represents an intercept was fit resulting in a posterior sample over β. Probability contours over the joint posterior distribution of each pair \( \left({\omega}_i,{\widehat{\theta}}_{1i}\right) \) was calculated using kernel density estimates computed by the R package ks [96].

We constructed a permutation distribution to investigate whether the negative relationship between \( {\widehat{\theta}}_1 \) and ω suggested by this posterior distribution of β could be trivially due to there being more low abundance families than high abundance families in our starting community or a feature of working with such CLR transformed variables. This permutation distribution was constructed by randomly permuting the labels of the vector \( {\widehat{\theta}}_1 \) and recomputing the posterior mean of β 1000 times. We computed the probability of our observed posterior mean of β under the constructed permutation distribution to estimate the probability, conditioned on our prior beliefs that the observed inverse relationship was simply due to the rank abundance distribution of our starting community or a feature of working with such CLR transformed variables. This probability is different than a traditional frequentist p value in that it is conditioned on our prior specifications and our observed data.

Sensitivity analyses

The sensitivity of our results to our prior specifications was assessed by rerunning our posterior inference and computations under perturbed priors. We identified two key quantities in our analysis, the regression slope β and the percentage of total variation attributable to biological sources Tr(W), and looked at the change of the associated posterior intervals under the perturbed priors. The results and specifications of the modified priors are shown in Additional files 6 and 13.

Calculation of fold changes from balance values

Changes in balances values are a measure of evidence information [39]. Given a composition x with D taxa, we can denote the balance that separates a group of r taxa from another group of s taxa as

$$ y=\sqrt{\frac{rs}{r+s}}\log \frac{g\left({x}^{+}\right)}{g\left({x}^{-}\right)} $$

where we denote the geometric mean of the elements of x that correspond to each of the two groups as g(x+) and g(x) respectively. In this form, the equivalent fold change between geometric means of the two groups can be calculated as

$$ \frac{g\left({x}^{+}\right)}{g\left({x}^{-}\right)}={e}^{{\left(\sqrt{\frac{rs}{r+s}}\right)}^{-1}y}. $$

Data simulation

To test our implementation and explore the behavior of the MALLARD model we used to analyze our artificial gut dataset, we simulated a toy microbial community time-series of three bacterial taxa (Additional file 18 panels A and B) based on the likelihood component of this model. We simulated a single vessel over 200 time-points with an extra 25 replicate samples taken on the final time-point of the series. We also modeled missing observations (n = 3), sparsity (23% of counts were zeros), and many low counts (53% of counts were less than or equal to 10). We created a random ILR basis denoted by the contrast matrix Ψtrue by using the functions named_rtree, phylo2sbp, and buildilrBasep from the R package philr [97]. With respect to Ψtrue, we simulated data in accordance with our likelihood model using the following “true” values,

$$ {W}^{true}=\left[\begin{array}{cc}0.05& 0.01\\ {}0.01& 0.05\end{array}\right] $$
$$ {V}^{true}=\left[\begin{array}{cc}0.2& -0.1\\ {}-0.1& 0.2\end{array}\right] $$
$$ {\theta}_0^{true}=\left[\begin{array}{c}1\\ {}-3\end{array}\right]. $$

After simulation, we removed samples from time-points 15, 16, and 20 to simulate missing observations. To demonstrate that our inferences were not sensitive to our choice of basis, we modeled the resulting dataset using a second random ILR basis Ψ. Our results in comparison to the true values are shown in Additional file 18 with respect to the basis Ψ.

Analysis of this simulated dataset showed that estimates for the unobserved compositions η and θ were closer to the true simulated values than a standard modeling approach of normalizing read counts to proportions (Additional file 18 panels C and D). This result is clearest in regimes rich in low and zero read counts, which we expected because of our Bayesian approach to modeling the read counting process. In addition, we found that the model correctly estimated distinct technical and biological variation patterns (Additional file 18 panels E and F). Thus, our simulations suggested that our model implementation could successfully decompose longitudinal microbiota data into component processes and characterize technical and biological variation.

Visualizations

All visualization were produced using the R programming language (v3.4.0) with the addition of the following packages: ggplot2 [98], ggtree for plotting phylogenetic trees [99], and compositions [100].