1 Introduction

Nonlinear reaction dynamics in natural sciences are governed by reaction mechanism variables, such as reaction rate constants, diffusion coefficients, and reaction pathways [1, 2]. Therefore, understanding the nonlinear dynamics of reaction and mechanism are important for scientific theory and ensuring precise simulations in Earth sciences. Many of the important reactions in natural sciences are heterogeneous, i.e., the reaction occurs at the interface between two or more phases, which are typically minerals and water [1, 3]. Understanding the mechanism of heterogeneous reactions is important for multiple disciplines, including environmental science, geochemistry, hydrogeology, oceanography, nuclear waste disposal, and soil science. Inversion is a common way of understanding the mechanism of heterogeneous reactions; however, the complexity and nonlinearity of heterogeneous reactions, which typically involve sequences of several reactions occurring in parallel, makes inversion difficult.

Many laboratory experiments have attempted to understand the mechanism of heterogeneous reactions. These experiments typically involve quantitative observations of either (1) the temporal evolution of the concentrations of aqueous species [4,5,6,7], or (2) the spatial distribution of minerals or concentrations of aqueous species at a single time [8,9,10,11]. The reaction rate constant in a reaction pathway can be determined inversely from the former by analyzing the temporal evolution of the solution chemistry and equilibration. However, it is impossible to determine the reaction rate and reaction pathways from the latter because the stability of a mineral cannot be determined without fluid composition data. Therefore, it is important to develop a versatile method that can extract nonlinear dynamics directly from a spatial dataset.

The number of model parameters to be estimated is an important factor for inversion analysis. In heterogeneous parallel reactions, many possible reaction pathways are present, and many reaction rate constants need to be estimated. However, in general, the extent of fitting increases with the number of model parameters used for fitting. This phenomenon is well known as overfitting, which is a very common concept in the field of machine learning [12,13,14]. When overfitting occurs, the estimated model parameters are substantially affected by noise, which can reduce the predictive ability of the model. Thus, to estimate reliable reaction rate constants by inversion, reaction rate constants of unimportant reaction pathways must be identified and excluded from redundant candidates of reaction pathways.

In this study, we propose a framework for exploring the reaction pathways of nonlinear and parallel heterogeneous reactions from the spatial distribution of minerals alone. We used the concept of Bayesian estimation from machine learning [12, 13]. The concept has been applied to various fields including physics [15,16,17,18,19,20], brain science [21], astronomy [22], and Earth sciences [23]. Notably, the Bayesian estimation has been applied to identify reaction pathways in the fields of biology and chemical engineering [24,25,26,27,28,29,30]. In these fields, the estimation of reaction rates from noisy observations is the technically important and challenging topic. Biochemical reaction networks are characterized by nonlinear systems that can be changed by spatio-temporal scale. These characteristics are similar to reactions in the Earth sciences where heterogeneous reactions between rock and solution are coupled with diffusive and/or advective transport of solution. Therefore, the Bayesian estimation can be an effective way to identify heterogeneous reaction pathways in the Earth sciences.

Under Bayesian estimation, a statistical model can be selected by evaluating the Bayesian free energy, and the important variables can be identified among redundant variables from a set of candidate models and given data [12]. Typically, huge computational costs are required to estimate the Bayesian free energy. Therefore, the Schwarz information criterion [31] that approximates Bayesian free energy has been used, but its usage is validated only if the statistical model is regular (i.e., the posterior distribution can be approximated by normal distribution). Recently, an alternative approach that gives an approximate value of the Bayesian free energy and can be used for both regular and singular statistical models has been proposed [32]. This approach has made it significantly easier to apply Bayes’ theorem for nonlinear model selection problems. Using the framework proposed in this study, we successfully extract important reaction pathways of heterogeneous reactions using only the observable spatial distribution of minerals after the reaction.

This paper is organized as follows. In Sect. 2, we introduce the method for estimating the reaction rate constants form the observed spatial mineral distribution. We first present a forward modeling approach for an observed spatial distribution of solids. Bayesian inference is employed to construct the posterior distribution, i.e., the conditional probability of the model parameters for an observed spatial distribution of phases. In Sect. 3, we validate the proposed estimation method by determining the reaction mechanism from a synthetic spatial distribution of solid phases. Section 4 presents the discussion and summary.

Fig. 1
figure 1

Conceptual model of parallel heterogeneous reactions in the presence of three minerals (\(M_a, M_b\), and \(M_c\)) in discretized time t and space x for a finite difference approach. At each grid point, a set of partial differential equations conserving the masses of the chemical components in the system is solved, considering diffusion and heterogeneous reactions.y Heterogeneous mineral-water reactions are included in the model by using reaction rates that depend on the solution composition at each x and t (\(C_{x,t}\)). Active reaction pathways are determined from the local variable \(C_{x,t}\). At the observable time \(t = T\), the type and amount of mineral present varies with x because the reaction pathways differ according to \(C_{x,t}\)

2 Method for estimating Model parameters

In this study, we analyze the spatio-temporal dynamics of a nonlinear heterogeneous reaction. To reflect prospected experimental constraints, we suppose a situation in which temporal changes are not observable and spatial changes at a single time are observable. Figure 1 shows a conceptual model of parallel heterogeneous reactions. In this conceptual model, three minerals (\(M_a, M_b\), and \(M_c\)) are present at the initial time \(t=0\). These minerals can react and either be consumed or produced by each other. For example, one reaction can be written in the following generalized form:

where \(A_{\mathrm {(aq)}}\) is an aqueous species in the solution, and \(k^{\prime }_{ab}\) and \(k^{\prime }_{ba}\) \(\mathrm {(s^{-1})}\) are the rate coefficients of the reaction pathway for \(M_b\) after \(M_a\) and \(M_a\) after \(M_b\), respectively. Other reactions can be written similarly; in this heterogeneous reaction network, the total number of reaction pathways L is given by \(L = N_m(N_m-1)\) where \(N_m\) is the total number of minerals. Notably, the value of \(k^{\prime }_l\) is non-zero when the reaction pathway l is active but zero when the reaction pathway l is inactive. This concept is used for identifying important reaction pathways, as described later. Whether the reaction pathways are active depends on the local variable of concentration of \(A_{\mathrm {(aq)}}\) (C (mol \(\mathrm {cm^{-3}}\)); Fig. 1). The concentration C varies in space \(x^{\prime }\) and time \(t^{\prime }\) according to the reaction and the diffusion of \(A_{\mathrm {(aq)}}\). The amount of mineral i (\(m^{(i)}\) (mol \(\mathrm {cm^{-3}}\))), varies with \(x^{\prime }\) and \(t^{\prime }\) because it is a function of C.

To reflect realistic experimental environments, \(C\) is set as an unobservable value. The observable values in this study are the spatial variation of the amount of minerals and the volume fraction of the solution (porosity) at the start (\(t^{\prime }=0\) in Fig. 1) and end of the reaction (\(t^{\prime }=T\) in Fig. 1). Note that discretized space and time step (xt) are used in Fig. 1 rather than continuous space and time \((x', t')\)

2.1 Forward modeling of mineral-fluid interactions using a reaction-diffusion model

The spatio-temporal evolution of \(C_{x^{\prime },t^{\prime }}\) can be modeled using the mass-conservation equation of aqueous species [1];

$$\begin{aligned} \begin{aligned} \frac{\partial \left( \varphi C\right) }{\partial t^{\prime }}&= D \frac{\partial }{\partial x^{\prime }}\left( \varphi ^{2}\frac{\partial C}{\partial x^{\prime }}\right) - \sum _{l=1}^L \frac{\partial \varXi ^l(C)}{\partial {t^{\prime }}}, \end{aligned} \end{aligned}$$
(1)

where \(\varphi \) \(\mathrm {(-)}\) is porosity, D \(\mathrm {(cm^{2}/s)}\) is the constant of diffusivity for aqueous species, and \(\varXi ^{l}(C)\ \mathrm {(mol \ cm^{-3})}\) is the loss or gain density of aqueous species according to the reaction pathway \(l\). The first term on the right side of Eq. (1) expresses the concentration change due to diffusion of the aqueous species, whereas the second term expresses the change due to heterogeneous reactions. The rate of the heterogeneous reaction in a reaction pathway \(l\), \(\partial \varXi ^l(C) / {\partial {t}}\) (mol \(\mathrm {cm^{-3} \ s^{-1})} \), can be written as a linear equation as a function of concentration, as follows:

$$\begin{aligned} \frac{\partial \varXi ^l(C)}{\partial {t^{\prime }}} = k^{\prime }_{l}(C^l_{(eq)}-C), \end{aligned}$$
(2)

where \(C^l_{(eq)} \ \mathrm {(mol \ cm^{-3})}\) and \(k^{\prime }_l\) \(\mathrm {(s^{-1})}\) are the constant of equilibrium concentration and the rate constant for a reaction pathway \(l\), respectively. From the rate of heterogeneous reactions, the rate of increase or decrease in the amount of mineral \(i\) can be written as

$$\begin{aligned} \frac{\partial m^{(i)}}{\partial t^{\prime }} = \varphi \sum _{l=1}^{L}v^l_i \frac{\partial \varXi ^l(C)}{\partial {t^{\prime }}}, \end{aligned}$$
(3)

where \(m^{(i)}\) \(\mathrm {(mol \ cm^{-3})}\) and \(v^{l}_i\) are the amount of mineral i and stoichiometric coefficient of mineral i relative to aqueous species in the reaction pathway l, respectively. \(\varphi \) can be calculated from one minus the sum of the volume fraction of all minerals, as follows:

$$\begin{aligned} \varphi = 1-\sum _{i=1}^{N_m}\overline{V}_{i} m^{(i)}, \end{aligned}$$
(4)

where \(\overline{V}_{i} (\mathrm {cm^3 \ mol^{-1}})\) is a constant representing the molar volume of mineral \(i\). When Eqs. (1)–(3) are solved, \(x^{\prime }\) and \(t^{\prime }\) need to be discretized. The discretization of Eq. (1) is described in the Appendix.

As a result of discretization, the concentration (\(C_{x,t}\)), amount of mineral \(i\) (\(m^{(i)}_{x,t}\)), and porosity (\(\varphi _{x,t}\)) for a discretized space \(x \ (\in \{1, \ldots , X\}\)) and time \(t \ (\in \{1, \ldots , T\}\)) are obtained through forward modeling by inputting the following unknown parameter vector \(\mathbf {\Theta }^{\prime }\):

$$\begin{aligned} \mathbf {\Theta }^{\prime } = \{k^{\prime }_1, k^{\prime }_{2}, \ldots , k^{\prime }_{L}\}. \end{aligned}$$
(5)

For a given parameter set \(\mathbf {\Theta }^{\prime }\), the theoretical amount of all observation series (i.e., minerals and porosity) at the observable target time \(T\) with dimensions of \(J = N_{m}+1\) is

$$\begin{aligned} \begin{aligned} \varvec{y}_{x}&= \{ y_{x}^{(1)},\ldots ,y_{x}^{(J-1)}, y_{x}^{(J)} \}\\&=\{ {m}_{x,T}^{(1)}, \ldots , {m}_{x,T}^{(N_{m})},{\varphi }_{x,T} \}. \end{aligned} \end{aligned}$$
(6)

The output for observation series \(j\), \(y^{\text {ex}(j)}_{x}\), is the sum of the response function for the corresponding input \({y}^{(j)}_{x}\) and the observation noise \(\epsilon ^{(j)}_x\); that is,

$$\begin{aligned} y^{\text {ex}(j)}_{x} = y^{(j)}_{x} + \epsilon _{x}^{(j)}. \end{aligned}$$
(7)

At this point, we assume that the observation noise follows a Gaussian distribution with a mean of zero and a standard deviation of \(\sigma _{j}\):

$$\begin{aligned} p(y_{x}^{\text {ex}(j)}|y_{x}^{(j)}) = \frac{1}{\sqrt{2\pi \sigma _{j}^{2}}} \exp \left( -\frac{(y_{x}^{\text {ex}(j)}-y_{x}^{(j)})^{2}}{2 \sigma ^{2}_{j}}\right) , \end{aligned}$$
(8)

for \(x \in X_{obs}\), where \(X_{obs}\) is a set of observable spatial points. Thus, in the forward modeling, the conditional probability of the observed spatial distribution of the mineral and porosity using the model parameters \(\mathbf {\Theta ^{\prime }}\) is written as

$$\begin{aligned} \begin{aligned} p(\varvec{y}_{x}^{\text {ex}}|\varvec{y}_{x})&= p(\varvec{y}_{x}^\text {ex}|\varvec{\varTheta ^{\prime }})\\&= \prod _{j=1}^{J}\frac{1}{\sqrt{2\pi \sigma _{j}^{2}}} \exp \left( -\frac{(y_{x}^{\text {ex}(j)}-y_{x}^{(j)})^{2}}{2 \sigma ^{2}_{j}}\right) , \end{aligned} \end{aligned}$$
(9)

where \(\varvec{y}_{x}^{\text {ex}}= \{ y_{x}^{\text {ex}(1)},\ldots ,y_{x}^{\text {ex}(J)}\}\) is the set of observed series at position \(x\). This formula can be used to obtain the probabilistic estimation of the mineral distribution after reaction as the conditional probability for the given model parameters. This can then be compared with experimental data.

Fig. 2
figure 2

Schematic of the Bayesian modeling procedure. The model parameter (\(\varvec{\varTheta ^{\prime }} = \{k^{\prime }_1, \ldots , k^{\prime }_L \} \)) is generated from the indicator vector \(\varvec{c}\). The observed data set is generated from the model parameters. To identify the active reaction pathways, \(p(\varvec{c}|\varvec{Y_{X}})\) is calculated using Bayes’ theorem

2.2 Exhaustive search method for exploring reaction pathways

This section describes the method for achieving the first goal of this study, i.e., to identify the active reaction pathways. After the active reaction pathways are identified, the value of the rate constants are estimated (goal 2) in Section II C.

We can determine whether a reaction pathway l is active or inactive using the \(k^{\prime }_{l}\) value of a reaction pathway l. Thus, \(\varvec{\varTheta ^{\prime }}\) can be formulated as

$$\begin{aligned} \begin{aligned} \mathbf {\Theta }^{\prime }&= \varvec{c} \circ \varvec{\varTheta } \\&= \{c_1k_{1}, c_2k_{2}, \ldots , c_Lk_{L}\}, \end{aligned} \end{aligned}$$
(10)

where \(k_l\) is the intrinsic rate constant of the reaction pathways, the symbol \(\circ \) represents the Hadamard product, and \(\varvec{\varTheta } = \{k_1, \ldots , k_L \}\). \(\varvec{c}\) is an \(L\) dimensional binary vector, defined as

$$\begin{aligned} \varvec{c}=\left( c_{1}, c_{2}, \ldots , c_{L}\right) \in \{0,1\}^{L}. \end{aligned}$$
(11)

Each variable \(c_l\) is 0 or 1; \(c_l = 1\) if the lth variable belongs to the combination and is used for inversion. In contrast, \(c_l= 0\) if it does not . The term \(\varvec{c}\) is used to represent the indicators (Fig. 2).

This formulation is key for clarifying one aspect of the program setting; i.e., identifying the active reaction pathways. The best \(\varvec{c}\) for modeling an objective variable is determined by minimizing the Bayesian free energy \(\mathcal {F}\) using the exhaustive search (ES) method, which is the simplest variable selection method [33, 34]. In the ES method, all combinations of used and unused variables are exhaustively searched, which requires estimation of \(2^{L}\) combinations of variable. With the ES method, the \( \mathcal {F} (\varvec{c}) \) is calculated for all combinations of explanatory variables; the combination that minimizes the \(\mathcal {F}(\varvec{c})\) is determined as the optimal combination.

Using Bayes’ theorem, the posterior probability can be expressed as proportional to the likelihood function and the prior probability, as follows:

$$\begin{aligned} p(\varvec{c}|\varvec{Y}_{X}) = \frac{p(\varvec{Y}_{X} |\varvec{c})p(\varvec{c})}{p(\varvec{Y}_{X})} \propto p(\varvec{Y}_X|\varvec{c}). \end{aligned}$$
(12)

Here, \(\varvec{Y}_{X} = \{ \varvec{y}_{1}^{\text {ex}}, \ldots , \varvec{y}_{X}^{\text {ex}} \}\) and \(X\) is the number of spatial data points in the single observation series. We assume a continuous uniform prior probability \(p(\varvec{c})\), where the posterior distribution is proportional to a marginalized likelihood function defined as follows:

$$\begin{aligned} p(\varvec{Y}_X |\varvec{c})=\int p(\varvec{Y}_X |\varvec{c}, \varvec{\varTheta }) p(\varvec{\varTheta } | \varvec{c}) \mathrm {d} \varvec{\varTheta }. \end{aligned}$$
(13)

In addition, we assume that previous quantitative measurements of the minerals and porosity do not affect the present measurement; thus,

$$\begin{aligned} \begin{aligned} p(\varvec{Y}_{X}|\varvec{c},\mathbf {\Theta })&= \prod _{x=1}^{X}p(\varvec{y}_{x}^{\text {ex}}|\varvec{c},\mathbf {\Theta }) \\&=\prod _{x=1}^{X}\prod _{j=1}^{J}\frac{1}{\sqrt{2\pi \sigma _{j}^{2}}} \exp \left( -\frac{(y_{x}^{\text {ex}(j)}-y_{x}^{(j)})^{2}}{2 \sigma ^{2}_{j}} \right) . \end{aligned} \end{aligned}$$
(14)

Because reaction rate constants must be positive and may vary by orders of magnitude, as for \(p(\varvec{\varTheta }|\varvec{c})\), we assume that the case of \(c_l=1\), \(p(k_l|c_l=1)\) has a continuous uniform distribution in logarithmic space, as follows:

$$\begin{aligned} \begin{aligned} p(k_l|c_l = 1) = \left\{ \begin{array}{ll}{\frac{1}{k_l \left( \ln (k^{\mathrm {(max)}}_l) - \ln (k^{\mathrm {(min)}}_l) \right) }} &{} { \text{ for } x \in [k^{\mathrm {(min)}}_l, k^{\mathrm {(max)}}_l]} \\ {0} &{} { \text{ otherwise } }\end{array}\right. \\ \end{aligned} \end{aligned}$$
(15)

where \(k^{\mathrm {(max)}}_l\) and \(k^{\mathrm {(min)}}_l\) are the maximum and minimum values of the rate coefficient for the reaction pathway \(l\) within the range of searched values. In the case of \(c_l = 0\), \(p(k_l|c_l=0)\) is assumed to be the delta function, as follows:

$$\begin{aligned} p(k_l|c_l = 0) = \delta \left( k_l \right) \ \left( l=1,\ldots , L \right) . \end{aligned}$$
(16)

The negative logarithm of the marginalized likelihood is termed the Bayesian free energy \(\mathcal {F}\), which is defined as

$$\begin{aligned} \mathcal {F} = -\ln p(\varvec{Y}_X|\varvec{c}). \end{aligned}$$
(17)

The minimization of \(\mathcal {F}\) is identical to the posterior probability maximization. However, it is difficult to analytically calculate the integral over the parameter \(\varvec{\varTheta }\) in the marginal likelihood. A well-established solution is thermodynamic integration [16] that numerically calculates \(\mathcal {F}\) using the Markov chain Monte Carlo (MCMC) method [35]; however, this method generates huge computational costs because the expectations over several distributions with different pseudo-temperatures must be calculated. In this study, we numerically calculate the WBIC for the Bayesian model selection of reaction pathways. The WBIC value gives an approximate value of \(\mathcal {F}\) [32]. This approach is expected to substantially reduce the computational costs of numerical calculation. The WBIC is defined as

$$\begin{aligned}&\mathrm {WBIC}=\frac{\int nL_{n} p\left( \varvec{Y}_X |\varvec{c}, \mathbf {\Theta } \right) ^{\beta } p(\mathbf {\Theta }| \varvec{c}) \mathrm {d}\mathbf {\Theta } }{\int p \left( \varvec{Y}_X |\varvec{c}, \mathbf {\Theta } \right) ^{\beta } p(\mathbf {\Theta } | \varvec{c}) \mathrm {d}\mathbf {\Theta }}, \mathrm {where}\nonumber \\&\quad \beta =\frac{1}{\ln (n)}, \end{aligned}$$
(18)

where n is the total number of data points (\(n=XJ\)) and \(\beta \) is the inverse pseudo-temperature (\(\beta > 0\)). \(L_{n}\) is a negative logarithmic likelihood function, defined as

$$\begin{aligned} L_{n} =-\frac{1}{n} \ln p(\varvec{Y}_X|\varvec{c,\varTheta }). \end{aligned}$$
(19)

As defined in Eq. (18), WBIC is identical to the average \(nL_n\) over the posterior distribution with \(\beta = 1/\ln (n)\), whose \(\beta \) is different from the standard Bayesian estimation of the posterior (\(\beta =1\)). The MCMC method [36] is employed to obtain WBIC values. In the MCMC simulations, the candidate is generated using a Gaussian proposal density (\(\mathcal {N}(0,0.01)\)). The same proposal is used for all models. The candidate is accepted with the Metropolis-type probability from the transition from \(\mathbf {\Theta }\) to \(\mathbf {\Theta ^{*}}\) as follows:

$$\begin{aligned} w(\mathbf {\Theta ^{*}}|\mathbf {\Theta }) = \min \{1, \exp [-\beta (E(\varvec{\varTheta ^*})-E(\varvec{\varTheta }))]\}, \end{aligned}$$
(20)

where E is an energy function,

$$\begin{aligned} E(\varvec{\varTheta })= nL_n - \frac{1}{\beta }\ln p(\varvec{\varTheta |c}). \end{aligned}$$
(21)

Using these equations, WBIC values for \(2^{7}\) combinations of \(\varvec{c}\) were calculated using the MCMC method with the setting of \(\beta =\) 1/\(\ln (n)\). The vector of active reaction pathways, i.e., the reaction pathways with non-zero rate constants, can be given by the indicator vector that minimizes the WBIC value.

$$\begin{aligned} \varvec{\hat{c}} = \mathop \mathrm{arg~min}\limits _{\varvec{c}} \mathrm {WBIC}(\varvec{c}). \end{aligned}$$
(22)

2.3 Estimation of model parameters

After \(\varvec{\hat{c}}\) is identified, the parameter in \(\varvec{\varTheta }\) can be estimated using Bayes’ theorem, as follows:

$$\begin{aligned} p(\varvec{\varTheta }|\varvec{Y}_X, \varvec{\hat{c}}) \propto p(\varvec{Y}_X|\varvec{\varTheta , \hat{c}})p(\varvec{\varTheta }|\varvec{\hat{c}}). \end{aligned}$$
(23)

The likelihood function \(p(\varvec{Y}_X|\varvec{\varTheta , \hat{c}})\) is then given by

$$\begin{aligned} \begin{aligned} p(\varvec{Y}_X|\varvec{\varTheta , \hat{c}})&=\prod _{x=1}^{X}p(\varvec{y}_{x}^{\text {ex}}|\mathbf {\Theta }, \varvec{\hat{c}}) \\&=\prod _{x=1}^{X}\prod _{j=1}^{J}\frac{1}{\sqrt{2\pi \sigma _{j}^{2}}} \exp \left( -\frac{(y_{x}^{\text {ex}(j)}-y_{x}^{(j)})^{2}}{2 \sigma ^{2}_{j}}\right) . \\ \end{aligned} \end{aligned}$$
(24)

Thus, the posterior distribution can be expressed as

$$\begin{aligned} p(\varvec{\varTheta }|\varvec{Y}_{X}, \varvec{\hat{c}}) \propto \exp \left[ -\frac{1}{2}\sum _{j=1}^J\frac{1}{\sigma _j^2}\sum _{x=1}^X(y_{x}^{\text {ex}(j)}-y_{x}^{(j)})^{2}\right] p(\varvec{\varTheta }|\varvec{\hat{c}}), \end{aligned}$$
(25)

which is obtained by the MCMC method.

Fig. 3
figure 3

Schematic illustration of the reactive transport model considered in this study [37]. At \(x\) = 0, \(SiO_{2(\text {aq})}\) is transported by diffusion at a rate of D through the porous media of powdered olivine, and secondary minerals (talc, serpentine, and brucite) are formed. The gray bold line shows the overall reaction between the two minerals. \(k^{\prime }_{l}\) is the effective rate constant of the reaction pathway l

3 Validation

3.1 An example of parallel heterogeneous reactions

In this section, we validate the proposed method using simulated data. As an example of the potential applications of the proposed methodology, we consider coupled diffusion and reaction in the presence of olivine, quartz, and \(\text {H}_2\text {O}\) (Fig. 3). This is chosen for the following reasons: (1) The diffusional metasomatic zoning of talc and serpentine zones between quartz and olivine has been classically modeled [38,39,40]; (2) this heterogeneous reaction network is relatively simple, but involves the dissolution/precipitation processes of several minerals as well as element diffusion [8, 41].

Fig. 4
figure 4

Synthesized spatial distribution of minerals (\(m^{(i)}_{x,T}\), \(i=1,\ldots ,4\)), and porosity (\(\varphi _{x,T}\)). The black dotted line is the theoretical spatial distribution of minerals (\(m^{(i)}_{x,t}\)) with artificial parameters at a certain time step T. The pink circle shows the observed spatial distribution of minerals (\(y^{\mathrm {ex}(i)}_{x}\)) with added Gaussian noise

In this example, heterogeneous reactions between four types of minerals (\(m^{(i)}_{x,t}\), \(i=1,\ldots ,4\)) at 300 \(^\circ \)C and 10 MPa are considered (\(N_m = 4\)). At \(t=0\), the porous media of the reactant mineral, \(\text {Mg}_2\text {SiO}_{4}\) (olivine), are initially present (Fig. 3). The minerals predicted to be produced from the reactant are \(\text {Mg}_3\text {Si}_2\text {O}_{5}\text {(OH)}_{4}\) (serpentine), \(\text {Mg}_3\text {Si}_4\text {O}_{10}\text {(OH)}_{2}\) (talc), and \(\text {Mg(OH)}_{2}\) (brucite). At \(x=0\), the concentration of \(\text {SiO}_2\text {(aq)}\), \(C_{x,t}\), is externally buffered at a constant value. All reaction pathway candidates can be given as

Reaction 1 (R1):

Reaction 2 (R2):

Reaction 3 (R3):

Reaction 4 (R4):

Reaction 5 (R5):

Here, we assume that R1, R2, and R4 are irreversible reactions (Fig. 3). Based on the stoichiometric relationship in these reaction equations, the \(v^l_i\) for \(\text {Mg}_{2}\text {SiO}_\text {4}\) are given as \(\{v^1_1, \ldots , v^7_1\}\) = {3/5, 3, 0, 0, -1, 0, 0}. Similarly, for \(\text {Mg}_{3}\text {Si}_{4}\text {O}_{10}\text {(OH)}_{2}\), \(\{v^1_2, \ldots , v^7_2\}\) = {-2/5, 0, -1/2, -1/2, 0, 0, 0}; for \(\text {Mg(OH)}_{2}\), \(\{v^1_3, \ldots , v^7_3\}\) = {0, 0, 0, 0, 2, 1/2, 1/2}; and for \(\text {Mg}_{3}\text {Si}_{2}\text {O}_{5}\text {(OH)}_{4}\), \(\{v^1_4, \ldots , v^7_4\}\) = {0,-2, 1/2, 1/2, 0, -1/2, -1/2}. The model parameters (i.e., rate constants of each reaction pathway) can be expressed by \(\mathbf {\Theta }^{\prime }\) = \(\{k^{\prime }_{1}\), \(k^{\prime }_{2}\), ..., \(k^{\prime }_{7}\}\) with the dimension \(L=7\).

3.2 Validation dataset

The black dotted line in Fig. 4 (a–e) shows the mineral spatial distribution, which is obtained by \(\mathbf {\Theta ^{\prime }}\) = {\(10^{-3.301}\), \(10^{-3.301}\), 0, 0, \(10^{-3.301}\), 0, 0}; i.e., \(k^{\prime }_1, k^{\prime }_2,\) and \(k^{\prime }_5\) are non-zero. The equilibrium concentrations are set as \(\{C^{1}_{(eq)}\),..., \(C^{7}_{(eq)}\}\) = \(\{10^{-6.57},\) \(10^{-9.41}\), \(10^{-5.86}\), \(10^{-5.86}\), \(10^{-7.54}\), \(10^{-8.01}\), \(10^{-8.01}\}\) and \(D\) is set to \(10^{-4.2}\) [41]. The temperature-pressure variable \(\overline{V}_{i}\) at 300 \(^\circ \)C and 10 MPa was obtained from the petrological software Perple_X [42], which can determine physical properties of minerals thermodynamically. We assume that the mineral spatial distribution can be observed at steps of 0.03 cm and is represented by adding Gaussian noise. To impose the same level of Gaussian noise on each observable series, \(\sigma _j\) was assumed to be \(\sigma _j = \sigma ^{\prime } \times y^{(j)}_{\mathrm {max}} (j=1,\ldots ,5)\), whereas \(\sigma ^{\prime }\) is an noise magnitude, and \(y^{(j)}_{\mathrm {max}}\) is maximum value of a single observation series (\(y^{(j)}_{\mathrm {max}}\) = \(max(y^{(j)}_1,\ldots , y^{(j)}_X)\)). The \(\sigma ^{\prime }\) is set to \(10^{-1.5}\), which corresponds to 6.3% deviation, and thus \(\{\sigma _{1}, \ldots , \sigma _{5}\}\) = \(\{10^{-3.44}, 10^{-5.17}, 10^{-4.45}, 10^{-7.11}, 10^{0.20}\}\). After adding Gaussian noise, we set negative values of the synthesized spatial data to zero, because the amount of mineral cannot be negative. The circle in Fig. 4(a–e) shows the synthesized mineral spatial distribution. Each observed series contains 60 data points (\(X = 60\)) and there are five observation series (\(J=5\)); thus, the total number of data points is 300 (\(n = 300\)). The parameters for prior information \(k_l^{\mathrm {(min)}}\) and \(k_l^{\mathrm {(max)}}\) are set to \(10^{-8}\) and \(10^0\), respectively. The noisy dataset is then analyzed using the proposed method to determine its effectiveness for extracting and estimating the model parameters \(\mathbf {\Theta }\).

Fig. 5
figure 5

a \(2^7\)(= 128) combinations of used (\(c_l = 1\)) and unused (\(c_l = 0 \)) parameters \(k^{'}_{1}, \ldots , k^{'}_{7}\). b A magnified image of (a). c Number of kinetic parameters used. d A magnified image of (c). e WBIC values for each combination of used and unused parameters. f A magnified image of (e). In (ae), the horizontal axes show the ranking of WBIC values in increasing order. Among 128 combinations, 32 combinations have uniformly highest WBIC values, which are ranked at 97

3.3 Validation result

\(10^4\) samples were obtained using the MCMC method with the setting of \(\beta =1/\ln (n)\), 1000 burn-in time, and 10 thinning intervals, which were used to calculate WBIC values for \(2^L\) candidates of \(\varvec{c}\). As shown in a later section (Sect. 3.5), autocorrelation of chains is generally low in each MCMC run, suggesting that chains are well mixed. Figure 5 shows the calculated value of WBIC for \(2^{7}\) combinations of \(\varvec{c}\). Among \(2^7\) combinations, 96 models have different WBIC values, and WBIC increases as ranking increases from 1 to 96 (Fig. 5a–d). In contrast, another 32 models have the highest WBIC values, and these worst models are uniformly ranked at 97 (Fig. 5a–d). The ranking result shows that \(c_1\), \(c_2\), and \(c_5 = 1\) appeared frequently in combinations ranked at < 15 (Fig. 5a, b). In contrast, \(c_3\), \(c_4\), \(c_6\), and \(c_7 = 0\) frequently ranked at < 15 (Fig. 5a, b), indicating that \(k^{\prime }_{3}\), \(k^{\prime }_{4}\), \(k^{\prime }_6\), and \(k^{\prime }_{7}\) are comparably unimportant parameters. When all parameters are used (\(c_l = 1, l = 1, \ldots , 7\)), the WBIC value is \(-2324.7\) and ranked at 11 (Fig. 5a–d). WBIC is lowest (WBIC = \(-2327.4\)) when \(c_{1}\), \(c_{2}\), and \(c_{5} = 1\), which is the same combination used in the input observation series. This demonstrates that the proposed method can identify the active reaction pathways from redundant candidates of reaction pathways.

Fig. 6
figure 6

Result of MCMC sampling, showing MCMC chains for \(10^4\) iterations, posterior distributions, and autocorrelation (AC) of MCMC sampling for parameter a \(k^{\prime }_{1}\), b \(k^{\prime }_{2}\), and c \(k^{\prime }_{5}\). The red dotted line in the AC plot is the 95 % confidence interval

For the combinations of used model parameters that minimize WBIC values (i.e., \(k^{\prime }_{1}\), \(k^{\prime }_{2}\), and \(k^{\prime }_{5}\)), another MCMC run was conducted to estimate the model parameters. The first \(10^{3}\) trials of MCMC sampling were not used. The auto correlation (AC) between MCMC sampling and lag time showed steep decay with increasing lag length, suggesting that MCMC sampling is less correlated and independent (Fig. 6a–c). The posterior distribution of these parameters after \(10^{4}\) Monte Carlo steps is similar to the shape of a normal distribution with a single peak (Fig. 6). The mean values of the posterior distribution for parameters \(\log _{10}(k^{\prime }_{1})\), \(\log _{10}(k^{\prime }_{2})\), and \(\log _{10}(k^{\prime }_{5})\) are \(-3.299, -3.301\), and \(-3.291\), respectively (Fig. 6a–c), which are close to their true values.

Fig. 7
figure 7

Top 5 combinations of used (\(c_l=1\), white) and unused (\(c_l=0\), black) parameters among 10 trials of validation tests while changing the number of observation (n) and observation noise (\(\sigma ^\prime \)). The true model is marked using a red square

3.4 Dependence of model selection accuracy on spatial resolution and observation noise

Here we investigated how robustly the true model can be selected for different numbers of total observations and different levels of observation noise. With a set of n and \(\sigma ^\prime \), the validations are repeatedly conducted for 10 trials. Different datasets are used for each trial.

Figure 7a–i show the result of model selection for all ten trials at each set of n and \(\sigma ^\prime \). Combination of indicators ranked within 5th are shown. At \(n = 300\), true models are definitely selected as the best (rank 1) models four times when noise levels are low (\(\sigma ^\prime = 10^{-2.0}\); Fig. 7a). The number of times the true models were selected are decreased at high noise levels, and are five and three times among ten trials at \(\sigma ^\prime = 10^{-1.5}\) (Fig. 7b) and \(10^{-1.0}\) (Fig. 7c), respectively. Similarly, at \(n=100\), the number of times the true models were selected as the best model are 2, 3, and 2 times among 10 trials for \(\sigma ^\prime = 10^{-2.0}\) (Fig. 7d), \(10^{-1.5}\) (Fig. 7e), and \(10^{-1.0}\) (Fig. 7f), respectively; At \(n=30\), the number of times the true models were selected as the best model are 2, 1, and 0 times among 10 trials for \(\sigma ^\prime = 10^{-2.0}\) (Fig. 7g), \(10^{-1.5}\) (Fig. 7h), and \(10^{-1.0}\) (Fig. 7i), respectively.

We note that the true combination of the indicator (\(c_1\), \(c_2\), and \(c_5=1\)) appeared frequently in the top 5 models during 10 trials at any set of n and \(\sigma ^\prime \) (Fig. 7a–i). Moreover, the true model is generally selected in the top 5 models. For example, at \(n=300\), the true models are ranked in top 5 for 10, 8, and 7 times among 10 trials for \(\sigma ^\prime = 10^{-2.0}\) (Fig. 7a–c), \(10^{-1.5}\), and \(10^{-1.0}\) (Fig. 7a–c), respectively.

3.5 Dependence of parameter estimation accuracy on spatial resolution and observation noise

Here we investigated how robustly the non-zero variables (\(k^{\prime }_1\), \(k^{\prime }_2\), and \(k^{\prime }_5\)) can be estimated for different numbers of total observations and different levels of observation noise. We evaluated the discrepancy between estimated value and true value as the normalized root mean square error (NRMSE) by

$$\begin{aligned} NRMSE = \sqrt{\frac{1}{3}\sum _{l=1,2,5}\left( (k_l^{\prime \mathrm {(est)}}-k_l^{\prime \mathrm {(true)}})/k_l^{\prime \mathrm {(true)}}\right) ^2 }, \end{aligned}$$

where \(k_l^{\prime \mathrm {(est)}}\) and \(k_l^{\prime \mathrm {(true)}}\) are the posterior mean and true value of the rate constant for reaction pathway l, respectively. For each setting of the number of observations (n) and observation noise (\(\sigma ^{\prime }\)), we evaluated the NRMSE values for several trials because the NRMSE values can be affected by a random number. Then, we used one of the trials as a typical result because the calculated NRMSE results for the several trials were consistent.

Figure 8 shows that the calculated NRMSE on n and \(\sigma ^{\prime }\). We obtained NRMSE in the range of \(n=15-300\) (\(\log _{10}(n) = 1.17-2.48\)) and \(\sigma ^{\prime } = 10^{-2.0} - 10^{-1.0}\) (corresponding to \(2-20\%\) deviation relative to the maximum value of each observation series). Results presented in Fig. 8 suggest that the NRMSE increases with (1) decreasing total number of observation and (2) increasing noise intensity. The NRMSE shown in Fig. 6 (\(\log _{10}(n)\) = 2.48 \((n=300)\) and \(\sigma ^{\prime }\) = \(10^{-1.5}\)) was  \(10^{-1.7}\), which corresponds to 2% error on average. Even when the number of observation was limited (\(\log _{10}(n)\) = 1.30 \((n = 20)\)), rate constants were estimated with similar NRMSE (\(10^{-1.7}\)) for moderate noise levels (\(\sigma ^{\prime }\) = \(10^{-1.75}\)), whereas it was estimated with slightly higher NRMSE (\(10^{-1.2}\)) for intense noise levels (\(\sigma ^{\prime }\) = \(10^{-1.25}\)). For most regions except for \(\log _{10}(n)\) < 1.30 \((n < 20)\) and \(\sigma ^{\prime }>10^{-1.50}\), the NRMSE was lower than \(10^{-1.0}\), which corresponds to 10% estimation error on average (Fig. 8). The estimates were very accurate from the viewpoint of heterogeneous reaction in the Earth sciences, suggesting that our proposed method is effective against both observational sparseness and noise intensities and can be used for estimating kinetic parameters.

Fig. 8
figure 8

Dependence of discrepancy between true and estimated parameters in the total number of observations (\(n\)) and the magnitude of observation noise (\(\sigma ^{\prime }\)).

4 Discussion and summary

In this study, we developed a method for identifying active reaction pathways from redundant reaction pathway candidates using observed spatial data of the solid phase and Bayesian modeling. Verification using simulated data showed that the proposed method provides effective estimation for a given noisy dataset.

In general, it is difficult to determine the reaction pathways in nonlinear parallel heterogeneous reactions without measuring the solution chemistry. Even when the solution chemistry dataset is provided, the exploration and identification of reaction pathways involves multiple considerations due to the complexity and nonlinearity of the reactions. As demonstrated during the estimation of known kinetic parameters from the noisy spatial distribution of minerals, our method objectively identifies both the active reaction pathways and unimportant reaction pathways, instead of arbitrarily ignoring them. The proposed method can also be employed to evaluate the estimation accuracy for each unknown variable, which is important for scientific research.

In this study, the lowest WBIC value was not attained using all seven variables (–2324.7, corresponding to a ranking of 11; Fig. 5a). This suggests that overfitting can be avoided thorough Bayesian inversion, and the estimated model parameters are not affected by noise.

For variable combinations ranked at < 15, the indicator suggests that non-zero parameters \(k^{\prime }_{1}\), \(k^{\prime }_{2}\), and \(k^{\prime }_{5}\) always appeared in the combinations (Fig. 5a, b); however, the number of unimportant parameters (i.e., with a zero value) included in each combinations differed. For the model ranked first, no zero-value parameters were included. For the model ranked second, the indicator suggests that \(k^{\prime }_{1}\), \(k^{\prime }_{2}\), \(k^{\prime }_{5}\), and \(k^{\prime }_{6}\) were included, whereas \(k^{\prime }_{1}\), \(k^{\prime }_{2}\), \(k^{\prime }_{3}\), \(k^{\prime }_{5}\), and \(k^{\prime }_{6}\) were included in the model ranked third (Fig. 5a, b). Because unimportant parameters were included in the models ranked second and third, the WBIC values were larger than those for the parameter combination ranked first.

Large variability was observed in WBIC values, especially between rank 48 (WBIC = 138.2) and 49 (WBIC = 5665.7; Fig. 5e). The indicators suggest that the former used three variables: \(k^{\prime }_{2}\), \(k^{\prime }_{6}\), and \(k^{\prime }_{7}\) (Fig. 5a), whereas the latter used two variables: \(k^{\prime }_{1}\) and \(k^{\prime }_{2}\) (Fig. 5). When the variable combinations ranked 48 and 49 were compared, the number of variables was larger for the lower-ranked model. This suggests that the extent of fitting led to large differences in the WBIC values between the models ranked at 48 and 49.

It is important to note the characteristics of the 32 worst models identified by highest WBIC value (Fig. 5a–e). Among these worst models, all of the models without any parameters (i.e., all rate parameters set to zero) and with less than five parameter are similarly selected as worst models. This is because reactions that are necessary for another reaction to take place are restricted by these models. Common features in these worst models is that \(k^{\prime }_1\) and \(k^{\prime }_2\) are not used (Fig. 5a). Without these reaction pathways, the combination of variables become worst case, even when the important reaction \(k^{\prime }_5\) is used. These results suggests that reaction pathway \(k^{\prime }_1\) and \(k^{\prime }_2\) are necessary to drive another reaction, such as \(k^{\prime }_5\). We suggest that identifying the common features of variables in the worst models may also be helpful to understand the sequences of reactions.

Although it is difficult to analytically derive Bayesian free energy \(\mathcal {F}\) in general, there are several methods to calculate \(\mathcal {F}\) [16], such as thermodynamic integration [43], nested sampling [44], and the non-equilibrium Monte Carlo method [45]. In this study, we used an information criterion, WBIC, that may provide a computationally efficient approximation of the \(\mathcal {F}\). The \(\mathcal {F}\) can be approximated by a Bayesian information criteria (BIC) [31] only when the statistical model is regular, i.e., the posterior distribution can be approximated as the normal distribution [31, 32]. In the example described in this study, the shape of the posterior distributions obtained by MCMC method are similar to those of the normal distribution (Fig. 6). To investigate the effectiveness of BIC, the same validation numerical experiments were conducted. As a result, we found that the BIC also identified active reaction pathways in the case of the validation test shown in the present study. However, the usage of BIC cannot be validated without sampling because the shape of the posterior distribution cannot be constrained prior to sampling. In contrast, the WBIC, which is a generalized version of the BIC, can be used in both regular and singular statistical models and regardless of the shape of the posterior distribution [32]. Analyzing real heterogeneous reactions could require a solution for a singular model because heterogeneous reactions between minerals and fluids can be complex. Thus, WBIC as a generalized version of BIC would be the preferred information criterion for selecting models via Bayesian inference.

In this study, the noise variances \(\sigma _j\) for each observation series are treated as known constants. In the realistic situation, noise variance can be estimated as a hyperparameter by Bayesian estimation. However, such hyperparameter estimation is not necessary because approximate values for noise variance are known for the heterogeneous reaction between minerals and fluid.

In this study, spatial data points at a single time are used as observable data (n). As we showed in Fig. 7, the true model is likely difficult to be selected when \(n=30\) with a corresponding space interval of 0.3 cm (Fig. 7g–i). Because the spatial observation step can be  0.01 cm at minimum, n may not be less than 100, and an extreme situation in which only 30 observation data points can be used is unrealistic. However, even in the extreme situation, the parameters used (\(c_1, c_2,\) and \(c_5=1\)) in the true model frequently appeared in the top five models, suggesting that the true models can be interpreted by checking common indicators in the high ranked models.

The Bayesian approach has been applied to infer kinetic reaction pathways in the fields of chemical engineering and biochemistry [26, 29]. In these works, the Bayes factor, which quantifies the support for one model over another, was used to select models. The Bayes factor between two hypothesized models is obtained by calculating the difference of \(\mathcal {F}\) between the two models. Such a calculation of \(\mathcal {F}\) can be substituted by WBIC to reduce computational cost.

Although the effectiveness of our approach was demonstrated by analyzing a synthesized dataset, the limits of its applicability must be noted. It is easy to imagine that the exhaustive search method used in this study will become intractable for a large number of variables (L). Therefore, to reduce the computational load, a relaxation approach such as the least absolute shrinkage and selection operator (LASSO) method [21, 46] using an \(l_1\)-norm regularization term could be used when the ES method is computationally infeasible. However, exhaustive search (ES) is necessary for the strict selection of efficient variables [33, 34, 47]. This computational explosion problem associated with the ES method can be improved through the development of another relaxation method.

The proposed approach serves as a basic inversion framework for extracting active reaction pathways during parallel heterogeneous reactions. The applications of our methods to real data will be considered separately. The development of analytical methods for heterogeneous reactions and applications to a real dataset will provide a better understanding of important and dynamic Earth processes.