Abstract
Recent progress in Bayesian inference has allowed for accurate posterior estimations in complex situations with no idea about a likelihood function. Currently, Approximate Bayesian Computation (ABC) techniques have emerged as a widely used set of free-likelihood methods. Although there is a large number of different ABC-based approaches across the literature, all they have in common a hard dependence on free parameters selection, demanding for expensive tuning procedures such as grid search or cross-validation. Here, we introduce an Automatic Metric Learning-based ABC approach, termed AML-ABC. Namely, AML-ABC matches the simulation and observation spaces within an ABC-based framework. Attained results on a synthetic dataset and a real-world ecological system show that our approach is a competitive method compared to other non-automatic state-of-the-art ABC techniques.
Keywords
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
Bayesian-based statistical inference tasks require the calculation of the likelihood function, which performs an important role as long as it states the probability of the observed data under a particular model. Therefore, the Bayes’ theorem leverages the inclusion of a priori knowledge about the studied phenomenon into the posterior distribution. Indeed, straightforward models gather an analytic expression for the likelihood function facilitating the evidence assessment; then, the posterior can be precisely computed. Notwithstanding, for complex models, to find an exact formula for the likelihood function is often intractable [13].
To deal with this intractability, free-likelihood techniques like Approximate Bayesian Computation (ABC) have emerged. ABC-based methods assess an auxiliary model with different parameter values drawn from a prior distribution to compute simulations that are compared to the observed data [13]. In particular, this comparison can be performed using statistics that summarize and characterize the information over large features and observations [5, 10]. However, the selection of proper and sufficient summary statistic could be difficult for complex models, demanding for alternative approaches that rely on kernel functions to embed and compare distributions into a Reproducing Kernel Hilbert Space (RKHS) [4, 9]. Recent advances in ABC-based inference have introduced kernel methods to accomplish more accurate posterior estimations. Authors in [6] developed a surrogate model as synthetic likelihood to define an adequate number of simulations within the ABC procedure via a Gaussian process-based framework. Mitrovic et al. [7] modeled the functional relationship between simulations and the optimal choice of summary statistics to encode the structure of a generative model using a kernel ridge regression for conditional distributions. Nonetheless, the techniques mentioned above require the estimation of different parameters related to the similarity computation among simulations to approximate the posterior. Then, expensive tuning procedures as grid search and cross-validation are carried out. Moreover, the user requires a vast knowledge concerning the ABC algorithm and the studied data to properly tune the free parameters, yielding to a high influence in the quality of the posterior approximation.
Here, we introduce an automatic version of an ABC algorithm to support Bayesian inference. Our approach, named AML-ABC, comprises a metric learning stage based on a Centered Kernel Alignment (CKA) to assess the matching between similarities defined over parameters and simulations [3]. Besides, a Mahalanobis distance is computed through CKA, and a graph representation is utilized to highlight local dependencies from both parameter and simulation spaces in ABC. Achieved results on synthetic and real-world inference problems demonstrate that our automatic extension of ABC infers competitive posteriors without requiring any manually fixing of free parameters.
The remainder of this paper is organized as follows: Sect. 2 describes the mathematical background. Section 3 describes the experimental set-up and the obtained result. Finally, the conclusions are outlined in Sect. 4.
2 Materials and Methods
ABC Fundamentals. In any Bayesian inference task, the central aim concerns the computation of a posterior \(p(\theta |y)\), using a prior distribution \(\zeta (\theta )\) and a likelihood function \(p(y|\theta )\), where \(y{{\,\mathrm{\in }\,}}\mathcal {Y}\) stands for the observed data and \(\theta {{\,\mathrm{\in }\,}}\Theta \) holds the model parameters. Nonetheless, when the likelihood is intractable, neither exact nor sampled posterior \(p(\theta |y)\propto p(y|\theta )\zeta (\theta )\) can be calculated. ABC approaches aim to facilitate such an inference via simulation of the likelihood through a generative model represented by a conditional probability \(p(x|\theta ),\) where \(x{{\,\mathrm{\in }\,}}\mathcal {Y}\) is a random variable standing for the simulated data [13]. Fundamentally, an ABC-based framework relies on the acceptance and rejection of simulated samples x using a distance function \(\mathrm{{d}}_{\mathcal {Y}}:\mathcal {Y}\times \mathcal {Y}\rightarrow \mathbb {R}^+.\) In turn, an approximate posterior can be estimated such that: \(\hat{p}(\theta |y;\epsilon )\propto \hat{p}(y|\theta ;\epsilon )\zeta (\theta ),\) where \(\hat{p}(y|\theta ;\epsilon ){{\,\mathrm{=}\,}}\int _{\mathcal {B}\left( y;\epsilon \right) }{p(x|\theta )}dx,\) \(\mathcal {B}\left( y;\epsilon \right) {{\,\mathrm{=}\,}}\{x:\mathrm{{d}}_{\mathcal {Y}}(x,y)<\epsilon \},\) and \(\epsilon {{\,\mathrm{\in }\,}}\mathbb {R}^+.\) Note that setting the value of \(\epsilon \) is a crucial stage for obtaining an accurate posterior.
On the other hand, most of the times it is difficult to apply a distance directly on \(\mathcal {Y}\) due to a large number of samples and features in real data. In such a case, some strategies use a mapping \(s{{\,\mathrm{=}\,}}\vartheta (x)\) before calculating the distance, where \(s{{\,\mathrm{\in }\,}}\mathcal {S}\) is a feature space and \(\vartheta :\mathcal {Y}\rightarrow \mathcal {S}\) [5]. However, using \(\vartheta (x)\) often leaks information for complex models. Then, some ABC-based approaches approximate \(\hat{p}(y|\theta ;\epsilon )\) as the convolution of the true likelihood \(p(y|\theta )\) and a kernel function \(\kappa :\mathcal {Y}\times \mathcal {Y}\rightarrow \mathbb {R},\) which imposes a constraint to the rejection of samples as the inner product \(\kappa (x,y){{\,\mathrm{=}\,}}\langle \phi (x),\phi (y)\rangle _{\mathcal {H}}\) in a Reproducing Kernel Hilbert Space (RKHS), \(\mathcal {H},\) where \(\phi :\mathcal {Y}\rightarrow \mathcal {H}\) [9]. In practice, given N samples \(\{x_n\backsim P_{X_n}\}^N_{n{{\,\mathrm{=}\,}}1}\) drawn from \(p(x|\theta _n),\) with \(\theta _n\backsim \zeta (\theta ),\) and the observation \(y\backsim P_Y,\) a weighted sample set \(\Psi {{\,\mathrm{=}\,}}\left\{ (\theta _n,w_n)\right\} ^N_{n{{\,\mathrm{=}\,}}1}\) is calculated by fixing:
where \(\kappa _G\) is a Gaussian kernel with bandwidth \(\epsilon .\) Eq. (1) is used to approximate \(p(\theta |y)\) via posterior expectation as: \(\hat{p}(\theta |y){{\,\mathrm{=}\,}}\sum \nolimits ^N_{n{{\,\mathrm{=}\,}}1}{w_n\kappa _G(\mathrm{{d}}_e(\theta ,\theta _n);\sigma _\theta )},\) where \(\mathrm{{d}}_e\) stands for the Euclidean distance and \(\sigma _\theta {{\,\mathrm{\in }\,}}\mathbb {R}^+\). Moreover, \(\mathrm{{d}}_{\mathcal {H}}(P_{X_n},P_Y)\) represents a distance over distributions.
Automatic ABC Using Metric Learning. To avoid the influence of the \(\epsilon \) value and the kernel parameters while computing the ABC-based posterior as in Eq. (1), we introduce an Automatic Metric Learning (AML) approach in the context of ABC, termed AML-ABC, for enhancing and automating the inference task. The idea behind AML-ABC is to include the information contained in the candidates \(\{\theta _n\}_{n=1}^N\) to improve the comparison stage carried out over simulations and observations. Let \(\varPsi = \{\theta _n,x_n\}_{n=1}^N\) be the set of N candidates \(\theta _n{{\,\mathrm{\in }\,}}\mathbb {R}^P\backsim \zeta (\theta )\) and their corresponding simulations \(x_n{{\,\mathrm{\in }\,}}\mathbb {R}^Q\backsim p(x|\theta )\). Further, let the kernel function \(\kappa _\theta :\Theta \times \Theta \rightarrow \mathbb {R}^+\) be a similarity measure between candidates in \(\Theta \), that define the kernel matrix \(\mathbf{K }_{{\varvec{\theta }}}{{\,\mathrm{\in }\,}}\mathbb {R}^{N\times N}\) holding elements:
where \(\Omega _{n'}\) is a set holding the M-nearest neighbors of \(\theta _{n'}\) in the sense of the distance \(\text {d}_{\Theta }:\Theta \times \Theta \rightarrow \mathbb {R}^+\). In this paper, to avoid large variations among components of \(\theta _n\) we rely on the Mahalanobis distance \(\text {d}_{\Theta }^2(\theta _n,\theta _{n'})=(\theta _n-\theta _{n'})^{\text {T}}{\varvec{\Sigma }}_{\Theta }^{-1}(\theta _n-\theta _{n'})\), where \({\varvec{\Sigma }}_{\Theta }{{\,\mathrm{\in }\,}}\mathbb {R}^{P\times P}\) is the sample covariance matrix of \(\{\theta _n\}_{n=1}^N\). Concerning the feature space \(\mathcal {S}\), we assess the similarity via the kernel \(\kappa _s:\mathcal {S}\times \mathcal {S}\rightarrow \mathbb {R}^+\), \(\kappa _s(\vartheta (x_n),\vartheta (x_{n'}))=\exp (-\text {d}_{\mathcal {S}}^2(\vartheta (x_n),\vartheta (x_{n'})))\), to build the matrix \(\mathbf{K }_{{\varvec{s}}}{{\,\mathrm{\in }\,}}\mathbb {R}^{N\times N},\) where \(\text {d}_{\mathcal {S}}^2:\mathcal {S}\times \mathcal {S}\rightarrow \mathbb {R}^+\) and \(\vartheta :\mathcal {Y}\rightarrow \mathcal {S}\) is a feature mapping. To perform the pairwise comparison between simulations in \(\mathcal {S}\) we use the Mahalanobis distance of the form [2]:
where \({\varvec{\Sigma }}_{\mathcal {S}}^{-1}={\varvec{A}}{\varvec{A}}^{\text {T}}\) stands for the inverse covariance matrix of \(\vartheta (x_n){{\,\mathrm{\in }\,}}\mathbb {R}^D\) and \({\varvec{A}}{{\,\mathrm{\in }\,}}\mathbb {R}^{D\times d}\). In this sense, we use the information concerning the similarity over candidates in \(\Theta \), represented via \(\mathbf{K }_{{\varvec{\theta }}}\), to state the notion of similarity over simulations and observation in \(\mathcal {S}\), represented via \(\mathbf{K }_{{\varvec{s}}}\). In particular, we use a CKA-based measure between the above kernel matrices as [3]:
where \(\bar{\mathbf{K }}\) stands for the centered kernel as \(\bar{\mathbf{K }} = \tilde{{\varvec{I}}}\mathbf{K }\tilde{{\varvec{I}}}\), being \(\tilde{{\varvec{I}}}= {\varvec{I}}-{\varvec{1}}^{\top }{\varvec{1}}/N\) the empirical centering matrix, \({\varvec{I}}\in \mathbb {R}^{N \times N}\) is the identity matrix, and \({\varvec{1}}\in \mathbb {R}^{N}\) is the all-ones vector. Moreover, The notation \(\langle \cdot ,\cdot \rangle _{\text {F}}\) represents the matrix-based Frobenius norm. In Eq. (4), \(\hat{\rho }(\cdot ,\cdot )\) is a data driven estimator that aims to quantify the similarity between the parameter space and the feature space. To find the projection matrix \({\varvec{A}}\), we consider the following optimization problem:
where the logarithm function is used for mathematical convenience. The optimization problem in Eq. (5) can be solved using a gradient descent-based approach [2]. Moreover, we form the weighted sample set \(\Psi {{\,\mathrm{=}\,}}\left\{ (\theta _n,w_n)\right\} ^N_{n{{\,\mathrm{=}\,}}1}\) by fixing \(w_n=\kappa _E(z,z_n)/{\sum \nolimits ^N_{n{{\,\mathrm{=}\,}}1}{\kappa _E(z,z_n)}}\), where \(\kappa _E:\mathbb {R}^d\times \mathbb {R}^d\rightarrow \mathbb {R}\) is a similarity kernel defined as:
where \(\Upsilon \) is a set holding the M-nearest neighbors of \(z=\vartheta (y)^{\top }{\varvec{\hat{A}}}\) in the sense of the Euclidean distance. Algorithm 1 summarizes the AML-ABC approach.
3 Experiments and Results
To test the AML-ABC performance, we consider two experiments following [9]: a toy problem concerning synthetic data from a mixture model and a Bayesian inference problem for a real ecological dynamic system. To accomplish an automatic inference, we find the number of M-nearest neighbors as the median of the optimal number of neighbours per point according to the Local Neighborhood Selection (LNS) algorithm introduced in [1]. Moreover, the K2-ABC method is selected as benchmark due to its nice performance over other methods [9].
For the toy problem, we analyze a mixture of uniform distributions of the form: \(p(x|{\varvec{\pi }}){{\,\mathrm{=}\,}}\sum _{c{{\,\mathrm{=}\,}}1}^{C} \pi _c{\mathscr {U}(c-1,c)},\) where \({\varvec{\pi }}{{\,\mathrm{=}\,}}\{\pi _c\}_{c{{\,\mathrm{=}\,}}1}^C\) are the mixing coefficients holding \(\sum _{c{{\,\mathrm{=}\,}}1}^{C}\pi _c{{\,\mathrm{=}\,}}1\), and C is the number of components. Here, the aim is to estimate the posterior \(p({\varvec{\pi }}|y)\) for \(C{{\,\mathrm{=}\,}}5\), given synthetic observations y drawn from the mixture with true parameters (target): \({\varvec{\pi ^*}}=\{0.25,0.04,0.33,0.04,0.34\}\). For concrete testing, we draw \(N=1000\) samples from a symmetric Dirichlet prior, \({\varvec{\pi }}\backsim \text {Dirichlet}({\varvec{1}}),\) and then used the mixture model to form the simulated data by drawing 400 observations for each prior candidate. Moreover, we employ a histogram with 10 bins as feature mapping in AML-ABC, and kernel widths \(\gamma =0.1,\) \(\epsilon =0.001\) in K2-ABC [9]. As quantitative assessment, we use the Euclidean distance \(\mathcal {E}=||{\varvec{\pi ^*}}-{\varvec{\hat{\pi }}}||_2\), where \({\varvec{\hat{\pi }}}\) is the expected value of the posterior using the weights \(\{w_n\}_{n=1}^N\) obtained by each method.
For the real dataset experiment, we considere the problem of inferring the dynamics of an adult blowfly population as introduced in [14]. In particular, the population dynamics are modelled by a discretised differential equation of the form: \(N_{t+1}=PN_{t-\tau }\exp (-N_{t-\tau }/N_0)e_t+N_t\exp (-\delta \epsilon _t)\), with \(N_{t+1}\) denoting the observation time at \(t+1\) which is determined by the time-lagged observations \(N_t\) and \(N_{t-\tau }\), where \(e_t\) and \(\epsilon _t\) stand for Gamma distributed noise \(e_t\backsim \mathcal {G}(1/\sigma _p^2,\sigma _p^2)\) and \(\epsilon _t\backsim \mathcal {G}(1/\sigma _d^2,\sigma _d^2)\). Here, the aim is to estimate the posterior of the parameters \({\varvec{\theta }}=\{P,N_0,\sigma _d,\sigma _p,\tau ,\delta \}\) given observed data concerning a time series of 180 observationsFootnote 1. We adopt Log-normal distributions for setting priors over \({\varvec{\theta }}\) [6]: \(\log P\backsim \mathcal {N}(2,2^2),\) \(\log N_0\backsim \mathcal {N}(6,1),\) \(\log \sigma _d\backsim \mathcal {N}(-0.5,1),\) \(\log \sigma _p\backsim \mathcal {N}(-0.5,1),\) \(\log \tau \backsim \mathcal {N}(2.7,1),\) \(\log \delta \backsim \mathcal {N}(-1,0.4^2)\). For AML-ABC we draw \(N=5000\) samples from the prior and then assess the model to form the simulated data by drawing 180 observations for each prior candidate. Besides, as feature mapping, we selected the 10 statistics used in [9]. Moreover, we use the Euclidean distance \(\mathscr {E}=||\vartheta (y)-\vartheta (x_n|{\varvec{\hat{\theta }}})||_2\) as quantitative assessment, where \(x_n|{\varvec{\hat{\theta }}}\) is a simulation from the model given the expected value of the posterior using each method. In particular, due to fluctuations produced by \(\epsilon _t\) and \(e_t\), we draw 100 simulation for each method and compute the median and standard deviation for \(\mathscr {E}\) [9].
Toy Problem Results. Since this is a controlled experiment with known parameters \({\varvec{\pi ^*}},\) we can find the best possible performance of our AML-ABC by running the inference stage in Algorithm 1 with \(\tilde{w}_n = \kappa _E({\varvec{\pi ^*}},{\varvec{\pi }}_n)\). We refer to this approach as Best. The previous setting is equivalent to think that the CKA between \(\mathbf{K }_{{\varvec{\theta }}}\) and \(\mathbf{K }_{{\varvec{s}}}\) is perfect (\(\mathbf{K }_{{\varvec{\theta }}}\) = \(\mathbf{K }_{{\varvec{s}}}\)). Figure 1 shows the Best performance along with K2-ABC and AML-ABC results over the uniform mixture problem. In Fig. 1(a), the expected value of the posterior computed for all methods is close to the target. In particular, we obtained \(\mathcal {E}_{Best}=0.030\), \(\mathcal {E}_{\text {K2-ABC}}=0.063\), and \(\mathcal {E}_{\text {AML-ABC}}=0.064\). These results show that our AML-ABC is a competitive estimator to K2-ABC with a significant advantage concerning the automatic selection of free parameters. In addition, to provide a better understanding of the AML-ABC effectiveness, in Fig. 1(b) we provide the weights for the 5 nearest neighbors (according to the LNS algorithm) used to compute the posteriors. As noted, the majority of the chosen simulations for AML-ABC match the selected candidates using the Best, although our approach never observes the target.
Real Dataset Results. Inferring the model parameters in this blowfly dataset is a very challenging task since the system dynamics can easily move from stable to chaotic regimes [6, 14]. This states an interesting scenario to test the performance and robustness of the AML-ABC. In Figs. 2(a) to (f), we provide the prior and the posterior approximation for each parameter fixing \(\sigma _\theta \) according to [12]. Notice how our proposal updates the beliefs about the model parameters leading to more concentrated posteriors. In the case of \(\log \sigma _p \), two modes reflect different intervals with probable values for driving the noise realization associated with egg production in the blowfly population. However, there is a predominant mode that states higher probabilities for this parameter. Moreover, Fig. 2(g) shows the closest and farthest simulation to the observed data from 100 realization used to compute \(\mathscr {E}\), showing a posterior in stable regime. Finally, Table 1 shows the performance of AML-ABC compared to different ABC-based methods tested on the blowfly dataset by authors in [9], where clearly the proposed method is a quite competitive approach to K2-ABC.
4 Conclusions
We propose an automatic enhancement of the well-known ABC algorithm devoted to Bayesian inference called AML-ABC. In particular, we include a Metric Learning approach based on a CKA methodology to quantify the matching between parameter and simulation spaces. Then, a Mahalanobis distance is learned through CKA and a graph representation is employed to reveal local relationships among parameter and simulation samples. Notably, our AML-ABC does not require the tuning of any free parameter. Besides, obtained results on a synthetic dataset and a real-world ecological system show the introduced AML-ABC is a competitive approach compared to other non-automatic state-of-the-art ABC methods. Future work includes the extension of AML-ABC for multi-dimensional problems and the inclusion of other dissimilarity measures, besides the Mahalanobis distance, to deal with complex and/or noisy data.
Notes
- 1.
Available on the supplementary materials of [14].
References
Álvarez-Meza, A., et al.: Global and local choice of the number of nearest neighbors in locally linear embedding. Pattern Recogn. Lett. 32(16), 2171–2177 (2011)
Álvarez-Meza, A., et al.: Kernel-based relevance analysis with enhanced interpretability for detection of brain activity patterns. Frontiers Neurosci. 11, 550 (2017)
Cortes, C., Mohri, M., Rostamizadeh, A.: Algorithms for learning kernels based on centered alignment. JMLR 13, 795–828 (2012)
González-Vanegas, W., Alvarez-Meza, A., Orozco-Gutierrez, Á.: Sparse hilbert embedding-based statistical inference of stochastic ecological systems. In: Mendoza, M., Velastín, S. (eds.) CIARP 2017. LNCS, vol. 10657, pp. 255–262. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75193-1_31
Joyce, P., Marjoram, P., et al.: Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Molec. 7(1), 26 (2008)
Meeds, E., Welling, M.: GPS-ABC: Gaussian process surrogate approximate Bayesian computation. arXiv preprint arXiv:1401.2838 (2014)
Mitrovic, J., Sejdinovic, D., Teh, Y.W.: DR-ABC: approximate Bayesian computation with kernel-based distribution regression. In: International Conference on Machine Learning, ICML 2016, vol. 48, pp. 1482–1491 (2016)
Nakagome, S., et al.: Kernel approximate Bayesian computation in population genetic inferences. Stat. Appl. Genet. Mol. Biol. 12(6), 667–678 (2013)
Park, M., Jitkrittum, W., Sejdinovic, D.: K2-ABC: approximate Bayesian computation with kernel embeddings. arXiv preprint arXiv:1502.02558 (2015)
Paul, F., Dennis, P.: Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate bayesian computation. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 74(3), 419–474 (2012)
Pigorsch, E.G.C.: Approximate Bayesian computation with indirect summary statistics. Technical report (2013)
Shimazaki, H., Shinomoto, S.: Kernel bandwidth optimization in spike rate estimation. J. Comput. Neurosci. 29(1–2), 171–182 (2010)
Turner, B.M., Van Zandt, T.: A tutorial on approximate Bayesian computation. J. Math. Psychol. 56(2), 69–85 (2012)
Wood, S.N.: Statistical inference for noisy nonlinear ecological dynamic systems. Nature 466(7310), 1102–1104 (2010)
Acknowledgments
Research under grants provided by the project 1110-745-58696, funded by Colciencias, Colombia. Authors would like to thank the Master in Electrical Engineering program from Universidad Tecnológica de Pereira for partially funding this research. Moreover, author W. González-Vanegas was supported under the project E6-18-2, funded by Universidad Tecnológica de Pereira.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
González-Vanegas, W., Álvarez-Meza, A., Orozco-Gutiérrez, A. (2019). An Automatic Approximate Bayesian Computation Approach Using Metric Learning. In: Vera-Rodriguez, R., Fierrez, J., Morales, A. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. Lecture Notes in Computer Science(), vol 11401. Springer, Cham. https://doi.org/10.1007/978-3-030-13469-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-13469-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13468-6
Online ISBN: 978-3-030-13469-3
eBook Packages: Computer ScienceComputer Science (R0)