Distributed Estimation of Mixture Models

Dedecius, Kamil; Reichl, Jan

doi:10.1007/978-3-319-16238-6_3

Kamil Dedecius⁵ &
Jan Reichl⁵

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 126))

2031 Accesses

Abstract

The contribution deals with sequential distributed estimation of global parameters of normal mixture models, namely mixing probabilities and component means and covariances. The network of cooperating agents is represented by a directed or undirected graph, consisting of vertices taking observations, incorporating them into own statistical knowledge about the inferred parameters and sharing the observations and the posterior knowledge with other vertices. The aim to propose a computationally cheap online estimation algorithm naturally disqualifies the popular (sequential) Monte Carlo methods for the associated high computational burden, as well as the expectation-maximization (EM) algorithms for their difficulties with online settings requiring data batching or stochastic approximations. Instead, we proceed with the quasi-Bayesian approach, allowing sequential analytical incorporation of the (shared) observations into the normal inverse-Wishart conjugate priors. The posterior distributions are subsequently merged using the Kullback–Leibler optimal procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The terms “adaptation” and “combination” were introduced by [10]. We adopt them for our Bayesian counterparts.

References

Dedecius, K., Sečkárová, V.: Dynamic diffusion estimation in exponential family models. IEEE Signal Process. Lett. 20(11), 1114–1117 (2013)
Article Google Scholar
Dedecius, K., Reichl, J., Djurić, P.M.: Sequential estimation of mixtures in diffusion networks. IEEE Signal Process. Lett. 22(2), 197–201 (2015)
Google Scholar
Dongbing, Gu.: Distributed EM algorithm for Gaussian mixtures in sensor networks. IEEE Trans. Neural Netw. 19(7), 1154–1166 (2008)
Article Google Scholar
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, London (2006)
MATH Google Scholar
Hlinka, O., Hlawatsch, F., Djurić, P.M.: Distributed particle filtering in agent networks: a survey, classification, and comparison. IEEE Signal Process. Mag. 30(1), 61–81 (2013)
Google Scholar
Kárný, M., Böhm, J., Guy, T.V., Jirsa, L., Nagy, I., Nedoma, P., Tesař, L.: Optimized Bayesian Dynamic Advising: Theory and Algorithms. Springer, London (2006)
Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MATH MathSciNet Google Scholar
Pereira, S.S., Lopez-Valcarce, R., Pages-Zamora, A.: A diffusion-based EM algorithm for distributed estimation in unreliable sensor networks. IEEE Signal Process. Lett. 20(6), 595–598 (2013)
Article Google Scholar
Raiffa, H., Schlaifer, R.: Applied Statistical Decision Theory (Harvard Business School Publications). Harvard University Press, Cambridge (1961)
Google Scholar
Sayed, A.H.: Adaptive networks. Proc. IEEE 102(4), 460–497 (2014)
Article Google Scholar
Smith, A.F.M., Makov, U.E.: A Quasi-Bayes sequential procedure for mixtures. J. R. Stat. Soc. Ser. B (Methodol.) 40(1), 106–112 (1978)
Google Scholar
Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985)
MATH Google Scholar
Weng, Y., Xiao, W., Xie, L.: Diffusion-based EM algorithm for distributed estimation of Gaussian mixtures in wireless sensor networks. Sensors 11(6), 6297–316 (2011)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Czech Science Foundation, postdoctoral grant no. 14–06678P. The authors thank the referees for their valuable comments.

Author information

Authors and Affiliations

Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, Prague, Czech Republic
Kamil Dedecius & Jan Reichl

Authors

Kamil Dedecius
View author publications
You can also search for this author in PubMed Google Scholar
Jan Reichl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kamil Dedecius .

Editor information

Editors and Affiliations

Institute for Statistics and Mathematics, WU Vienna University of Economics and Business, Vienna, Austria
Sylvia Frühwirth-Schnatter
Institute for Statistics and Mathematics, WU Vienna University of Economics and Business, Vienna, Austria
Angela Bitto
Institute for Statistics and Mathematics, WU Vienna University of Economics and Business, Vienna, Austria
Gregor Kastner
Institute for Statistics and Mathematics, WU Vienna University of Economics and Business, Vienna, Austria
Alexandra Posekany

Appendix

Below we give several useful definitions and lemmas regarding the Bayesian estimation of exponential family distributions with conjugate priors [9]. The proofs are trivial. Their application to the normal model and normal inverse-gamma prior used in Sect. 3.4 follows.

Definition 1 (Exponential family distributions and conjugate priors).

Any distribution of a random variable y parameterized by θ with the probability density function of the form

$$\displaystyle{ p(y\vert \theta ) = f(y)g(\theta )\exp \left \{\eta (\theta )^{\intercal }T(y)\right \}, }$$

where f, g, η, and T are known functions, is called an exponential family distribution. η ≡ η(θ) is its natural parameter, T(y) is the (dimension preserving) sufficient statistic. The form is not unique.

Any prior distribution for θ is said to be conjugate to p(y | θ), if it can be written in the form

$$\displaystyle{ \pi (\theta \vert \xi,\nu ) = q(\xi,\nu )g(\theta )^{\nu }\exp \left \{\eta (\theta )^{\intercal }\xi \right \}, }$$

where q is a known function and the hyperparameters ν ∈ ℝ ⁺ and ξ is of the same shape as T(y).

Lemma 1 (Bayesian update with conjugate priors).

Bayes’ theorem

$$\displaystyle{ \pi (\theta \vert \xi _{t},\nu _{t}) \propto p(y_{t}\vert \theta )\pi (\theta \vert \xi _{t-1},\nu _{t-1}) }$$

yields the posterior hyperparameters as follows:

$$\displaystyle{ \xi _{t} =\xi _{t-1} + T(y_{t})\qquad \text{and}\qquad \nu _{t} =\nu _{t-1} + 1. }$$

Lemma 2.

The normal model

$$\displaystyle{ p(y_{t}\vert \mu,\sigma ^{2}) = \frac{(\sigma ^{2})^{-\frac{1} {2} }} {\sqrt{2\pi }} \exp \left \{-\frac{1} {2\sigma ^{2}}(y_{t}-\mu )^{2}\right \} }$$

where μ,σ ² are unknown can be written in the exponential family form with

$$\displaystyle\begin{array}{rcl} \eta = \left ( \frac{\mu } {\sigma ^{2}}, \frac{-1} {2\sigma ^{2}}, \frac{-\mu ^{2}} {2\sigma ^{2}} \right )^{\intercal },\qquad T(y_{ t}) = \left (y,y^{2},1\right )^{\intercal },\qquad g(\eta ) = \left (\sigma ^{2}\right )^{-\frac{1} {2} }.& & {}\\ \end{array}$$

Lemma 3.

The normal inverse-gamma prior distribution for μ,σ ² with the (nonnatural) real scalar hyperparameters m, and positive s,a,b, having the density

$$\displaystyle{ p(\mu,\sigma ^{2}\vert m,s,a,b) = \frac{b^{a}(\sigma ^{2})^{a+1+\frac{1} {2} }} {\sqrt{2\pi }s\varGamma (a)} \exp \left \{-\frac{1} {\sigma ^{2}} \left [b + \frac{1} {2s}(m-\mu )^{2}\right ]\right \} }$$

can be written in the prior-conjugate form with

$$\displaystyle{ \xi _{t} = \left (\frac{m} {s}, \frac{m^{2}} {s} + 2b, \frac{1} {s}\right )^{\intercal }. }$$

Lemma 4.

The Bayesian update of the normal inverse-gamma prior following the previous lemma coincides with the ‘ordinary’ well-known update of the original hyperparameters,

$$\displaystyle{ \begin{array}{ll} s_{t}^{-1} & = s_{t-1}^{-1} + 1, \\ m_{t} & = s_{t}\left (\frac{m_{t-1}} {s_{t-1}} + y_{t}\right ),\end{array} \qquad \qquad \begin{array}{ll} a_{t}& = a_{t-1} + \frac{1} {2}, \\ b_{t} & = b_{t-1} + \frac{1} {2}\left (\frac{m_{t-1}^{2}} {s_{t-1}} -\frac{m_{t}^{2}} {s_{t}} + y_{t}^{2}\right ).\end{array} }$$

Definition 2 (Kullback–Leibler divergence).

Let f(x), g(x) be two probability density functions of a random variable x, f absolutely continuous with respect to g. The Kullback–Leibler divergence is the nonnegative functional

$$\displaystyle{ \mathrm{D}(f\vert \vert g) = \mathbb{E}_{f}\left [\log \frac{f(x)} {g(x)}\right ] =\int f(x)\log \frac{f(x)} {g(x)}dx, }$$

(3.12)

where the integration domain is the support of f. The Kullback–Leibler divergence is a premetric; it is zero if f = g almost everywhere, it does not satisfy the triangle inequality nor is it symmetric.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dedecius, K., Reichl, J. (2015). Distributed Estimation of Mixture Models. In: Frühwirth-Schnatter, S., Bitto, A., Kastner, G., Posekany, A. (eds) Bayesian Statistics from Methods to Models and Applications. Springer Proceedings in Mathematics & Statistics, vol 126. Springer, Cham. https://doi.org/10.1007/978-3-319-16238-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-16238-6_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16237-9
Online ISBN: 978-3-319-16238-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Distributed Estimation of Mixture Models

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Definition 1 (Exponential family distributions and conjugate priors).

Lemma 1 (Bayesian update with conjugate priors).

Lemma 2.

Lemma 3.

Lemma 4.

Definition 2 (Kullback–Leibler divergence).

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation