Advertisement

MaxEntropy Pursuit Variational Inference

  • Evgenii Egorov
  • Kirill Neklydov
  • Ruslan Kostoev
  • Evgeny BurnaevEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11554)

Abstract

One of the core problems in variational inference is a choice of approximate posterior distribution. It is crucial to trade-off between efficient inference with simple families as mean-field models and accuracy of inference. We propose a variant of a greedy approximation of the posterior distribution with tractable base learners. Using Max-Entropy approach, we obtain a well-defined optimization problem. We demonstrate the ability of the method to capture complex multimodal posterior via continual learning setting for neural networks.

Keywords

Variational inference Deep learning Maximum Entropy Bayesian Inference 

Notes

Acknowledgements

The work was supported by the Russian Science Foundation under Grant 19-41-04109.

References

  1. 1.
    Burda, Y., Grosse, R., Salakhutdinov, R.: Importance weighted autoencoders. arXiv preprint arXiv:1509.00519 (2015)
  2. 2.
    Burnaev, E., Panin, I.: Adaptive design of experiments for Sobol indices estimation based on quadratic metamodel. In: Gammerman, A., Vovk, V., Papadopoulos, H. (eds.) SLDS 2015. LNCS (LNAI), vol. 9047, pp. 86–95. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-17091-6_4Google Scholar
  3. 3.
    Burnaev, E., Panin, I., Sudret, B.: Effective design for Sobol indices estimation based on polynomial chaos expansions. In: Gammerman, A., Luo, Z., Vega, J., Vovk, V. (eds.) COPA 2016. LNCS (LNAI), vol. 9653, pp. 165–184. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-33395-3_12Google Scholar
  4. 4.
    Burnaev, E., Panin, I., Sudret, B.: Efficient design of experiments for sensitivity analysis based on polynomial chaos expansions. Ann. Math. Artif. Intell. 81(1), 187–207 (2017)Google Scholar
  5. 5.
    Burnaev, E., Zaytsev, A., Spokoiny, V.: Properties of the posterior distribution of a regression model based on Gaussian random fields. Autom. Remote Control 74(10), 1645–1655 (2013)Google Scholar
  6. 6.
    Burnaev, E., Zaytsev, A., Spokoiny, V.: The Bernstein-von Mises theorem for regression based on Gaussian processes. Russ. Math. Surv. 68(5), 954–956 (2013)Google Scholar
  7. 7.
    Burnaev, E., Zaytsev, A., Spokoiny, V.: Properties of the Bayesian parameter estimation of a regression based on Gaussian processes. J. Math. Sci. 203(6), 789–798 (2014)Google Scholar
  8. 8.
    Caticha, A.: Relative entropy and inductive inference. In: AIP Conference Proceedings, vol. 707, pp. 75–96 (2004)Google Scholar
  9. 9.
    Cremer, C., Li, X., Duvenaud, D.: Inference suboptimality in variational autoencoders. In: International Conference on Machine Learning, pp. 1086–1094 (2018)Google Scholar
  10. 10.
    Duvenaud, D., Adams, R.P.: Black-box stochastic variational inference in five lines of python. In: NIPS Workshop on Black-box Learning and Inference (2015)Google Scholar
  11. 11.
    Guo, F., Wang, X., Fan, K., Broderick, T., Dunson, D.B.: Boosting variational inference. arXiv preprint arXiv:1611.05559 (2016)
  12. 12.
    Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303–1347 (2013)Google Scholar
  13. 13.
    Kemker, R., McClure, M., Abitino, A., Hayes, T.L., Kanan, C.: Measuring catastrophic forgetting in neural networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  14. 14.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
  15. 15.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)Google Scholar
  16. 16.
    Li, Y., Turner, R.E.: Rényi divergence variational inference. In: Advances in Neural Information Processing Systems, pp. 1073–1081 (2016)Google Scholar
  17. 17.
    Locatello, F., Khanna, R., Ghosh, J., Ratsch, G.: Boosting variational inference: an optimization perspective. In: International Conference on Artificial Intelligence and Statistics, pp. 464–472 (2018)Google Scholar
  18. 18.
    Miller, A.C., Foti, N.J., Adams, R.P.: Variational boosting: iteratively refining posterior approximations. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2420–2429. JMLR. org (2017)Google Scholar
  19. 19.
    Minka, T., et al.: Divergence measures and message passing. Technical report, Microsoft Research (2005)Google Scholar
  20. 20.
    Ranganath, R., Gerrish, S., Blei, D.: Black box variational inference. In: Artificial Intelligence and Statistics, pp. 814–822 (2014)Google Scholar
  21. 21.
    Ranganath, R., Tran, D., Blei, D.: Hierarchical variational models. In: International Conference on Machine Learning, pp. 324–333 (2016)Google Scholar
  22. 22.
    Rezende, D., Mohamed, S.: Variational inference with normalizing flows. In: International Conference on Machine Learning, pp. 1530–1538 (2015)Google Scholar
  23. 23.
    Salimans, T., Kingma, D.P., Welling, M., et al.: Markov chain Monte Carlo and variational inference: bridging the gap. In: ICML, vol. 37, pp. 1218–1226 (2015)Google Scholar
  24. 24.
    Titsias, M., Lázaro-Gredilla, M.: Doubly stochastic variational Bayes for non-conjugate inference. In: International Conference on Machine Learning, pp. 1971–1979 (2014)Google Scholar
  25. 25.
    Tran, D., Ranganath, R., Blei, D.M.: The variational Gaussian process. arXiv preprint arXiv:1511.06499 (2015)
  26. 26.
    Wang, C., Wang, Y., Schapire, R., et al.: Functional Frank-Wolfe boosting for general loss functions. arXiv preprint arXiv:1510.02558 (2015)
  27. 27.
    Wang, D., Liu, H., Liu, Q.: Variational inference with tail-adaptive f-divergence. In: Advances in Neural Information Processing Systems, pp. 5742–5752 (2018)Google Scholar
  28. 28.
    Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 681–688 (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Evgenii Egorov
    • 1
  • Kirill Neklydov
    • 2
    • 3
  • Ruslan Kostoev
    • 1
  • Evgeny Burnaev
    • 1
    Email author
  1. 1.Skolkovo Institute of Science and TechnologyMoscowRussia
  2. 2.National Research University Higher School of EconomicsMoscowRussia
  3. 3.Samsung AI Center in MoscowMoscowRussia

Personalised recommendations