Comparing two multinomial samples using hierarchical Bayesian models

  • A. R. Masegosa
  • A. Torres
  • M. Morales
  • A. SalmerónEmail author
Regular Paper


Two-sample statistical tests are commonly used when deciding whether two samples can be considered to be drawn from the same population. However, statistical tests face problems when confronted to situations involving extremely large volumes of data, in which case the power of the test is so high that they reject the null hypothesis even if the differences found in the data are minimal. Furthermore, the fact that they may require to explore the whole sample each time they are applied is a serious limitation, for instance, in streaming data contexts. In this paper, we apply a class of Bayesian models that have been successfully used in streaming data context, to the problem of comparing multinomial populations. The underlying tool is latent variable models with hierarchical power priors. We show how it is possible, by means of a relevant parameter, to decide whether two populations are different or not.


Hierarchical Bayesian models Latent variable models Multinomial population comparison 



This work has been supported by the Spanish Ministry of Economy and Competitiveness through projects TIN2016-77902-C3-3-P, TIN2015-74368-JIN and has received FEDER funds.


  1. 1.
    Barndorff-Nielsen, O.: Information and Exponential Families: In Statistical Theory. Wiley, Hoboken (2014)CrossRefGoogle Scholar
  2. 2.
    Bernardo, J.M., Smith, A.F.: Bayesian Theory, vol. 405. Wiley, Hoboken (2009)Google Scholar
  3. 3.
    Bishop, C.M.: Latent variable models. In: Learning in graphical models, pp. 371–403. Springer (1998)Google Scholar
  4. 4.
    Blei, D.M.: Build, compute, critique, repeat: data analysis with latent variable models. Annu. Rev. Stat. Appl. 1, 203–232 (2014)CrossRefGoogle Scholar
  5. 5.
    Borgwardt, K., Ghahramani, Z.: Bayesian two-sample tests. arXiv preprint arXiv:0906.4032 (2009)Google Scholar
  6. 6.
    Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)zbMATHGoogle Scholar
  7. 7.
    Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14, 1303–1347 (2013)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Lehmann, E.: Testing Statistical Hypothesis. Springer, Berlin (2006)Google Scholar
  9. 9.
    Masegosa, A., Nielsen, T.D., Langseth, H., Ramos-López, D., Salmerón, A., Madsen, A.L.: Bayesian models of data streams with hierarchical power priors. In: International Conference on Machine Learning, pp. 2334–2343 (2017)Google Scholar
  10. 10.
    Sullivan, G., Feinn, R.: Using effect size—or why the p value is not enough. J. Grad. Med. Educ. 4, 279–282 (2012)CrossRefGoogle Scholar
  11. 11.
    Torres, A., Masegosa, A.R., Salmerón, A.: Un test de dos muestras multinomiales basado en modelos Bayesianos jerárquicos. In: Proceedings of the 18th Conference of the Spanish Assocciation for Artificial Intelligence, pp. 7–12 (2018)Google Scholar
  12. 12.
    van der Laan, M., Rose, S.: Next generation of statisticians must build tools for massive data sets. Amstat News (2010)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Center for the Development and Transfer of Mathematical Research to Industry (CDTIME) and Department of MathematicsUniversity of AlmeríaAlmeríaSpain

Personalised recommendations