Comparing two multinomial samples using hierarchical Bayesian models
- 12 Downloads
Two-sample statistical tests are commonly used when deciding whether two samples can be considered to be drawn from the same population. However, statistical tests face problems when confronted to situations involving extremely large volumes of data, in which case the power of the test is so high that they reject the null hypothesis even if the differences found in the data are minimal. Furthermore, the fact that they may require to explore the whole sample each time they are applied is a serious limitation, for instance, in streaming data contexts. In this paper, we apply a class of Bayesian models that have been successfully used in streaming data context, to the problem of comparing multinomial populations. The underlying tool is latent variable models with hierarchical power priors. We show how it is possible, by means of a relevant parameter, to decide whether two populations are different or not.
KeywordsHierarchical Bayesian models Latent variable models Multinomial population comparison
This work has been supported by the Spanish Ministry of Economy and Competitiveness through projects TIN2016-77902-C3-3-P, TIN2015-74368-JIN and has received FEDER funds.
- 2.Bernardo, J.M., Smith, A.F.: Bayesian Theory, vol. 405. Wiley, Hoboken (2009)Google Scholar
- 3.Bishop, C.M.: Latent variable models. In: Learning in graphical models, pp. 371–403. Springer (1998)Google Scholar
- 5.Borgwardt, K., Ghahramani, Z.: Bayesian two-sample tests. arXiv preprint arXiv:0906.4032 (2009)Google Scholar
- 8.Lehmann, E.: Testing Statistical Hypothesis. Springer, Berlin (2006)Google Scholar
- 9.Masegosa, A., Nielsen, T.D., Langseth, H., Ramos-López, D., Salmerón, A., Madsen, A.L.: Bayesian models of data streams with hierarchical power priors. In: International Conference on Machine Learning, pp. 2334–2343 (2017)Google Scholar
- 11.Torres, A., Masegosa, A.R., Salmerón, A.: Un test de dos muestras multinomiales basado en modelos Bayesianos jerárquicos. In: Proceedings of the 18th Conference of the Spanish Assocciation for Artificial Intelligence, pp. 7–12 (2018)Google Scholar
- 12.van der Laan, M., Rose, S.: Next generation of statisticians must build tools for massive data sets. Amstat News (2010)Google Scholar