Skip to main content

A Survey of Bayesian Statistical Approaches for Big Data

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Mathematics ((LNM,volume 2259))

Abstract

The modern era is characterised as an era of information or Big Data. This has motivated a huge literature on new methods for extracting information and insights from these data. A natural question is how these approaches differ from those that were available prior to the advent of Big Data. We present a survey of published studies that present Bayesian statistical approaches specifically for Big Data and discuss the reported and perceived benefits of these approaches. We conclude by addressing the question of whether focusing only on improving computational algorithms and infrastructure will be enough to face the challenges of Big Data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. S. Ahn, B. Shahbaba, M. Welling, Distributed stochastic gradient MCMC, in International Conference on Machine Learning (2014), pp. 1044–1052

    Google Scholar 

  2. S Akter, S.F. Wamba, Big data analytics in e-commerce: a systematic review and agenda for future research. Electron. Mark. 26(2), 173–194 (2016)

    Article  Google Scholar 

  3. A. Akusok, K.M. Björk, Y. Miche, A. Lendasse, High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access 3, 1011–1025 (2015)

    Article  Google Scholar 

  4. O.Y. Al-Jarrah, P.D. Yoo, S. Muhaidat, G.K. Karagiannidis, K. Taha, Efficient machine learning for big data: a review. Big Data Res. 2(3), 87–93 (2015)

    Article  Google Scholar 

  5. K. Albury, J. Burgess, B. Light, K Race, R. Wilken, Data cultures of mobile dating and hook-up apps: emerging issues for critical social science research. Big Data Soc. 4(2), 1–11 (2017)

    Article  Google Scholar 

  6. G.I. Allen, L. Grosenick, J. Taylor, A generalized least-square matrix decomposition. J. Am. Stat. Assoc. 109(505), 145–159 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  7. G.M. Allenby, E.T. Bradlow, E.I. George, J. Liechty, R.E. McCulloch, Perspectives on Bayesian methods and big data. Cust. Needs Solut. 1(3), 169–175 (2014)

    Article  Google Scholar 

  8. S.G. Alonso, I. de la Torre Díez, J.J. Rodrigues, S. Hamrioui, M. López-Coronado, A systematic review of techniques and sources of big data in the healthcare sector. J. Med. Syst. 41(11), 183 (2017)

    Google Scholar 

  9. A. Alyass, M. Turcotte, D. Meyre, From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med. Genomics 8(1), 33 (2015)

    Google Scholar 

  10. D. Apiletti, E. Baralis, T. Cerquitelli, P. Garza, F. Pulvirenti, L. Venturini, (2017) Frequent itemsets mining for big data: a comparative analysis. Big Data Res. 9, 67–83

    Article  Google Scholar 

  11. M.D. Assunção, R.N. Calheiros, S. Bianchi, M.A. Netto, R. Buyya, Big data computing and clouds: trends and future directions. J. Parallel Distrib. Comput. 79, 3–15 (2015)

    Article  Google Scholar 

  12. S. Atkinson, N. Zabaras, Structured Bayesian Gaussian process latent variable model: applications to data-driven dimensionality reduction and high-dimensional inversion. J. Comput. Phys. 383, 166–195 (2019)

    Article  MathSciNet  Google Scholar 

  13. A.T. Azar, A.E. Hassanien, Dimensionality reduction of medical big data using neural-fuzzy classifier. Soft Comput. 19(4), 1115–1127 (2015)

    Article  Google Scholar 

  14. A. Baldominos, E. Albacete, Y. Saez, P. Isasi, A scalable machine learning online service for big data real-time analysis, in 2014 IEEE Symposium on Computational Intelligence in Big Data (CIBD) (IEEE, Piscataway, 2014), pp. 1–8

    Google Scholar 

  15. S. Banerjee, High-dimensional Bayesian geostatistics. Bayesian Anal. 12(2), 583 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  16. S. Bansal, G. Chowell, L. Simonsen, A. Vespignani, C. Viboud, Big data for infectious disease surveillance and modeling. J. Infect. Dis. 214(suppl_4), S375–S379 (2016)

    Article  Google Scholar 

  17. R. Bardenet, A. Doucet, C. Holmes, Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach, in International Conference on Machine Learning (ICML) (2014), pp. 405–413

    Google Scholar 

  18. R. Bardenet, A. Doucet, C. Holmes, On Markov chain Monte Carlo methods for tall data. J. Mach. Learn. Res. 18(1), 1515–1557 (2017)

    MathSciNet  MATH  Google Scholar 

  19. D.W. Bates, S. Saria, L. Ohno-Machado, A. Shah, G. Escobar, Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 33(7), 1123–1131 (2014)

    Article  Google Scholar 

  20. M.J. Beal, Z. Ghahramani, C.E. Rasmussen, The infinite hidden Markov model, in Advances in Neural Information Processing Systems (2002), pp. 577–584

    Google Scholar 

  21. A. Belle, R. Thiagarajan, S. Soroushmehr, F. Navidi, D.A. Beard, K. Najarian, Big data analytics in healthcare. BioMed. Res. Int. 2015, 370194 (2015)

    Article  Google Scholar 

  22. G. Bello-Orgaz, J.J. Jung, D. Camacho, Social big data: recent achievements and new challenges. Inf. Fusion 28, 45–59 (2016)

    Article  Google Scholar 

  23. I. Ben-Gal, Bayesian Networks. Encycl. Stat. Qual. Reliab. 1, 1–6 (2008)

    Google Scholar 

  24. A. Beskos, A. Jasra, E.A. Muzaffer, A.M. Stuart, Sequential Monte Carlo methods for Bayesian elliptic inverse problems. Stat. Comput. 25(4), 727–737 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  25. M. Betancourt, A conceptual introduction to Hamiltonian Monte Carlo. Preprint, arXiv: 170102434 (2017)

    Google Scholar 

  26. J.E. Bibault, P. Giraud, A. Burgun, Big data and machine learning in radiation oncology: state of the art and future prospects. Cancer Lett. 382(1), 110–117 (2016)

    Article  Google Scholar 

  27. A. Bifet, Morales GDF Big data stream learning with Samoa, in 2014 IEEE International Conference on Data Mining Workshop (ICDMW), IEEE, pp. 1199–1202 (2014)

    Google Scholar 

  28. H. Binder, M. Blettner, Big data in medical science–a biostatistical view: Part 21 of a series on evaluation of scientific publications. Dtsch. Ärztebl Int. 112(9), 137 (2015)

    Google Scholar 

  29. D.M. Blei, A. Kucukelbir, J.D. McAuliffe, Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)

    Article  MathSciNet  Google Scholar 

  30. A. Bouchard-Côté, S.J. Vollmer, A. Doucet, The bouncy particle sampler: a nonreversible rejection-free Markov chain Monte Carlo method. J. Am. Stat. Assoc. 113, 1–13 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  31. E.T. Bradlow, M. Gangwar, P. Kopalle, S. Voleti, The role of big data and predictive analytics in retail. J. Retail. 93(1), 79–95 (2017)

    Article  Google Scholar 

  32. R. Branch, H. Tjeerdsma, C. Wilson, R. Hurley, S. McConnell, Cloud computing and big data: a review of current service models and hardware perspectives. J. Softw. Eng. Appl. 7(08), 686 (2014)

    Article  Google Scholar 

  33. L. Breiman, Classification and Regression Trees (Routledge, Abingdon, 2017)

    Book  Google Scholar 

  34. P.F. Brennan, S. Bakken, Nursing needs big data and big data needs nursing. J. Nurs. Scholarsh. 47(5), 477–484 (2015)

    Article  Google Scholar 

  35. F. Buettner, K.N. Natarajan, F.P. Casale, V. Proserpio, A. Scialdone, F.J. Theis, S.A. Teichmann, J.C. Marioni, O. Stegle, Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33(2), 155 (2015)

    Article  Google Scholar 

  36. J. Bughin, Big data, big bang? J. Big Data 3(1), 2 (2016)

    Google Scholar 

  37. R. Burrows, M. Savage, After the crisis? Big data and the methodological challenges of empirical sociology. Big Data Soc. 1(1), 1–6 (2014)

    Google Scholar 

  38. H. Cai, B. Xu, L. Jiang, A.V. Vasilakos, Iot-based big data storage systems in cloud computing: perspectives and challenges. IEEE Internet Things J. 4(1), 75–87 (2017)

    Article  Google Scholar 

  39. J.N. Cappella, Vectors into the future of mass and interpersonal communication research: big data, social media, and computational social science. Hum. Commun. Res. 43(4), 545–558 (2017)

    Article  Google Scholar 

  40. S. Castruccio, M.G. Genton, Compressing an ensemble with statistical models: an algorithm for global 3d spatio-temporal temperature. Technometrics 58(3), 319–328 (2016)

    Article  MathSciNet  Google Scholar 

  41. K. Chalupka, C.K. Williams, I. Murray, A framework for evaluating approximation methods for Gaussian process regression. J. Mach. Learn. Res. 14(Feb), 333–350 (2013)

    MathSciNet  MATH  Google Scholar 

  42. J. Chang, J.W. Fisher III, Parallel sampling of DP mixture models using sub-cluster splits, in Advances in Neural Information Processing Systems (2013), pp. 620–628

    Google Scholar 

  43. S. Chaudhuri, M. Ghosh, Empirical likelihood for small area estimation. Biometrika 98, 473–480 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  44. T. Chen, E. Fox, C. Guestrin, Stochastic gradient Hamiltonian Monte Carlo, in Int. Conference on Machine Learning (2014), pp. 1683–1691

    Google Scholar 

  45. J.J. Chen, E.E. Chen, W. Zhao, W. Zou, Statistics in big data. J. Chin. Stat. Assoc. 53, 186–202 (2015)

    Google Scholar 

  46. A.S. Cheung, Moving beyond consent for citizen science in big data health and medical research. Northwest J. Technol. Intellect. Prop. 16(1), 15 (2018)

    Google Scholar 

  47. H.A. Chipman, E.I. George, R.E. McCulloch et al., BART: Bayesian additive regression trees. Ann. Appl. Stat. 4(1), 266–298 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  48. N. Chopin, P.E. Jacob, O. Papaspiliopoulos, Smc2: an efficient algorithm for sequential analysis of state space models. J. R. Stat. Soc. Ser. B (Stat Methodol.) 75(3), 397–426 (2013)

    Article  MATH  Google Scholar 

  49. A. Damianou, N. Lawrence, Deep Gaussian processes, in Artificial Intelligence and Statistics (2013), pp. 207–215

    Google Scholar 

  50. T. Das, P.M. Kumar, Big data analytics: a framework for unstructured data analysis. Int. J. Eng. Sci. Technol. 5(1), 153 (2013)

    Google Scholar 

  51. A. De Mauro, M. Greco, M. Grimaldi, What is big data? a consensual definition and a review of key research topics, in AIP Conference Proceedings, AIP, vol. 1644 (2015), pp. 97–104

    Google Scholar 

  52. A. De Mauro, M. Greco, M. Grimaldi A formal definition of big data based on its essential features. Libr. Rev. 65(3), 122–135 (2016)

    Article  Google Scholar 

  53. M.P. Deisenroth, J.W. Ng, Distributed Gaussian processes, in Proceedings of the 32nd International Conference on International Conference on Machine Learning, vol. 37, JMLR.org (2015), pp. 1481–1490

    Google Scholar 

  54. H. Demirkan, D. Delen Leveraging the capabilities of service-oriented decision support systems: putting analytics and big data in cloud. Decis. Support Syst. 55(1), 412–421 (2013)

    Article  Google Scholar 

  55. K.S. Divya, P. Bhargavi, S. Jyothi Machine learning algorithms in big data analytics. Int. J. Comput. Sci. Eng. 6(1), 63–70 (2018)

    Google Scholar 

  56. S. Donnet, S. Robin Shortened bridge sampler: using deterministic approximations to accelerate SMC for posterior sampling. Preprint, arXiv 170707971 (2017)

    Google Scholar 

  57. J.A. Doornik, Autometrics, in The Methodology and Practice of Econometrics, A Festschrift in Honour of David F. Hendry, University Press, pp. 88–121 (2009)

    Google Scholar 

  58. J.A. Doornik, D.F. Hendry, Statistical model selection with “big data”. Cogent Econ. Finan. 3(1), 1045216 (2015)

    Google Scholar 

  59. C.C. Drovandi, C. Grazian, K. Mengersen, C. Robert, Approximating the likelihood in ABC, in Handbook of Approximate Bayesian Computation, ed. by S.A. Sisson, Y. Fan, M. Beaumont (Chapman and Hall/CRC, Boca Raton, 2018), pp. 321–368

    Google Scholar 

  60. P. Ducange, R. Pecori, P. Mezzina, A glimpse on big data analytics in the framework of marketing strategies. Soft Comput. 22(1), 325–342 (2018)

    Article  Google Scholar 

  61. D.B. Dunson, Statistics in the big data era: failures of the machine. Stat. Probab. Lett. 136, 4–9 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  62. R. Dutta, M. Schoengens, J.P. Onnela, A. Mira, Abcpy, in Proceedings of the Platform for Advanced Scientific Computing Conference on - PASC (2017)

    Google Scholar 

  63. C.K. Emani, N. Cullot, C. Nicolle, Understandable big data: a survey. Comput. Sci. Rev. 17, 70–81 (2015)

    Article  MathSciNet  Google Scholar 

  64. A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A.Y. Zomaya, S. Foufou, A. Bouras, A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)

    Article  Google Scholar 

  65. J. Fan, F. Han, H. Liu, Challenges of big data analysis. Natl. Sci. Rev. 1(2), 293–314 (2014)

    Article  Google Scholar 

  66. S. Fosso Wamba, D. Mishra, Big data integration with business processes: a literature review. Bus. Process Manag. J. 23(3), 477–492 (2017)

    Article  Google Scholar 

  67. B. Franke, J.F. Plante, R. Roscher, A. Lee, C. Smyth, A. Hatefi, F. Chen, E. Gil, A. Schwing, A. Selvitella et al., Statistical inference, learning and models in big data. Int. Stat. Rev. 84(3), 371–389 (2016)

    Article  MathSciNet  Google Scholar 

  68. D.T. Frazier, G.M. Martin, C.P. Robert, J. Rousseau, Asymptotic properties of approximate Bayesian computation. Biometrika 105(3), 593–607 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  69. Y. Gal, M. Van Der Wilk, C.E. Rasmussen, Distributed variational inference in sparse Gaussian process regression and latent variable models, in Advances in Neural Information Processing Systems (2014), pp. 3257–3265

    Google Scholar 

  70. A. Gandomi, M. Haider, Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)

    Article  Google Scholar 

  71. H. Ge, Y. Chen, M. Wan, Z. Ghahramani, Distributed inference for Dirichlet process mixture models, in International Conference on Machine Learning (2015), pp. 2276–2284

    Google Scholar 

  72. R. Genuer, J.M. Poggi, Tuleau-Malot C, N. Villa-Vialaneix, Random forests for big data. Big Data Res. 9, 28–46 (2017)

    Article  Google Scholar 

  73. Z. Ghahramani, Bayesian non-parametrics and the probabilistic approach to modelling. Phil. Trans. R. Soc. A. 371(1984), 20110553 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  74. Z. Ghahramani, T.L. Griffiths, Infinite latent feature models and the Indian buffet process, in Advances in Neural Information Processing Systems (2006), pp. 475–482

    Google Scholar 

  75. P. Gloaguen, M.P. Etienne, S. Le Corff Online sequential Monte Carlo smoother for partially observed diffusion processes. URASIP J. Adv. Signal Process. 2018(1), 9 (2018)

    Google Scholar 

  76. S. Guha, R. Hafen, J. Rounds, J. Xia, J. Li, B. Xi, W.S. Cleveland, Large complex data: divide and recombine (D&R) with RHIPE. Stat 1(1), 53–67 (2012)

    Article  MathSciNet  Google Scholar 

  77. R. Guhaniyogi, S. Banerjee, Meta-Kriging: scalable Bayesian modeling and inference for massive spatial datasets. Technometrics 60(4), 430–444 (2018)

    Article  MathSciNet  Google Scholar 

  78. R. Guhaniyogi, S. Banerjee, Multivariate spatial meta kriging. Stat. Probab. Lett. 144, 3–8 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  79. R. Guhaniyogi, S. Qamar, D.B. Dunson, Bayesian conditional density filtering for big data. Stat 1050, 15 (2014)

    Google Scholar 

  80. D. Gunawan, R. Kohn, M. Quiroz, K.D. Dang, M.N. Tran, Subsampling Sequential Monte Carlo for Static Bayesian Models. Preprint, arXiv:180503317 (2018)

    Google Scholar 

  81. H. Hassani, E.S. Silva, Forecasting with big data: a review. Ann. Data Sci. 2(1), 5–19 (2015)

    Article  Google Scholar 

  82. S.I. Hay, D.B. George, C.L. Moyes, J.S. Brownstein, Big data opportunities for global infectious disease surveillance. PLoS Med. 10(4), e1001413 (2013)

    Article  Google Scholar 

  83. M.J. Heaton, A. Datta, A. Finley, R. Furrer, R. Guhaniyogi, F. Gerber, R.B. Gramacy, D. Hammerling, M. Katzfuss, F. Lindgren et al., Methods for analyzing large spatial data: a review and comparison. Preprint, arXiv:171005013 (2017)

    Google Scholar 

  84. J. Hensman, N. Fusi, N.D. Lawrence, Gaussian processes for big data. Preprint, arXiv:13096835 (2013)

    Google Scholar 

  85. J. Hensman, A.G.d.G. Matthews, Z. Ghahramani, Scalable variational Gaussian process classification, in 18th International Conference on Artificial Intelligence and Statistics (AISTATS) (2015), pp. 351–360

    Google Scholar 

  86. M. Hilbert, Big data for development: a review of promises and challenges. Dev. Policy Rev. 34(1), 135–174 (2016)

    Article  Google Scholar 

  87. R.W. Hoerl, R.D. Snee, R.D. De Veaux, Applying statistical thinking to “Big Data” problems. Wiley Interdiscip. Rev. Comput. Stat. 6(4), 222–232 (2014)

    Article  Google Scholar 

  88. M.D. Hoffman, D.M. Blei, C. Wang, J. Paisley, Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303–1347 (2013)

    MathSciNet  MATH  Google Scholar 

  89. H.H. Huang, H. Liu, Big data machine learning and graph analytics: Current state and future challenges, in 2014 IEEE International Conference on Big Data (Big Data) (IEEE, Piscataway, 2014), pp. 16–17

    Book  Google Scholar 

  90. R. Izbicki, A.B. Lee, T. Pospisil, ABC–CDE: toward approximate Bayesian computation with complex high-dimensional data and limited simulations. J. Comput. Graph. Stat. 28, 1–20 (2019)

    Article  MathSciNet  Google Scholar 

  91. G. Jifa, Z. Lingling, Data, DIKW, big data and data science. Procedia Comput. Sci. 31, 814–821 (2014)

    Article  Google Scholar 

  92. S. Kaisler, F. Armour, J.A. Espinosa, W. Money, Big data: issues and challenges moving forward, in 2013 46th Hawaii International Conference on System Sciences (IEEE, Piscataway, 2013), pp. 995–1004

    Book  Google Scholar 

  93. A. Kapelner, J. Bleich bartMachine: machine learning with Bayesian additive regression trees. Preprint, arXiv:13122171 (2013)

    Google Scholar 

  94. V.D. Katkar, S.V. Kulkarni, A novel parallel implementation of Naive Bayesian classifier for big data, in 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE) (IEEE, Piscataway, 2013), pp. 847–852

    Google Scholar 

  95. A. Korattikara, Y. Chen, M. Welling, Austerity in MCMC land: Cutting the Metropolis-Hastings budget, in International Conference on Machine Learning (2014), pp. 181–189

    Google Scholar 

  96. H. Kousar, B.P. Babu, Multi-Agent based MapReduce Model for Efficient Utilization of System Resources. Indones. J. Electr. Eng. Comput. Sci. 11(2), 504–514 (2018)

    Article  Google Scholar 

  97. S. Landset, T.M. Khoshgoftaar, A.N. Richter, T. Hasanin, A survey of open source tools for machine learning with big data in the hadoop ecosystem. J. Big Data 2(1), 24 (2015)

    Google Scholar 

  98. G.J. Lasinio, G. Mastrantonio, A. Pollice, Discussing the “big n problem”. Stat. Methods Appt. 22(1), 97–112 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  99. N.A. Lazar, Bayesian empirical likelihood. Biometrika 90(2), 319–326 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  100. A. Lee, N. Whiteley, Forest resampling for distributed sequential Monte Carlo. Stat. Anal. Data Min. 9(4), 230–248 (2016)

    Article  MathSciNet  Google Scholar 

  101. A. Lee, C. Yau, M.B. Giles, A. Doucet, C.C. Holmes, On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. J. Comput. Graph. Stat. 19(4), 769–789 (2010)

    Article  Google Scholar 

  102. X.J. Lee, M. Hainy, McKeone JP, C.C. Drovandi, A.N. Pettitt, ABC model selection for spatial extremes models applied to South Australian maximum temperature data. Comput. Stat. Data Anal. 128, 128–144 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  103. S. Li, S. Dragicevic, F.A. Castro, M. Sester, S. Winter, A. Coltekin, C. Pettit, B. Jiang, J. Haworth, A. Stein et al., Geospatial big data handling theory and methods: a review and research challenges. ISPRS J. Photogramm. Remote Sens. 115, 119–133 (2016)

    Article  Google Scholar 

  104. D. Lin, Online learning of nonparametric mixture models via sequential variational approximation, in Advances in Neural Information Processing Systems (2013), pp. 395–403

    Google Scholar 

  105. F. Lindsten, A.M. Johansen, C.A. Naesseth, B. Kirkpatrick, T.B. Schön, J. Aston, A. Bouchard-Côté, Divide-and-conquer with sequential Monte Carlo. J. Comput. Graph. Stat. 26(2), 445–458 (2017)

    Article  MathSciNet  Google Scholar 

  106. A.R. Linero, Bayesian regression trees for high-dimensional prediction and variable selection. J. Am. Stat. Assoc. 113, 1–11 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  107. B. Liquet, K. Mengersen, A. Pettitt, M. Sutton et al., Bayesian variable selection regression of multivariate responses for group data. Bayesian Anal. 12(4), 1039–1067 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  108. L. Liu, Computing infrastructure for big data processing. Front. Comput. Sci. 7(2), 165–170 (2013)

    Article  MathSciNet  Google Scholar 

  109. Q. Liu, D. Wang, Stein variational gradient descent: a general purpose Bayesian inference algorithm, in Advances In Neural Information Processing Systems (2016), pp. 2378–2386

    Google Scholar 

  110. B. Liu, E. Blasch, Y. Chen, D. Shen, G. Chen, Scalable sentiment classification for big data analysis using Naive Bayes classifier, in 2013 IEEE International Conference on Big Data (IEEE, Piscataway, 2013), pp. 99–104

    Google Scholar 

  111. Z. Liu, F. Sun, D.P. McGovern, Sparse generalized linear model with L0 approximation for feature selection and prediction with big omics data. BioData Min. 10(1), 39 (2017)

    Google Scholar 

  112. Y. Liu, V. Ročková, Y. Wang, ABC variable selection with Bayesian forests. Preprint, arXiv:180602304 (2018)

    Google Scholar 

  113. C. Loebbecke, A. Picot, Reflections on societal and business model transformation arising from digitization and big data analytics: a research agenda. J. Strategic Inf. Syst. 24(3), 149–157 (2015)

    Article  Google Scholar 

  114. J. Luo, M. Wu, D. Gopukumar, Y. Zhao, Big data application in biomedical research and health care: a literature review. Biomed. Inform. Insights 8, BII–S31559 (2016)

    Article  Google Scholar 

  115. Z. Ma, P.K. Rana, J. Taghia, M. Flierl, A. Leijon, Bayesian estimation of Dirichlet mixture model with variational inference. Pattern Recognit. 47(9), 3143–3157 (2014)

    Article  MATH  Google Scholar 

  116. D. Maclaurin, R.P. Adams, Firefly Monte Carlo: exact MCMC with subsets of data, in Twenty-Fourth International Joint Conference on Artificial Intelligence (2014), pp. 543–552

    Google Scholar 

  117. T. Magdon-Ismail, C. Narasimhadevara, D. Jaffe, R. Nambiar, Tpcx-hs v2: transforming with technology changes, in Technology Conference on Performance Evaluation and Benchmarking (Springer, Berlin, 2017), pp. 120–130

    Google Scholar 

  118. L. Mählmann, M. Reumann, N. Evangelatos, A. Brand, Big data for public health policy-making: policy empowerment. Public Health Genomics 20(6), 312–320 (2017)

    Article  Google Scholar 

  119. F. Maire, N. Friel, P. Alquier, Informed sub-sampling MCMC: approximate Bayesian inference for large datasets. Stat. Comput. 1–34 (2017). https://doi.org/10.1007/s11222-018-9817-3

    Article  MathSciNet  MATH  Google Scholar 

  120. R. Manibharathi, R. Dinesh, Survey of challenges in encrypted data storage in cloud computing and big data. J. Netw. Commun. Emerg. Technol. 8(2) (2018). ISSN:2395-5317

    Google Scholar 

  121. R.F. Mansour, Understanding how big data leads to social networking vulnerability. Comput. Hum. Behav. 57, 348–351 (2016)

    Article  Google Scholar 

  122. A. Marshall, S. Mueck, R. Shockley, How leading organizations use big data and analytics to innovate. Strateg. Leadersh. 43(5), 32–39 (2015)

    Article  Google Scholar 

  123. T.H. McCormick, R. Ferrell, A.F. Karr, P.B. Ryan, Big data, big results: knowledge discovery in output from large-scale analytics. Stat. Anal. Data Min. 7(5), 404–412 (2014)

    Article  MathSciNet  Google Scholar 

  124. C.A. McGrory, D. Titterington, Variational approximations in Bayesian model selection for finite mixture distributions. Comput. Stat. Data Anal. 51(11), 5352–5367 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  125. T.J. McKinley, I. Vernon, I. Andrianakis, N. McCreesh, J.E. Oakley, R.N. Nsubuga, M. Goldstein, R.G. White et al., Approximate Bayesian computation and simulation-based inference for complex stochastic epidemic models. Stat. Sci. 33(1), 4–18 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  126. E. Meeds, M. Welling, GPS-ABC: Gaussian process surrogate approximate Bayesian computation. Preprint, arXiv:14012838 (2014)

    Google Scholar 

  127. K.L. Mengersen, P. Pudlo, C.P. Robert, Bayesian computation via empirical likelihood. Proc. Natl. Acad. Sci. 110(4), 1321–1326 (2013)

    Article  Google Scholar 

  128. S. Minsker, S. Srivastava, L. Lin, D.B. Dunson, Robust and scalable Bayes via a median of subset posterior measures. J. Mach. Learn. Res. 18(1), 4488–4527 (2017)

    MathSciNet  MATH  Google Scholar 

  129. M.T. Moores, C.C. Drovandi, K. Mengersen, C.P. Robert, Pre-processing for approximate Bayesian computation in image analysis. Stat. Comput. 25(1), 23–33 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  130. N. Moustafa, G. Creech, E. Sitnikova, M. Keshk, Collaborative anomaly detection framework for handling big data of cloud computing, in Military Communications and Information Systems Conference (MilCIS), 2017 (IEEE, Piscataway, 2017), pp. 1–6

    Google Scholar 

  131. P. Müller, F.A. Quintana, A. Jara, T. Hanson, Bayesian Nonparametric Data Analysis (Springer, Berlin, 2015)

    Book  MATH  Google Scholar 

  132. O. Müller, I. Junglas, J.v. Brocke, S. Debortoli, Utilizing big data analytics for information systems research: challenges, promises and guidelines. Eur. J. Inf. Syst. 25(4), 289–302 (2016)

    Article  Google Scholar 

  133. C.A. Naesseth, S.W. Linderman, R. Ranganath, D.M. Blei, Variational sequential Monte Carlo. Preprint, arXiv:170511140 (2017)

    Google Scholar 

  134. W. Neiswanger, C. Wang, E. Xing, Asymptotically exact, embarrassingly parallel MCMC. Preprint, arXiv:13114780 (2013)

    Google Scholar 

  135. Y. Ni, P. Müller, M. Diesendruck, S. Williamson, Y. Zhu, Y. Ji Scalable Bayesian nonparametric clustering and classification. J. Comput. Graph. Stat. 1–45 (2019). https://doi.org/10.1080/10618600.2019.1624366

    Article  MathSciNet  Google Scholar 

  136. L.G. Nongxa, Mathematical and statistical foundations and challenges of (big) data sciences. S. Afr. J. Sci. 113(3–4), 1–4 (2017)

    Google Scholar 

  137. B. Oancea, R.M. Dragoescu et al., Integrating R and hadoop for big data analysis. Romanian Stat. Rev. 62(2), 83–94 (2014)

    Google Scholar 

  138. Z. Obermeyer, E.J. Emanuel, Predicting the future—big data, machine learning, and clinical medicine. N. Engl. J. Med. 375(13), 1216 (2016)

    Article  Google Scholar 

  139. A. O’Driscoll, J. Daugelaite, R.D. Sleator, ‘Big data’, Hadoop and cloud computing in genomics. J. Biomed. Inform. 46(5), 774–781 (2013)

    Google Scholar 

  140. D. Oprea, Big questions on big data. Rev. Cercet. Interv. Soc. 55, 112 (2016)

    Google Scholar 

  141. A.B. Owen, Empirical Likelihood (Chapman and Hall/CRC, Boca Raton, 2001)

    Book  MATH  Google Scholar 

  142. S. Pandey, V. Tokekar, Prominence of mapreduce in big data processing, in 2014 Fourth International Conference on Communication Systems and Network Technologies (CSNT) (IEEE, Piscataway, 2014), pp. 555–560

    Google Scholar 

  143. A.Ç. Pehlivanlı, A novel feature selection scheme for high-dimensional data sets: four-staged feature selection. J. Appl. Stat. 43(6), 1140–1154 (2015)

    Article  MathSciNet  Google Scholar 

  144. D.N. Politis, J.P. Romano, M. Wolf, Subsampling (Springer Science & Business Media, New York, 1999)

    Book  MATH  Google Scholar 

  145. A.T. Porter, S.H. Holan, C.K. Wikle, Bayesian semiparametric hierarchical empirical likelihood spatial models. J. Stat. Plan. Inference 165, 78–90 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  146. A.T. Porter, S.H. Holan, C.K. Wikle, Multivariate spatial hierarchical Bayesian empirical likelihood methods for small area estimation. Stat 4(1), 108–116 (2015)

    Article  MathSciNet  Google Scholar 

  147. P. Pudlo, J.M. Marin, A. Estoup, J.M. Cornuet, M. Gautier, C.P. Robert, Reliable ABC model choice via random forests. Bioinformatics 32(6), 859–866 (2015)

    Article  Google Scholar 

  148. F. Qi, F. Yang, Analysis of large data mining platform based on cloud computing, in 2018 4th World Conference on Control Electronics and Computer Engineering (2018)

    Google Scholar 

  149. J. Qiu, Q. Wu, G. Ding, Y. Xu, S. Feng, A survey of machine learning for big data processing. EURASIP J. Adv. Signal Process. 2016(1), 67 (2016)

    Google Scholar 

  150. M. Quiroz, M. Villani, R. Kohn, Scalable MCMC for large data problems using data subsampling and the difference estimator. SSRN Electron. J. (2015). arXiv:1507.02971

    Google Scholar 

  151. M. Quiroz, R. Kohn, M. Villani, M.N. Tran, Speeding up MCMC by efficient data subsampling. J. Am. Stat. Assoc. 1–13 (2018). https://doi.org/10.1080/01621459.2018.1448827

    Article  MathSciNet  MATH  Google Scholar 

  152. M. Rabinovich, E. Angelino, M.I. Jordan, Variational consensus Monte Carlo, in Advances in Neural Information Processing Systems (2015), pp. 1207–1215

    Google Scholar 

  153. W. Raghupathi, V. Raghupathi, Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 3 (2014)

    Google Scholar 

  154. E. Raguseo, Big data technologies: an empirical investigation on their adoption, benefits and risks for companies. Int. J. Inf. Manag. 38(1), 187–195 (2018)

    Article  Google Scholar 

  155. C.E. Rasmussen, The infinite Gaussian mixture model, in Advances in Neural Information Processing Systems (2000), pp. 554–560

    Google Scholar 

  156. C.E. Rasmussen, Gaussian processes in machine learning, in Advanced Lectures on Machine Learning (Springer, Berlin, 2004), pp. 63–71

    Book  MATH  Google Scholar 

  157. V. Rocková, S. van der Pas, Posterior concentration for Bayesian regression trees and forests. Ann. Stat. (in revision) 1–40 (2017). arXiv:1708.08734

    Google Scholar 

  158. J. Roski, G.W. Bo-Linn, T.A. Andrews, Creating value in health care through big data: opportunities and policy implications. Health Aff. 33(7), 1115–1122 (2014)

    Article  Google Scholar 

  159. J.S. Rumsfeld, K.E. Joynt, T.M. Maddox, Big data analytics to improve cardiovascular care: promise and challenges. Nat. Rev. Cardiol. 13(6), 350–359 (2016)

    Article  Google Scholar 

  160. S. Sagiroglu, D. Sinanc, Big data: a review, in 2013 International Conference on Collaboration Technologies and Systems (CTS) (IEEE, Piscataway, 2013), pp. 42–47

    Book  Google Scholar 

  161. S.M. Schennach, Bayesian exponentially tilted empirical likelihood. Biometrika 92(1), 31–46 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  162. E.D. Schifano, J. Wu, C. Wang, J. Yan, M.H. Chen, Online updating of statistical inference in the big data setting. Technometrics 58(3), 393–403 (2016)

    Article  MathSciNet  Google Scholar 

  163. S.L. Scott, A.W. Blocker, F.V. Bonassi, H.A. Chipman, E.I. George, R.E. McCulloch (2016) Bayes and big data: The consensus Monte Carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11(2), 78–88

    Google Scholar 

  164. D.V. Shah, J.N. Cappella, W.R. Neuman, Big data, digital media, and computational social science: possibilities and perils. Ann. Am. Acad. Pol. Soc. Sci. 659(1), 6–13 (2015)

    Article  Google Scholar 

  165. A. Siddiqa, A. Karim, A. Gani, Big data storage technologies: a survey. Front. Inf. Technol. Electron. Eng. 18(8), 1040–1070 (2017)

    Article  Google Scholar 

  166. P. Singh, A. Hellander, Multi-statistic Approximate Bayesian Computation with multi-armed bandits. Preprint, arXiv:180508647 (2018)

    Google Scholar 

  167. S. Sisson, Y. Fan, M. Beaumont, Overview of ABC, in Handbook of Approximate Bayesian Computation (Chapman and Hall/CRC, New York, 2018), pp. 3–54

    Book  Google Scholar 

  168. U. Sivarajah, M.M. Kamal, Z. Irani, V. Weerakkody, Critical analysis of big data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)

    Article  Google Scholar 

  169. S. Srivastava, C. Li, D.B. Dunson, Scalable Bayes via barycenter in Wasserstein space. J. Mach. Learn. Res. 19(1), 312–346 (2018)

    MathSciNet  MATH  Google Scholar 

  170. H. Strathmann, D. Sejdinovic, M. Girolami, Unbiased Bayes for big data: paths of partial posteriors. Preprint, arXiv:150103326 (2015)

    Google Scholar 

  171. M.A. Suchard, Q. Wang, C. Chan, J. Frelinger, A. Cron, M. West, Understanding GPU programming for statistical computation: studies in massively parallel massive mixtures. J. Comput. Graph. Stat. 19(2), 419–438 (2010)

    Article  MathSciNet  Google Scholar 

  172. Z. Sun, L. Sun, K. Strang, Big data analytics services for enhancing business intelligence. J. Comput. Inf. Syst. 58(2), 162–169 (2018)

    Google Scholar 

  173. S. Suthaharan, Big data classification: problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS Perform. Eval. Rev. 41(4), 70–73 (2014)

    Article  Google Scholar 

  174. O. Sysoev, A. Grimvall, O. Burdakov, Bootstrap confidence intervals for large-scale multivariate monotonic regression problems. Commun. Stat. Simul. Comput. 45(3), 1025–1040 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  175. D. Talia, Clouds for scalable big data analytics. Computer 46(5), 98–101 (2013)

    Article  Google Scholar 

  176. Y. Tang, Z. Xu, Y. Zhuang, Bayesian network structure learning from big data: a reservoir sampling based ensemble method, in International Conference on Database Systems for Advanced Applications (Springer, Berlin, 2016), pp. 209–222

    Google Scholar 

  177. A. Tank, N. Foti, E. Fox, Streaming variational inference for Bayesian nonparametric mixture models, in Artificial Intelligence and Statistics (2015), pp. 968–976

    Google Scholar 

  178. Y.W. Teh, A.H. Thiery, S.J. Vollmer, Consistency and fluctuations for stochastic gradient Langevin dynamics. J. Mach. Learn. Res. 17(1), 193–225 (2016)

    MathSciNet  MATH  Google Scholar 

  179. D. Tran, R. Ranganath, D.M. Blei, The variational Gaussian process. Preprint, arXiv:151106499 (2015)

    Google Scholar 

  180. N. Tripuraneni, S. Gu, H. Ge, Z. Ghahramani, Particle Gibbs for infinite hidden Markov models, in Advances in Neural Information Processing Systems (2015), pp. 2395–2403

    Google Scholar 

  181. S. van der Pas, V. Rockova, Bayesian dyadic trees and histograms for regression, in Advances in Neural Information Processing Systems (2017), pp. 2089–2099

    Google Scholar 

  182. M. Viceconti, P. Hunter, R. Hose, Big data, big knowledge: big data for personalized healthcare. IEEE J. Biomed. Health Inform. 19(4), 1209–1215 (2015)

    Article  Google Scholar 

  183. A. Vyas, S. Ram, Comparative study of MapReduce frameworks in big data analytics. Int. J. Mod. Comput. Sci. 5(Special Issue), 5–13 (2017)

    Google Scholar 

  184. S.F. Wamba, S. Akter, A. Edwards, G. Chopin, D. Gnanzou, How “big data” can make big impact: findings from a systematic review and a longitudinal case study. Int. J. Prod. Econ. 165, 234–246 (2015)

    Article  Google Scholar 

  185. X.F. Wang, Fast clustering using adaptive density peak detection. Stat. Methods Med. Res. 26(6), 2800–2811 (2015)

    Article  MathSciNet  Google Scholar 

  186. L. Wang, D.B. Dunson, Fast Bayesian inference in Dirichlet process mixture models. J. Comput. Graph. Stat. 20(1), 196–216 (2011)

    Article  MathSciNet  Google Scholar 

  187. X. Wang, D.B. Dunson, Parallelizing MCMC via weierstrass sampler. Preprint, arXiv:13124605 (2013)

    Google Scholar 

  188. T. Wang, R.J. Samworth, High dimensional change point estimation via sparse projection. J. R. Stat. Soc. Ser. B (Stat Methodol.) 80(1), 57–83 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  189. C. Wang, J. Paisley, D. Blei, Online variational inference for the hierarchical Dirichlet process, in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (2011), pp. 752–760

    Google Scholar 

  190. J. Wang, Y. Tang, M. Nguyen, I. Altintas, A scalable data science workflow approach for big data Bayesian network learning, in 2014 IEEE/ACM Int Symp. Big Data Comput. (IEEE, Piscataway, 2014), pp. 16–25

    Google Scholar 

  191. C. Wang, M.H. Chen, E. Schifano, J. Wu, J. Yan, Statistical methods and computing for big data. Stat. Interface 9(4), 399–414 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  192. C. Wang, M.H. Chen, J. Wu, J. Yan, Y. Zhang, E. Schifano, Online updating method with new variables for big data streams. Can. J. Stat. 46(1), 123–146 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  193. H.J. Watson, Tutorial: big data analytics: concepts, technologies, and applications. Commun. Assoc. Inf. Syst. 34, 65 (2014)

    Google Scholar 

  194. Y. Webb-Vargas, S. Chen, A. Fisher, A. Mejia, Y. Xu, C. Crainiceanu, B. Caffo, M.A. Lindquist, Big data and neuroimaging. Stat. Biosci. 9(2), 543–558 (2017)

    Article  Google Scholar 

  195. S. White, T. Kypraios, S.P. Preston, Piecewise Approximate Bayesian Computation: fast inference for discretely observed Markov models using a factorised posterior distribution. Stat. Comput. 25(2), 289–301 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  196. R. Wilkinson, Accelerating ABC methods using Gaussian processes, in Artificial Intelligence and Statistics (2014), pp. 1015–1023

    Google Scholar 

  197. S. Williamson, A. Dubey, E.P. Xing, Parallel Markov chain Monte Carlo for nonparametric mixture models, in Proceedings of the 30th International Conference on Machine Learning (ICML-13) (2013), pp. 98–106

    Google Scholar 

  198. A.F. Wise, D.W. Shaffer, Why theory matters more than ever in the age of big data. J. Learn. Anal. 2(2), 5–13 (2015)

    Article  Google Scholar 

  199. C. Wu, C.P. Robert, Average of recentered parallel MCMC for big data. Preprint, arXiv:170604780 (2017)

    Google Scholar 

  200. X.G. Xia, Small data, mid data, and big data versus algebra, analysis, and topology. IEEE Signal Process. Mag. 34(1), 48–51 (2017)

    Article  Google Scholar 

  201. C. Yang, Q. Huang, Z. Li, K. Liu, F. Hu, Big data and cloud computing: innovation opportunities and challenges. Int. J. Digit Earth 10(1), 13–53 (2017)

    Article  Google Scholar 

  202. C. Yoo, L. Ramirez, J. Liuzzi, Big data analysis using modern statistical and machine learning methods in medicine. Int. Neurourol. J. 18(2), 50 (2014)

    Article  Google Scholar 

  203. L. Yu, N. Lin, ADMM for penalized quantile regression in big data. Int. Stat. Rev. 85(3), 494–518 (2017)

    Article  MathSciNet  Google Scholar 

  204. T. Zhang, B. Yang, An exact approach to ridge regression for big data. Comput. Stat. 32, 1–20 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  205. X. Zhang, C. Liu, S. Nepal, C. Yang, W. Dou, J. Chen, A hybrid approach for scalable sub-tree anonymization over big data using MapReduce on cloud. J. Comput. Syst. Sci. 80(5), 1008–1020 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  206. Y. Zhang, T. Cao, S. Li, X. Tian, L. Yuan, H. Jia, A.V. Vasilakos, Parallel processing systems for big data: a survey. Proc. IEEE 104(11), 2114–2136 (2016)

    Article  Google Scholar 

  207. Z. Zhang, K.K.R. Choo, B.B. Gupta, The convergence of new computing paradigms and big data analytics methodologies for online social networks. J. Comput. Sci. 26, 453–455 (2018)

    Article  Google Scholar 

  208. L. Zhang, A. Datta, S. Banerjee, Practical Bayesian modeling and inference for massive spatial data sets on modest computing environments. Stat. Anal. Data Min. 12(3), 197–209 (2019)

    Article  MathSciNet  Google Scholar 

  209. L. Zhou, S. Pan, J. Wang, A.V. Vasilakos, Machine learning on big data: Opportunities and challenges. Neurocomputing 237, 350–361 (2017)

    Article  Google Scholar 

  210. J. Zhu, J. Chen, W. Hu, B. Zhang, Big learning with Bayesian methods. Natl. Sci. Rev. 4(4), 627–651 (2017)

    Article  Google Scholar 

  211. G. Zoubin, Scaling the Indian Buffet process via submodular maximization, in International Conference on Machine Learning (2013), pp. 1013–1021

    Google Scholar 

Download references

Acknowledgements

This research was supported by an ARC Australian Laureate Fellowship for project, Bayesian Learning for Decision Making in the Big Data Era under Grant no. FL150100150. The authors also acknowledge the support of the Australian Research Council (ARC) Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farzana Jahan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jahan, F., Ullah, I., Mengersen, K.L. (2020). A Survey of Bayesian Statistical Approaches for Big Data. In: Mengersen, K., Pudlo, P., Robert, C. (eds) Case Studies in Applied Bayesian Data Science. Lecture Notes in Mathematics, vol 2259. Springer, Cham. https://doi.org/10.1007/978-3-030-42553-1_2

Download citation

Publish with us

Policies and ethics