Skip to main content

Application of Bayesian Mixture Models to Satellite Images and Estimating the Risk of Fire-Ant Incursion in the Identified Geographical Cluster

  • Chapter
  • First Online:
Case Studies in Applied Bayesian Data Science

Part of the book series: Lecture Notes in Mathematics ((LNM,volume 2259))

  • 2013 Accesses

Abstract

Bayesian non-parametric mixture models have found great success in the statistical practice of identifying latent clusters in data. However, fitting such models can be computationally intensive and of less practical use when it comes to tall datasets, such as Landsat imagery. To overcome this issue, we propose to obtain multiple samples from data using stratified random sampling to enforce adequate representation in each sample from sub-populations that may exist in data. The non-parametric model is then fitted to each sample dataset independently to obtain posterior estimates. Label correspondence across multiple estimates is achieved using multivariate component densities of a chosen reference partition followed by pooling multiple posterior estimates to form a consensus posterior inference. The labels for pixels in the entire image are inferred using the conditional posterior distribution given pooled estimates, thereby substantially reducing the computational time and memory requirement.

The method is tested on Landsat images from the Brisbane region in Australia, which were compiled as a part of the national program for the eradication of the imported red fire-ant that was launched in September 2001 and which continues to the present date. The aim is to estimate the risk of fire-ant incursion in each of the identified geographical cluster so that the eradication program focuses on high risk areas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. R. Bardenet, A. Doucet, C. Holmes, On markov chain monte carlo methods for tall data. J. Mach. Learn. Res. 18(1), 1515–1557 (2017)

    MathSciNet  MATH  Google Scholar 

  2. D. Blackwell, J.B. MacQueen, Ferguson distributions via pólya urn schemes. Ann. Stat. 1, 353–355 (1973)

    Article  MATH  Google Scholar 

  3. D.M. Blei, A. Kucukelbir, J.D. McAuliffe, Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)

    Article  MathSciNet  Google Scholar 

  4. J. Chang, J.W. Fisher III, Parallel sampling of DP mixture models using sub-cluster splits, in Advances in Neural Information Processing Systems (2013), pp. 620–628

    Google Scholar 

  5. C.M. De Vries, L. De Vine, S. Geva, R. Nayak, Parallel streaming signature em-tree: a clustering algorithm for web scale applications, in Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee (2015), pp. 216–226

    Google Scholar 

  6. M.D. Escobar, Estimating normal means with a dirichlet process prior. J. Am. Stat. Assoc. 89(425), 268–277 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  7. M.D. Escobar, M. West, Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90(430), 577–588 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  8. S. Guha, R. Hafen, J. Rounds, J. Xia, J. Li, B. Xi, W.S. Cleveland, Large complex data: divide and recombine (D&R) with RHIPE. Stat 1(1), 53–67 (2012)

    Article  MathSciNet  Google Scholar 

  9. G. Guillera-Arroita, J.J. Lahoz-Monfort, J. Elith, A. Gordon, H. Kujala, P.E. Lentini, M.A. McCarthy, R. Tingley, B.A. Wintle, Is my species distribution model fit for purpose? Matching data and models to applications. Glob. Ecol. Biogeogr. 24(3), 276–292 (2015)

    Article  Google Scholar 

  10. J.A. Hartigan, M.A. Wong, Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. C 28(1), 100–108 (1979)

    MATH  Google Scholar 

  11. T. Hastie, W. Fithian, Inference from presence-only data; the ongoing controversy. Ecography 36(8), 864–867 (2013)

    Article  Google Scholar 

  12. R.J. Hijmans, J. van Etten, J. Cheng, M. Mattiuzzi, M. Sumner, J.A. Greenberg, O.P. Lamigueiro, A. Bevan, E.B. Racine, A. Shortridge, et al., Package ‘raster’. R package (2016). https://cranr-projectorg/web/packages/raster/indexhtml. Accessed October 1, 2016

    Google Scholar 

  13. M.D. Hoffman, D.M. Blei, C. Wang, J. Paisley, Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303–1347 (2013)

    MathSciNet  MATH  Google Scholar 

  14. Z. Huang, A. Gelman, Sampling for Bayesian computation with large datasets. Technical Report (2005)

    Google Scholar 

  15. H. Ishwaran, L.F. James, Approximate dirichlet process computing in finite normal mixtures: smoothing and prior information. J. Comput. Graph. Stat. 11(3), 508–532 (2002)

    Article  MathSciNet  Google Scholar 

  16. A.K. Jain, Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  17. A. Kulkarni, J. Callan, Document allocation policies for selective searching of distributed indexes, in Proceedings of the 19th ACM International Conference on Information and Knowledge Management (ACM, New York, 2010), pp. 449–458

    Google Scholar 

  18. A. Lee, C. Yau, M.B. Giles, A. Doucet, C.C. Holmes, On the utility of graphics cards to perform massively parallel simulation of advanced monte carlo methods. J. Comput. Graph. Stat. 19(4), 769–789 (2010)

    Article  Google Scholar 

  19. S.N. MacEachern, Estimating normal means with a conjugate style dirichlet process prior. Commun. Stat. Simul. Comput. 23(3), 727–741 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  20. J. MacQueen, et al., Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics And Probability, Oakland, vol. 1 (1967), pp. 281–297

    Google Scholar 

  21. I. Manolopoulou, C. Chan, M. West, Selection sampling from large data sets for targeted inference in mixture modeling. Bayesian Anal. 5(3), 1 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  22. J.M. Marin, P. Pudlo, C.P. Robert, R.J. Ryder, Approximate Bayesian computational methods. Stat. Comput. 22(6), 1167–1180 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  23. C.A. McGrory, D. Titterington, Variational approximations in Bayesian model selection for finite mixture distributions. Comput. Stat. Data Anal. 51(11), 5352–5367 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  24. M.T. Moores, C.C. Drovandi, K. Mengersen, C.P. Robert, Pre-processing for approximate Bayesian computation in image analysis. Stat. Comput. 25(1), 23–33 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  25. J.T. Ormerod, M.P. Wand, Explaining variational approximations. Am. Stat. 64(2), 140–153 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  26. C.E. Rasmussen, The infinite gaussian mixture model, in Advances in Neural Information Processing Systems (MIT Press, Cambridge, 2000), pp. 554–560

    Google Scholar 

  27. S.L. Scott, A.W. Blocker, F.V. Bonassi, H.A. Chipman, E.I. George, R.E. McCulloch, Bayes and big data: the consensus Monte Carlo algorithm. Int. J. Manage. Sci. Eng. Manage. 11(2), 78–88 (2016)

    Google Scholar 

  28. J. Sethuraman, A constructive definition of dirichlet priors, in Statistica Sinica (1994), pp. 639–650

    Google Scholar 

  29. D. Spring, O.J. Cacho, Estimating eradication probabilities and trade-offs for decision analysis in invasive species eradication programs. Biol. Invasions 17(1), 191–204 (2015)

    Article  Google Scholar 

  30. M.A. Suchard, Q. Wang, C. Chan, J. Frelinger, A. Cron , M. West, Understanding GPU programming for statistical computation: studies in massively parallel massive mixtures. J. Comput. Graph. Stat. 19(2), 419–438 (2010)

    Article  MathSciNet  Google Scholar 

  31. S. Williamson, A. Dubey, E.P. Xing, Parallel Markov chain Monte Carlo for nonparametric mixture models, in Proceedings of the 30th International Conference on Machine Learning (ICML-13), (2013), pp. 98–106

    Google Scholar 

Download references

Acknowledgements

This research was supported by an ARC Australian Laureate Fellowship for project, Bayesian Learning for Decision Making in the Big Data Era under Grant no. FL150100150. The authors also acknowledge the support of the Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS) and the support of QUT’s high-performance computing and Research Support (HPC) group.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Insha Ullah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ullah, I., Mengersen, K.L. (2020). Application of Bayesian Mixture Models to Satellite Images and Estimating the Risk of Fire-Ant Incursion in the Identified Geographical Cluster. In: Mengersen, K., Pudlo, P., Robert, C. (eds) Case Studies in Applied Bayesian Data Science. Lecture Notes in Mathematics, vol 2259. Springer, Cham. https://doi.org/10.1007/978-3-030-42553-1_17

Download citation

Publish with us

Policies and ethics