Skip to main content

Advertisement

Log in

Random Forest-Based Prospectivity Modelling of Greenfield Terrains Using Sparse Deposit Data: An Example from the Tanami Region, Western Australia

  • Original Paper
  • Published:
Natural Resources Research Aims and scope Submit manuscript

Abstract

Data-driven prospectivity modelling of greenfields terrains is challenging because very few deposits are available and the training data are overwhelmingly dominated by non-deposit samples. This could lead to biased estimates of model parameters. In the present study involving Random Forest (RF)-based gold prospectivity modelling of the Tanami region, a greenfields terrain in Western Australia, we apply the Synthetic Minority Over-sampling Technique to modify the initial dataset and bring the deposit-to-non-deposit ratio closer to 50:50. An optimal threshold range is determined objectively using statistical measures such as the data sensitivity, specificity, kappa and per cent correctly classified. The RF regression modelling with the modified dataset of close to 50:50 sample ratio of deposit to non-deposit delineates 4.67% of the study area as high prospectivity areas as compared to only 1.06% by the original dataset, implying that the original “sparse” dataset underestimates prospectivity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

References

  • Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6(1), 20–29.

    Article  Google Scholar 

  • Bean, W. T., Stafford, R., & Brashares, J. S. (2012). The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models. Ecography, 35(3), 250–258.

    Article  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi:10.1023/A%3A1010933404324.

    Article  Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees, the wadsworth statistics and probability series (p. 356). Belmont California: Wadsworth International Group.

  • Breslow, N., & Cain, K. (1988). Logistic regression for two-stage case-control data. Biometrika, 75, 11–20.

  • Cantor, S. B., Sun, C. C., Tortolero-Luna, G., Richards-Kortum, R., & Follen, M. (1999). A comparison of c/b ratios from studies using receiver operating characteristic curve analysis. Journal of Clinical Epidemiology, 52(9), 885–892. http://www.sciencedirect.com/science/article/pii/S089543569900075X

  • Carranza, E. J. M., & Laborte, A. G. (2015a). Data-driven predictive mapping of gold prospectivity, baguio district, philippines: Application of random forests algorithm. Ore Geology Reviews, 71, 777–787.

    Article  Google Scholar 

  • Carranza, E. J. M., & Laborte, A. G. (2015b). Random forest predictive modeling of mineral prospectivity with small number of prospects and data with missing values in abra (philippines). Computers & Geosciences, 74, 60–70.

    Article  Google Scholar 

  • Carranza, E. J. M., & Laborte, A. G. (2016). Data-driven predictive modeling of mineral prospectivity using random forests: A case study in catanduanes island (philippines). Natural Resources Research, 25(1), 35–50.

    Article  Google Scholar 

  • Carranza, E. J. M., Sadeghi, M., & Billay, A. (2015). Predictive mapping of prospectivity for orogenic gold, giyani greenstone belt (south africa). Ore Geology Reviews, 71, 703–718.

    Article  Google Scholar 

  • Champion, D., Budd, A., & Wyborn, L. (2007). Ozchem national whole rock geochemistry database. Geoscience Australia. http://www.ga.gov.au/metadata-gateway/metadata/record/gcat_65464

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.

  • Core Team, R. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN 3-900051-07-0. http://www.R-project.org/

  • Cracknell, M. J., Reading, A. M., & McNeill, A. W. (2014). Mapping geology and volcanic-hosted massive sulfide alteration in the Hellyer–Mt Charter region, tasmania, using random forests and self-organising maps. Australian Journal of Earth Sciences, 61(2), 287–304. doi:10.1080/08120099.2014.858081.

    Article  Google Scholar 

  • Cushman, S. A., Shirk, A. J., & Landguth, E. L. (2013). Landscape genetics and limiting factors. Conservation Genetics, 14(2), 263–274.

    Article  Google Scholar 

  • Gao, Y., Zhang, Z., Xiong, Y., & Zuo, R. (2016). Mapping mineral prospectivity for cu polymetallic mineralization in Southwest Fujian province, China. Ore Geology Reviews, 75, 16–28.

    Article  Google Scholar 

  • Geoscience Australia. (2008). Geological survey of western australia. West Tanami, 2008: Western Australia Geological Survey, 1:100 000 Geological Information Series, ISBN 978-1-74168-186-4.

  • Geoscience Australia. (2010). Geological survey of western australia, geochem database, geological survey of western australia. http://geochem.dmp.wa.gov.au/geochem/

  • Gislason, P. O., Benediktsson, J. A., & Sveinsson, J. R. (2006). Random forests for land cover classification. Pattern Recognition Letters, 27(4), 294–300. Pattern Recognition in Remote Sensing (PRRS 2004). http://www.sciencedirect.com/science/article/pii/S0167865505002242

  • Goleby, B. R., Huston, D. L., Lyons, P., Vandenberg, L., Bagas, L., Davies, B. M., et al. (2009). The tanami deep seismic reflection experiment: An insight into gold mineralization and paleoproterozoic collision in the north australian craton. Tectonophysics, 472(1), 169–182.

    Article  Google Scholar 

  • Grömping, U. (2009). Variable importance assessment in regression: Linear regression versus random forest. The American Statistician, 63(4), 308–319.

  • Guisan, A., Theurillat, J.-P., & Kienast, F. (1998). Predicting the potential distribution of plant species in an alpine environment. Journal of Vegetation Science, 9(1), 65–74.

    Article  Google Scholar 

  • Hariharan, S., Tirodkar, S., & Bhattacharya, A. (2016). Polarimetric sar decomposition parameter subset selection and their optimal dynamic range evaluation for urban area classification using random forest. International Journal of Applied Earth Observation and Geoinformation, 44, 144–158.

    Article  Google Scholar 

  • Harris, J., Grunsky, E., Behnia, P., & Corrigan, D. (2015). Data-and knowledge-driven mineral prospectivity maps for canada’s north. Ore Geology Reviews, 71, 788–803.

    Article  Google Scholar 

  • Harris, J., Wilkinson, L., Heather, K., Fumerton, S., Bernier, M., Ayer, J., et al. (2001). Application of gis processing techniques for producing mineral prospectivity mapsa case study: Mesothermal au in the swayze greenstone belt, ontario, canada. Natural Resources Research, 10(2), 91–124.

    Article  Google Scholar 

  • He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.

    Article  Google Scholar 

  • Hocking, R., & Leslie, R. (1967). Selection of the best subset in regression analysis. Technometrics, 9(4), 531–540.

    Article  Google Scholar 

  • Huntley, B., Berry, P. M., Cramer, W., & McDonald, A. P. (1995). Special paper: Modelling present and potential future ranges of some european higher plants using climate response surfaces. Journal of Biogeography, 22, 967–1001.

  • Jiménez-Valverde, A., & Lobo, J. M. (2007). Threshold criteria for conversion of probability of species presence to either-or presence-absence. Acta Oecologica, 31(3), 361–369.

    Article  Google Scholar 

  • Joly, A., McCuaig, T. C., & Bagas, L. (2010). The importance of early crustal architecture for subsequent basin-forming, magmatic and fluid flow events. the granites-tanami orogen example. Precambrian Research, 182(1), 15–29.

    Article  Google Scholar 

  • Joly, A., Porwal, A., & McCuaig, T. C. (2012). Exploration targeting for orogenic gold deposits in the granites-tanami orogen: Mineral system analysis, targeting model and prospectivity analysis. Ore Geology Reviews, 48, 349–383.

    Article  Google Scholar 

  • King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163.

  • Koskas, M., Genin, A. S., Graesslin, O., Barranger, E., Haddad, B., Darai, E., et al. (2014). Evaluation of a method of predicting lymph node metastasis in endometrial cancer based on five pre-operative characteristics. European Journal of Obstetrics & Gynecology and Reproductive Biology, 172, 115–119.

    Article  Google Scholar 

  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22.

    Google Scholar 

  • Lieberman, M. D., & Cunningham, W. A. (2009). Type i and type ii error concerns in fmri research: Re-balancing the scale. Social Cognitive and Affective Neuroscience, 4(4), 423.

    Article  Google Scholar 

  • Liu, C., Berry, P. M., Dawson, T. P., & Pearson, R. G. (2005). Selecting thresholds of occurrence in the prediction of species distributions. Ecography, 28(3), 385–393. doi:10.1111/j.0906-7590.2005.03957.x.

    Article  Google Scholar 

  • Lobo, J. M., Jimnez-Valverde, A., & Real, R. (2008). Auc: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography, 17(2), 145–151. doi:10.1111/j.1466-8238.2007.00358.x.

    Article  Google Scholar 

  • Maloney, K. O., Weller, D. E., Michaelson, D. E., & Ciccotto, P. J. (2013). Species distribution models of freshwater stream fishes in maryland and their implications for management. Environmental Modeling & Assessment, 18(1), 1–12.

    Article  Google Scholar 

  • Manel, S., Williams, H. C., & Ormerod, S. (2001). Evaluating presenceabsence models in ecology: The need to account for prevalence. Journal of Applied Ecology, 38(5), 921–931. doi:10.1046/j.1365-2664.2001.00647.x.

    Article  Google Scholar 

  • McCoy, J., Johnston, K., & Environmental Systems Research Institute. (2001). Using ArcGIS spatial analyst: GIS by ESRI. Redlands, CA: Environmental Systems Research Institute.

  • McKay, G., & Harris, J. (2016). Comparison of the data-driven random forests model and a knowledge-driven method for mineral prospectivity mapping: a case study for gold deposits around the huritz group and nueltin suite, nunavut, canada. Natural Resources Research, 25(2), 125–143.

    Article  Google Scholar 

  • Nagelkerke, N. J. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691–692.

    Article  Google Scholar 

  • Ok, A. O., Akar, O., & Gungor, O. (2012). Evaluation of random forest method for agricultural crop classification. European Journal of Remote Sensing, 45(3), 421.

    Article  Google Scholar 

  • Porwal, A., & Carranza, E. J. M. (2015). Introduction to the special issue: Gis-based mineral potential modelling and geological data analyses for mineral exploration. Ore Geology Reviews, 71, 477–483.

    Article  Google Scholar 

  • Rodriguez-Galiano, V., Chica-Olmo, M., & Chica-Rivas, M. (2014). Predictive modelling of gold potential with the integration of multisource information based on random forest: a case study on the rodalquilar area, southern spain. International Journal of Geographical Information Science, 28(7), 1336–1354. doi:10.1080/13658816.2014.885527.

    Article  Google Scholar 

  • Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., & Chica-Rivas, M. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71, 804–818.

    Article  Google Scholar 

  • Schill, W., Jöckel, K.-H., Drescher, K., & Timm, J. (1993). Logistic analysis in case-control studies under validation sampling. Biometrika, 80, 339–352.

  • Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., & Feuston, B. P. (2003). Random forest: A classification and regression tool for compound classification and qsar modeling. Journal of Chemical Information and Computer Sciences, 43(6), 1947–1958.

    Article  Google Scholar 

  • Zhang, Z., Zuo, R., & Xiong, Y. (2016). A comparative study of fuzzy weights of evidence and random forests for mapping mineral prospectivity for skarn-type fe deposits in the southwestern fujian metallogenic belt, china. Science China Earth Sciences, 59(3), 556–572.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the two anonymous reviewers for their insightful comments and suggestions which we believe has improved the overall technical quality of the paper. We also thank the editors of Natural Resources Research for suggesting edits to the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alok Porwal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hariharan, S., Tirodkar, S., Porwal, A. et al. Random Forest-Based Prospectivity Modelling of Greenfield Terrains Using Sparse Deposit Data: An Example from the Tanami Region, Western Australia. Nat Resour Res 26, 489–507 (2017). https://doi.org/10.1007/s11053-017-9335-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11053-017-9335-6

Keywords

Navigation