Random Forest-Based Prospectivity Modelling of Greenfield Terrains Using Sparse Deposit Data: An Example from the Tanami Region, Western Australia

Hariharan, Siddharth; Tirodkar, Siddhesh; Porwal, Alok; Bhattacharya, Avik; Joly, Aurore

doi:10.1007/s11053-017-9335-6

Random Forest-Based Prospectivity Modelling of Greenfield Terrains Using Sparse Deposit Data: An Example from the Tanami Region, Western Australia

Original Paper
Published: 19 April 2017

Volume 26, pages 489–507, (2017)
Cite this article

Natural Resources Research Aims and scope Submit manuscript

Siddharth Hariharan¹,
Siddhesh Tirodkar²,
Alok Porwal^1,3,
Avik Bhattacharya¹ &
…
Aurore Joly⁴

1276 Accesses
52 Citations
Explore all metrics

Abstract

Data-driven prospectivity modelling of greenfields terrains is challenging because very few deposits are available and the training data are overwhelmingly dominated by non-deposit samples. This could lead to biased estimates of model parameters. In the present study involving Random Forest (RF)-based gold prospectivity modelling of the Tanami region, a greenfields terrain in Western Australia, we apply the Synthetic Minority Over-sampling Technique to modify the initial dataset and bring the deposit-to-non-deposit ratio closer to 50:50. An optimal threshold range is determined objectively using statistical measures such as the data sensitivity, specificity, kappa and per cent correctly classified. The RF regression modelling with the modified dataset of close to 50:50 sample ratio of deposit to non-deposit delineates 4.67% of the study area as high prospectivity areas as compared to only 1.06% by the original dataset, implying that the original “sparse” dataset underestimates prospectivity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Influence of sampling design on landslide susceptibility modeling in lithologically heterogeneous areas

Article Open access 08 February 2022

3D Mineral Prospectivity Mapping with Random Forests: A Case Study of Tongling, Anhui, China

Article 23 October 2019

Land Subsidence Susceptibility Mapping Using Machine Learning in the Google Earth Engine Platform

References

Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6(1), 20–29.
Article Google Scholar
Bean, W. T., Stafford, R., & Brashares, J. S. (2012). The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models. Ecography, 35(3), 250–258.
Article Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi:10.1023/A%3A1010933404324.
Article Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees, the wadsworth statistics and probability series (p. 356). Belmont California: Wadsworth International Group.
Breslow, N., & Cain, K. (1988). Logistic regression for two-stage case-control data. Biometrika, 75, 11–20.
Cantor, S. B., Sun, C. C., Tortolero-Luna, G., Richards-Kortum, R., & Follen, M. (1999). A comparison of c/b ratios from studies using receiver operating characteristic curve analysis. Journal of Clinical Epidemiology, 52(9), 885–892. http://www.sciencedirect.com/science/article/pii/S089543569900075X
Carranza, E. J. M., & Laborte, A. G. (2015a). Data-driven predictive mapping of gold prospectivity, baguio district, philippines: Application of random forests algorithm. Ore Geology Reviews, 71, 777–787.
Article Google Scholar
Carranza, E. J. M., & Laborte, A. G. (2015b). Random forest predictive modeling of mineral prospectivity with small number of prospects and data with missing values in abra (philippines). Computers & Geosciences, 74, 60–70.
Article Google Scholar
Carranza, E. J. M., & Laborte, A. G. (2016). Data-driven predictive modeling of mineral prospectivity using random forests: A case study in catanduanes island (philippines). Natural Resources Research, 25(1), 35–50.
Article Google Scholar
Carranza, E. J. M., Sadeghi, M., & Billay, A. (2015). Predictive mapping of prospectivity for orogenic gold, giyani greenstone belt (south africa). Ore Geology Reviews, 71, 703–718.
Article Google Scholar
Champion, D., Budd, A., & Wyborn, L. (2007). Ozchem national whole rock geochemistry database. Geoscience Australia. http://www.ga.gov.au/metadata-gateway/metadata/record/gcat_65464
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Core Team, R. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN 3-900051-07-0. http://www.R-project.org/
Cracknell, M. J., Reading, A. M., & McNeill, A. W. (2014). Mapping geology and volcanic-hosted massive sulfide alteration in the Hellyer–Mt Charter region, tasmania, using random forests and self-organising maps. Australian Journal of Earth Sciences, 61(2), 287–304. doi:10.1080/08120099.2014.858081.
Article Google Scholar
Cushman, S. A., Shirk, A. J., & Landguth, E. L. (2013). Landscape genetics and limiting factors. Conservation Genetics, 14(2), 263–274.
Article Google Scholar
Gao, Y., Zhang, Z., Xiong, Y., & Zuo, R. (2016). Mapping mineral prospectivity for cu polymetallic mineralization in Southwest Fujian province, China. Ore Geology Reviews, 75, 16–28.
Article Google Scholar
Geoscience Australia. (2008). Geological survey of western australia. West Tanami, 2008: Western Australia Geological Survey, 1:100 000 Geological Information Series, ISBN 978-1-74168-186-4.
Geoscience Australia. (2010). Geological survey of western australia, geochem database, geological survey of western australia. http://geochem.dmp.wa.gov.au/geochem/
Gislason, P. O., Benediktsson, J. A., & Sveinsson, J. R. (2006). Random forests for land cover classification. Pattern Recognition Letters, 27(4), 294–300. Pattern Recognition in Remote Sensing (PRRS 2004). http://www.sciencedirect.com/science/article/pii/S0167865505002242
Goleby, B. R., Huston, D. L., Lyons, P., Vandenberg, L., Bagas, L., Davies, B. M., et al. (2009). The tanami deep seismic reflection experiment: An insight into gold mineralization and paleoproterozoic collision in the north australian craton. Tectonophysics, 472(1), 169–182.
Article Google Scholar
Grömping, U. (2009). Variable importance assessment in regression: Linear regression versus random forest. The American Statistician, 63(4), 308–319.
Guisan, A., Theurillat, J.-P., & Kienast, F. (1998). Predicting the potential distribution of plant species in an alpine environment. Journal of Vegetation Science, 9(1), 65–74.
Article Google Scholar
Hariharan, S., Tirodkar, S., & Bhattacharya, A. (2016). Polarimetric sar decomposition parameter subset selection and their optimal dynamic range evaluation for urban area classification using random forest. International Journal of Applied Earth Observation and Geoinformation, 44, 144–158.
Article Google Scholar
Harris, J., Grunsky, E., Behnia, P., & Corrigan, D. (2015). Data-and knowledge-driven mineral prospectivity maps for canada’s north. Ore Geology Reviews, 71, 788–803.
Article Google Scholar
Harris, J., Wilkinson, L., Heather, K., Fumerton, S., Bernier, M., Ayer, J., et al. (2001). Application of gis processing techniques for producing mineral prospectivity mapsa case study: Mesothermal au in the swayze greenstone belt, ontario, canada. Natural Resources Research, 10(2), 91–124.
Article Google Scholar
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
Article Google Scholar
Hocking, R., & Leslie, R. (1967). Selection of the best subset in regression analysis. Technometrics, 9(4), 531–540.
Article Google Scholar
Huntley, B., Berry, P. M., Cramer, W., & McDonald, A. P. (1995). Special paper: Modelling present and potential future ranges of some european higher plants using climate response surfaces. Journal of Biogeography, 22, 967–1001.
Jiménez-Valverde, A., & Lobo, J. M. (2007). Threshold criteria for conversion of probability of species presence to either-or presence-absence. Acta Oecologica, 31(3), 361–369.
Article Google Scholar
Joly, A., McCuaig, T. C., & Bagas, L. (2010). The importance of early crustal architecture for subsequent basin-forming, magmatic and fluid flow events. the granites-tanami orogen example. Precambrian Research, 182(1), 15–29.
Article Google Scholar
Joly, A., Porwal, A., & McCuaig, T. C. (2012). Exploration targeting for orogenic gold deposits in the granites-tanami orogen: Mineral system analysis, targeting model and prospectivity analysis. Ore Geology Reviews, 48, 349–383.
Article Google Scholar
King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163.
Koskas, M., Genin, A. S., Graesslin, O., Barranger, E., Haddad, B., Darai, E., et al. (2014). Evaluation of a method of predicting lymph node metastasis in endometrial cancer based on five pre-operative characteristics. European Journal of Obstetrics & Gynecology and Reproductive Biology, 172, 115–119.
Article Google Scholar
Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22.
Google Scholar
Lieberman, M. D., & Cunningham, W. A. (2009). Type i and type ii error concerns in fmri research: Re-balancing the scale. Social Cognitive and Affective Neuroscience, 4(4), 423.
Article Google Scholar
Liu, C., Berry, P. M., Dawson, T. P., & Pearson, R. G. (2005). Selecting thresholds of occurrence in the prediction of species distributions. Ecography, 28(3), 385–393. doi:10.1111/j.0906-7590.2005.03957.x.
Article Google Scholar
Lobo, J. M., Jimnez-Valverde, A., & Real, R. (2008). Auc: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography, 17(2), 145–151. doi:10.1111/j.1466-8238.2007.00358.x.
Article Google Scholar
Maloney, K. O., Weller, D. E., Michaelson, D. E., & Ciccotto, P. J. (2013). Species distribution models of freshwater stream fishes in maryland and their implications for management. Environmental Modeling & Assessment, 18(1), 1–12.
Article Google Scholar
Manel, S., Williams, H. C., & Ormerod, S. (2001). Evaluating presenceabsence models in ecology: The need to account for prevalence. Journal of Applied Ecology, 38(5), 921–931. doi:10.1046/j.1365-2664.2001.00647.x.
Article Google Scholar
McCoy, J., Johnston, K., & Environmental Systems Research Institute. (2001). Using ArcGIS spatial analyst: GIS by ESRI. Redlands, CA: Environmental Systems Research Institute.
McKay, G., & Harris, J. (2016). Comparison of the data-driven random forests model and a knowledge-driven method for mineral prospectivity mapping: a case study for gold deposits around the huritz group and nueltin suite, nunavut, canada. Natural Resources Research, 25(2), 125–143.
Article Google Scholar
Nagelkerke, N. J. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691–692.
Article Google Scholar
Ok, A. O., Akar, O., & Gungor, O. (2012). Evaluation of random forest method for agricultural crop classification. European Journal of Remote Sensing, 45(3), 421.
Article Google Scholar
Porwal, A., & Carranza, E. J. M. (2015). Introduction to the special issue: Gis-based mineral potential modelling and geological data analyses for mineral exploration. Ore Geology Reviews, 71, 477–483.
Article Google Scholar
Rodriguez-Galiano, V., Chica-Olmo, M., & Chica-Rivas, M. (2014). Predictive modelling of gold potential with the integration of multisource information based on random forest: a case study on the rodalquilar area, southern spain. International Journal of Geographical Information Science, 28(7), 1336–1354. doi:10.1080/13658816.2014.885527.
Article Google Scholar
Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., & Chica-Rivas, M. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71, 804–818.
Article Google Scholar
Schill, W., Jöckel, K.-H., Drescher, K., & Timm, J. (1993). Logistic analysis in case-control studies under validation sampling. Biometrika, 80, 339–352.
Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., & Feuston, B. P. (2003). Random forest: A classification and regression tool for compound classification and qsar modeling. Journal of Chemical Information and Computer Sciences, 43(6), 1947–1958.
Article Google Scholar
Zhang, Z., Zuo, R., & Xiong, Y. (2016). A comparative study of fuzzy weights of evidence and random forests for mapping mineral prospectivity for skarn-type fe deposits in the southwestern fujian metallogenic belt, china. Science China Earth Sciences, 59(3), 556–572.
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the two anonymous reviewers for their insightful comments and suggestions which we believe has improved the overall technical quality of the paper. We also thank the editors of Natural Resources Research for suggesting edits to the manuscript.

Author information

Authors and Affiliations

Centre of Studies in Resources Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
Siddharth Hariharan, Alok Porwal & Avik Bhattacharya
Climate Studies, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
Siddhesh Tirodkar
Centre for Exploration Targeting, University of Western Australia, Crawley, WA, 6009, Australia
Alok Porwal
Aurora Australis Geoconsulting, 363, Haystreet, Subiaco, WA, 6008, Australia
Aurore Joly

Authors

Siddharth Hariharan
View author publications
You can also search for this author in PubMed Google Scholar
Siddhesh Tirodkar
View author publications
You can also search for this author in PubMed Google Scholar
Alok Porwal
View author publications
You can also search for this author in PubMed Google Scholar
Avik Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Aurore Joly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alok Porwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hariharan, S., Tirodkar, S., Porwal, A. et al. Random Forest-Based Prospectivity Modelling of Greenfield Terrains Using Sparse Deposit Data: An Example from the Tanami Region, Western Australia. Nat Resour Res 26, 489–507 (2017). https://doi.org/10.1007/s11053-017-9335-6

Download citation

Received: 14 December 2016
Accepted: 01 April 2017
Published: 19 April 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11053-017-9335-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random Forest-Based Prospectivity Modelling of Greenfield Terrains Using Sparse Deposit Data: An Example from the Tanami Region, Western Australia

Abstract

Access this article

Similar content being viewed by others

Influence of sampling design on landslide susceptibility modeling in lithologically heterogeneous areas

3D Mineral Prospectivity Mapping with Random Forests: A Case Study of Tongling, Anhui, China

Land Subsidence Susceptibility Mapping Using Machine Learning in the Google Earth Engine Platform

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Random Forest-Based Prospectivity Modelling of Greenfield Terrains Using Sparse Deposit Data: An Example from the Tanami Region, Western Australia

Abstract

Access this article

Similar content being viewed by others

Influence of sampling design on landslide susceptibility modeling in lithologically heterogeneous areas

3D Mineral Prospectivity Mapping with Random Forests: A Case Study of Tongling, Anhui, China

Land Subsidence Susceptibility Mapping Using Machine Learning in the Google Earth Engine Platform

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation