Natural Resources Research

, Volume 26, Issue 4, pp 489–507 | Cite as

Random Forest-Based Prospectivity Modelling of Greenfield Terrains Using Sparse Deposit Data: An Example from the Tanami Region, Western Australia

  • Siddharth Hariharan
  • Siddhesh Tirodkar
  • Alok Porwal
  • Avik Bhattacharya
  • Aurore Joly
Original Paper


Data-driven prospectivity modelling of greenfields terrains is challenging because very few deposits are available and the training data are overwhelmingly dominated by non-deposit samples. This could lead to biased estimates of model parameters. In the present study involving Random Forest (RF)-based gold prospectivity modelling of the Tanami region, a greenfields terrain in Western Australia, we apply the Synthetic Minority Over-sampling Technique to modify the initial dataset and bring the deposit-to-non-deposit ratio closer to 50:50. An optimal threshold range is determined objectively using statistical measures such as the data sensitivity, specificity, kappa and per cent correctly classified. The RF regression modelling with the modified dataset of close to 50:50 sample ratio of deposit to non-deposit delineates 4.67% of the study area as high prospectivity areas as compared to only 1.06% by the original dataset, implying that the original “sparse” dataset underestimates prospectivity.


Random forest Mineral prospectivity Threshold Mapping Modelling 



The authors would like to thank the two anonymous reviewers for their insightful comments and suggestions which we believe has improved the overall technical quality of the paper. We also thank the editors of Natural Resources Research for suggesting edits to the manuscript.


  1. Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6(1), 20–29.CrossRefGoogle Scholar
  2. Bean, W. T., Stafford, R., & Brashares, J. S. (2012). The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models. Ecography, 35(3), 250–258.CrossRefGoogle Scholar
  3. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi: 10.1023/A%3A1010933404324.CrossRefGoogle Scholar
  4. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees, the wadsworth statistics and probability series (p. 356). Belmont California: Wadsworth International Group.Google Scholar
  5. Breslow, N., & Cain, K. (1988). Logistic regression for two-stage case-control data. Biometrika, 75, 11–20.Google Scholar
  6. Cantor, S. B., Sun, C. C., Tortolero-Luna, G., Richards-Kortum, R., & Follen, M. (1999). A comparison of c/b ratios from studies using receiver operating characteristic curve analysis. Journal of Clinical Epidemiology, 52(9), 885–892.
  7. Carranza, E. J. M., & Laborte, A. G. (2015a). Data-driven predictive mapping of gold prospectivity, baguio district, philippines: Application of random forests algorithm. Ore Geology Reviews, 71, 777–787.CrossRefGoogle Scholar
  8. Carranza, E. J. M., & Laborte, A. G. (2015b). Random forest predictive modeling of mineral prospectivity with small number of prospects and data with missing values in abra (philippines). Computers & Geosciences, 74, 60–70.CrossRefGoogle Scholar
  9. Carranza, E. J. M., & Laborte, A. G. (2016). Data-driven predictive modeling of mineral prospectivity using random forests: A case study in catanduanes island (philippines). Natural Resources Research, 25(1), 35–50.CrossRefGoogle Scholar
  10. Carranza, E. J. M., Sadeghi, M., & Billay, A. (2015). Predictive mapping of prospectivity for orogenic gold, giyani greenstone belt (south africa). Ore Geology Reviews, 71, 703–718.CrossRefGoogle Scholar
  11. Champion, D., Budd, A., & Wyborn, L. (2007). Ozchem national whole rock geochemistry database. Geoscience Australia.
  12. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.Google Scholar
  13. Core Team, R. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
  14. Cracknell, M. J., Reading, A. M., & McNeill, A. W. (2014). Mapping geology and volcanic-hosted massive sulfide alteration in the Hellyer–Mt Charter region, tasmania, using random forests and self-organising maps. Australian Journal of Earth Sciences, 61(2), 287–304. doi: 10.1080/08120099.2014.858081.CrossRefGoogle Scholar
  15. Cushman, S. A., Shirk, A. J., & Landguth, E. L. (2013). Landscape genetics and limiting factors. Conservation Genetics, 14(2), 263–274.CrossRefGoogle Scholar
  16. Gao, Y., Zhang, Z., Xiong, Y., & Zuo, R. (2016). Mapping mineral prospectivity for cu polymetallic mineralization in Southwest Fujian province, China. Ore Geology Reviews, 75, 16–28.CrossRefGoogle Scholar
  17. Geoscience Australia. (2008). Geological survey of western australia. West Tanami, 2008: Western Australia Geological Survey, 1:100 000 Geological Information Series, ISBN 978-1-74168-186-4.Google Scholar
  18. Geoscience Australia. (2010). Geological survey of western australia, geochem database, geological survey of western australia.
  19. Gislason, P. O., Benediktsson, J. A., & Sveinsson, J. R. (2006). Random forests for land cover classification. Pattern Recognition Letters, 27(4), 294–300. Pattern Recognition in Remote Sensing (PRRS 2004).
  20. Goleby, B. R., Huston, D. L., Lyons, P., Vandenberg, L., Bagas, L., Davies, B. M., et al. (2009). The tanami deep seismic reflection experiment: An insight into gold mineralization and paleoproterozoic collision in the north australian craton. Tectonophysics, 472(1), 169–182.CrossRefGoogle Scholar
  21. Grömping, U. (2009). Variable importance assessment in regression: Linear regression versus random forest. The American Statistician, 63(4), 308–319.Google Scholar
  22. Guisan, A., Theurillat, J.-P., & Kienast, F. (1998). Predicting the potential distribution of plant species in an alpine environment. Journal of Vegetation Science, 9(1), 65–74.CrossRefGoogle Scholar
  23. Hariharan, S., Tirodkar, S., & Bhattacharya, A. (2016). Polarimetric sar decomposition parameter subset selection and their optimal dynamic range evaluation for urban area classification using random forest. International Journal of Applied Earth Observation and Geoinformation, 44, 144–158.CrossRefGoogle Scholar
  24. Harris, J., Grunsky, E., Behnia, P., & Corrigan, D. (2015). Data-and knowledge-driven mineral prospectivity maps for canada’s north. Ore Geology Reviews, 71, 788–803.CrossRefGoogle Scholar
  25. Harris, J., Wilkinson, L., Heather, K., Fumerton, S., Bernier, M., Ayer, J., et al. (2001). Application of gis processing techniques for producing mineral prospectivity mapsa case study: Mesothermal au in the swayze greenstone belt, ontario, canada. Natural Resources Research, 10(2), 91–124.CrossRefGoogle Scholar
  26. He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.CrossRefGoogle Scholar
  27. Hocking, R., & Leslie, R. (1967). Selection of the best subset in regression analysis. Technometrics, 9(4), 531–540.CrossRefGoogle Scholar
  28. Huntley, B., Berry, P. M., Cramer, W., & McDonald, A. P. (1995). Special paper: Modelling present and potential future ranges of some european higher plants using climate response surfaces. Journal of Biogeography, 22, 967–1001.Google Scholar
  29. Jiménez-Valverde, A., & Lobo, J. M. (2007). Threshold criteria for conversion of probability of species presence to either-or presence-absence. Acta Oecologica, 31(3), 361–369.CrossRefGoogle Scholar
  30. Joly, A., McCuaig, T. C., & Bagas, L. (2010). The importance of early crustal architecture for subsequent basin-forming, magmatic and fluid flow events. the granites-tanami orogen example. Precambrian Research, 182(1), 15–29.CrossRefGoogle Scholar
  31. Joly, A., Porwal, A., & McCuaig, T. C. (2012). Exploration targeting for orogenic gold deposits in the granites-tanami orogen: Mineral system analysis, targeting model and prospectivity analysis. Ore Geology Reviews, 48, 349–383.CrossRefGoogle Scholar
  32. King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163.Google Scholar
  33. Koskas, M., Genin, A. S., Graesslin, O., Barranger, E., Haddad, B., Darai, E., et al. (2014). Evaluation of a method of predicting lymph node metastasis in endometrial cancer based on five pre-operative characteristics. European Journal of Obstetrics & Gynecology and Reproductive Biology, 172, 115–119.CrossRefGoogle Scholar
  34. Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22.Google Scholar
  35. Lieberman, M. D., & Cunningham, W. A. (2009). Type i and type ii error concerns in fmri research: Re-balancing the scale. Social Cognitive and Affective Neuroscience, 4(4), 423.CrossRefGoogle Scholar
  36. Liu, C., Berry, P. M., Dawson, T. P., & Pearson, R. G. (2005). Selecting thresholds of occurrence in the prediction of species distributions. Ecography, 28(3), 385–393. doi: 10.1111/j.0906-7590.2005.03957.x.CrossRefGoogle Scholar
  37. Lobo, J. M., Jimnez-Valverde, A., & Real, R. (2008). Auc: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography, 17(2), 145–151. doi: 10.1111/j.1466-8238.2007.00358.x.CrossRefGoogle Scholar
  38. Maloney, K. O., Weller, D. E., Michaelson, D. E., & Ciccotto, P. J. (2013). Species distribution models of freshwater stream fishes in maryland and their implications for management. Environmental Modeling & Assessment, 18(1), 1–12.CrossRefGoogle Scholar
  39. Manel, S., Williams, H. C., & Ormerod, S. (2001). Evaluating presenceabsence models in ecology: The need to account for prevalence. Journal of Applied Ecology, 38(5), 921–931. doi: 10.1046/j.1365-2664.2001.00647.x.CrossRefGoogle Scholar
  40. McCoy, J., Johnston, K., & Environmental Systems Research Institute. (2001). Using ArcGIS spatial analyst: GIS by ESRI. Redlands, CA: Environmental Systems Research Institute.Google Scholar
  41. McKay, G., & Harris, J. (2016). Comparison of the data-driven random forests model and a knowledge-driven method for mineral prospectivity mapping: a case study for gold deposits around the huritz group and nueltin suite, nunavut, canada. Natural Resources Research, 25(2), 125–143.CrossRefGoogle Scholar
  42. Nagelkerke, N. J. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691–692.CrossRefGoogle Scholar
  43. Ok, A. O., Akar, O., & Gungor, O. (2012). Evaluation of random forest method for agricultural crop classification. European Journal of Remote Sensing, 45(3), 421.CrossRefGoogle Scholar
  44. Porwal, A., & Carranza, E. J. M. (2015). Introduction to the special issue: Gis-based mineral potential modelling and geological data analyses for mineral exploration. Ore Geology Reviews, 71, 477–483.CrossRefGoogle Scholar
  45. Rodriguez-Galiano, V., Chica-Olmo, M., & Chica-Rivas, M. (2014). Predictive modelling of gold potential with the integration of multisource information based on random forest: a case study on the rodalquilar area, southern spain. International Journal of Geographical Information Science, 28(7), 1336–1354. doi: 10.1080/13658816.2014.885527.CrossRefGoogle Scholar
  46. Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., & Chica-Rivas, M. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71, 804–818.CrossRefGoogle Scholar
  47. Schill, W., Jöckel, K.-H., Drescher, K., & Timm, J. (1993). Logistic analysis in case-control studies under validation sampling. Biometrika, 80, 339–352.Google Scholar
  48. Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., & Feuston, B. P. (2003). Random forest: A classification and regression tool for compound classification and qsar modeling. Journal of Chemical Information and Computer Sciences, 43(6), 1947–1958.CrossRefGoogle Scholar
  49. Zhang, Z., Zuo, R., & Xiong, Y. (2016). A comparative study of fuzzy weights of evidence and random forests for mapping mineral prospectivity for skarn-type fe deposits in the southwestern fujian metallogenic belt, china. Science China Earth Sciences, 59(3), 556–572.CrossRefGoogle Scholar

Copyright information

© International Association for Mathematical Geosciences 2017

Authors and Affiliations

  • Siddharth Hariharan
    • 1
  • Siddhesh Tirodkar
    • 2
  • Alok Porwal
    • 1
    • 3
  • Avik Bhattacharya
    • 1
  • Aurore Joly
    • 4
  1. 1.Centre of Studies in Resources EngineeringIndian Institute of Technology BombayMumbaiIndia
  2. 2.Climate StudiesIndian Institute of Technology BombayMumbaiIndia
  3. 3.Centre for Exploration TargetingUniversity of Western AustraliaCrawleyAustralia
  4. 4.Aurora Australis GeoconsultingSubiacoAustralia

Personalised recommendations