Abstract
Recent studies have shown that novel genetic variation for resistance to pests and diseases can be detected in plant genetic resources originating from locations with an environmental profile similar to the collection sites of a reference set of accessions with known resistance, based on the Focused Identification of Germplasm Strategy (FIGS) approach. FIGS combines both the development of a priori information based on the quantification of the trait-environment relationship and the use of this information to define a best bet subset of accessions with a higher probability of containing new variation for the sought after trait(s). The present study investigates the development strategy of the a priori information using different modeling techniques including learning-based techniques as a follow up to previous work where parametric approaches were used to quantify the stem rust resistance and climate variables relationship. The results show that the predictive power, derived from the accuracy parameters and cross-validation, varies depending on whether the models are based on linear or non-linear approaches. The prediction based on learning techniques are relatively higher indicating that the non-linear approaches, in particular support vector machine and neural networks, outperform both principal component logistic regression and generalized partial least squares. Overall there are indications that the trait distribution of resistance to stem rust is confined to certain environments or areas, whereas the susceptible types appear to be limited to other areas with some degree of overlapping of the two classes. The results also point to a number of issues to consider for improving the predictive performance of the models.
Similar content being viewed by others
Abbreviations
- AUC:
-
Area under the ROC curve
- GPLS:
-
Generalized partial least squares
- GIS:
-
Geographic information systems
- NN:
-
Neural networks
- PCA:
-
Principal component analysis
- PCLR:
-
Principal component logistic regression
- PLS:
-
Partial least squares
- RF:
-
Random forest
- ROC:
-
Receiver operating characteristics
- SVM:
-
Support vector machine
References
Abdi H (2010) Partial least squares regression and projection on latent structure regression (PLS regression). Wiley Interdiscip Rev Comput Stat 2(1):97–106. doi:10.1002/wics.51
Aguilera AM, Escabias M, Valderrama MJ (2006) Using principal components for estimating logistic regression with high-dimensional multicollinear data. Comput Stat Data Anal 50:1905–1924
Arif S, Adams DC, Wicknick JA (2007) Bioclimatic modeling, morphology, and behavior reveal alternative mechanisms regulating the distributions of two parapatric salamander species. Evol Ecol Res 9:843–854
Barboni D, Harrison SP, Bartlein PJ, Jalut G, New M, Prentice IC, Sanchez-Goñi M-F, Spessa A, Davis B, Stevenson AC (2004) Relationships between plant traits and climate in the Mediterranean region: a pollen data analysis. J Veg Sci 15:635–646
Bari A, Martin A, Boulouha B, Barranco D, Gonzalez-Andujar JL, Trujillo I, Ayad G (2003) Image feature extraction combined with a neural networks approach for the identification of olive cultivars. In: Proceeding of the 3rd IASTED international conference on visualization, imaging and image processing, pp 613–620. ACTA Press
Bastien P, Vinzi VE, Tenenhaus M (2005) PLS generalized linear regression. Comput Stat Data Anal 48(1):17–46
Belsley DA (1991) A guide to using the collinearity diagnostics. Comput Sci Econ Manag 4:33–50
Bhullar NK, Zhang Z, Wicker T, Keller B (2009) Wheat gene bank accessions as a source of new alleles of the powdery mildew resistance gene Pm3: a large scale allele mining project. BMC Plant Biol 10:88. doi:10.1186/1471-2229-10-88
Bonman JM, Bockelman HE, Jackson LF, Steffenson BJ (2005) Disease and insect resistance in cultivated barley accessions from the USDA national small grains collection. Crop Sci 45:1271–1280
Bonman JM, Bockelman HE, Jin Y, Hijmans RJ, Gironella A (2007) Geographic distribution of stem rust resistance in wheat landraces. Crop Sci 47:1955–1963
Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc Series B (Methodol) 26(2):211–252
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Brown AHD, Spillane C (1999) Implementing core collections principles, procedures, progress, problems and promise. In: Johnson RC, Hodgkin T (eds) Core collections for today and tomorrow. International Plant Genetic Resources Institute, Rome, pp 1–9
Chuine I (2010) Why does phenology drive species distribution? Phil Trans R Soc B 365:3149–3160
CIMMYT (2005) Sounding the alarm on global stem rust. An Assessment of race ug99 in Kenya and Ethiopia and the potential for impact in neighbouring regions and beyond. Resource Document. Accessed 17 Feb 2011. http://www.globalrust.org/db/attachments/about/2/1/Sounding%20the%20Alarm%20on%20Global%20Stem%20Rust.pdf
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. doi:10.1007/BF00994018
Cushman SA, McGarigal K (2004) Patterns in the species-environment relationship depend on both scale and choice of response variable. Oikos 105:117–124
Cutler DR, Edwards TC Jr., Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random Forests for classification in ecology. Ecology 88:2783–2792
De Pauw E, Goebel W, Adam H (2000) Agrometeorological aspects of agriculture and forestry in the arid zones. Agric Forest Meteorol 103:43–58
Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A (2010) R library (e1071). The R foundation for statistical computing. ISBN: 3-900051-07-0
Ding BY, Gentleman R (2005) Classification using generalized partial least squares. J Comput Graphical Stat 14(2):280–298
Dinoor A (1975) Evaluation of sources of resistance. In: Frankel OH, Hawkes JD (eds) Crop genetic resources for today and tomorrow. Cambridge University Press, Cambridge, pp 201–210
Drake JM, Randin C, Guisan A (2006) Modelling ecological niches with support vector machines. J Appl Ecol 43:424–432
Dwivedi SL, Crouch JH, Mackill DJ, Xu Y, Blair MW, Ragot M, Upadhyaya HD, Ortiz R (2007) The molecularization of public sector crop breeding: progress, problems, and prospects. Adv Agron 95:163–318
Eckardt NA (2001) Functional evolutionary genetics and plant adaptation linking phenotype and genotype. Plant Cell 13(6):1249–1254
El-Bouhssini M, Street K, Joubi A, Ibrahim Z, Rihawi F (2009) Sources of wheat resistance to Sunn pest, Eurygaster integriceps Puton, in Syria. Genet Resour Crop Evol 56(8):1065–1069
El-Bouhssini M, Street K, Amri A, Mackay M, Ogbonnaya FC, Omran A, Abdalla O, Baum M, Dabbous A, Rihawi F (2010) Sources of resistance in bread wheat to Russian wheat aphid (Diuraphis noxia) in Syria identified using the focused identification of germplasm strategy (FIGS). Plant Breed 130:96–97
Endresen DTF (2010) Predictive association between trait data and ecogeographic data for Nordic barley landraces. Crop Sci 50(6):2418–2430. doi:10.2135/cropsci2010.03.0174
Endresen DTF, Street K, Mackay M, Bari A, De Pauw E (2011) Predictive association between biotic stress traits and ecogeographic data for wheat and barley landraces. Crop Sci 51:2036–2055
Epperson BK (1990) Spatial autocorrelation of genotypes under directional selection. Genetics 124(3):757–771
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874. doi:10.1016/j.patrec.2005.10.010
Feelders AJ (1999) Statistical concepts. In: Berthold M, Hand DJ (eds) Intelligent data analysis: an Introduction. Springer, Berlin, pp 15–66
Fehser S, Beike U, Stoveken J, Pretorius ZA, Van der Westhuizen A, Moersbacher B (2010) Histological and initial molecular analysis of Ug99, the new Sr31-breaking race of the wheat stem rust fungus threatening global wheat production. J Plant Pathology 92(3):709–720
Freeman EA, Moisen GG (2008) A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa. Ecol Model 217:48–58
Gepts P (2006) Plant genetic resources conservation and utilization: the accomplishments and future of a societal insurance policy. Crop Sci 46:2278–2292
Gesch DB, Larson KS (1996) Techniques for development of global 1-kilometer digital elevation models. On-line document: http://edcdaac.usgs.gov/gtopo30/README.html
Golden RM (1996) Mathematical methods for neural network analysis and design. Massachusetts Institute of Technology, Cambridge, MA
Gollin D, Smale M, Skovmand B (2000) Searching an ex situ collection of wheat genetic resources. Am J Agric Econ 82(4):812–827
Guo Q, Kelly M, Graham CH (2004) Support vector machines for predicting distribution of Sudden oak death in California. Ecol Model 182(1):75–90
Hakes AS, Cronin JT (2011) Environmental heterogeneity and spatiotemporal variability in plant defense traits. Oikos 120:452–462. doi:10.1111/j.1600-0706.2010.18679.x
Hanspach J, Kühn I, Pompe S, Klotz S (2010) Predictive performance of plant species distribution models depends on species traits. Perspect Plant Ecol Evol Syst 12(3):219–225. doi:10.1016/j.ppees.2010.04.002
Hernandez PA, Graham CH, Master LL, Albert DL (2006) The effect of sample size and species characteristics on performance of different species distribution modeling methods. Ecography 29:773–785
Hodson D, DePauw E (2011) Use of GIS applications to combat the threat of emerging virulent wheat stem rust races. In: Sharon A (ed) GIS applications in agriculture, vol 3. Clay CRC Press, Boca Raton, pp 129–157
Hutchinson MF (1995) Interpolating mean rainfall using thin plate smoothing splines. Int J Geogr Inf Syst 9:385–403
Hutchinson MF (2000) ANUSPLIN version 4.1. User Guide. Center for resource and environmental studies. Australian National University, Canberra
Hutchinson MF, Corbett JD (1995) Spatial interpolation of climatic data using thin plate smoothing splines. Co-ordination and harmonisation of databases and software for Agroclimatic applications, FAO Agrometeorology Series 13. FAO, Rome, pp 211–224
Jeschke JM, Strayer DL (2008) Usefulness of bioclimatic models for studying climate change and invasive species. Ann N Y Acad Sci 1134:1–24
Kampichler C, Wieland R, Calmé S, Weissenberger H, Arriaga-Weiss S (2010) Classification in conservation biology: a comparison of five machine-learning methods. Ecol Inform 5(6):441–450
Karatzoglou A, Meyer D, Hornik K (2006) Support vector machines in R. J Stat Softw 15(9)
Kolmer JA (2005) Tracking wheat rust on a continental scale. Curr Opin Plant Biol 8(4):441–449
Koo B, Wright BD (2000) The optimal timing of evaluation of genebank accessions and the effects of biotechnology. Am J Agric Econ 82(4):797–811
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5)
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Leonard KJ, Szabo LJ (2005) Stem rust of small grains and grasses caused by Puccinia graminis. Mol Plant Pathol 6:99–111
MacArthur RH, Wilson EO (1967) The theory of island biogeography. Princeton University Press, Princeton
Mackay MC (1990) Strategic planning for effective evaluation of plant germplasm. In: Srivastava JP, Damania AB (eds) Wheat genetic resources: meeting diverse needs. Wiley, Chichester, pp 21–25
Mackay MC (1995) One core collection or many? In: Hodgkin T, Brown AHD, Van Hintum TJL, Morales AAV (eds) Core collections of plant genetic resources. Wiley, Chichester, pp 199–210
Mackay MC, Street K (2004) Focused identification of germplasm strategy—FIGS. In: Black CK, Panozzo JF, Rebetzke GJ (eds) Proceedings of the 54th Australian cereal chemistry conference and the 11th wheat breeders’ assembly, pp 138–141. Royal Australian Chemical Institute, Melbourne
Malanson GP, Armstrongy MP (1990) Improving environmental simulation models to assess climate change impacts. University of Iowa, Department of Geography discussion paper no. 43, p 35
Mann S, Benwell GL (1995) Geographic information systems in environmental management, AURISA/ 7th colloquium of the Spatial Information Research Centre, pp 295–310, Palmerston North
McIntosh RA, Yamazaki Y, Dubcovsky J, Rogers J, Morris C, Somers DJ, Appels R, Devos KM (2008) Catalogue of gene symbols for wheat. In: Appels R, Eastwood R, Lagudah E, Langridge P, Mackay M, McIntyre L, Sharp P (eds) Proceedings of the 11th international wheat genetics symposium, Brisbane
McIntosh R, Dubcovsky J, Rogers W, Morris C, Appels R, Xia X (2010) Catalogue of gene symbols for wheat: 2010 supplement. http://www.shigen.nig.ac.jp/wheat/komugi/genes/macgene/supplement2010.pdf
Mevik BH, Wehrens R (2006) The pls package: principal component and partial least squares regression. J Stat Softw 18(2):1–24
Osborne JW (2010) Improving your data transformations: applying Box–Cox transformations as a best practice. Pract Assess Res Eval 15(12):1–9
Paillard S, Goldringer I, Enjalbert J, Trottet M, David J, de Vallavieille-Pope C, Brabant P (2000) Evolution of resistance against powdery mildew in winter wheat populations conducted under dynamic management. II. Adult plant resistance. Theoretical Appl Genet 101:457–462
Pakeman R, Leps J, Kleyer M, Lavorel S, Garnier E, VISTA consortium (2009) Relative climatic, edaphic and management controls of plant functional trait signatures. J Veg Sci 20:148–159
Pessoa-Filho M, Rangel PHN, Ferreira ME (2010) Extracting samples of high diversity from thematic collections of large gene banks using a genetic-distance based approach. BMC Plant Biol 10:127
Pohlmann JT, Leitner DW (2003) A comparison of ordinary least squares and logistic regression. Ohio J Sci 103(5):118–125
Polignano GB, Uggenti P, Scippa G (2001) Diversity analysis and core collection formation in Bari faba bean germplasm. FOA/Bioversity PGR Newsl 125:33–38
Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199
Principe JC, Euliano NR, Lefebvre WC (2000) Neural and adaptive systems: fundamentals through simulations. Wiley, New York
Qualset CO (1975) Sampling germplasm in a center of diversity: an example of disease resistance in Ethiopian Barley. In: Frankel H, Hawkes JD (eds) Crop genetic resources today and tomorrow. Cambridge University Press, Cambridge, pp 81–96
R Development Core Team (2011) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna. ISBN: 3-900051-07-0
Scott JM, Heglund PJ, Morrison ML (2002) Predicting species occurrences: issues of accuracy and scale. Island Press, Covelo California
Silipo R (1999) Neural networks. In: Berthold M, Hand DJ (eds) Intelligent data analysis: an Introduction. Springer, Berlin, pp 217–268
Spieth PT (1979) Environmental heterogeneity: a problem of contradictory selection pressures, gene flow, and local polymorphism. Am Nat 113(2):247–260
Spooner DM, Jansky SH, Simon R (2009) Tests of taxonomic and biogeographic predictivity: resistance to disease and insect pests in wild relatives of cultivated potato. Crop Sci 49:1367–1376
Stockwell D (2007) Niche modeling: predictions from statistical distributions. Chapman and Hall, CRC. ISBN: 9781584884941
Street K, Mackay M, Zuev E, Kaur N, El Bouhssini M, Konopka J, Mitrofanova O (2008) Diving into the genepool: a rational system to access specific traits from large germplasm collections. In: Appels R, Eastwood R, Lagudah E, Langridge P, Mackay M (eds) Proceedings of the 11th international wheat genetics symposium, pp 28–31, Brisbane
Strobl C, Malley J, Tutz G (2009) An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol Methods 14(4):323–348
Swets JA, Dawes RM, Monahan J (2000) Better decisions through science. Sci Am 283:82–87
Tait AB, Turner RW (2005) Generating multi-year gridded daily rainfall over. NZ J Appl Meteorol 44:1315–1323
Tautenhahn S, Heilmeier H, Götzenberger L, Klotz S, Wirth C, Kühn I (2008) On the biogeography of seed mass in Germany distribution patterns and environmental correlates. Ecography 31:457–468
Tirelli T, Pozzi L, Pessani D (2009) Use of different approaches to model presence/absence of Salmo marmoratus in Piedmont (Northwestern Italy). Ecol Inform 4:234–242
Tukey JW (1957) On the comparative anatomy of transformations. Ann Math Stat 28(3):602–632
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York
Vurro M, Bonciani B, Vannacci G (2010) Emerging infectious diseases of crop plants in developing countries: impact on agriculture and socio-economic consequences. Food Sec 2:113–132
Warner B, Misra M (1996) Understanding neural networks as statistical tools. Am Stat 50(4):284–293
Webb CT, Hoeting JA, Ames GM, Pyne MI, LeRoy Poff N (2010) A structured and dynamic framework to advance traits-based theory and prediction in ecology. Ecol Lett 13:267–283
Wold S, Ruhe A, Wold H, Dunn WJ (1984) The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J Sci Stat Comp 5:735–743
Wood SN (2000) Modelling and smoothing parameter estimation with multiple quadratic penalties. J R Stat Soc (B) 62(2):413–428
Wratt DS, Tait A, Griffiths G, Espie P, Jessen M, Keys J, Ladd M, Lew D, Lowther W, Mitchell N, Morton J, Reid J, Reid S, Richardson A, Sansom J, Shankar U (2006) Climate for crops: integrating climate data with information about soils and crop requirements to reduce risks in agricultural decision-making. Meteorol Appl 13:305–315
Wu L, Bradshaw AD, Thurman DA (1975) The potential for evolution of heavy metal tolerance in plants. III. The rapid evolution of copper tolerance in Agrostis stolonifera. Heredity 34(2):165–187
Wu Y, Johnson GL, Gomez SM (2008) Data-driven modeling of cellular stimulation, signaling and output response in RAW 264.7 cells. J Mol Signaling 3:11. doi:10.1186/1750-2187-3-11
Xu Y (2010) Plant genetic resources: Management, evaluation and enhancement. In: Molecular plant breeding. CAB International, Wallingford, UK, pp 151–194
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bari, A., Street, K., Mackay, M. et al. Focused identification of germplasm strategy (FIGS) detects wheat stem rust resistance linked to environmental variables. Genet Resour Crop Evol 59, 1465–1481 (2012). https://doi.org/10.1007/s10722-011-9775-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10722-011-9775-5