Abstract
Key message
Our simulation results clarify the areas of applicability of nine prediction methods and suggest the factors that affect their accuracy at predicting empirical traits.
Abstract
Whole-genome prediction is used to predict genetic value from genome-wide markers. The choice of method is important for successful prediction. We compared nine methods using empirical data for eight phenological and morphological traits of Asian rice cultivars (Oryza sativa L.) and data simulated from real marker genotype data. The methods were genomic BLUP (GBLUP), reproducing kernel Hilbert spaces regression (RKHS), Lasso, elastic net, random forest (RForest), Bayesian lasso (Blasso), extended Bayesian lasso (EBlasso), weighted Bayesian shrinkage regression (wBSR), and the average of all methods (Ave). The objectives were to evaluate the predictive ability of these methods in a cultivar population, to characterize them by exploring the area of applicability of each method using simulation, and to investigate the causes of their different accuracies for empirical traits. GBLUP was the most accurate for one trait, RKHS and Ave for two, and RForest for three traits. In the simulation, Blasso, EBlasso, and Ave showed stable performance across the simulated scenarios, whereas the other methods, except wBSR, had specific areas of applicability; wBSR performed poorly in most scenarios. For each method, the accuracy ranking for the empirical traits was largely consistent with that in one of the simulated scenarios, suggesting that the simulation conditions reflected the factors that affected the method accuracy for the empirical results. This study will be useful for genomic prediction not only in Asian rice, but also in populations from other crops with relatively small training sets and strong linkage disequilibrium structures.
Similar content being viewed by others
References
Agarwala V, Flannick J, Sunyaev S, GoT2D Consortium, Altshuler D (2013) Evaluating empirical bounds on complex disease genetic architecture. Nature Genet 45:1418–1429
Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ (2010) Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci 93:743–752
Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, Simianer H, Schon CC (2011) Genome-based prediction of testcross values in maize. Theor Appl Genet 123:339–350
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Breiman L (2001) Random forests. Mach Learn 45:5–32
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
Buhlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin
Bureau A, Dupuis J, Falls K, Lunetta KL, Hayward B, Keith TP, Van Eerdeweg P (2005) Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol 28:171–182
Christensen OF, Lund MS (2010) Genomic prediction when some animals are not genotyped. Genet Sel Evol 42:2
Clark SA, Hickey JM, van der Werf JHJ (2011) Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol 43:18
Crossa J, Campos GL, Perez P, Gianola D, Burgueno J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun HJ (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724
Crossa J, Beyene Y, Kassa S, Perez P, Hickey JM, Chen C, de loss Campos G, Burgueno J, Windhausen VS, Buckler E, Jannink JL, Lopez CMA, Babu R (2013) Genomic prediction in maize breeding populations with genotyping-by-sequencing. G3 (Bethesda) 3:1903–1926
Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA (2010) The impact of genetic architecture on genome-wide evaluation methods. Genetics 185:1021–1031
Daetwyler HD, Calus MP, Pong-Wong R, de los Campos G, Hickey JM (2013) Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 193:347–365
de los Campos G, Gianola D, Rosa GJ, Weigel KA, Crossa J (2010) Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res (Camb) 92:295–308
de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327–345
de los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182:375–385
Dietterich TG (2000) Ensemble methods in machine learning. In: Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 1–15
Donoho D, Stodden V (2006) Breakdown point of model selection when the number of variables exceeds the number of observations. In: Proceedings of the international joint conference on neural networks, pp 1916–1921
Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250–255
Endelman JB, Atlin GN, Beyene Y, Semagn K, Zhang X, Sorrells ME, Jannink JL (2014) Optimal design of preliminary yield trials with genome-wide markers. Crop Sci 54:48–59
Falconer DS (1981) Introduction to quantitative genetics, 2nd edn. Longman Inc., New York
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22
Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R (2009) Additive genetic variability and the Bayesian alphabet. Genetics 183:347–363
Gianola D, van Kaam JB (2008) Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178:2289–2303
Gianola D, Fernando RL, Stella A (2006) Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173:1761–1776
Gonzalez-Camacho JM, de los Campos G, Perez P, Gianola D, Cairns JE, Mahuku G, Babu R, Crossa J (2012) Genome-enabled prediction of genetic values using radial basis function neural networks. Theor Appl Genet 125:759–771
Gonzalez-Recio O, Gianola D, Long N, Weigel KA, Rosa GJ, Avendano S (2008) Nonparametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers. Genetics 178:2305–2313
Habier D, Fernando RL, Dekkers JC (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177:2389–2397
Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinfo 12:186
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York
Hayashi T, Iwata H (2010) EM algorithm for Bayesian estimation of genomic breeding values. BMC Genet 11:3
Hayashi T, Iwata H (2013) A Bayesian method and its variational approximation for prediction of genomic breeding values in multiple traits. BMC Bioinfo 14:34
Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92:433–443
Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49:1–12
Heffner EL, Jannink JL, Sorrells ME (2011) Genomic selection accuracy using multifamily prediction models in a wheat breeding program. Plant Genome 4:65–75
Heslot N, Yang HP, Sorrells ME, Jannink JL (2012) Genomic selection in plant breeding: a comparison of models. Crop Sci 52:146–160
Hickey JM, Gorjanc G (2012) Simulated data for genomic selection and genome-wide association studies using a combination of coalescent and gene drop methods. G3 (Bethesda) 2:425–427
Jannink JL, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomic Proteomic 9:166–177
Jia Y, Jannink JL (2012) Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics 192:1513–1522
Jiang R, Tang W, Wu X, Fu W (2009) A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinfo 10(Suppl 1):S65
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the international joint conference on artificial intelligence (IJCAI) vol 14, pp 1137–1145
Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems 7. MIT Press, Cambridge, pp 231–238
Legarra A, Robert-Granie C, Manfredi E, Elsen JM (2008) Performance of genomic selection in mice. Genetics 180:611–618
Li Z, Sillanpaa MJ (2012a) Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection. Theor Appl Genet 125:419–435
Li Z, Sillanpaa MJ (2012b) Estimation of quantitative trait locus effects with epistasis by variational Bayes algorithms. Genetics 190:231–249
Long N, Gianola D, Rosa GJ, Weigel KA, Kranis A, Gonzalez-Recio O (2010) Radial basis function regression methods for predicting quantitative traits using SNP markers. Genetic Res (Camb) 92:209–225
Long N, Gianola D, Rosa GJ, Weigel KA (2011a) Application of support vector regression to genome-assisted prediction of quantitative traits. Theor Appl Genet 123:1065–1074
Long N, Gianola D, Rosa GJ, Weigel KA (2011b) Dimension reduction and variable selection for genomic selection: application to predicting milk yield in Holsteins. J Anim Breed Genet 128:247–257
Lorenz AJ, Smith KP, Jannink JL (2012) Potential and optimization of genomic selection for Fusarium head blight resistance in six-row barley. Crop Sci 52:1609–1621
Lorenzana RE, Bernardo R (2009) Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Theor Appl Genet 120:151–161
Lund MS, Sahana G, de Koning DJ, Su G, Carlborg O (2009) Comparison of analyses of the QTLMAS XII common dataset. I: Genomic selection. BMC Proc 3 (Suppl 1):S1
Makowsky R, Pajewski NM, Klimentidis YC, Vazquez AI, Duarte CW, Allison DB, de los Campos G (2011) Beyond missing heritability: prediction of complex traits. PLoS Genet 7:e1002051
Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
Mutshinda CM, Sillanpaa MJ (2010) Extended Bayesian LASSO for multiple quantitative trait loci mapping and unobserved phenotype prediction. Genetics 186:1067–1075
Nadaf J, Riggio V, Yu TP, Pong-Wong R (2012) Effect of the prior distribution of SNP effects on the estimation of total breeding value. BMC Proc 6(Suppl 2):S6
Nagasaki H, Ebana K, Shibaya T, Yonemaru JI, Yano M (2010) Core single-nucleotide polymorphisms—a tool for genetic analysis of the Japanese rice population. Breed Sci 60:648–655
Ober U, Erbe M, Long N, Porcu E, Schlather M, Simianer H (2011) Predicting genetic values: a kernel-based best linear unbiased prediction with genomic data. Genetics 188:695–708
Ogutu JO, Piepho HP, Schulz-Streeck T (2011) A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proc 5(Suppl 3):S11
Ogutu JO, Schulz-Streeck T, Piepho HP (2012) Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. BMC Proc 6(Suppl 2):S10
Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103:681–686
R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing. Vienna. ISBN 3-900051-07-0. http://www.R-project.org/
Perez-Rodriguez P, Gianola D, Gonzalez-Camacho JM, Crossa J, Manes Y, Dreisigacker S (2012) Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3 (Bethesda) 2:1595–1605
Riedelsheimer C, Technow F, Melchinger AE (2012) Comparison of whole-genome prediction models for traits with contrasting genetic architecture in a diversity panel of maize inbred lines. BMC Genomics 13:452
Solberg TR, Sonesson AK, Woolliams JA, Meuwissen TH (2009) Reducing dimensionality for prediction of genome-wide breeding values. Genet Sel Evol 41:29
Sun X, Qu L, Garrick DJ, Dekkers JC, Fernando RL (2012) A fast EM algorithm for BayesA-like prediction of genomic breeding values. PLoS One 7:e49157
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288
Usai MG, Goddard ME, Hayes BJ (2009) LASSO with cross-validation for genomic selection. Genetic Res (Camb) 91:427–436
VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423
Wimmer V, Lehermeier C, Albrecht T, Auinger HJ, Wang Y, Schon CC (2013) Genome-wide prediction of traits with different genetic architecture through efficient variable selection. Genetics 195:573–587
Yamamoto T, Nagasaki H, Yonemaru J, Ebana K, Nakajima M, Shibaya T, Yano M (2010) Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms. BMC Genomics 11:267
Yamasaki M, Ideta O (2013) Population structure in Japanese rice population. Breed Sci 63:49–57
Yao C, Spurlock DM, Armentano LE, Page CDJ, Vandehaar MJ, Bickhart DM, Weigel KA (2013) Random forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle. J Dairy Sci 96:6716–6729
Zhang Z, Liu J, Ding X, Bijma P, de Koning DJ, Zhang Q (2010) Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix. PLoS One 5:e12648
Zhao Y, Gowda M, Liu W, Wurschum T, Maurer HP, Longin FH, Ranc N, Reif JC (2012) Accuracy of genomic selection in European maize elite breeding populations. Theor Appl Genet 124:769–776
Zhong S, Dekkers JC, Fernando RL, Jannink JL (2009) Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics 182:355–364
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320
Acknowledgments
This work was supported by a Grant-in-Aid for Japan Society for the Promotion of Science (JSPS) Fellows (26.10661), and also by the FIRST Program of JSPS. We would like to thank the technical staff at the National Agriculture and Food Research Organization Western Region Agricultural Research Center.
Conflict of interests
The authors declare that they have no competing interests.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jose Crossa.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Onogi, A., Ideta, O., Inoshita, Y. et al. Exploring the areas of applicability of whole-genome prediction methods for Asian rice (Oryza sativa L.). Theor Appl Genet 128, 41–53 (2015). https://doi.org/10.1007/s00122-014-2411-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-014-2411-y