Skip to main content
Log in

Exploring the areas of applicability of whole-genome prediction methods for Asian rice (Oryza sativa L.)

  • Original Paper
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Key message

Our simulation results clarify the areas of applicability of nine prediction methods and suggest the factors that affect their accuracy at predicting empirical traits.

Abstract

Whole-genome prediction is used to predict genetic value from genome-wide markers. The choice of method is important for successful prediction. We compared nine methods using empirical data for eight phenological and morphological traits of Asian rice cultivars (Oryza sativa L.) and data simulated from real marker genotype data. The methods were genomic BLUP (GBLUP), reproducing kernel Hilbert spaces regression (RKHS), Lasso, elastic net, random forest (RForest), Bayesian lasso (Blasso), extended Bayesian lasso (EBlasso), weighted Bayesian shrinkage regression (wBSR), and the average of all methods (Ave). The objectives were to evaluate the predictive ability of these methods in a cultivar population, to characterize them by exploring the area of applicability of each method using simulation, and to investigate the causes of their different accuracies for empirical traits. GBLUP was the most accurate for one trait, RKHS and Ave for two, and RForest for three traits. In the simulation, Blasso, EBlasso, and Ave showed stable performance across the simulated scenarios, whereas the other methods, except wBSR, had specific areas of applicability; wBSR performed poorly in most scenarios. For each method, the accuracy ranking for the empirical traits was largely consistent with that in one of the simulated scenarios, suggesting that the simulation conditions reflected the factors that affected the method accuracy for the empirical results. This study will be useful for genomic prediction not only in Asian rice, but also in populations from other crops with relatively small training sets and strong linkage disequilibrium structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Agarwala V, Flannick J, Sunyaev S, GoT2D Consortium, Altshuler D (2013) Evaluating empirical bounds on complex disease genetic architecture. Nature Genet 45:1418–1429

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ (2010) Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci 93:743–752

    Article  CAS  PubMed  Google Scholar 

  • Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, Simianer H, Schon CC (2011) Genome-based prediction of testcross values in maize. Theor Appl Genet 123:339–350

    Article  PubMed  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton

    Google Scholar 

  • Buhlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin

    Book  Google Scholar 

  • Bureau A, Dupuis J, Falls K, Lunetta KL, Hayward B, Keith TP, Van Eerdeweg P (2005) Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol 28:171–182

    Article  PubMed  Google Scholar 

  • Christensen OF, Lund MS (2010) Genomic prediction when some animals are not genotyped. Genet Sel Evol 42:2

    Article  PubMed Central  PubMed  Google Scholar 

  • Clark SA, Hickey JM, van der Werf JHJ (2011) Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol 43:18

    Article  PubMed Central  PubMed  Google Scholar 

  • Crossa J, Campos GL, Perez P, Gianola D, Burgueno J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun HJ (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Crossa J, Beyene Y, Kassa S, Perez P, Hickey JM, Chen C, de loss Campos G, Burgueno J, Windhausen VS, Buckler E, Jannink JL, Lopez CMA, Babu R (2013) Genomic prediction in maize breeding populations with genotyping-by-sequencing. G3 (Bethesda) 3:1903–1926

  • Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA (2010) The impact of genetic architecture on genome-wide evaluation methods. Genetics 185:1021–1031

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Daetwyler HD, Calus MP, Pong-Wong R, de los Campos G, Hickey JM (2013) Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 193:347–365

    Article  PubMed Central  PubMed  Google Scholar 

  • de los Campos G, Gianola D, Rosa GJ, Weigel KA, Crossa J (2010) Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res (Camb) 92:295–308

    Article  CAS  Google Scholar 

  • de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327–345

    Article  PubMed Central  PubMed  Google Scholar 

  • de los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182:375–385

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Dietterich TG (2000) Ensemble methods in machine learning. In: Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 1–15

    Chapter  Google Scholar 

  • Donoho D, Stodden V (2006) Breakdown point of model selection when the number of variables exceeds the number of observations. In: Proceedings of the international joint conference on neural networks, pp 1916–1921

  • Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250–255

    Article  Google Scholar 

  • Endelman JB, Atlin GN, Beyene Y, Semagn K, Zhang X, Sorrells ME, Jannink JL (2014) Optimal design of preliminary yield trials with genome-wide markers. Crop Sci 54:48–59

    Article  Google Scholar 

  • Falconer DS (1981) Introduction to quantitative genetics, 2nd edn. Longman Inc., New York

    Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22

    PubMed Central  PubMed  Google Scholar 

  • Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R (2009) Additive genetic variability and the Bayesian alphabet. Genetics 183:347–363

    Article  PubMed Central  PubMed  Google Scholar 

  • Gianola D, van Kaam JB (2008) Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178:2289–2303

    Article  PubMed Central  PubMed  Google Scholar 

  • Gianola D, Fernando RL, Stella A (2006) Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173:1761–1776

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Gonzalez-Camacho JM, de los Campos G, Perez P, Gianola D, Cairns JE, Mahuku G, Babu R, Crossa J (2012) Genome-enabled prediction of genetic values using radial basis function neural networks. Theor Appl Genet 125:759–771

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Gonzalez-Recio O, Gianola D, Long N, Weigel KA, Rosa GJ, Avendano S (2008) Nonparametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers. Genetics 178:2305–2313

    Article  PubMed Central  PubMed  Google Scholar 

  • Habier D, Fernando RL, Dekkers JC (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177:2389–2397

    CAS  PubMed Central  PubMed  Google Scholar 

  • Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinfo 12:186

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York

    Book  Google Scholar 

  • Hayashi T, Iwata H (2010) EM algorithm for Bayesian estimation of genomic breeding values. BMC Genet 11:3

    Article  PubMed Central  PubMed  Google Scholar 

  • Hayashi T, Iwata H (2013) A Bayesian method and its variational approximation for prediction of genomic breeding values in multiple traits. BMC Bioinfo 14:34

    Article  Google Scholar 

  • Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92:433–443

    Article  CAS  PubMed  Google Scholar 

  • Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49:1–12

    Article  CAS  Google Scholar 

  • Heffner EL, Jannink JL, Sorrells ME (2011) Genomic selection accuracy using multifamily prediction models in a wheat breeding program. Plant Genome 4:65–75

    Article  Google Scholar 

  • Heslot N, Yang HP, Sorrells ME, Jannink JL (2012) Genomic selection in plant breeding: a comparison of models. Crop Sci 52:146–160

    Article  Google Scholar 

  • Hickey JM, Gorjanc G (2012) Simulated data for genomic selection and genome-wide association studies using a combination of coalescent and gene drop methods. G3 (Bethesda) 2:425–427

  • Jannink JL, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomic Proteomic 9:166–177

    Article  CAS  Google Scholar 

  • Jia Y, Jannink JL (2012) Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics 192:1513–1522

    Article  PubMed Central  PubMed  Google Scholar 

  • Jiang R, Tang W, Wu X, Fu W (2009) A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinfo 10(Suppl 1):S65

    Article  Google Scholar 

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the international joint conference on artificial intelligence (IJCAI) vol 14, pp 1137–1145

  • Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems 7. MIT Press, Cambridge, pp 231–238

    Google Scholar 

  • Legarra A, Robert-Granie C, Manfredi E, Elsen JM (2008) Performance of genomic selection in mice. Genetics 180:611–618

    Article  PubMed Central  PubMed  Google Scholar 

  • Li Z, Sillanpaa MJ (2012a) Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection. Theor Appl Genet 125:419–435

    Article  CAS  PubMed  Google Scholar 

  • Li Z, Sillanpaa MJ (2012b) Estimation of quantitative trait locus effects with epistasis by variational Bayes algorithms. Genetics 190:231–249

    Article  PubMed Central  PubMed  Google Scholar 

  • Long N, Gianola D, Rosa GJ, Weigel KA, Kranis A, Gonzalez-Recio O (2010) Radial basis function regression methods for predicting quantitative traits using SNP markers. Genetic Res (Camb) 92:209–225

    Article  CAS  Google Scholar 

  • Long N, Gianola D, Rosa GJ, Weigel KA (2011a) Application of support vector regression to genome-assisted prediction of quantitative traits. Theor Appl Genet 123:1065–1074

    Article  PubMed  Google Scholar 

  • Long N, Gianola D, Rosa GJ, Weigel KA (2011b) Dimension reduction and variable selection for genomic selection: application to predicting milk yield in Holsteins. J Anim Breed Genet 128:247–257

    Article  CAS  PubMed  Google Scholar 

  • Lorenz AJ, Smith KP, Jannink JL (2012) Potential and optimization of genomic selection for Fusarium head blight resistance in six-row barley. Crop Sci 52:1609–1621

    Article  Google Scholar 

  • Lorenzana RE, Bernardo R (2009) Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Theor Appl Genet 120:151–161

    Article  PubMed  Google Scholar 

  • Lund MS, Sahana G, de Koning DJ, Su G, Carlborg O (2009) Comparison of analyses of the QTLMAS XII common dataset. I: Genomic selection. BMC Proc 3 (Suppl 1):S1

  • Makowsky R, Pajewski NM, Klimentidis YC, Vazquez AI, Duarte CW, Allison DB, de los Campos G (2011) Beyond missing heritability: prediction of complex traits. PLoS Genet 7:e1002051

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829

    CAS  PubMed Central  PubMed  Google Scholar 

  • Mutshinda CM, Sillanpaa MJ (2010) Extended Bayesian LASSO for multiple quantitative trait loci mapping and unobserved phenotype prediction. Genetics 186:1067–1075

    Article  PubMed Central  PubMed  Google Scholar 

  • Nadaf J, Riggio V, Yu TP, Pong-Wong R (2012) Effect of the prior distribution of SNP effects on the estimation of total breeding value. BMC Proc 6(Suppl 2):S6

    Article  PubMed Central  PubMed  Google Scholar 

  • Nagasaki H, Ebana K, Shibaya T, Yonemaru JI, Yano M (2010) Core single-nucleotide polymorphisms—a tool for genetic analysis of the Japanese rice population. Breed Sci 60:648–655

    Article  Google Scholar 

  • Ober U, Erbe M, Long N, Porcu E, Schlather M, Simianer H (2011) Predicting genetic values: a kernel-based best linear unbiased prediction with genomic data. Genetics 188:695–708

    Article  PubMed Central  PubMed  Google Scholar 

  • Ogutu JO, Piepho HP, Schulz-Streeck T (2011) A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proc 5(Suppl 3):S11

    Article  PubMed Central  PubMed  Google Scholar 

  • Ogutu JO, Schulz-Streeck T, Piepho HP (2012) Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. BMC Proc 6(Suppl 2):S10

    Article  PubMed Central  PubMed  Google Scholar 

  • Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103:681–686

    Article  CAS  Google Scholar 

  • R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing. Vienna. ISBN 3-900051-07-0. http://www.R-project.org/

  • Perez-Rodriguez P, Gianola D, Gonzalez-Camacho JM, Crossa J, Manes Y, Dreisigacker S (2012) Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3 (Bethesda) 2:1595–1605

  • Riedelsheimer C, Technow F, Melchinger AE (2012) Comparison of whole-genome prediction models for traits with contrasting genetic architecture in a diversity panel of maize inbred lines. BMC Genomics 13:452

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Solberg TR, Sonesson AK, Woolliams JA, Meuwissen TH (2009) Reducing dimensionality for prediction of genome-wide breeding values. Genet Sel Evol 41:29

    Article  PubMed Central  PubMed  Google Scholar 

  • Sun X, Qu L, Garrick DJ, Dekkers JC, Fernando RL (2012) A fast EM algorithm for BayesA-like prediction of genomic breeding values. PLoS One 7:e49157

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288

    Google Scholar 

  • Usai MG, Goddard ME, Hayes BJ (2009) LASSO with cross-validation for genomic selection. Genetic Res (Camb) 91:427–436

    Article  CAS  Google Scholar 

  • VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423

    Article  CAS  PubMed  Google Scholar 

  • Wimmer V, Lehermeier C, Albrecht T, Auinger HJ, Wang Y, Schon CC (2013) Genome-wide prediction of traits with different genetic architecture through efficient variable selection. Genetics 195:573–587

    Article  PubMed Central  PubMed  Google Scholar 

  • Yamamoto T, Nagasaki H, Yonemaru J, Ebana K, Nakajima M, Shibaya T, Yano M (2010) Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms. BMC Genomics 11:267

    Article  PubMed Central  PubMed  Google Scholar 

  • Yamasaki M, Ideta O (2013) Population structure in Japanese rice population. Breed Sci 63:49–57

    Article  PubMed Central  PubMed  Google Scholar 

  • Yao C, Spurlock DM, Armentano LE, Page CDJ, Vandehaar MJ, Bickhart DM, Weigel KA (2013) Random forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle. J Dairy Sci 96:6716–6729

    Article  CAS  PubMed  Google Scholar 

  • Zhang Z, Liu J, Ding X, Bijma P, de Koning DJ, Zhang Q (2010) Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix. PLoS One 5:e12648

    Article  PubMed Central  PubMed  Google Scholar 

  • Zhao Y, Gowda M, Liu W, Wurschum T, Maurer HP, Longin FH, Ranc N, Reif JC (2012) Accuracy of genomic selection in European maize elite breeding populations. Theor Appl Genet 124:769–776

    Article  PubMed  Google Scholar 

  • Zhong S, Dekkers JC, Fernando RL, Jannink JL (2009) Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics 182:355–364

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by a Grant-in-Aid for Japan Society for the Promotion of Science (JSPS) Fellows (26.10661), and also by the FIRST Program of JSPS. We would like to thank the technical staff at the National Agriculture and Food Research Organization Western Region Agricultural Research Center.

Conflict of interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroyoshi Iwata.

Additional information

Communicated by Jose Crossa.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 101 kb)

Supplementary material 2 (PDF 3596 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Onogi, A., Ideta, O., Inoshita, Y. et al. Exploring the areas of applicability of whole-genome prediction methods for Asian rice (Oryza sativa L.). Theor Appl Genet 128, 41–53 (2015). https://doi.org/10.1007/s00122-014-2411-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-014-2411-y

Keywords

Navigation