Efficient genetic value prediction using incomplete omics data
Covering a subset of individuals with a quantitative predictor, while imputing records for all others using pedigree or genomic data, could improve the precision of predictions while controlling for costs.
Predicting genetic values with high accuracy is pivotal for effective candidate selection in animal and plant breeding. Novel ‘omics’-based predictors have been shown to improve upon established genome-based predictions of important complex traits but require laborious and expensive assays. As a consequence, there are various datasets with full genetic marker coverage of all studied individuals but incomplete coverage with other ‘omics’ data. In animal breeding, single-step prediction was introduced to efficiently combine pedigree information, collected on a large number of animals, with genomic information, collected on a smaller subset of animals, for breeding value estimation without bias. Using two maize datasets of inbred lines and hybrids, we show that the single-step framework facilitates imputing transcriptomic data, boosting forecasts when their predictive ability exceeds that of pedigree or genomic data. Our results suggest that covering only a subset of inbred lines with ‘omics’ predictors and imputing all others using pedigree or genomic data could enable breeders to improve trait predictions while keeping costs under control. Employing ‘omics’ predictors could particularly improve candidate selection in hybrid breeding because the success of forecasts is a strongly convex function of predictive ability.
We thank T. A. Schrag from the University of Hohenheim for providing the phenotypic data as well as S. Scholten, A. Thiemann and F. Seifert from the University of Hamburg for providing the gene expression data for Experiment 2, respectively. Furthermore, we would like to thank researchers and institutions who contributed to the development of the maize diversity panel and associated data from Experiment 1, in particular Jianbing Yan and Haijun Liu from Huazhong Agricultural University Wuhan in China. We thank T. A. Schrag and W. Molenaar for valuable suggestions for improving the content of this manuscript. The authors acknowledge support by the state of Baden-Württemberg through bwHPC. Financial support for M. W. was provided by the Fiat Panis foundation, Ulm, Germany.
Author contribution statement
MW and CH conceived the study. AEM, RF and GT guided the structure of the research and checked the methodology and results for validity. MW and CH drafted the manuscript. MW and CH implemented the prediction models and developed software. MW analyzed the data. All authors interpreted the results, read and approved the final version of the manuscript.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
- Canty A, Ripley BD (2017) Boot: bootstrap R (S-Plus) functionGoogle Scholar
- Dey KK, Hsiao CJ, Stephens M (2016) Clustering RNA-seq expression data using grade of membership models. https://doi.org/10.1101/051631
- Fragomeni BO, Lourenco DAL, Tsuruta S, Masuda Y, Aguilar I, Legarra A, Lawlor TJ, Misztal I (2015) Hot topic: use of genomic recursions in single-step genomic best linear unbiased predictor (BLUP) with a large number of genotypes. J Dairy Sci 98(6):4090–4094. https://doi.org/10.3168/jds.2014-9125 CrossRefGoogle Scholar
- Fu J, Cheng Y, Linghu J, Yang X, Kang L, Zhang Z, Zhang J, He C, Du X, Peng Z, Wang B, Zhai L, Dai C, Xu J, Wang W, Li X, Zheng J, Chen L, Luo L, Liu J, Qian X, Yan J, Wang J, Wang G (2013) RNA sequencing reveals the complex regulatory network in the maize kernel. Nat Commun 4:2832. https://doi.org/10.1038/ncomms3832 CrossRefGoogle Scholar
- Ganal MW, Durstewitz G, Polley A, Bérard A, Buckler ES, Charcosset A, Clarke JD, Graner EM, Hansen M, Joets J, Le Paslier MC, McMullen MD, Montalent P, Rose M, Schön CC, Sun Q, Walter H, Martin OC, Falque M (2011) A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PloS ONE 6(12):e28-334. https://doi.org/10.1371/journal.pone.0028334 CrossRefGoogle Scholar
- García-Ruiz A, Cole JB, VanRaden PM, Wiggans GR, Ruiz-López FJ, Tassell CPV (2016) Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection. Proc Natl Acad Sci USA 113(33):201519,061. https://doi.org/10.1073/PNAS.1519061113 Google Scholar
- Lourenco DAL, Tsuruta S, Fragomeni B, Masuda Y, Aguilar I, Legarra A, Bertrand J, Amen T, Wang L, Moser D, Misztal I (2015) Genetic evaluation using single-step genomic best linear unbiased predictor in American Angus. J Anim Sci 93:2653–2662. https://doi.org/10.2527/jas2014-8836 CrossRefGoogle Scholar
- Masuda Y, Misztal I, Tsuruta S, Legarra A, Aguilar I, Lourenco D, Fragomeni B, Lawlor T (2016) Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals. J Dairy Sci 99(3):1968–1974. https://doi.org/10.3168/jds.2015-10540 CrossRefGoogle Scholar
- Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829Google Scholar
- Mrode RA (2014) Linear Models for the Prediction of Animal Breeding Values, 3rd edn. CABI, Oxfordshire, https://doi.org/10.1017/CBO9781107415324.004
- Schrag TA, Westhues M, Schipprack W, Seifert F, Thiemann A, Scholten S, Melchinger AE (2018) Beyond genomic prediction: combining different types of omics data can improve prediction of hybrid performance in maize. Genetics https://doi.org/10.1534/genetics.117.300374
- Vazquez AI, Veturi YC, Behring M, Shrestha S, Kirst M, Resende MF Jr, de los Campos G (2016) Increased proportion of variance explained and prediction accuracy of survival of breast cancer patients with use of whole-genome multi-omic profiles. Genetics 203(3):1425–1438. https://doi.org/10.1534/genetics.115.185181 CrossRefGoogle Scholar
- Watson A, Ghosh S, Williams MJ, Cuddy W, Simmonds J, Rey MD, Md Hatta MA, Hinchliffe A, Steed A, Reynolds D, Adamski N, Breakspear A, Korolev A, Rayner T, Dixon LE, Riaz A, Martin W, Ryan M, Edwards D, Hickey L (2018) Speed breeding is a powerful tool to accelerate crop research and breeding. Nat Plants 4:23–29CrossRefGoogle Scholar
- Wedzony M, Forster B, Zur I, Golemiec E, Scechynska-Hebda M, Dubas E, Gotebiowska G (2009) Progress in doubled haploid technology in higher plants. In: Touarev A, Forster BP, Mohan JS (eds) Advances in haploid production in higher plants, chap 1. Springer, New YorkGoogle Scholar
- Windhausen VS, Atlin GN, Hickey JM, Crossa J, Jannink JL, Sorrells ME, Raman B, Cairns JE, Tarekegne A, Semagn K, Beyene Y, Grudloyma P, Technow F, Riedelsheimer C, Melchinger AE (2012) Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments. G3 (Bethesda, Md) 2(11):1427–36. https://doi.org/10.1534/g3.112.003699 CrossRefGoogle Scholar
- Yang N, Lu Y, Yang X, Huang J, Zhou Y, Ali F, Wen W, Liu J, Li J, Yan J (2014) Genome wide association studies using a new nonparametric model reveal the genetic architecture of 17 agronomic traits in an enlarged maize association panel. PLoS Genet 10(9):1–2. https://doi.org/10.1371/journal.pgen.1004573 CrossRefGoogle Scholar