# Estimating genomic breeding values and detecting QTL using univariate and bivariate models

- 2.6k Downloads
- 3 Citations

## Abstract

### Background

Genomic selection is particularly beneficial for difficult or expensive to measure traits. Since multi-trait selection is an important tool to deal with such cases, an important question is what the added value is of multi-trait genomic selection.

### Methods

The simulated dataset, including a quantitative and binary trait, was analyzed with four univariate and bivariate linear models to predict breeding values for juvenile animals. Two models estimated variance components with REML using a numerator (A), or SNP based relationship matrix (G). Two SNP based Bayesian models included one (BayesA) or two distributions (BayesC) for estimated SNP effects. The bivariate BayesC model sampled QTL probabilities for each SNP conditional on both traits. Genotypes were permuted 2,000 times against phenotypes and pedigree, to obtain significance thresholds for posterior QTL probabilities. Genotypes were permuted rather than phenotypes, to retain relationships between pedigree and phenotypes, such that polygenic effects could still be estimated.

### Results

Correlations between estimated breeding values (EBV) of different SNP based models, for juvenile animals, were greater than 0.93 (0.87) for the quantitative (binary) trait. Estimated genetic correlation was 0.71 (0.66) for model G (A). Accuracies of breeding values of SNP based models were for both traits highest for BayesC and lowest for G. Accuracies of breeding values of bivariate models were up to 0.08 higher than for univariate models.

The bivariate BayesC model detected 14 out of 32 QTL for the quantitative trait, and 8 out of 22 for the binary trait.

### Conclusions

Accuracy of EBV clearly improved for both traits using bivariate compared to univariate models. BayesC achieved highest accuracies of EBV and was also one of the methods that found most QTL. Permuting genotypes against phenotypes and pedigree in BayesC provided an effective way to derive significance thresholds for posterior QTL probabilities.

## Keywords

Genomic Selection Bivariate Model Estimate Variance Component Estimate Breeding Value Polygenic Effect## Background

Genomic selection is particularly beneficial for difficult or expensive to measure traits [1]. One strategy to partly tackle these issues in breeding schemes previously, without using genotypic information, was multi-trait selection [e.g. 2]. An important question is therefore what the added value is of multi-trait genomic selection. VanRaden and Sullivan [3] showed some benefit using this approach in international dairy cattle evaluations. There are, however, no other reports so far on applications of multi-trait genomic selection. The objective of this study was to present methods to apply multi-trait genomic breeding value prediction, and to evaluate their performance and impact on accuracy of prediction compared to single trait applications. In addition, the ability of one model to detect QTL was investigated.

## Methods

### Estimation of breeding values

Simulated data of the 14^{th} QTL-MAS workshop was analyzed with univariate and bivariate applications of four different models to predict breeding values for juvenile animals without phenotypes. A linear model was assumed for both the quantitative and binary trait. Using a linear model for binary traits is expected to give breeding values that are highly related to those obtained from a threshold model, when trait incidence is moderate [e.g. 4], which is the case here with a value of 0.30. The first two models used ASREML to estimate variance components:

*y*_{ ij } = *µ*_{ j } + *animal*_{ ij } + *e*_{ ij }

*y*

_{ ij }is the phenotypic record of animal

*i*,

*µ*

_{ j }is the overall mean for trait

*j*,

*animal*

_{ ij }is the random polygenic effect of animal

*i*for trait

*j*, and

*e*

_{ ij }is a random residual for animal

*i*. Model A used a numerator relationship matrix for polygenic effects, while model G used a SNP based genomic relationship matrix. For G, matrix

**G**was calculated as [5]:,

where **Z** contained marker genotypes for all animals across loci, being -1 and 1 for either homozygote and 0 for the heterozygote genotype, corrected for allele frequency per locus in the current population.

where *SNP*_{ ijkl } is a random effect for allele *l* on trait *j* at locus *k* of animal *i*. The difference between those two models is that 1 (BayesA) or 2 (BayesC) distributions for SNP effects are considered, respectively.

SNP effects, denoted as *SNP*_{ ijkl }, were estimated in BayesA and BayesC as *q*_{ ijkl }×*v*_{ .k }[6], where *q*_{ ijkl } is the effect size of allele *l* at locus *k* and *v*_{ .jk } is the direction vector for locus *k* that scales the effect at locus *k* for trait *j*. In the original implementation [6], variance of the direction vector *v*_{ jk }, denoted as **V**, is sampled for each trait *j* separately, without considering covariances between traits across loci. Here, both in BayesA and BayesC, in **V** covariances between traits across loci are considered.

### QTL mapping

BayesC, also known as Bayesian stochastic search variable selection (BSSVS) [7], involved sampling presence of a QTL at each SNP position from a Bernoulli distribution with probability equal to Open image in new window , where P(**v**_{ j } | **0**, **V**) is the probability of sampling **v**_{ j } from N(**0**, **V**), and Pr_{j} is the prior probability of presence of a QTL at SNP position *j*. Pr_{j} was calculated per locus as 50 divided by the total number of SNPs, reflecting that 50 QTL were expected. Posterior QTL probabilities were calculated as proportions of cycles after burn-in that a locus was placed in the distribution with large effects and therefore was sampled from N(**0**, **V**). For more details on prior distributions and fully conditional distributions, see Meuwissen and Goddard [6].

To obtain significance thresholds for posterior QTL probabilities for the bivariate BayesC model, genotypes were permuted 2,000 times against phenotypes and pedigree.

## Results

### Variance components

Estimated heritabilities and genetic correlations.

h | ||||||
---|---|---|---|---|---|---|

Model | Quantitative | s.e. | Binary | s.e. | r | s.e. |

A | 0.53 | 0.06 | 0.22 | 0.04 | 0.66 | 0.09 |

G | 0.46 | 0.03 | 0.29 | 0.03 | 0.71 | 0.06 |

### Breeding values

Correlations between predicted breeding values of juvenile animals.

Univariate | Bivariate | ||||||||
---|---|---|---|---|---|---|---|---|---|

A | G | BayesA | BayesC | A | G | BayesA | BayesC | ||

A | 0.60 | 0.67 | 0.63 | 0.99 | 0.62 | 0.61 | 0.58 | ||

Uni | G | 0.60 | 0.98 | 0.94 | 0.60 | 0.99 | 0.99 | 0.94 | |

BayesA | 0.62 | 1.00 | 0.98 | 0.66 | 0.98 | 0.99 | 0.96 | ||

BayesC | 0.56 | 0.95 | 0.96 | 0.63 | 0.94 | 0.96 | 0.98 | ||

A | 0.93 | 0.62 | 0.64 | 0.60 | 0.63 | 0.61 | 0.58 | ||

Biv | G | 0.60 | 0.95 | 0.95 | 0.94 | 0.64 | 0.99 | 0.95 | |

BayesA | 0.58 | 0.94 | 0.95 | 0.96 | 0.63 | 0.99 | 0.98 | ||

BayesC | 0.50 | 0.88 | 0.88 | 0.95 | 0.57 | 0.94 | 0.96 |

Accuracies and regressions of true on estimated breeding values for juvenile animals.

Accuracy | Regression coefficient | |||||||
---|---|---|---|---|---|---|---|---|

Quantitative trait | Binary trait | Quantitative trait | Binary trait | |||||

Model | Uni. | Biv. | Uni. | Biv. | Uni. | Biv. | Uni. | Biv. |

A | 0.39 | 0.39 | 0.47 | 0.52 | 0.84 | 0.84 | 0.71 | 0.75 |

G | 0.61 | 0.62 | 0.72 | 0.79 | 0.96 | 0.96 | 0.83 | 0.88 |

BayesA | 0.63 | 0.64 | 0.73 | 0.81 | 0.96 | 0.96 | 0.84 | 0.91 |

BayesC | 0.66 | 0.67 | 0.79 | 0.85 | 0.93 | 0.93 | 0.91 | 0.95 |

### QTL detection

Detection of QTL was considered for univariate and bivariate BayesC models, while significance thresholds were only derived for the bivariate BayesC model. Therefore, only detected QTL from the bivariate BayesC model were used in the comparison of QTL detection methods.

For the quantitative trait 14 out of 32 QTL were detected, while for the binary trait 8 out of 22 were detected [8]. SNP that were declared significant together explained 35.0% and 22.6% of the genetic variance of the quantitative and binary trait, respectively. Polygenic effects explained only 4.3 and 1.1% of the genetic variance. This indicates that most of the genetic variance (i.e. 60.7 and 76.2% for the quantitative and binary trait, respectively) in the bivariate BayesC model was explained by effects of SNP that where not declared significant.

## Discussion

This study aimed to present methods to apply multi-trait genomic breeding value prediction, to evaluate impact on accuracy of prediction compared to single trait genomic breeding value prediction, and to detect QTL with one of the models. Results clearly indicated that accuracy of EBV increased when model complexity increased to allow better modeling of the genetic architecture. First, accuracy increased going from model A, to SNP based models with increasing flexibility to model SNP effects (in the order: G, BayesA, BayesC). Second, accuracy of EBV for both traits increased more for all SNP based models when using bivariate instead of univariate applications, compared to model A. This confirms results of a simulation study for dairy cattle showing that model G yields higher accuracies when using data of multiple countries compared to one country [3]. Third, considering that few QTL had relatively large effects, it was expected that the model best able to give more weight to loci with large effect – BayesC – fits the data best. Results are fully in agreement with this expectation. This suggests that SNP based models were better able to capture pleiotropic effects of QTL.

The model that achieved highest EBV accuracy, i.e. BayesC, was also one of the presented models that detected most QTL. The model that is best able to detect the position of QTL, however, is not always the model that is best able to predict total genetic merit of animals [9]. Permuting genotypes against phenotypes and pedigree in model BayesC provided an effective way to derive significance thresholds for posterior QTL probabilities. Note that SNP genotypes after the permutation no longer followed Mendelian inheritance. Lack of Mendelian inheritance probably results in fewer associations, since SNPs are less likely to capture pedigree effects, and therefore a lower threshold. In the present data, however, polygenic effects only captured a very small fraction of the variance, indicating that the applied permutation strategy will have had a minor impact on significance thresholds.

## Conclusions

The EBV accuracy clearly improved for both traits for all bivariate models compared to their univariate counterparts. BayesC achieved highest EBV accuracies and was also one of the methods presented at the workshop that found most QTL.

## Notes

### Acknowledgements

MPLC and RFV were funded by the EU RobustMilk project that is financially supported by the European Commission under the Seventh Research Framework Programme, Grant Agreement KBBE-211708, and HM by the EU SABRE project that is financially supported by the European Commission under the Sixth Research Framework Programme, contract No. FOOD-CT-2006-016250. The content of this paper is the sole responsibility of the authors, and it does not necessarily represent the views of the Commission or its services.

This article has been published as part of *BMC Proceedings* Volume 5 Supplement 3, 2011: Proceedings of the 14th QTL-MAS Workshop. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/5?issue=S3.

## References

- 1.Haley CS, Visscher PM: Strategies to utilize marker-quantitative trait loci associations. J Dairy Sci. 1998, 81 (Suppl. 2): 85-97. 10.3168/jds.S0022-0302(98)70157-2.CrossRefPubMedGoogle Scholar
- 2.Apiolaza LA: Very early selection for solid wood quality: screening for early winners. Ann Forest Sci. 2009, 66 (6):Google Scholar
- 3.VanRaden P, Sullivan P: International genomic evaluation methods for dairy cattle. Genet Sel Evol. 2010, 42 (1): 7-10.1186/1297-9686-42-7.PubMedCentralCrossRefPubMedGoogle Scholar
- 4.Weller JI, Misztal I, Gianola D: Genetic analysis of dystocia and calf mortality in Israeli Holsteins by threshold and linear models. 1988, 71 (9): 2491-2501.Google Scholar
- 5.VanRaden PM: Efficient methods to compute genomic predictions. J Dairy Sci. 2008, 91 (11): 4414-4423. 10.3168/jds.2007-0980.CrossRefPubMedGoogle Scholar
- 6.Meuwissen THE, Goddard ME: Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data. Genet Sel Evol. 2004, 36 (3): 261-279. 10.1186/1297-9686-36-3-261.PubMedCentralCrossRefPubMedGoogle Scholar
- 7.Verbyla KL, Hayes BJ, Bowman PJ, Goddard ME: Accuracy of genomic selection using stochastic search variable selection in Australian Holstein Friesian dairy cattle. Genet Res. 2009, 91 (5): 307-311. 10.1017/S0016672309990243.CrossRefGoogle Scholar
- 8.Mucha S, Pszczoła M, Strabel T, Wolc A, Paczyńska P, Szydlowski M: Comparison of analyses of the QTLMAS XIV common dataset. II: QTL analysis. BMC Proceedings. 2011, 5 (Suppl 3):Google Scholar
- 9.Calus MPL, Meuwissen THE, Windig JJ, Knol EF, Schrooten C, Vereijken ALJ, Veerkamp RF: Effects of the number of markers per haplotype and clustering of haplotypes on the accuracy of QTL mapping and prediction of genomic breeding values. Genet Sel Evol. 2009, 41: 11-10.1186/1297-9686-41-11.PubMedCentralCrossRefPubMedGoogle Scholar

## Copyright information

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.