Introduction

Quantitative traits are showing continuous variation in segregating populations. For a long time it has been assumed that quantitative traits are controlled by multiple genetic factors each having a small effect on the expression of the trait, known as the multiple factor hypothesis (East 1916). However, this hypothesis remained largely hypothetical for most of the last century as it was impossible to unravel the genetic basis of quantitative traits at the whole genome level using classical genetic methods. Recent advances in genome research involving a number of molecular-marker techniques and the availability of high-density molecular linkage maps, together with developments in analytical methods (Lander and Botstein 1989; Zeng 1994), facilitated the analysis of the genetic basis of quantitative traits at a single-locus level.

Main effects and epistatic effects of QTLs are important genetic components of quantitative traits. Epistasis refers to the phenotypic effects of interactions among alleles at multiple loci. Our current understanding of biochemical and physiological genetics, as well as the regulation of gene expression, strongly suggests the ubiquity of interactions among gene products. There were also substantial interests in the classical quantitative genetics of epistasis, defined as the deviation from additivity of the effects between alleles of different loci (Cockerham 1954). Epistasis or the additive-by-additive interaction between loci on a genome that controls a quantitative trait is of great interest to geneticists (Eta-Ndu and Openshaw 1999; Gao and Zhu 2007; Kang et al. 2009; Zuo and Kang 2010; Binh et al. 2011; He et al. 2011). Strong interactions between QTLs have also been observed in maize (Lukens and Doebley 1999) and soybean (Lark et al. 1995). Recent genetic analyses using molecular markers in several plant species have clearly shown that, in addition to single locus QTLs, epistatic interactions play an important role on the genetic basis of quantitative traits (Lark et al. 1995; Maughan et al. 1996; Li et al. 1997; Yu et al. 1997; Poelwijk et al. 2011; Krajewski et al. 2012; Bocianowski 2012c, 2013, 2014).

In the previously paper (Bocianowski 2012a) was described the simulation study investigated a comparison between two estimation methods of the additive-by-additive (epistasis) effect: one, using only phenotypic data, and the other, which additionally takes into account the genotypic marker data. The main assumption was that the QTLs were located at marker positions. The shortcoming associated with the analyses of epistatic interactions published in previous studies (Bocianowski 2012a) is that the calculations were directly based on markers that are located at certain distances away from the QTLs involved in the epistases. The estimated effects are therefore biased depending on the distances between the marker loci and the QTLs. In this paper the method based on interval mapping was developed for mapping QTLs with additive and/or digenic epistasis effects. The analysis is based on a mixed linear-model approach, and put together the QTL main effects and digenic interactions that are possible with a two-locus data set in the same model (Wang et al. 1999).

A mixed model is a statistical model containing both fixed effects and random effects, that is mixed effects. These models are useful in a wide variety of disciplines in the biomedical, agricultural, physical, biological and social sciences (Parisseaux and Bernardo 2004; Yu et al. 2005b; Arbelbide et al. 2006; Aulchenko et al. 2007; Yang et al. 2007). They are particularly useful in settings where repeated measurements are made on the same statistical units, or where measurements are made on clusters of related statistical units. Because of their advantage to deal with missing values, mixed effects models are often preferred over more traditional approaches such as repeated measures analysis of variance. Mixed models can account for relationships among inbreds and for unbalanced data, and can incorporate marker data (Parisseaux and Bernardo 2004). A mixed-model procedure represents an in silico approach for gene mapping because it exploits phenotypic and genomic databases that are already available (Grupe et al. 2001).

The aim of the study reported in this paper was to compare two estimation methods for the parameters connected with the additive and additive-by-additive interaction gene action, i.e. the genotypic method, which is based on marker observations, and the phenotypic method, traditionally used in quantitative genetics, based on the only phenotypic observations. A mixed linear model approach was used to detect QTLs with main effects and QTLs involved in digenic interaction. The comparison was performed by the Monte Carlo simulation study.

Materials and methods

Plant material

For simplicity, we use a biparental homozygous (doubled haploid or recombinant inbred) lines population, from a cross between two homozygous lines. If in the experiment we observed n significantly different plant lines, we obtain an n-vector of phenotypic mean observations y = [y 1 y 2 y n ]′ and q n-vectors of marker genotype observations M l , l = 1, 2,…, q. The i-th element (i = 1, 2,…, n) of vector M l is equal to −1 or 1, depending on the parent’s genotype exhibited by the i-th line.

Genetic models

In the first step of the selection, a mixed linear model for the simultaneous search of two interacting QTLs (Q i between flanking markers M i and M i+, and Q j between flanking markers M j and M j+) can be expressed as follows (Wang et al. 1999):

$$y_{k} = \mu + a_{i} x_{{G_{ik} }} + a_{j} x_{{G_{jk} }} + aa_{ij} x_{{GG_{ijk} }} + \sum\limits_{f} {u_{{M_{fk} }} e_{{M_{f} }} } + \sum\limits_{l} {u_{{MM_{lk} }} e_{{MM_{l} }} } + \varepsilon_{k} ,$$
(1)

where y k denotes the phenotypic value of a quantitative trait measured on the k-th individual (k = 1, 2,…, n); μ denotes the population mean; a i and a j denote the additive effects (fixed) of the two putative QTLs (Q i and Q j ) respectively; aa ij denotes the additive × additive epistatic effect (fixed) between Q i and Q j ; \(x_{{G_{ik} }}\), \(x_{{G_{jk} }}\) and \(x_{{GG_{ijk} }}\) denote coefficients of QTL effects derived according to the observed genotypes of the markers (M i, M i+ and M j, M j+) and the test position (Wang et al. 1999); \(e_{{M_{f} }} \sim N\,\left( {0, \, \sigma_{\text{M}}^{2} } \right)\) denotes the random effect of marker f with indicator coefficient \(u_{{M_{fk} }}\) (−1 for m f m f and 1 for M f M f ); \(e_{{MM_{l} }} \sim N\,\left( {0, \, \sigma_{\text{MM}}^{2} } \right)\) denotes the random effect of the l-th marker interaction (between marker K l and marker L l ) with indicator coefficient \(u_{{MM_{lk} }}\) (−1 for M K M K m L m L or m K m K M L M L and 1 for M K M K M L M L or m K m K m L m L ); and \(\varepsilon_{k} \sim N\;\left( {0, \, \sigma_{\varepsilon }^{2} } \right)\) denotes the random residual effect.

The inclusion of \(e_{{M_{f} }}\) and \(e_{{MM_{l} }}\) in the model is intended to absorb the additive and epistatic effects of background QTLs (additional segregating QTLs other than the loci examined) for controlling the noise caused by these background QTLs.

Markers and marker pairs selected in the first step were subjected to backward stepwise selection in the second stage. The final model is as follows:

$$y_{k} = \mu + \sum\limits_{t = 1}^{p} {a_{t} x_{{G_{tk} }} } + \sum\limits_{t = 1}^{p - 1} {\sum\limits_{\begin{subarray}{l} t' = t + 1 \\ t' \ne t \end{subarray} }^{p} {aa_{tt'} x_{{GG_{tt'k} }} } } + \sum\limits_{g = 1}^{h} {\sum\limits_{f} {u_{{M_{fk} }} e_{{M_{f} }} } } + \sum\limits_{g' = 1}^{h'} {\sum\limits_{l} {u_{{MM_{lk} }} e_{{MM_{l} }} } } + \varepsilon_{k} ,$$
(2)

where p, g, g′, h and h′ ∈ {1,…, q}. In the case of the large number of markers, selection of markers chosen for model (1) can be made independently inside all linkage groups; next, markers chosen in this way can be put in one group and subjected to the second step of selection. Model (2) can be written as a matrix form of the mixed linear model:

$$y = 1\mu + X\beta + Z\gamma + U_{M} e_{M} + U_{MM} e_{MM} + e_{\varepsilon } ,$$
(3)

where y denotes n-dimensional vector of phenotypic values, 1 denotes the n-dimensional vector of ones, μ denotes the general mean, X denotes (n × p)-dimensional matrix which columns are markers, β denotes the p-dimensional vector of unknown fixed effects of the form \(\beta ' = \left[ {\mathop {}\limits^{{}} \begin{array}{*{20}c} {a_{1} } {a_{2} } \ldots {a_{p} } \\ \end{array} } \right]\), Z denotes a matrix which columns are products of some columns of matrix X, γ denotes the vector of unknown fixed effects of the form \(\gamma' = {\left[{aa_{1,2} }\,{aa_{1,3} }\ldots {aa_{p - 1,p} } \right]}\), \(e_{M} \sim N\;\left( {0,\sigma_{M}^{2} R_{M} } \right)\) denotes a random vector of marker effects, \(e_{MM} \sim N\left( {0,\sigma_{MM}^{2} R_{MM} } \right)\) denotes a random vector of interaction effects, \(e_{\varepsilon } \sim N\left( {0,\sigma_{\varepsilon }^{2} I} \right)\) denotes the n-dimensional vector of random variables such that E(e i ) = 0, Var(e i ) = \(\sigma^{2}\), Cov(e i , e j ) = 0 for i ≠ j, i, j = 1, 2,…, n, U M and U MM denote known incidence matrices, R M and R MM denote known symmetric matrices of incidence coefficients that can be obtained from the linkage relationships between the main-effect markers and between the pairs of interacting markers, respectively (Wang et al. 1999). The distribution of y is:

$$y\sim N\;\left( {Gb,V} \right) ,$$
(4)

where:

$$G = \left[ {\begin{array}{*{20}c} 1& {\text{X}} & {\text{Z}} \\ \end{array} } \right] ,$$
(5)
$$b' = \left[ {\begin{array}{*{20}c} \mu & {\beta '} & {\gamma '} \\ \end{array} } \right] ,$$
(6)
$$V = \sigma_{M}^{2} U_{M} R_{M} U_{M}^{'} + \sigma_{MM}^{2} U_{MM} R_{MM} U_{MM}^{'} + \sigma_{\varepsilon }^{2} .$$
(7)

The likelihood function (L) for the parameters of effects b and variance components in model (1) is:

$$L\;(b,V) = (2\pi )^{{ - \frac{n}{2}}} \left| V \right|^{{ - \frac{1}{2}}} \exp \left[ { - \frac{1}{2}\left( {y - Gb} \right)'V^{ - 1} \left( {y - Gb} \right)} \right] ,$$
(8)

When variance components of the model are known and if G is of full rank matrix, the estimate of b is given by (Searle 1982)

$$\hat{b} = \left( {G'V^{ - 1} G} \right)^{ - 1} G'V^{ - 1} y.$$
(9)

The total additive effect of genes influencing the trait (a g ) is defined as the sum of values of individual QTL effects. The total additive-by-additive epistasis effect of genes influencing the trait (aa g ) is defined as the sum of values of individual pairs’ effects. The coefficient of determination were used to measure how the model (2) fitted the data and, in this study, was the amount of the phenotypic variance explained by total QTLs with additive effects and QTL-pairs with epistatic effects (R 2).

Phenotypic estimation

Estimation of the additive gene effect and additive-by-additive interaction of homozygous loci (epistasis) effect on the basis of phenotypic observations y requires identification of groups of extreme lines, i.e. lines with the minimal and maximal expression of the observed trait (Choo and Reinbergs 1982). The group of minimal lines consists of the lines which contain, theoretically, only alleles reducing the value of the trait. Analogously, the group of maximal lines contains the lines which have only alleles increasing the trait value. In this paper we identify the groups of extreme lines using the quantile method (Bocianowski et al. 1999), in which lines with the mean values smaller (bigger) than 0.03 (0.97) quantile of the empirical distribution of means are assumed as minimal (maximal) lines. The choose the quantiles 0.03 and 0.97 is results of previously study (Bocianowski et al. 1999). The total additive effect a p of all genes controlling the trait and the total additive-by-additive interaction effect aa p may be estimated by the formulas (Bocianowski and Krajewski 2009; Bocianowski 2012b)

$${\hat{a}_{p}} = {\frac{1}{2}\left( {\overline{L}_{\hbox{max} } - \overline{L}_{\hbox{min} } } \right)}$$
(10)

and

$${\mathop {aa}^{\wedge}}_{p} = \frac{1}{2}\left( {\overline{L}_{\hbox{max} } + \overline{L}_{\hbox{min} } } \right) - \overline{L} ,$$
(11)

where aa p denotes total additive-by-additive interaction effect estimated on the basis of only phenotypes observations, \(\overline{L}_{\hbox{min} }\) and \(\overline{L}_{\hbox{max} }\) denote the means for the groups of minimal and maximal lines, respectively, and \(\bar{L}\) denotes the mean for all lines.

Simulation studies

In the Monte Carlo simulation studies comparing the “phenotypic” (a p and aa p ) and “genotypic” (a g and aa g ) estimates of the additive and additive-by-additive interaction of QTL effects the following variants of assumed parameters were adopted. The number of QTLs affecting the trait was 5 (each with an additive effect of 2) (a = 10). The true value of total epistatic interaction effect was set to 5 (aa = 5) and the total mean value of the trait to 100. A total of 200 homozygous lines and 210 markers were analyzed. Markers were located in ten linkage groups (LG). LG contained 21 markers. Distances between markers were all equal (10 centiMorgans, cM). Distances between markers were used to calculation of recombination fractions as r = 1−d/100, where d denotes distance between markers. The number of QTL–QTL pairs with additive-by-additive epistatic effects affecting the trait was assumed to be 1, 2, 5 or 10. The QTLs were (i) distributed over the whole genome (each QTL was in a different LG), or (ii) located in one LG. QTLs were located in the middle of two markers (5 cM to each of both). Effects of particular pairs of genes were assumed to be: (i) equal for all pairs, or (ii) one QTL–QTL pair effect was much larger than the other (for two pairs: 4 and 1; for five pairs: 2, 1, 1, 0.5 and 0.5; for ten pairs: 1.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4 and 0.4). The error variance was equal to 5, 10 or 15. A total of 5,000 data sets containing the vector of phenotypic observations and vectors of marker genotype observations were generated for each combination of the parameters. For each data set the additive effect estimates \({\hat{a}}_{jp}\) and \({\hat{a}}_{jg}\) as well as the additive-by-additive interaction effect estimates \({\hat{aa}}_{jp}\) and \({\hat{aa}}_{jg}\), j = 1, 2,…, 5000, were calculated by the methods presented above. Additionally, the coefficients of determination \(R^{2}_{j}\) were estimated. Then, mean values of parameter estimates \(\overline{\hat{a}}_{p}\), \(\overline{\hat{a}}_{g}\), \(\overline{\hat{aa}}_{p}\) and \(\overline{\hat{aa}}_{g}\) for each series were calculated, together with the mean squared errors. Mean value of R 2 was calculated. All statistical analyses were conducted with the statistical software package GenStat 15th edition.

Results

Tables 1, 2 and 3 show results of simulation performed to compare the estimates of the additive and epistatic effects obtained by the genotypic and phenotypic methods with situations where error variances were equal to 5, 10 and 15, respectively. The phenotypic estimate of additive effect was less than 10, the true value, only for 10 QTL–QTL pairs. The genetic estimates of additive effects were bigger than 10 for 1, 2 and 5 pairs (except for five QTL–QTL pairs with the error variance equal to 5). The phenotypic estimates as well as the genotypic estimates of additive effects were bigger when the QTL–QTL pairs were located in one linkage group. The differences between the phenotypic estimate and the genotypic estimate of additive effects were always positive and were the smallest when the ten QTL–QTL pairs were assumed. Generally, differences between phenotypic and genotypic estimates were bigger when QTLs were located in many LG.

Table 1 Phenotypic and genotypic estimates of the total additive effect and the total additive-additive interaction effect obtained in the simulation study (error variance equal to 5)
Table 2 Phenotypic and genotypic estimates of the total additive effect and the total additive–additive interaction effect obtained in the simulation study (error variance equal to 10)
Table 3 Phenotypic and genotypic estimates of the total additive effect and the total additive–additive interaction effect obtained in the simulation study (error variance equal to 15)

The phenotypic estimate of additive-by-additive effect was always bigger than 5, the true value (except for ten QTL–QTL pairs with unequal effects in many linkage groups). The genetic estimates of additive-by-additive epistatic effects were bigger than 5 for 1 and 2 pairs as well as for 5 pairs where error variance was equal to 10 and 15 (except for situation when five QTL–QTL pairs with unequal effects were located in the many linkage groups 5). The both (phenotypic and genotypic) estimates of additive-by-additive epistatic effects were bigger when the QTL–QTL pairs were located in one linkage group as well as when the QTL–QTL interaction effects were equal. The differences between the phenotypic estimate and the genotypic estimate of additive-by-additive effects were always positive (except only one situation: for two QTL–QTL pairs with unequal effects located in the many linkage groups) and were the biggest when ten QTL–QTL pairs were assumed.

In general, a decrease of the estimates was accompanied by an increase of their mean squared error (Tables 1, 2, 3), because, larger estimates of additive effect as well as the additive-by-additive effect are more biased than smaller estimates. Variance explained by the QTLs with additive and epistatic effects ranged from 81 to 97 % (Tables 1, 2, 3).

Discussion

There are several different strategies to map quantitative trait loci (Kearsey and Farquhar 1998), e.g., single-marker locus analysis (Liu 1998); simple interval mapping (Lander and Botstein 1989); composite interval mapping (Zeng 1993, 1994; Krajewski et al. 2012); marker regression (Kearsey and Hyne 1994; Wu and Li 1994; Bocianowski and Krajewski 2009); Bayesian methods (Sillanpää and Arjas 1998); and multiple interval mapping (Kao et al. 1999; Zeng et al. 1999). The latter methods have been shown to yield better power of QTL detection than interval mapping and single-marker locus analysis (Liu 1998; Piepho 2000). In this article we have demonstrated how to use mixed model for analysis of the main effects and epistatic effects of the QTLs. This study illustrates the ability of the analysis to assess the genetic components underlying the quantitative traits, and demonstrates the relative importance of the various components as the genetic basis of yield traits in aimed population. Understanding the genetic architecture of complex traits is a major challenge in the post-genomic era, especially for QTL-by-QTL interactions (Yang et al. 2007).

In the present study, a full-QTL model is proposed for modeling the genetic architecture of complex trait, which integrates the effects of multiple QTLs and epistasis into one mapping system. The most-important results obtained from the simulation study show some stability of the properties of both methods of estimation over different types of genetic material. The lack of influence of error variance on estimation of additive as well as additive-by-additive gene action effects by both methods and on conclusions concerning the comparison of proposed methods of estimation, shows good prospect for application of our conclusions for different plant species. Moreover, the lack of influence of the number of linkage groups with QTL shows a possibility of using those methods for different genetic maps. In opposite, the number of QTL–QTL interaction effects influence on additive-by-additive gene action effects estimated by both methods and on conclusions concerning the comparison of proposed methods of estimation. Method presented in this paper may be preferred method of estimation of major and interacted QTLs for quantitative traits in bi-parental segregation population, because it provides results closer to the true values of total additive effects and total epistatic effects than previous methods based on fixed model (Bocianowski 2012b, c). The coefficients of determination of the proposed model are larger than obtained by using other methods: multiple interval mapping (Kao et al. 1999), penalized maximum likelihood (Zhang and Xu 2005).

Development of mixed linear model approaches and its application in quantitative genetics will create enormous challenges for quantitative geneticists in dealing with complicated genetic problems (Xu and Yi 2000). Applications of mixed models to association mapping and other genetic analyses in maize, wheat, Arabidopsis, and potato panels demonstrate that mixed models obtain fewer false positives and higher power than previous methods including genomic control, structured association, and principal component analysis (Yu et al. 2005a; Boer et al. 2007; Malosetti et al. 2007; Zhao et al. 2007; Zhang et al. 2008). Different mixed models have been proposed to map QTLs in complex pedigrees. Crepieux et al. (2004) proposed an identity by descent QTL mapping method using plant breeding data for self-pollinated crops. Crepieux et al. (2005) used this method to identify one QTL for kernel hardness and two QTLs for dough strength from data available in a wheat breeding program. The random model approach estimates a variance component associated with the QTL and identifies the marker interval that most likely contains the QTL. This approach allows a better evaluation of the overall breeding value of an inbred and the identification of genomic regions associated with the trait (Arbelbide and Bernardo 2006).

A direct implication of epistasis, especially the involvement of QTLs in the epistatic interactions, is that the effects of the single-locus QTLs are mostly dependent on the genotypes of other loci, and, as can be seen from this analysis, the effect of a QTL can sometimes be negated by the genotypes of a second locus. Thus an attempt for utilization of the QTLs in the breeding programs has to taken into account for such epistatic effects. Epistatic effects have been considered to be important for complex traits by several researchers (Ma et al. 2005, 2007; Rebetzke et al. 2007; Krajewski et al. 2012). Determining the contribution of epistasis is important for understanding the genetic basis of complex traits. Hence, genetic models for QTL mapping assuming no epistasis can lead to a biased estimation of QTL parameters (Bocianowski 2013). A large number of epistatic effects have recently been detected in rice (Oryza sativa L.) using polymorphic markers in the whole genome (Hua et al. 2002; Mei et al. 2003, 2005). Epistatic effects have been found to be important in the expression of dough rheological properties in a wheat DH population (Ma et al. 2005).

Alternative for a mixed-model approach are: (1) Bayesian approach (Meuwissen et al. 2001; Xu 2003; Ter Braak et al. 2005), (2) penalized regression (Boer et al. 2002; Zhang and Xu 2005) and (3) the use of regularization paths (e.g., Hastie et al. 2001). However, results obtained on the basis of methods presented in this paper we show unbiased prediction of estimated parameters. We detected QTLs with additive effects and epistatic effects for quantitative trait using a homozygous lines population. The results showed that both additive effects and epistatic effects were important genetic bases of quantitative trait. The total QTLs with additive effects and epistatic effects explained more than 80 % of the phenotypic variation. The information obtained in this study will be useful for manipulating the QTLs for plant breeding by marker assisted selection.