Keywords

1 Introduction

Breeding output is a small probability event, only through large-scale investment can we convert the small probability event into an inevitable event, making the originally accident breeding output become predictable and designable. Due to the lack of proper methods and tools support, domestic breeding team cannot manage and control the large-scale breeding materials and data, and it is difficult to accumulate the breeding rules of selections and combination, which leads to it popular that combination with experience and there are a lot of uncertainty with the maize breeding output in China [1]. Molecular breeding is considered an important way to improve the breeding efficiency. But due to the deletion of data, method and instrument, molecular design breeding is basically at the concept stage, without operational technology process and breeding practice. Based on this, this paper explored a set of design breeding method which included from the integrated management of maize varieties’ molecular marker data and phenotypic data of field testing to maize inbred lines comprehensive evaluation, auxiliary screening. Besides, the design breeding method was also based on the heterosis analysis of inbred lines with molecular markers and phenotypic differences and the simulation distribution of maize hybrids and performance prediction. The paper also developed a software system of hybrid combination to support the technology route, providing methods and tools support to improve China’s maize breeding scale and information management level.

2 Materials and Methods

2.1 Research Technical Route

In this paper, it calculated the parent value, yield and disease resistance characteristics, superiority and defect index for each variety to evaluate the screening inbred lines and calculate the phenotypic distance after making the inbred lines’ phenotypic testing data be standardized; Similarly, it calculated the effect locis, heterozygosity index for each variety to evaluate the screening inbred lines and calculated the genetic distance based on the inbred lines’ SSR molecular marker data [37]; combined with phenotypic distance and genetic distance to establish the indices of selected parents’ heterosis rate; combined the seed performance indices of selected parents such as flowering and height difference coordination to establish the indices of double cross and reciprocal value; finally, composited the comprehensive value of the parents, the hybrid advantage, reciprocal value, the yield and resistance characteristics to establish hybridization group model and predict the comprehensive value, superiority, defect degree, the yield and resistance characteristics of the hybrid combinations, then screened the hybrid combination meeting the demands. (The technical route was shown in Fig. 1).

Fig. 1.
figure 1

The technology roadmap of maize hybrid breeding group based on inbred lines SSR and field test data

2.2 The Experimental Data

In this paper, the inbred lines’ field identification data and DNA fingerprint data were obtained from the Beijing Kings Nower Seed S&T CO., LTD. Then it chose the data of 2009 in JunXian, Gongzhuling, Tieling, Dandong, Shunyi, Zhumadian six pilots totally of 482 inbred lines of field identification materials, putting them into the database (Table 1), of which 179 copies of materials were analyzed by SSR molecular marker detection (Table 2).

Table 1. The field identification standard data of the 482 inbred lines (the recorded indices of the inbred lines)
Table 2. The primers of SSR molecular detection to the 179 inbred lines

SSR molecular markers are mainly used for identification of new germplasm and inbred lines heterotic group in maize breeding [2, 9, 10]. Molecular marker is a direct reflection of genetic polymorphism on DNA level, the earliest molecular markers developed were RFLP and then RAPD, AFLP, ISSR, SSR, which were based on PCR, SNP were developed in recent years based on single base mutation. Simple sequence repeat (SSR) markers have been widely used because of its advantages, such as simple, fast, high repeatability, high polymorphism and codominant markers. Xue et al. research showed that the differences between SSR and SNP two kind of markers in data integrity, ability to distinguish among species and site stability was samall. A in another study it showed that both RAPD and SSR two kind of markers were suitable for the study of genetic diversity on maize germplasm, but SSR was more desirable.

Detection technology of SSR molecules is an acceptable cost of gene sequencing technology [11, 13, 15, 17] for domestic seed companies; therefore, Kings Nower Seed S&T CO., LTD. began to establish detection department of molecular based on SSR from 2008, which provided data support to this study. In this paper, the SSR molecular marker data and the phenotypic test data were applied to the estimation of genetic differences between the hybrid parents, and inspected the comprehensive development value of hybrid combination from these aspects such as genetic differences, combining ability, major defects, the advantage performances, the results were applied to hybrid combinations.

2.3 Standardization and Evaluation of Inbred Lines Field Identification Data

The standardization of inbred lines field identification data refers to the transform process of field data to relative value using the pilot average as the reference frame. The inbred lines usually use the absolute value or relative value of data to describe in different environment. The absolute value of data generally are obtained through direct observation to the indices, the dimension used is the independent reference system which has nothing to do with the specific observations. If the recorded data do not mark the data unit, the absolute value of data will be meaningless. The relative value of data take the data recorded in the general order or associated with the overall statistics as reference. Since rich in information and easy to undersatand, the relative value is the main evaluating method in the inbred lines evaluation. Zhe [8] proposed 5 kinds of relative value transform methods of variety test data such as the 3 s and 2 s conversion value taking group mean as the reference value, standard of value, the class NDVI value taking the sample mean as the reference value and so on, and used the “standard deviation ratio” to analysis the transform effect of the 5 methods, the analysis results indicated that the 3 s and 2 s conversion value taking the group mean as reference were more close to the genetic characteristics of the varieties. This paper adopted the method to transform the inbred lines identification data of which the type was numerical to relative value.

When the inbred lines identification data were normalized, it calculated the production, seeding, earning, economic value, disease resistance, insect resistance, lodging resistance, growth period and plant panicle characteristics and the comprehensive value, and used them to calculate the phenotypic distance of inbred lines selected.

The comprehensive value of inbred lines is a comprehensive study to the inbred lines field observation value, which is calculated by the following formula:

$$ HD = \frac{{\sum {v_{k} *w_{k} } }}{{\sum {w_{k} } }} $$
(1)

In the formula (1), \( v_{k} \) is the standard value of the kth field observation index value, \( w_{k} \) is the weight of the kth field observation index value.

To evaluate the advantages and disadvantages of the inbred lines and hybrid combinations, this paper designed the defect degree and excellence degree two indices on the basis of the existing observed phenotypic traits.

Defect degree and excellent degree are the statistics of inbred lines (or hybrid combinations) whose field observation indices perform “very good”, “good”, “bad”, “poor”. The statistics are the reached degree of the field identification index, followed by “very good”, “good”, “bad”, “poor” to calculate the weighted average income in a ratio 3:2:2:3 of attached weights. The “very good” standard means that the value of the hybrid combination in a certain index is more than the average value of all combinations in the index pulsing 2times the standard deviation, then we regard the combination performing “very good” in the index. While if the hybrid combination in a certain index is below the average value of all combinations in the index subtracting 2times the standard deviation, it will be regarded performing “poor” in the index. Among them, the standard deviation multiple of which “very good”, “good”, “bad”, “poor” can be set by software. Then the inbred lines were screened by using the difference, defect, inbred lines output resistance characteristics and comprehensive value.

2.4 Treatment and Evaluation of Inbred SSR Data

The effective sites, hybrid bands and heterozygous ratio for each inbred line can be calculated based on the inbred lines SSR molecular marker data. The number of the effective sites refers to the n loci of the selected SSR data, of which having the value is effective site; Generally the measured SSR data of inbred lines are homozygous, which means each site should be the same value, but actually it may not be homozygous; Heterozygous brand refers to how many different values in the n sites, and it will be the number of the heterozygous brand; Heterozygous ratio refers to the number of the heterozygous brand/n. These three indicators are used to judge the pure degree of the inbred lines, screen the inbred lines which have higher pure degree, more effective sites into hybrid combination, and then calculate the genetic distance of the inbred lines selected.

2.5 Calculation of Parent Heterosis Rate

The additive gene effect is the main factor of genetic effects of quantitative characters in maize, the molecular marker data and phenotypic value of the field test data are approximate value of the gene additive effect, both of them contain some information of additive effects, at the same time the information may be complement to each. If combine the information of these two parts to predict the comprehensive heterosis of the single cross hybrid F1 generation, it is expected to get higher choosing efficiency. Specifically to say, to show the phenotypic differences of hybrid parents we use phenotypic data to calculate the phenotypic distances, to show the genotype differences we use the DNA fingerprint data to calculate the molecular marker genetic distances, both of them do not completely overlap, which describe the genetic differences of parents from different angles. This paper used a parent heterosis (MD) to generalize the genetic differences of the parents, which was written as linear equations of the phenotypic genetic distance and SSR molecular marker genetic distance:

$$ MD = r1*PD + r2*GD $$
(2)

Among them, \( r1 + r2 = 1 \), PD refers to the phenotypic difference of the parents, GD refers to the genetic difference of the parents, R1 and R2 refers to the weight of phenotypic distance and genetic distance.

2.5.1 Algorithm of Molecular Marker Genetic Distance

Genetic distance is the index that measures the size of some characteristics of the comprehensive genetic differences among varieties. More than one traits are required in the breeding target, in order to be able to more fully reflect the genetic differences between the parental varieties, multiple traits need to be comprehensively considered, thus will extend the concept of genetic distance. Calculate the multidimensional geometric distance consisted of multiple traits according to the method of multivariate statistics analysis, called genetic distance. There are mainly in two ways to calculate the DNA fingerprint genetic distance:

$$ GD = 1 - \frac{\text{m}}{m + n} $$
(3)

M denotes the number of same bands between the two varieties, n donates the number of different bands between the two varieties.

$$ GD = 1 - \frac{{2N_{ij} }}{{N_{i} + N_{j} }} $$
(4)

\( N_{ij} \) denotes the number of same bands between the two varieties (lines), \( N_{i} \) and \( N_{j} \) respectively denotes the band number of their own in i and j varieties (lines). The differences between two kinds of molecular markers in genetic distance calculation are as flowing:

  1. (1)

    When there are invalid detection sites at least in one of the both detection sides, the calculated results of formula (4) are more conservative than formula (3), which means the method 2 tends to judge the two sides involved in the calculation have a genetic difference. For example, for the DNA fingerprint data shown in Table 2, the invalid sites do not participate in calculation, the GD value is 0 according to formula (3), meaning that the genetic basis of inbred line “A01” and “B01” are very close, almost the same; the GD value is 0. 67 according to formula (4), apparently it is more conservative using formula (4) to calculate.

  2. (2)

    Formula (4) is suitable for the parents’ fingerprint data that are not using the same primer combinations to detect. If the fingerprint data are obtained using different primer combinations to detect, you cannot determine the different bands of varieties (lines) to be detect (i.e. n in formula (3)).

Considering the two points above, this paper used formula (4) to calculate the molecular marker genetic distance.

2.5.2 Calculation of the Phenotypic Distance

It makes principal component analysis to quantitative traits in the conventional phenotypic genetic distance calculation, after which uses the Euclidean distance to calculate the comprehensive index. The essence of the principal component analysis is based on the multivariate statistical method, extracting a few comprehensive indices as the distance analysis dimensions, and uses the index contribution rate to distribute weights of the comprehensive indices. Its purpose is to reduce the dimension so as to simplify the problem, and the weight distribution is just the result of the calculation. The disadvantages are:

  1. (1)

    It will spend a lot of time to calculate multiple inbred lines, the more the data of inbred lines, the bigger the calculation scale, which is not suitable for real-time calculation in software.

  2. (2)

    It only considers the genetic differences of quantitative traits, ignoring the genetic differences of varieties (lines) in quality traits.

  3. (3)

    There is no clear meaning of the comprehensive index extracted by principle component analysis.

In order to calculate the phenotypic genetic distance of both quantitative traits and quality traits, and be easy to soft program, combining with the data standardization processing described in this article, we adopted the following calculation:

$$ PD_{ij}^{2} = \frac{{\sum {\left( {V_{ik} - V_{jk} } \right)^{2} *w_{k} } }}{{n*\sum {w_{k} } }} $$
(5)

Among them, \( PD_{ij} \) is the phenotypic distance of inbred lines i and j, \( V_{ik} \) and \( V_{jk} \) respectively denote the standard value of inbred lines i and j in index k, \( w_{k} \) is the weight of kth index, n is the effective index number. In this formula, the quality traits related to breeding after numerical processing are also used to express the phenotypic differences among inbred lines. Because the index weight can be configured flexibly, the data do not need to go through the orthogonal transformation and calculate the principal component values, thus the cost of the system decreases, and the calculation goes fast.

2.6 Calculation of Reciprocal Value

Reciprocal value mainly considers the special traits’ combining degree of the parents, the parents in these characters are not the closer the better but with certain differences can play a greater role for hybrid seed production or maximizing the heterosis. These special traits are such as plant height-ear position difference of the parents, Anthesis-silking interval (ASI). This paper calculated the orthogonal value and reciprocal value respectively, choosing the larger of them as the exchange value of crossbreds. Orthogonal value (reciprocal value) is calculated as flowing:

$$ V_{ij} = HD + r_{1} *(t_{i} - t_{j} ) + r_{2} *(h_{i} - h_{j} ) $$
(6)

Among them, HD is the comprehensive value of the inbred lines, \( V_{ij} \) is the orthogonal of the parents both i and j, \( t_{i} \) is the ith parent’s standard value at the anthesis stage, \( t_{j} \) is the jth parent’s standard value at the silking stage, \( h_{i} \) is the plant height of the ith parent,\( h_{j} \) is the plant height of the jth parent, \( h_{j} \) is the ear position height of the jth parent, \( r_{1} \) and \( r_{2} \) respectively denote the important weight of spinning-powder interval and plant height-ear position difference. The calculation of reciprocal value is as flowing:

$$ SV = Max(V_{ij} ,V_{ji} ) $$
(7)

2.7 Hybrid Group Model and Evaluation and Screening of the Combination

Hybrid combination value (HV) is a comprehensive index which is used to describe the development value of the hybrid combination, this paper used the dear in heterosis rate(MD), the average of the parents’ comprehensive value (\( \bar{H}\bar{D} \)) and the reciprocal value(SV) to denote the value of hybrid combination. The average of the parents’ comprehensive value (\( \bar{H}\bar{D} \)) is calculated by (the male comprehensive value + the female comprehensive value)/2, which is HD. So the hybrid combination value is calculated as following, where r is the weight.

$$ HV = MD*HD + r * SV $$
(8)

3 Realization of the System and Case Analysis

3.1 The Screening Results of the SSR Fingerprint Information

Screening can be carried out only after put the SSR fingerprint into database and complete the eigenvalue calculation: the example in this paper screened according to the heterozygous ratio >=0.1 or the number of the effective site <=18 or the number of mixed band >=2. According to the screening conditions 94 inbred lines were removed (The screening condition interface is shown in (Fig. 2):

Fig. 2.
figure 2

The SSR fingerprint screening conditions of the inbred lines

3.2 The Screening Results of the Parental Phenotypic Traits

The screening of the parental phenotypic traits consists of three steps:

  1. (1)

    According to the indices of defect degree and excellent degree in the “very good”, “good”, “bad”, “poor” 4 categories of computing standard, it calculated the category each inbred lines belonged to for each of the phenotypic traits, and then calculated the defect degree and excellent degree of each inbred lines, and sorted them, then screened the inbred lines before 50 % to enter the next step.

  2. (2)

    This paper aimed to prepare the hybrid combination resistance to Ralstonia solanacearum, this step according to the performance of each inbred line’ resistance to Ralstonia solanacearum, we screened the inbred lines which were “very good” to enter the next step.

  3. (3)

    According to the formula 1 which calculated the comprehensive evaluation index, we got the comprehensive value of each inbred line and sorted them, then screened the top 50 % to enter the final hybrid group.

According to the SSR molecular marker data and phenotypic testing data, 78 inbred lines were got for combination after a quality of indices screening. Then these 78 inbred lines would be combined each other.

3.3 The Results of Hybridization Group and Combination Screening

78 inbred lines were got according to the inbred lines SSR data and phenotypic data screening. For each hybrid combination in this paper, it calculated the heterosis rates using formula 2, and calculated the orthogonal value or reciprocal value using formula 6, and calculated the comprehensive value using formula 8, then sorted the hybrid combinations by the comprehensive value (the sorting interface was shown in Fig. 3 below).

Fig. 3.
figure 3

The interface of hybrid combination order

Due to the field identification data of the inbred lines only contained 2009 one year’s data in the case, the number of test points was a little less and its reliability was less higher than DNA fingerprint data, so we gave the molecular marker genetic distance a bigger weight. On the reciprocal cross (row-column, column-row) value of the parameter setting, this paper thought that the elevation difference was more important than spin-powder interval, thus gave the row-column flowering, column-row height difference respectively the weight of 0. 05 and 0. 1. According to the hybrid combinations screening conditions, finally we got 37 hybrid combinations. (partial results were shown in Table 3).

Table 3. The results of hybrid combination

4 Discussions

In this paper, a kind of hybrid combination technology and method on maize was proposed, in which the main idea was:

Based on the SSR molecular detection results and field test results of current or previous years, first, the paper screened the inbred lines with high homozygote through SSR detection results, then it screened the inbred lines with good comprehensive traits as the parents of the combination; then it calculated the genetic differences of both of the parents each other according to the SSR detection results, similarly, it calculated the phenotypic differences of both of the parents each other according to field testing results, thus we could calculate the heterosis rate of the parents to express their special combining ability; in the last according to the comprehensive characters, special combining ability, orthogonal anti value, the paper constructed the hybridization group model, and calculated the comprehensive characters, advantages, disadvantages of the hybridizations which the parents combined with each other and screened hybrid combinations to the next round field breeding.

With the combination technology methods applied, this paper simulated combined 37 hybridizations resistance to Ralstonia solanacearum, but the distribution group only based on one year’s performance data of the parents. Whether the combination had good anti-bacterial blight resistance and comprehensive performance, it also needed hybrid combinations to prove with multiple years and points testing.

In this paper, the software system was developed suitable to the combination technology method, which supported the management of a quality of SSR molecular marker data, field testing data together with the calculation of massive indices, formulas and models, but the efficiency still needed to be improved.

It was also a molecular design breeding practice carried by the Beijing Kings Nower Seed S&T CO., LTD. There were some key technological differences here compared with traditional group technological process:

First, it did not consider the selection process of inbred lines, but directly determined the parents’ selection, simulated distribution group through the molecular and phenotypic characters of the inbred lines. Second, in this paper it did not use the genetic background of inbred lines, genealogical relationship and other information, especially when calculating hybrid heterosis rate it was only based on genetic distance and phenotypic distance without analyzing group relationship of each inbred line. Thus, it was likely to emerge the phenomenon that the parents had a high heterosis rate but belonged to the same group, which did not match the actual combining ability. Third, the phenotypic information of the inbred lines were obtained through multiple years and points field identification, but at present the domestic breeding institutions have not regarded the inbred lines multi-point identification as a conventional breeding work, mainly inferred the possible performance of the parents through measuring the combination process. Fourth, there was a good effect on classification using inbred lines SSR data [12, 14, 16], but because this method did not do division group to inbred lines, so it was difficult to increase measurement-matching through selecting test species, which may have an effect on the simulated group. It was the focus of the method in next step.

5 Conclusions

In this paper, it made some new attempts which were as follows:

Firstly, in the method, it first attempted to calculate using SSR fingerprint data together with phenotypic data, in this way more information about the inbred lines were considered to get good maize hybrids.

Secondly, it attempted to get the phenotypic information of the inbred lines through multiple years and points field identification, but at present the domestic breeding institutions mainly inferred the possible performance of the parents through measuring the combination process.

Thirdly, the paper explored a set of design breeding method which included from the integrated management of maize varieties’ molecular marker data and phenotypic data of field testing to maize inbred lines comprehensive evaluation, auxiliary screening. Besides, the design breeding method was also based on the heterosis analysis of inbred lines with molecular markers and phenotypic differences and the simulation distribution of maize hybrids and performance prediction. It also developed a software system of hybrid combination to support the technology route, providing methods and tools support to improve China’s maize breeding scale and information management level. Applying the software, 37 hybrid combinations resistance to Ralstonia solanacearum were got with 179 inbred lines’ molecular and phenotypic data. The method and software preliminary provides technological support for our country to carry out and perfect molecular design breeding.