1 Introduction

Nitrogen has the most significant effect on photosynthesis, growth and development, yield and quality formation, and is also a mineral element with high crop demand and application [1]. When the crop is lack of nitrogen, it will seriously affect the yield and quality of the crop, otherwise, it will cause certain pollution to the environment. Therefore, assessment of crop nitrogen content rapidly, nondestructively and accurately is important for monitoring crop growth, improving the nitrogen use efficiency and developing precision agriculture [2].

Winter wheat is one of the main grain crops in China, and its nitrogen nutrition assessment is beneficial to growth diagnosis and field technical management. The traditional method of crop nitrogen diagnosis was mainly through laboratory chemical analysis, which usually required destructive sampling, resulting in poor timeliness and strong subjectivity. Hyperspectral remote sensing technology has gained extensive attention in the field of crop nitrogen nutrition diagnosis because of its large amount of information, high spectral resolution and continuous wave band [3]. At present, many scholars have done a lot of researches on crop nitrogen content estimation. Nguyen et al. [4] have used partial least squares regression method to estimate the nitrogen content of rice, and concluded that it was feasible to estimate crop nitrogen content based on spectral reflectance. After that, the estimation accuracy of crop nitrogen content was improved by screening sensitive bands and constructing vegetation index [1, 5, 6]. Other researchers have used artificial neural network method to retrieve the nitrogen content of crop [7]. Although predecessors have done a great deal of work and achieved fruitful results in monitoring crop nitrogen content, different inversion methods had their own characteristics. Linear regression model has the advantages of simple, easy to construct and visualization, but it also has some shortcomings, such as too many parameters in the process of fitting, and high dimensional data can not get the optimal solution. Machine learning algorithm, as a new learning method developed on the basis of statistical learning theory, is a very powerful tool for data analysis and mining, and it can solve the defects of linear regression model very well.

In view of this, this study selects a variety of spectral indices, and establishes three LNC estimation models based on linear regression, multivariate stepwise regression and random forest algorithm. To further explore an accurate and robust model for remote sensing of winter wheat LNC, this study will elucidate the predictive ability and relative advantage of three inversion models from the aspects of predictive accuracy, stability, applicability with independent data sets.

2 Materials and Methods

2.1 Experimental Design

The winter wheat field experiments were carried out in 2014/2015 and 2015/2016 at the National Experimental Station for Precision Agriculture (116.44°E, 40.18°N) in northeast Beijing, China. This site is characterized by a semi-humid continental monsoon climate, with mean annual temperature and average elevation of 13 °C and 36 m, respectively.

A winter wheat field experiment was carried out in 2014/2015 under different variety, nitrogen levels and water treatments. The winter wheat varieties were Jing 9843 and Zhongmai 175. Nitrogen levels were 0 kg/hm2, 195 kg/hm2, 390 kg/hm2, 585 kg/hm2, and the nitrogen fertilizer was urea. Water treatments were rain feed, normal irrigation amount (675 m3/hm2), 1.5 times the normal amount of irrigation. A total of 16 experimental treatments were designed and each treatment was repeated 3 times, therefore, there were 48 experimental plots, and the area of each plot was 48 m2. Other field management shall be carried out according to local normal level.

A winter wheat field experiment was conducted in 2015/2016 and designed 3 treatments (i.e., different variety, different test areas and different nitrogen levels). The winter wheat varieties were Lunxuan 167 and Jingdong 18. The two test areas were north and south, in which north area nitrogen treatments were 39 kg/hm2, 195 kg/hm2, 390 kg/hm2, 585 kg/hm2; southern nitrogen treatments were 39 kg/hm2, 390 kg/hm2 (base fertilizer), 195 kg/hm2 (base fertilizer) + 195 kg/hm2 (after manuring), 195 kg/hm2 (base fertilizer) + 390 kg/hm2 (after manuring), and the nitrogen fertilizer was urea. A total of 16 experimental treatments were designed and each treatment was repeated 3 times, therefore, there were 48 experimental plots, and the area of each plot was 135 m2. Other field management shall be carried out according to local normal level.

2.2 Data Collection

In this study, field data of canopy reflectance and LNC of winter wheat of three critical growth stages (i.e., flag leaf period, flowering period and filling period) were collected for different treatments in 2015 and 2016, respectively.

2.2.1 Canopy Spectrometry Collection

The winter wheat canopy reflectance was obtained by the ASD FieldSpec FR2500 spectrometer. The wavelength range of American ASD FieldSpec FR2500 spectrometer is from 350 nm to 2500 nm, and the spectral re-sampling interval is 1 nm. The weather was clear during the measure and the time of Beijing was 10:00 to 14:00. During observation, the probe was always vertical downward, about 1.0 m from the ground, with 25° field angle. In each experimental plot, we collected 10 records and took the average reflectance as the canopy spectral reflectance of the plot, and the standard white plate correction was carried out immediately before and after each measurement.

2.2.2 LNC Determination

In each experimental plot, 20 representative winter wheat plants were selected as sample, and loaded them into sealed bags immediately and brought back to the laboratory. First of all, the separation of stems and leaves, leaves after the separation were purified at 105 °C about 30 min. Then, then dried to the constant weight under the condition of 75 °C. After grinding, the leaf nitrogen content of winter wheat was determined by the micro-Kjeldahl method.

2.3 Spectral Index Selection

Spectral index was a spectral parameter obtained by combining different spectral bands with a certain algebraic form [8], and it could reduce the background interference on canopy spectral characteristics, so the sensitivity was better than single band [9, 10]. According to the results of previous studies, 16 spectral indices related to LNC were selected (Table 1).

Table 1. Spectral indices related to LNC in this study.

2.4 Data Analysis

In this study, three LNC inversion models (i.e., LR, MSR and RFR) were constructed with 70% samples using SPSS 19.0 and MATLAB software and validated with the remaining 30% in 2014/2015. Then the robustness of three regression models was further tested by using independent data sets of different varieties, different growth stages and different test areas in 2015/2016. Among them, the random forest algorithm is a statistical machine learning algorithm proposed by Breiman [30] in 2001. It uses Bootstrap resampling method to extract multiple training samples from the original samples, and each training sample is grown into a single decision tree. Then many relatively independent decision trees are combined to build a decision forest. Finally, we determine the final forecast results by using voting methods. This study was based on MATLAB program code for random forest regression, in which the number of decision trees was 1000, and the segmentation variable was 3.

2.5 Accuracy Evaluation

Determination coefficient (R2), root mean square error (RMSE) and relative prediction deviation (RPD) were regarded as indicators to evaluate the predictive performance of the models in this study. Generally speaking, the closer the R2 is to 1, the smaller RMSE, and the better predictive performance of the estimated model, otherwise, the predictive performance is poor. For RPD, the model forecast ability is excellent when RPD ≥ 2; it can be used for rough evaluation of samples when 1.4 < RPD < 2; otherwise, the model fails to predict the samples [31].

3 Results and Analysis

3.1 Correlation Between Spectral Index and LNC

As can be seen from Table 2, there was the correlation coefficient between spectral index and LNC. As a whole, the spectral indices used in this study were strongly related to LNC, and |r| was above 0.70, and all of them have reached a very significant level of 0.01. Among them, the correlation coefficient between mNDVI705 and LNC was the highest (r = 0.835), and the correlation minimum was NDVI (r = 0.736). At the same time, this study analyzed the correlation between NDSI, RSI and the LNC of any two bands in the range of 400–1000 nm, as shown in Fig. 1. The result showed that the spectral indices made up of 594 nm, 592 nm and 506 nm bands were more sensitive to LNC. NDSI(R594, R506) and RSI(R592, R506) all had highly significant negative correlation with LNC, with correlation coefficient r of −0.907 and −0.911, respectively.

Table 2. Correlation coefficients between spectral index and LNC.
Fig. 1.
figure 1

The R2 between spectral indices and LNC

According to statistics, it can be regarded as highly relevant where correlation coefficient |r| ≥ 0.8; when correlation coefficient 0.5 ≤ |r| < 0.8, it can be regarded as a moderate correlation; otherwise, it can be regarded as a low correlation [32]. As can be seen from Table 2, there were 8 spectral indices that were highly related to LNC, namely NDVI705, mSR705, mNDVI705, VOG1, CIrededge, GNDVI, NDSI(R594, R506), RSI(R592, R506), and there were 8 spectral indices that were moderately related to LNC, namely NDVI, SR705, MSR, GMI1, GMI2, RI-half, RSI(D740, D522), RSI(R815, R704).

3.2 Selective Preference of Spectral Index

On the basis of the study above, 8 highly correlated spectral indices were selected for further analysis according to the correlation between spectral indices and LNC. As shown in Table 3, the determination coefficients of the inversion models of winter wheat LNC based on 8 spectral indices were examined by extremely significant, and the coefficient of determination of RSI (R592, R506) was the largest, and the R2 was 0.831. In this study, the R2 and F were used as the criteria for selecting better spectral indices. The closer the R2 is to 1, the higher the accuracy of the model, and the greater the F value, the more significant the regression relationship is. Therefore, 4 spectral indices were selected as the better spectral indices, and the 4 spectral indices were mNDVI705, mSR705, NDSI(R594, R506), RSI(R592, R506).

Table 3. Correlation analysis between spectral index and LNC.

3.3 Construction and Verification of LNC Estimation Model for Winter Wheat

This study constructed the LNC estimation model according to the three methods. The three methods were as follows: (1) With LNC as dependent variable, RSI(R592, R506), the highest spectral index of R2 and F, was chosen as independent variable, and a linear regression model (LR) was constructed. (2) 4 better spectral indices(i.e., mNDVI705, mSR705, NDSI(R594, R506), RSI(R592, R506)) were selected as independent variables, and the LNC remote sensing estimation model (MSR) was constructed by multiple stepwise regression method. (3) 4 better spectral indices (i.e., mNDVI705, mSR705, NDSI(R594, R506), RSI(R592, R506)) were selected as independent variables, and the LNC remote sensing estimation model (RFR) was constructed by the random forest algorithm.

The LNC estimation models that were constructed based on three modeling methods using 2014/2015 data were shown in Table 4. The estimation accuracy of three models was more than 0.8, and the RMSE was between 0.276 and 0.288. These results showed that the three models can be used for rapid, nondestructive and accurate monitoring of LNC of winter wheat, and among them, the effect of RFR model was the best. But in comparison, the estimation accuracy of multivariate stepwise regression (MSR) model was slightly better than that of linear regression (LR) model, and the estimation accuracy of the random forest regression (RFR) model was the highest, with R2 and RMSE of 0.962 and 0.276 respectively. It was possible that the information contained in a single spectral index had different degrees of saturation, and the error of crop nitrogen content estimation was greater. The multiple regression model could input more band information related to LNC, which could not only improve the estimation accuracy of the model, but also improved the stability. Random forest algorithm is a multivariate regression method based on statistics, and it has the advantages of strong noise tolerance, high efficiency in dealing with large data sets, and difficult to overfitting, so it is suitable for solving LNC inverse problem.

Table 4. Comparison of LNC estimation models constructed by three methods.

In order to compare the predictive ability of three estimation models, each model was used to predict 48 samples of the independent data-sets, and the results of three models were shown in Table 4 and Fig. 2. The result showed that the verification accuracy of three models was more than 0.8, and the RPD of the three models was greater than 2, indicating that the three models can predict the LNC better. However, the RFR had some advantages over the other two models in LNC estimation of winter wheat and the value of validation set of R2 and RMSE were 0.898 and 0.401. The reason may be that the RFR model used Out of Bag data (OOB) to establish error unbiased estimation during the data calculation. In summary, the difference of estimation performance of three regression models was little, but the random forest regression model (RFR) performed slightly better. The robustness of three regression models was further tested by using independent data sets of different varieties, stages and test areas in 2015/2016, so as to select the LNC estimation model of winter wheat with high precision, good stability and strong applicability.

Fig. 2.
figure 2

Relationship between measured value and predicted value based on 3 methods

3.4 Robustness Test of LNC Estimation Model for Winter Wheat

For inversion results of winter wheat LNC about different varieties (Table 5), the estimation accuracy of three estimation models was that the loose variety Lunxuan 167 lower than compact variety Jingdong 18, while the inversion accuracy of RFR model for loose type and compact type were the highest. As can be seen from Table 6, the inversion results of the three models in different growth stages were different, but the RFR model had the best inversion accuracy relative to LR and MSR at three growth stages. The LNC inversion results (Table 7) of winter wheat from different experimental areas showed that the accuracy of three estimation models to the north was higher than the south area, and the accuracy of RFR model in two test areas was higher than that of the other two models. Based on the above analysis, RFR model was not only robust to the LNC estimation of winter wheat under different conditions in different years, but also the estimation accuracy was higher than that of LR model and MSR model. It showed that the RFR model constructed in this study had high precision and good applicability, and it was a preferred model for estimating winter wheat LNC.

Table 5. LNC estimation results of winter wheat for different cultivars.
Table 6. LNC estimation results of winter wheat for different growth stages.
Table 7. LNC estimation results of winter wheat for different test areas.

4 Conclusions

In this study, correlation analysis between spectral index and LNC was carried out based on canopy spectral data and LNC of winter wheat under different years (i.e., 2014/2015, 2015/2016), different treatments (i.e., varieties, fertilizer and water supply), different growth stages (i.e., jointing stage, flowering stage and filling stage). Then, the LNC estimation models of winter wheat were constructed by three methods of linear regression, stepwise regression and random forest regression, and the accuracy and stability of the LNC estimation model were verified. The results were as follows:

  1. (1)

    The correlation between the four spectral indices (i.e., NDSI(R594, R506), RSI(R592, R506), mSR705, mNDVI705) and LNC of winter wheat was preferable, and RSI(R592, R506) had the highest correlation with LNC(r = −0.911). The accuracy of three estimation models was above 0.8, indicating that the three estimation models can be used for rapid, nondestructive and accurate LNC monitoring of winter wheat. Among them, the RFR model had the best effect. In the modeling set, the value of R2 and RMSE respectively were 0.962 and 0.276, and in the verification set, the value of R2 and RMSE respectively were 0.898 and 0.401.

  2. (2)

    The applicability of the three models was further tested by using 2015/2016 data. It was found that the estimation accuracy of the RFR model was higher than the other two estimation models for each set of experimental samples, moreover, the LNC estimation of winter wheat was robust under different conditions in different years, so it can be regarded as the preferred model for the estimation of winter wheat LNC.