1 Introduction

As a basic meteorological variable, precipitation is of great importance to a variety of fields such as water resources management, agriculture, and ecological environment assessment (Wei et al. 2005; Kisi and Cimen 2011; Shi et al. 2016; Liu and Chui 2018). In the past several decades, global warming has greatly influenced the water circulation and hydrological processes (Garbrecht et al. 2004; IPCC 2013), and thus, has attracted the attentions of numerous researchers (Groisman et al. 2005; Westra et al. 2013; Cao and Pan 2014). There are difficulties in the accurate prediction of precipitation because of the complexity of physical processes (Chow et al. 1988; Kulligowski and Barros 1998), especially for long-term prediction. As a result, many efforts have been made to develop appropriate methods to predict precipitation, which can be classified into the following types, e.g., dynamical methods (Claußnitzer and Névir 2009; Landman et al. 2014), statistical methods (Barnston and Smith 1996; Lee and Ouarda 2010; Kim et al. 2017; Chardon et al. 2018), soft computing methods (Silverman and Dracup 2000; Partal and Cigizoglu 2009; Ortiz-García et al. 2014), and numerical weather prediction methods (Richardson 2005; Park et al. 2008). Overall, the above-mentioned methods have been widely used and can be efficient in precipitation prediction; however, most of them rely on utilizing other climatic indices (e.g., El Niño-Southern Oscillation, ENSO) and climatic variables (e.g., sea surface temperature, SST). Moreover, precipitation can be influenced by a variety of related meteorological variables, e.g., air pressure, air temperature and wind speed (Singh 1988; Benestad 2013). However, due to the variable weather conditions and complex terrain orography, there is still limited amount of available meteorological data over the regions such as the Three-River Headwaters Region (TRHR), which may lead to great uncertainties in precipitation prediction (Xue et al. 2017). To this end, it would be valuable to develop a new approach to precipitation prediction for the designated regions (e.g., the TRHR), independent of other climatic indices and variables.

The TRHR is a plateau mountainous region in the western China. It is particularly sensitive to climate change, bringing serious disturbances to local ecosystem (Immerzeel et al. 2010; Tong et al. 2014; Shi et al. 2016, 2017; Xi et al. 2018). With reference to precipitation, several studies have reported an increasing trend in the annual precipitation (Liang et al. 2013; Tong et al. 2014; Shi et al. 2016) and precipitation extremes (Cao and Pan 2014) over this region. Moreover, it is worth noting that, for regions with large elevation variations (e.g., the TRHR), precipitation can be characterized by significant spatial variation, and the relationship between precipitation and elevation can be expressed by various functional forms (Naoum and Tsanis 2004; Chu 2012). Since precipitation is one of the basic component of hydrological cycle, to make accurate long-term prediction of precipitation for the TRHR has both scientific and practical significances (Xue et al. 2017). Therefore, for the designated regions such as the TRHR, it is important to develop an appropriate approach to precipitation prediction, especially for long-term prediction at the monthly scale.

To fill these gaps, the objective of this study is to develop a recursive approach to long-term prediction of monthly precipitation using genetic programming (GP), without the utilization of other climatic indices and variables. Compared to previous studies on the topic of precipitation prediction, the significances of this study can be summarized as follows: first, no other climatic indices and variables are introduced to predict precipitation. Second, a statistical method is proposed to compute the preliminary estimations of annual and monthly precipitation, and then GP is adopted as an optimization method to improve the preliminary estimations. Third, the prediction results of the monthly precipitation over the TRHR till 2050 are obtained, which cannot be found in existing literature. Overall, the proposed approach and the obtained results will enhance our understanding and facilitate future studies regarding the long-term prediction of precipitation in such regions. Following this Introduction, this paper has been divided into three further sections as follows: Section 2 introduces the data and methods, Section 3 presents the results and related discussion, and Section 4 draws several conclusions.

2 Data and Methods

2.1 Study Area and Research Data

The TRHR, well-known as the sources of the Yangtze River, the Yellow River and the Lantsang River, is located in the Qinghai Province in the western China (89°45′ - 102°23′ E, 31°39′ - 36°12’ N) (see Fig. 1). With an area of 0.3 million km2, it accounts for 43% of the total area of the Qinghai Province (Cao and Pan 2014). The elevation of this region varies between 2000 m and 6600 m, and the mean value is over 4000 m. This region lies in the temperate zone, mainly dominated by a plateau monsoon climate (Liu and Yin 2002; Duan et al. 2013). Normally, the annual precipitation ranges from 262 mm to 773 mm in this region (Yi et al. 2013), and more than 80% of the annual precipitation occurs in the wet season from May to October (Liang et al. 2013; Shi et al. 2016).

Fig. 1
figure 1

The locations of meteorological stations over the TRHR

There are 37 meteorological stations available inside or around the TRHR. The daily observed data can be downloaded for free on the official website of China Meteorological Administration (2016). As most of these stations were built in the late 1950s, only the data from 1961 are selected in this study to ensure that the lengths of the datasets from different stations are consistent. For the designated station, missing data in a certain year are interpolated using the data of the neighboring stations in the same year; however, stations with severe lack of data for more than twenty years are excluded directly (Shi et al. 2017). After this elimination, there are 29 meteorological stations left with complete daily observations from 1961 to 2014. Among them, 17 stations are located inside the TRHR (the blue points in Fig. 1) and the others are located outside the TRHR (the red points in Fig. 1 and the stations in italic format in Table 1). For each station, the annual and monthly observed precipitation can be obtained from the daily observed data. Moreover, the mean annual and monthly precipitation over the TRHR can be computed using the Thiessen polygon method (Thiessen and Alter 1911; Brassel and Reif 1979). In this study, the observations during 1961–2000 were used for calibration of the proposed approach, while the remaining data during 2001–2014 were used for validation.

Table 1 General information of the meteorological stations over the TRHR

2.2 Statistical Method for Computing the Preliminary Estimations of Precipitation

Both the observed precipitation and the preliminary estimations of precipitation are the essential inputs of the recursive approach. In this study, a statistical method is proposed to compute the preliminary estimations of precipitation, firstly at the annual scale and then at the monthly scale.

2.2.1 Annual Precipitation

Known from our previous studies (Shi et al. 2016, 2017), the annual precipitation over the TRHR has experienced a significant increasing trend during 1961–2014. Thus, it is supposed that, for the designated year, the preliminary estimation of annual precipitation, Pann, est, can be expressed as follows:

$$ {P}_{\mathrm{ann},\mathrm{est}}={P}_{\mathrm{ann},\mathrm{trend}}+{P}_{\mathrm{ann},\operatorname{var}} $$
(1)

where Pann, trend and Pann, var denote the tendency and variation parts of the preliminary estimation of annual precipitation, respectively. The tendency part can be easily obtained using the linear regression method based on the observed annual precipitation during the study period:

$$ {P}_{\mathrm{ann},\mathrm{trend}}=a\cdot Y+b $$
(2)

where Y denotes the year number (e.g., 1961, 1962, …, 2014 in this study), and a and b are the coefficients of the regression equation.

For the variation part of the preliminary estimation of annual precipitation, the following method is proposed. First, for each year during the study period, the difference between the observed annual precipitation and the tendency part (Pann, trend) is calculated, and a set of differences (Sdiff) can be obtained. Second, assuming that Sdiff obeys the normal distribution (see Subsection 3.1 for details), the variation part (Pann, var) for each year during the study period can be calculated through generating random numbers from the normal distribution function with mean and standard deviation of Sdiff. Then, together with the tendency part (Pann, trend), the preliminary estimation of annual precipitation for each year (Pann, est) can be computed as the sum of Pann, trend and Pann, var (see Eq. (1)).

2.2.2 Monthly Precipitation

In order to compute the preliminary estimation of monthly precipitation for each month of a year (Pmon, est, i, i = 1, 2, ..., 12), the following method is proposed. First, the mean annual precipitation (Pann, mean) and the mean monthly precipitation for each month of a year during the study period (Pmon, mean, i, i = 1, 2, ..., 12) are calculated with the observed precipitation. Second, the percentage of the mean monthly precipitation for each month of a year (PCTmon, mean, i, i = 1, 2, ..., 12) is computed as Pmon, mean, i/Pann, mean. Third, using the preliminary estimation of annual precipitation for each year (Pann, est) which has been obtained in the previous subsection, the preliminary estimation of monthly precipitation for each month of a year (Pmon, est, i) can then be computed as the product of PCTmon, mean, i and Pann, est. The mathematical formula to calculate Pmon, est, i can be expressed as follows:

$$ {P}_{\mathrm{mon},\mathrm{est},\mathrm{i}}=\frac{P_{\mathrm{mon},\mathrm{mean},\mathrm{i}}}{P_{\mathrm{ann},\mathrm{mean}}}\cdot {P}_{\mathrm{ann},\mathrm{est}},i=1,2,...,12 $$
(3)

2.3 Optimization Method for Improving the Preliminary Estimations

In this study, GP is adopted as an error updating scheme to improve the preliminary estimations of precipitation, which are obtained from the proposed statistical method. The details are introduced in the following.

2.3.1 Brief Introduction of GP

GP, which was firstly proposed by Cramer (1985) and later greatly expanded by Koza (1992, 1994), is a type of evolutionary algorithm, a subset of machine learning. Normally, GP implements an algorithm that uses random crossover, mutation, a fitness function, and multiple generations of evolution to resolve a user-defined task. GP is applicable to automatically discovering a functional relationship among features in data (i.e., symbolic regression, instead of traditional numerical regression). GP has been successfully used in various fields with complex optimization and search problems (e.g., Nordin and Banzhaf 1997; Muttil and Lee 2005; Wedge et al. 2009; Xu et al. 2016), including hydrology (e.g., Babovic and Keijzer 2000; Liong et al. 2002; Chadalawada et al. 2017).

Generally, GP gives each solution in a tree structure, with an operator function (e.g., four rules of arithmetic, trigonometric function, exponential function, logarithmic function, and so on) in every tree node and an operand (e.g., variable and number) in every terminal node, necessitating the evaluation of mathematical and logical expressions (Fig. 2). Crossover and mutation are the two major processes of producing new individuals in GP. In the crossover, two individuals (i.e., parents) are selected and sub-tree crossover randomly selects a crossover tree node in each parent tree (Fig. 2a). Then, two new individuals (i.e., children) are produced by exchanging the two selected crossover tree nodes, as illustrated in Fig. 2a. By contrast, in the mutation, each tree node is randomly considered and exchanged by another operator function with a certain probability (Fig. 2b). Therefore, it does not usually change the tree structure but the information content in the parse tree, which may serve to free the search from the possibility of being trapped in local optima (Khu et al. 2001; Fallah-Mehdipour et al. 2012).

Fig. 2
figure 2

(a) Crossover and (b) mutation in GP

In this study, Genetic Symbolic Regression (GSR), which is a special application of GP in the field of symbolic regression (Khu et al. 2001), will be used as the optimization method for improving the preliminary estimations of precipitation. In symbolic regression, both the most suitable functional form and the coefficients should be determined. This is quite different from traditional numerical regression, in which the functional form is pre-determined and only the coefficients should be determined. Thus, based on the observed precipitation and the preliminary estimations of precipitation, GSR will be conducted to explore a mathematical expression in symbolic form in this study.

2.3.2 Application of GP in the Recursive Approach

It is worth noting that an important application of GP is real-time prediction (e.g., Khu et al. 2001; Muttil and Lee 2005; Gaur and Deo 2008; Fallah-Mehdipour et al. 2012; Xu et al. 2016). In this study, such real-time prediction indicates that the observed monthly precipitation of the current year can be used as the input to predict the monthly precipitation of the next year. As a result, once the observed monthly precipitation in the initial year and the preliminary estimations of monthly precipitation in the initial year and the next year are available, GP can be applied to improve the preliminary estimations of monthly precipitation in the next year, and thus, the more accurate predictions of monthly precipitation in the next year can be obtained.

Mathematically, for the designated month, the observed monthly precipitation in the year t, Pmon, obs, t, can be expressed as follows:

$$ {P}_{\mathrm{mon},\mathrm{obs},\mathrm{t}}={P}_{\mathrm{mon},\mathrm{est},\mathrm{t}}+{\varepsilon}_{\mathrm{t}} $$
(4)

where Pmon, est, t is the preliminary estimation of monthly precipitation in the year t, and εt is the estimation error of monthly precipitation in the year t. This study assumes that the estimation error of monthly precipitation in a year will only be affected by the preliminary estimation of monthly precipitation and the estimation error in the previous year. Therefore, GSR is used to establish the functional relationship between the estimation error in the year t_next (εt _ next) and other variables, including the preliminary estimation of monthly precipitation in the year t_next (Pmon, est, t _ next), the preliminary estimation of monthly precipitation in the year t (Pmon, est, t), and the estimation error in the year t (εt). As the form of functional relationship generated from GSR can be various, this study uses a universal symbol, i.e., f(), to denote the functional relationship. The mathematical formula to calculate εt _ next can be expressed as follows:

$$ {\varepsilon}_{\mathrm{t}\_\mathrm{next}}=f\left({P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}},{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}},{\varepsilon}_{\mathrm{t}}\right) $$
(5)

Then, the improved estimation of monthly precipitation in the year t_next, Pmon, imp, t _ next, can be obtained:

$$ {P}_{\mathrm{mon},\mathrm{imp},\mathrm{t}\_\mathrm{next}}={P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}+{\varepsilon}_{\mathrm{t}\_\mathrm{next}} $$
(6)

Moreover, it is worth noting that two types of εt values may be used in Eq. (5). The first type is the actual estimation error in the case that the observed monthly precipitation is available; in this study, this type of εt values is used for calibration and validation (see Subsection 3.2 for details). However, for future precipitation prediction, the observed monthly precipitation is unavailable; in that case, the improved estimation of monthly precipitation can be regarded as the substitution of the observed monthly precipitation. As a result, the second type is the estimation error between the improved estimation of monthly precipitation and the preliminary estimation of monthly precipitation; this type of εt values is only used for prediction (see Subsection 3.3 for details).

Then, the procedures of the recursive approach to long-term prediction of monthly precipitation using GP can be summarized as follows:

  1. 1.

    For each year during the study period, the statistical method proposed in Subsection 2.2 is applied to compute the preliminary estimations of monthly precipitation for each month of that year (i.e., Pmon, est, i, i = 1, 2, ..., 12);

  2. 2.

    For the designated month in the year t, the estimation error between Pmon, est, t and Pmon, obs, t (or Pmon, imp, t), i.e., εt, is computed using Eq. (4);

  3. 3.

    For the corresponding month in the year t_next, GSR is used to derive the functional relationship of the estimation error (i.e., εt _ next) with the preliminary estimation of monthly precipitation in the year t_next (Pmon, est, t _ next), the preliminary estimation of monthly precipitation in the year t (Pmon, est, t), and the estimation error in the year t (i.e., εt), as shown in Eq. (5);

  4. 4.

    Finally, the improved estimation of monthly precipitation in the year t_next (i.e., Pmon, imp, t _ next) is calculated as the sum of Pmon, est, t _ next and εt _ next, as shown in Eq. (6).

2.4 Assessment Criterion

To evaluate the performances of the prediction results, the NSCE (Nash-Sutcliffe Coefficient of Efficiency) (Nash and Sutcliffe 1970) is selected as assessment criterion, and the relevant equation is given as follows:

$$ NSCE=1-\frac{\sum \limits_{j=1}^N{\left({X}_{j,\mathrm{obs}}-{X}_{j,\mathrm{pre}}\right)}^2}{\sum \limits_{j=1}^N{\left({X}_{j,\mathrm{obs}}-{\overline{X}}_{\mathrm{obs}}\right)}^2} $$
(7)

where Xi,obs and Xi,pre are the j-th observation and prediction, respectively; \( {\overline{X}}_{\mathrm{obs}} \) is the mean value of the observations; and N is the sample size. The NSCE can measure the goodness of fit, and its value will approach 1.0 if the prediction results are close to the observations.

In addition, to further evaluate the performances of the prediction results, residual analysis is implemented (Shi et al. 2014), and the equation for computing the standardized residual is given as follows:

$$ S{R}_j={e}_j/s $$
(8)

where SRj is the j-th standardized residual; ej is the j-th residual; s is the standard deviation of the residuals. ei and s can be computed as follows:

$$ {e}_j={X}_{j,\mathrm{obs}}-{X}_{j,\mathrm{pre}} $$
(9)
$$ s=\sqrt{\frac{1}{N-2}\sum \limits_{j=1}^N{e}_j^2} $$
(10)

3 Results and Discussion

3.1 Preliminary Estimations of Precipitation

Based on the observed precipitation, the mean annual precipitation (Pann, mean) was 423.0 mm over the TRHR during 1961–2014, and the annual precipitation showed a significant increasing trend with the change rate of 7.11 mm/decade at the significance level of p < 0.1. The maximum annual precipitation (i.e., 514.8 mm) occurred in 1989, while the minimum one (i.e., 362.9 mm) occurred in 1969 (Fig. 3a). Moreover, the annual precipitation showed a decreasing trend with the change rate of −2.6 mm/decade during 1961–2002 but a significant increasing trend with the change rate of 19.3 mm/decade during 2003–2014 (Shi et al. 2016). Therefore, it was infeasible to calibrate and validate the statistical method for computing the preliminary estimations of precipitation; instead, to present the overall trend during 1961–2014 which would be regarded as the trend during 2015–2050, all the data during 1961–2014 were used to make the preliminary estimations of precipitation be close to the observed precipitation as much as possible. Following the method proposed in Subsection 2.2.1, the preliminary estimations of annual precipitation were computed. Based on the linear regression method, the tendency part of annual precipitation (i.e., Pann, trend) could be preliminarily estimated as follows:

$$ {P}_{\mathrm{ann},\mathrm{trend}}=0.711\cdot Y-990.13 $$
(11)
Fig. 3
figure 3

(a) The annual precipitation over the TRHR during 1961–2014 and the linear trend; and (b) The differences between the observed annual precipitation and the tendency part of the annual precipitation over the TRHR during 1961–2014

Then, for each year during 1961–2014, the difference between the observed annual precipitation and the tendency part was calculated, and the result (i.e., Sdiff) is shown in Fig. 3b. It is observed that the differences varied between −61.32 mm and 90.74 mm, with the mean of 0.0085 mm and the standard deviation of 36.69 mm. Moreover, the distribution of Sdiff was checked by Kolmogorov-Smirnov test at the significance level of p < 0.05, which confirmed that Sdiff obeyed the normal distribution. Thus, the variation part of annual precipitation (i.e., Pann, var) for each year during 1961–2014 could be preliminarily estimated, and the preliminary estimation of annual precipitation (Pann, est) for each year during 1961–2014 was computed as the sum of Pann, trend and Pann, var.

It is worth noting that random numbers were generated for Pann, var, which might bring uncertainty in estimation; thus, one set of Pann, var would not be enough to represent the performance of the statistical method for computing the preliminary estimations of precipitation. In this study, ten sets of Pann, var were generated, and thus ten sets of Pann, est were obtained. For each year during 1961–2014, there were ten Pann, est values, among which, the mean, maximum and minimum Pann, est values were selected, respectively. The left part of Fig. 4 shows the preliminary estimations of annual precipitation during 1961–2014, which would be used for calibration and validation of the proposed recursive approach in the next subsection. The observed annual precipitation during 1961–2014 (marked by “+”) could basically be covered by the range between the maximum and minimum Pann, est values (i.e., the solid lines), and the mean Pann, est values (marked by “o”) generally showed the increasing trend.

Fig. 4
figure 4

The preliminary estimations of annual precipitation during 1961–2050. Note: the period of 1961–2014 is used for calibration and validation, and the period of 2015–2050 is used for prediction

In this study, the mean Pann, est values were used to compute the preliminary estimation of monthly precipitation for each month of a year (Pmon, est, i, i = 1, 2, ..., 12). According to the method proposed in Subsection 2.2.2, the percentage of the mean monthly precipitation for each month of a year (PCTmon, mean, i, i = 1, 2, ..., 12) should be firstly computed. Based on the observed precipitation, the mean monthly precipitation for each month of a year (Pmon, mean, i, i = 1, 2, ..., 12) during 1961–2014 was obtained (Table 2). It is observed that the maximum value of the mean monthly precipitation (i.e., 92.7 mm) occurred in July, followed by the mean monthly precipitation in June (i.e., 81.3 mm) and August (i.e., 78.4 mm), and the minimum value (i.e., 1.9 mm) occurred in December. Because the mean annual precipitation (Pann, mean) was 423.0 mm during 1961–2014, the percentage of the mean monthly precipitation for each month of a year (PCTmon, mean, i, i = 1, 2, ..., 12) could be computed (Table 2). Then, for each year during 1961–2014, the preliminary estimation of monthly precipitation for each month (Pmon, est, i, i = 1, 2, ..., 12) was computed using Eq. (3).

Table 2 The mean monthly precipitation and the percentage accounted for by the mean monthly precipitation for each month of a year during 1961–2014

Figure 5 shows the comparison of the preliminary estimations of monthly precipitation (marked by “□”) against the observations for each month of a year during 1961–2014. It is observed that the preliminary estimations were basically close to the observations and uniformly distributed on both sides of the red dash line, and the NSCE value was relatively high (i.e., 0.9195). However, there were more underestimated preliminary estimations in the case of large observations. The result of residual analysis (marked by “□” in Fig. 6) showed that the standardized residuals of the preliminary estimations were generally within ±2 when the preliminary estimations were smaller than 60 mm, but a number of points were beyond the range of ±2 when the preliminary estimations were larger than 60 mm (Fig. 6). This could also be represented by the significant deviations of the points from the red dash line in Fig. 5. It is worth noting that the preliminary estimations larger than 60 mm mainly appeared in the wet season, especially from June to September. Therefore, the preliminary estimations of monthly precipitation should be further improved, especially for those in the wet season.

Fig. 5
figure 5

Comparison of the preliminary/improved estimations of monthly precipitation against the observations for each month of a year during 1961–2014

Fig. 6
figure 6

Residual analysis based on the preliminary/improved estimations of monthly precipitation for each month of a year during 1961–2014

3.2 Improved Estimations of Precipitation

In this study, GP was considered as the optimization method for improving the preliminary estimations of monthly precipitation, and the main GP relevant parameters included size of population (100), number of generation (100), tournament size (6), and operator functions (times, minus, plus, sqrt, square). As mentioned above, the observations and preliminary estimations during 1961–2000 were used for calibration of the proposed method, while the remaining data during 2001–2014 were used for validation. Before optimization, the NSCE values derived from the observations and preliminary estimations (i.e., Case I) were 0.9190 and 0.9205 for the two periods of 1961–2000 and 2001–2014, respectively (Table 3). Following the procedures in Subsection 2.3.2, the optimization results were obtained and presented as follows.

Table 3 The NSCE values of the three cases

3.2.1 Calibration

In this study, GP was applied to derive the functional relationship as given in Eq. (5), and the improved estimations of monthly precipitation were calculated using Eq. (6). It is worth noting that two cases of the improved estimations of monthly precipitation (i.e., Case II and Case III) were considered during the calibration period of 1961–2000. Case II denoted that the improved estimations, which were computed considering all the preliminary estimations as a whole, were compared against the observations. Case III denoted that the improved estimations, which were computed considering the preliminary estimations with the standardized residuals beyond ±2 separately, were compared against the observations. It is observed from Table 3 that the NSCE value derived from Case II (i.e., 0.9222) was only a little higher than that derived from Case I (i.e., 0.9190), while the NSCE value derived from Case III (i.e., 0.9520) was much higher than those derived from Case I and Case II. As a result, this study regarded the improved estimations derived from Case III as the better ones and the relevant functional relationships derived from GP were expressed as follows:

$$ {\displaystyle \begin{array}{c}{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}\le 60\ \mathrm{mm}:\\ {}{P}_{\mathrm{mon},\mathrm{imp},\mathrm{t}\_\mathrm{next}}=1.023\cdot {P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}-1.023\cdot \mid {P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}-{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}}\mid \\ {}-0.00004854\cdot {P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}^2\cdot \left[{\varepsilon}_{\mathrm{t}}+\left({P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}-{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}}\right)\right]\\ {}+0.00141\cdot {\varepsilon}_{\mathrm{t}}\cdot {P}_{\mathrm{mon},\mathrm{est},\mathrm{t}}\cdot \mid {P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}-{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}}\mid +0.0001584\\ {}{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}>60\ \mathrm{mm}:\\ {}{P}_{\mathrm{mon},\mathrm{imp},\mathrm{t}\_\mathrm{next}}={P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}+\Big[12.6166\cdot {\varepsilon}_{\mathrm{t}}+71.9757\cdot \left({P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}-{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}}\right)\\ {}-0.3897\cdot {\varepsilon}_{\mathrm{t}}^2-10.4948\cdot {\left({P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}-{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}}\right)}^2\\ {}-3.9947\cdot {\varepsilon}_{\mathrm{t}}\cdot \left({P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}-{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}}\right)-51.3250\Big]\\ {}/\Big[1-0.3686\cdot {\varepsilon}_{\mathrm{t}}-1.9015\cdot \left({P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}-{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}}\right)\\ {}+0.04282\cdot {\varepsilon}_{\mathrm{t}}^2+1.1741\cdot {\left({P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}-{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}}\right)}^2\\ {}+0.4571\cdot {\varepsilon}_{\mathrm{t}}\cdot \left({P}_{\mathrm{mon},\mathrm{est},\mathrm{t}\_\mathrm{next}}-{P}_{\mathrm{mon},\mathrm{est},\mathrm{t}}\right)\Big]\end{array}} $$
(12)

Figure 5 shows the comparison of the improved estimations of monthly precipitation derived from Case III (marked by blue “×”) against the observations for each month of a year during the calibration period of 1961–2000. The improved estimations were closer to the observations than the preliminary estimations (marked by “□”), especially for those larger than 60 mm. The result of residual analysis on the improved estimations (marked by blue “×” in Fig. 6) showed that most of the standardized residuals were within ±2 even if the improved estimations were larger than 60 mm (Fig. 6). Moreover, it is worth noting that there were more improved estimations than preliminary estimations which were distributed between 90 mm and 120 mm, especially for those around 120 mm. This indicated that the larger observations could be better estimated by the improved estimations obtained from the proposed method.

3.2.2 Validation

Based on Eq. (12), the improved estimations of monthly precipitation for each month of a year during the period of 2001–2014 could be computed, and then, the validation of the proposed method was conducted. Figure 5 also shows the comparison of the improved estimations of monthly precipitation (marked by orange “·”) against the observations for each month of a year during the validation period of 2001–2014. The result indicated that the improved estimations were closer to the observations than the preliminary estimations (marked by “□”), and the distribution of the improved estimations during the validation period was similar to that during the calibration period. Moreover, Fig. 6 shows that most of the standardized residuals of the improved estimations during the validation period (marked by orange “·”) were within ±2, similar to those during the calibration period. Table 3 lists the NSCE values of the three cases (i.e., Case I, Case II and Case III), and the results derived from Case II (i.e., 0.9243) was only a little higher than that derived from Case I (i.e., 0.9205), while the NSCE value derived from Case III (i.e., 0.9517) was much higher than those derived from Case I and Case II.

To further evaluate the performance of the proposed recursive approach in precipitation prediction, additional validation was conducted as follows. For a designated year during the validation period of 2001–2014, the observations in the previous year were substituted by the improved estimations of monthly precipitation in the previous year, and then, the improved estimations of monthly precipitation in the designated year were computed for Case II and Case III, respectively. The relevant NSCE values were 0.9237 for Case II and 0.9524 for Case III, respectively. It indicated that, even when the observations were unavailable and the improved estimations were used as the substitutions of the observations for prediction, the performance of the proposed recursive approach could be as good as that using the observations directly. Therefore, the recursive approach proposed in this study could be adopted for the long-term prediction of monthly precipitation over the TRHR (see the next subsection for details). Nevertheless, the estimations of monthly precipitation could be further updated if the observations became available for subsequent predictions (Khu et al. 2001).

3.3 Long-Term Prediction of Monthly Precipitation

After calibration and validation, the proposed recursive approach has been proved to be applicable to long-term prediction of monthly precipitation over the TRHR. In this study, the predictions of monthly precipitation over the TRHR till 2050 were computed based on this recursive approach.

First, the preliminary estimations of annual precipitation were computed. For each year during 2015–2050, the tendency part (Pann, trend) and the variation part (Pann, var) of annual precipitation were estimated, and the preliminary estimation of annual precipitation (Pann, est) was computed as the sum of Pann, trend and Pann, var. Similar to that in Subsection 3.1, ten sets of Pann, var (as well as Pann, est) were generated, and the mean, maximum and minimum Pann, est values for each year during 2015–2050 were selected, respectively. The right part of Fig. 4 shows the relevant results.

Second, the mean Pann, est values were used to compute the preliminary estimations of monthly precipitation for each month of a year during 2015–2050 (Pmon, est, i, i = 1, 2, ..., 12), following the method proposed in Subsection 2.2.2.

Third, the improved estimations of monthly precipitation were calculated using Eq. (12). Since the observations during 2015–2050 were unavailable, the improved estimations of monthly precipitation in the previous year were regarded as the substitution of the observations in the previous year when applying GP.

Figure 7a shows the predictions (both the preliminary estimations and the improved estimations) of monthly precipitation for each month of a year during 2015–2050. It is observed that the series of the improved estimations had the more significant variation than the series of the preliminary estimations. In addition, some months in the wet season (e.g., July 2029 and July 2041) would have much higher values of precipitation than other months. Figure 7b shows the predictions of annual precipitation derived from the improved estimations for each year during 2015–2050. It is observed that the annual precipitation would present an overall increasing trend with significant inter-annual variation. This is reasonable because the same variation of the annual precipitation was found over the TRHR during 1961–2014 (Shi et al. 2016) (see Fig. 3a).

Fig. 7
figure 7

Predictions of (a) monthly precipitation for each month of a year during 2015–2050 and (b) annual precipitation for each year during 2015–2050

3.4 Discussion

It is worth noting that, in this study, ten sets of Pann, est were generated to address the problem that one set of Pann, est would not be enough to represent the performance of the statistical method; therefore, the impact of the number of sets should be discussed. In this section, comparative analysis is conducted between two results from ten sets and twenty sets of Pann, est, respectively. For the result from ten sets of Pann, est, the variation ranges of the mean, maximum and minimum values were 381~469 mm, 406~549 mm, and 320~426 mm, respectively; the averages of the mean, maximum and minimum values were 425 mm, 472 mm, and 375 mm, respectively. For the result from twenty sets of Pann, est, the variation ranges of the mean, maximum and minimum values were 383~468 mm, 445~567 mm, and 294~413 mm, respectively; the averages of the mean, maximum and minimum values were 424 mm, 490 mm, and 358 mm, respectively. It could be concluded that the maximum values would generally increase with larger number of sets, while the minimum values would generally decrease with larger number of sets. In contrast, the number of sets would not significantly influence the mean values. Since the mean values were used to compute the preliminary estimations of monthly precipitation in this study, the impact of the number of sets could be neglected.

Applying the proposed recursive approach for predicting monthly precipitation, we also need fully aware of the following four limitations. First, to some extent, the proposed recursive approach is region-dependent. The study area is located in a semi-arid region, and this approach is applicable. For different regions (e.g., humid regions), the variation characteristics of precipitation may be different, and thus the equations (e.g., Eqs. (11) and (12)) should be reestablished based on the observed data and the estimations. Second, this study adopted the mean Pann, est values for calculating the preliminary estimations of monthly precipitation, and thus, the results were in an average sense. If the maximum Pann, est, the minimum Pann, est, or any one of the ten sets of Pann, est values were used, the results might be different. Therefore, how to determine the most appropriate one is an important issue that is worth investigating in future works. Third, due to the quite different trends of annual precipitation between the two periods of 1961–2002 and 2003–2014, the statistical method for computing the preliminary estimations of precipitation was not calibrated and validated in this study, making the proposed recursive approach not a complete prediction scheme. However, for other regions with a consistent trend of annual precipitation, the proposed recursive approach could be further improved. Fourth, when predicting monthly precipitation over the TRHR during 2015–2050, the variation characteristics of precipitation were assumed to be the same as those during 1961–2014. Therefore, whether the variation characteristics of precipitation would change in the future is another issue that needs further study.

Nevertheless, with the awareness of the above limitations, the recursive approach proposed in this study can provide a new avenue of long-term prediction of monthly precipitation independent of other climatic indices and variables, which would be valuable for hydrology research around the world.

4 Conclusions

This study proposes a recursive approach to long-term prediction of monthly precipitation using GP, taking the TRHR as the study area. The major contributions of this study can be summarized as follows:

First, a statistical method for computing the preliminary estimations of monthly precipitation was proposed. For the TRHR during 1961–2014, the preliminary estimations were basically close to the observations except for those large values appeared in the wet season, which should be further improved.

Second, an optimization method with application of GP was proposed to improve the preliminary estimations. For the TRHR during 1961–2014, the improved estimations of monthly precipitation showed much better performance than the preliminary estimations.

Third, based on the proposed recursive approach, the monthly precipitation for each month of a year during 2015–2050 was predicted. The results indicated that there would be some very wet months in the near future. From a realistic perspective, more precipitation, especially for short-duration, high-intensity rains in the wet season, might lead to an increased risk of floods; therefore, it would be important to formulate related policies for reducing the losses caused by possible floods.

Overall, the proposed methods and the related conclusions can potentially benefit the utilization of some hydrological models at the monthly scale (Wang et al. 2009; Gosling and Arnell 2011), evaluating climate change (Wang et al. 2009), and exploring water resources sustainability (Barros et al. 2011; Chen et al. 2016; Li et al. 2018).