1 Introduction

One of the most important problems in modern ecology is a problem of identification of population dynamics type for the existing time series [1, 2]. Solution of this problem is scientifically based background for forecast, selection of optimal methods of population dynamics management, and some other important ecological problems. Determination of population dynamics type can be provided or with use of various biological tests [1, 3,4,5] or with use of mathematical models of ecosystem dynamics [6,7,8,9,10,11,12,13]. Note that in the first case, it is possible to use information of qualitatively different types; in the second case we can use existing time series only and may have serious problems when time series are rather short or when we have several correlated time series [14, 15].

In most cases the second way is preferred: it allows obtaining quantitative estimations of population dynamics characteristics. But at the same time, the second way doesn’t allow using the existing information totally: we can effectively use several existing time series only [6, 16].

For various mass species of forest insects (and, in particular, for green oak leaf roller (Tortrix viridana L.); [17,18,19,20,21]), we have long time series, and for analyses of these datasets it is possible to use ecological models with several (visible and invisible) variables [1, 2, 22,23,24]. But before applying of complex multi-component models describing population/ecosystem dynamics, we have to be sure that we cannot obtain sufficient description of population dynamics with simpler models [25].

In current publication for fitting of known time series [17] on dynamics of green oak leaf roller (Tortrix viridana L.), generalized discrete logistic model was used [26, 27]. Estimations of model parameters were provided by two various statistical approaches: ordinary least squares (OLS) [28, 29] and method of extreme points (MEP) [6, 7, 14, 30]. Provided calculations show that OLS estimations belong to “non-biological zone” of space of model parameters, and it doesn’t allow determining of population dynamics type and present a real forecast.

Searching of MEP estimations of model parameters was provided within the boundaries of “biological zone” of space of model parameters. Several points with extreme properties were found; on qualitative level all extreme points correspond to one and the same dynamic regime: it isn’t a cyclic fluctuations of population size (when length of cycle is less than 1500 years), and it can be characterized by fast decreasing auto-correlation function. It means that constructing of good forecasts can be a serious problem.

2 Model Description

In literature it is possible to find a rather small number of mathematical models of isolated population dynamics which have rich set of dynamic regimes and can be used as a “brick” for constructing complicated multi-component models of ecosystem dynamics [26, 27, 31,32,33,34,35,36]. Discrete logistic model is from that pool of models [6]:

$$ {x}_{k+1}=\left\{\begin{array}{c}{ax}_k\left(b-{x}_k\right),\kern1em {x}_k\le b\\ {}0,\kern1em {x}_k>b\end{array}\right.. $$
(2.1)

In (2.1) x k is a population size (or population density) at time moment k (time has a discrete nature). Parameter b is maximum value of population size; in (2.1) it is assumed that if at any time moment population size is greater than b population size of next generation is equal to zero. Product ab is maximal value of birth rate (which is defined as relation of sizes of two nearest generations). Parameters a and b, and initial value of population size x 0 are non-negative amounts, a, b, x 0 ≥ 0.

If inequality ab ≤ 4 is truthful, trajectories of model (1) are non-negative and bounded at 0 ≤ x 0 ≤ b. If ab > 4 and 0 < x 0 < b, possibility for model trajectory to escape from domain {x :  x < b} to {x :  x > b} is appeared; in this situation, identification of population dynamics regime is practically impossible: behavior of model trajectory will correspond to regime of population extinction, but it doesn’t correspond to reality. In other words, domain {(a, b) : ab > 4} in a space of model parameters can be defined as “non-biological zone.”

3 Used Statistical Criterions

Within the framework of traditional approach to model parameter estimation, it is assumed that the model allows obtaining sufficient approximation of empirical datasets if and only if deviations (between model/theoretical values and respective empirical amounts) are values of independent stochastic variables with normal distribution [28, 29]. It is also assumed that hypothesis about equivalence of deviations of averages to zero cannot be rejected for the selected significance level (it is equal to assumption that there are no regular errors in measurements).

For the analysis of properties of sets of deviations, Kolmogorov–Smirnov test, Lilliefors test, Shapiro–Wilk test [37,38,39], and other tests are used for checking of the normality of deviations (for checking the correspondence of sample of deviations to Normal distribution). For checking the absence/existence of serial correlation in sequence of deviations, Durbin–Watson test, “jumps up–jumps down” test, and some other tests can be used [28, 29, 40,41,42].

Estimations of model parameters can be found, for example, with the help of the following functional form (ordinary least squares (OLS)):

$$ Q\left(a,b,{x}_0\right)=\sum \limits_{k=0}^N{\left({x}_k^{\ast }-g\left(a,b,{x}_0,k\right)\right)}^2\to \underset{a,b,{x}_0}{\min }. $$
(3.1)

In (3.1) \( \left\{{x}_k^{\ast}\right\} \), k = 0, … , N is the existing initial sample (change in population sizes from year to year), N + 1 is a sample size; g(a, b, x 0, k) is a trajectory of model (1) which can be obtained for fixed values of parameters a and b, and initial population size x 0, g(a, b, x 0, 0) = x 0.

In literature, one can find other types/modifications of functional form (2). For example, if we want to take into account stronger influence of small values of initial sample onto final result (onto model parameter estimations), we can use functional form (2) with weights:

$$ Q\left(a,b,{x}_0\right)=\sum \limits_{k=0}^N{w}_k{\left({x}_k^{\ast }-g\left(a,b,{x}_0,k\right)\right)}^2\to \underset{a,b,{x}_0}{\min }. $$

In this expression weights w k are non-negative values for all k, w k ≥ 0, w 0 +  …  + w N = 1. But now we have no criterions for selection of weights w k. We have a recommendation only in such occasion where we need to have bigger values of weight for smaller deviation. We have also to note that weights may depend on amount of deviations.

One more way for modification of functional form (2) is the following:

$$ Q\left(a,b,{x}_0\right)=\sum \limits_{k=0}^N{\left|{x}_k^{\ast }-g\left(a,b,{x}_0,k\right)\right|}^{\gamma}\to \underset{a,b,{x}_0}{\min }. $$

In this expression γ is positive number, and it is not obligatory that γ = 2.

We can conclude that now we have no criterions for the selection of functional forms of type (2), and the functional forms correspond to our imagination about ways for obtaining best estimations of model parameters only. It is assumed that values which give global minimum for loss function Q are the best estimations of model parameters. If for these estimations of parameters, one of the used statistical criterions gives negative result (in particular, Null hypothesis about correspondence of distribution of set of deviations to Normal distribution, serial correlation is observed in sequence of residuals, etc.), and it gives a background for conclusion that the model cannot give sufficient fitting of time series. It is important to repeat it again: final conclusion about suitability or non-suitability of model for fitting of considering time series, we make using one point of a space of model parameters. Moreover, we use for final conclusion estimations of model parameters but not its real values. Use of loss functions for finding estimations of model parameters is one of the basic limitations of OLS. It becomes a more serious problem in a situation when we have to use several correlated time series [11,12,13,14, 43,44,45].

Use of method of extreme points (MEP) [7, 16] doesn’t assume using of any loss function. For obtaining MEP estimations of model parameters, first of all we have to construct feasible set of points Ω in a space Ω = {(a, b, x 0) :  a, b, x 0 ≥ 0}. Set Ω must be constructed by the following way: at the beginning we have to choose a set of statistical criterions which must be used for checking of properties of sets of deviations between theoretical/model values and respective values of initial time series. We have also to fix significance level for all criterions. After that we have to find points in a space Ω which correspond to sets of deviations when all the used statistical criterions give desired results. If feasible set is empty, Ω = ∅, we get a background for conclusion that model isn’t suitable for fitting of time series. When Ω isn’t empty for approximation of time series, we have to choose points with extreme properties (for example, we can choose points with maximum p-value for one or other statistical criterion).

In this paper, 5% significance level was fixed for all the used criterions. For every selected point of space of model parameters, set of respective deviations was checked on symmetry with respect to origin (it was provided with tests of homogeneity of two samples: Kolmogorov–Smirnov test, Mann–Whitney U-test, Lehmann–Rosenblatt test, and Wald–Wolfowitz test were used). Monotonic behavior of branches of density function was checked with Spearman rank correlation coefficient [37, 46]. For the analysis of absence/existence of serial correlation in sequences of deviations, Swed–Eisenhart test and “jumps up–jumps down” test were used [29, 40,41,42]. Note that feasible set Ω can be defined as confidence set: for all points of Ω statistical criterions give required results, and we have got a background for conclusion about invalidity of model for fitting of time series.

4 OLS Estimations of Model Parameters

For model (1), minimizing of loss function (2) allowed obtaining the following estimations for parameters a, b, and x 0: x 0 ≈ 0.086465, a ≈ 0.090102, and b ≈ 54.236778; for these estimations we have Q(a, b, x 0) ≈ 1215.035. This point of space of model parameters belongs to zone where origin is global stable equilibrium; when time step k = 30 population size x k = 65.12392, and after that step population size becomes equal to zero (and it doesn’t correspond to reality).

Analysis of deviations shows that with 5% significance level, hypothesis about equivalence of average to zero cannot be rejected. At the same time, probability that distribution of deviations corresponds to Normal distribution is following: p < 0.1 (Kolmogorov–Smirnov test), p < 0.01 (Lilliefors test), p = 0.0116 (Shapiro–Wilk test). Thus with 1% significance level, hypothesis about Normality of deviations must be rejected.

Testing of symmetry of distribution of deviations gave the following results: probability of event that distribution is symmetric is equal to p = 0.584645 (Wald–Wolfowitz test), p = 0.033261 (Mann–Whitney U-test). It allows concluding that with 5% significance level, hypothesis about symmetry must be rejected, and consequently OLS estimation doesn’t belong to feasible set Ω.

Thus, obtained results allow concluding that with OLS estimations, model (1) cannot give sufficient approximation of time series. Regime of population extinction is observed in the model at 31st time step; distribution of deviation doesn’t correspond to Normal (respective hypothesis must be rejected with 1% significance level) and so on. In such a situation, we don’t need to use other statistical criterions: for all possible results of application of other tests, we’ll have the same final result: model (1) cannot be used for fitting of empirical datasets, and cannot be used for description of green oak leaf roller population dynamics.

5 MEP Estimations of Model Parameters

In Fig. 1 there is a projection of 120,000 points of feasible set Ω onto plane (a, b). All points were found at pure stochastic search in domain [0, 100] × [0, 5] × [0, 130]. As we can see on Fig. 1, big number of points of feasible set Ω are within the boundaries of “biological zone” where inequalities ab ≤ 4 and x 0 ≤ b are truthful. Highest concentration of points (Fig. 1) is observed near curve ab = 4. It indicates that with a big probability, population dynamics of green oak leaf roller corresponds to cyclic regime with big length.

Fig. 1
figure 1

Projection of 120,000 points of feasible set Ω onto plane (a, b)

For points of set Ω, it was obtained that minimum value for Kolmogorov–Smirnov test (d = 0.25064) was observed at x 0 = 26.16194, a = 0.11327, b = 35.31373 (ab = 3.999978). For these estimations, minimum value (0.019231) was also observed for Lehmann–Rosenblatt test. For t = 0.26, probability K(t) of Kolmogorov distribution is close to zero [37], and respectively with significance level which is close to one, we cannot reject Null hypothesis about symmetry of distribution of deviations. Lehmann–Rosenblatt test shows that this hypothesis cannot be rejected with significance level 0.997. It means that Null hypothesis about symmetry of distribution must be accepted. Close result was obtained for Mann–Whitney test: U = 60 with critical level 45 when sample size is equal to 26.

Checking of monotonic behavior of branches of density function of deviations can be provided in two possible variants. If sample size is rather big, then we can check pointed out property for deviations \( \left\{{e}_k^{+}\right\} \) and \( \left\{-{e}_k^{-}\right\} \) separately where

$$ {e}_k={x}_k^{\ast }-g\left(a,b,{x}_0,k\right). $$

Deviation \( {e}_k^{+} \) is positive value of deviation e k, and respectively \( {e}_k^{-} \) is a negative one. If sample size is small, then we can check pointed out property for set \( \left\{{e}_k^{+}\right\}\cup \left\{-{e}_k^{-}\right\} \). Let’s consider a situation when \( \left\{{e}_k^{+}\right\} \) is sufficient big sample, k = 1, … , m. And let \( \left\{{e}_k^{+\ast}\right\} \) be a sample of ordered positive deviations:

$$ {e}_1^{+\ast}\le {e}_2^{+\ast}\le \dots \le {e}_m^{+\ast }. $$

Monotonic decreasing of density function means that bigger values (in sample) must be observed with smaller probabilities. Respectively, for lengths of intervals

$$ \left[0,{e}_1^{+\ast}\right],\kern0.5em \left[{e}_1^{+\ast },{e}_2^{+\ast}\right],\kern0.5em \dots, \kern0.5em \left[{e}_{m-1}^{+\ast },{e}_m^{+\ast}\right], $$

we need to have the similar order (in ideal situation). Rank 1 will correspond to the shortest interval \( \left[0,{e}_1^{+\ast}\right] \), while biggest rank m will correspond to the biggest interval \( \left[{e}_{m-1}^{+\ast },{e}_m^{+\ast}\right] \). Ideal case must be compared with real situation which is determined by sample \( \left\{{e}_k^{+\ast}\right\} \). For this reason, we have to calculate Spearman rank correlation coefficient ρ (and/or Kendall correlation coefficient τ), and check Null hypothesis H 0: ρ = 0 with alternative hypothesis H 1: ρ > 0. For selected significance level, Null hypothesis must be rejected. Note we have stronger result in a case when we can reject Null hypothesis with smaller significance level.

For pointed out parameters, we have p − value = 0.02052 for Spearman rank correlation coefficient, and p − value = 0.02325 for Kendall correlation coefficient τ. Thus Null hypotheses must be rejected for both coefficients with 3% significance level.

Analysis of behavior of auto-correlation function r(k) shows that for 0 < k ≤ 15000 all values of this function belong to close interval [−0.02, 0.02]. It allows concluding that if observed process is cyclic, the length of cycle is bigger than 1500 years. Moreover, fast decreasing of values of this function (r(0) = 1) and further fluctuations in narrow limits near zero level is typical behavior for processes which forget their history very fast (for example, like pure stochastic processes). In Fig. 2 considering time series and model (1) trajectory obtained for pointed out parameters are presented.

Fig. 2
figure 2

Time series of fluctuations of green oak leaf roller (solid line) and trajectory of discrete logistic model (1) (broken line) obtained for parameters when maximum amounts for p − value are observed for Kolmogorov–Smirnov and Lehmann–Rosenblatt tests

For points of feasible set Ω (Fig. 1), it was obtained that maximum value for Mann–Whitney test U = 119 was observed for x 0 = 0.529402, a = 0.299619, b = 13.234814 (note this point belongs to “biological zone” of space of model parameters, ab = 3.9654). For these parameters, we have p − value = 0.1838 (Kolmogorov–Smirnov test), p − value = 0.06028 (Lehmann–Rosenblatt test). Thus with 6% significance level, Null hypothesis about symmetry cannot be rejected, but we have to note that amount of p − value is very close to critical threshold.

Spearman rank correlation coefficient ρ = 0.6178 with p − value = 0.0004933. Kendall correlation coefficient τ = 0.4277 with p − value = 0.0009166. Taking it into account, we have to accept hypothesis about monotonic behavior of branches of density function.

Analysis of behavior of auto-correlation function r(k) shows that for 8 < k ≤ 15000 all values of this function belong to close interval [−0.08074, 0.0685]. Like in a previous case if observed process is cyclic, length of this cycle must be bigger than 1500 years. In Fig. 3 considering time series and model (1) trajectory obtained for pointed out parameters are presented.

Fig. 3
figure 3

Time series of fluctuations of green oak leaf roller (solid line) and trajectory of discrete logistic model (1) (broken line) obtained for parameters when maximum amounts for p − value are observed for Mann–Whitney U-test

For set Ω (Fig. 1), maximum value of Spearman rank correlation coefficient r = 0.888547 is observed for the following estimations of model parameters: x 0 = 0.573105, a = 0.217602, b = 18.18987 (this point belongs to “biological zone” of space of model parameters, ab = 3.958159). For the obtained parameters, p − value = 0.1226 for Kolmogorov–Smirnov test, p − value = 0.20171 for Lehmann–Rosenblatt test, p − value = 0.218743 for Wald–Wolfowitz test, and p − value = 0.293265 for Mann–Whitney U-test. Thus with 12% significance level, hypothesis about symmetry of deviation’s distribution cannot be rejected.

Spearman rank correlation coefficient ρ = 0.888547 with p − value = 8.367 ⋅ 10−7. Kendall correlation coefficient τ = 0.7169231 with p − value = 5.988 ⋅ 10−9. Taking into account the presented results for deviations, we have to accept hypothesis about monotonic behavior of branches of density function.

Analysis of behavior of auto-correlation function r(k) shows that for 11 < k ≤ 15000 all values of this function belong to close interval [−0.038, 0.035]. Like in previous cases if observed process is cyclic, length of this cycle must be bigger than 1500 years.

6 Conclusion

Provided analysis of fluctuations of green oak leaf roller population [17] with generalized discrete logistic model showed that estimations of model parameters obtained with ordinary least squares method belong to “non-biological zone.” If we use this approach only we obtain a background for conclusion that model doesn’t allow obtaining sufficient approximation for considering time series. Deviations between theoretical/model values and empirical numbers don’t correspond to several common requirements which must be observed if we have “good” correspondence between the model and the existing dataset. For example, hypothesis about Normality of set of deviations must be rejected with 1% significance level. Moreover, for estimated parameters model predict population extinction to 1993 year that doesn’t correspond to reality.

Approach to estimation of model parameters based on method of extreme points (MEP) allowed presenting several most suitable points for fitting of space of model parameters. All presented points are from “biological zone,” and deviations between theoretical/model trajectories and real dataset are satisfied to set of statistical criterions. In other words, analysis of deviations doesn’t allow concluding that model isn’t suitable for fitting of considering time series.

It is interesting to note that all variants of dynamic regimes which are observed for MEP estimations of model parameters correspond (on a qualitative level) to one and the same population size behavior. This is not a cyclic regime with cycle length in 1500 years or less. Moreover, in all situations, a rapid decrease in values of auto-correlation function (calculated for model trajectories) with further small fluctuations near zero level is observed. It is a typical behavior for processes which “forget their history” very fast.