Selecting the best probability distribution for at-site flood frequency analysis; a study of Torne River
Abstract
At-site flood frequency analysis is a direct method of estimation of flood frequency at a particular site. The appropriate selection of probability distribution and a parameter estimation method are important for at-site flood frequency analysis. Generalized extreme value, three-parameter log-normal, generalized logistic, Pearson type-III and Gumbel distributions have been considered to describe the annual maximum steam flow at five gauging sites of Torne River in Sweden. To estimate the parameters of distributions, maximum likelihood estimation and L-moments methods are used. The performance of these distributions is assessed based on goodness-of-fit tests and accuracy measures. At most sites, the best-fitted distributions are with LM estimation method. Finally, the most suitable distribution at each site is used to predict the maximum flood magnitude for different return periods.
Keywords
Flood frequency analysis L-moments Maximum likelihood estimation1 Introduction
Summary of Torne River gauging sites
Station name | Station no. | Latitude | Longitude | Catchment area (km\(^2\)) | Period for time series |
---|---|---|---|---|---|
Kukkolankoski Övre | 16722 | 65.98 | 24.06 | 33,929.60 | 1911–2018 |
Pajala Pumphus | 2012 | 67.21 | 23.40 | 11,038.10 | 1969–2018 |
Abisko | 2357 | 68.19 | 19.99 | 3345.50 | 1984–2018 |
Junosuando | 04 | 67.43 | 22.55 | 4348.00 | 1967–2018 |
Övre Abiskojokk | 957 | 68.36 | 18.78 | 566.30 | 1985–2018 |
To describe the flood frequency at a particular site, the choice of an appropriate probability distribution and parameter estimation method are of immense importance. The probability distributions used in this study include the generalized extreme value (GEV) distribution, Pearson type-III (P3) distribution, generalized logistic (GLO) distribution, Gumbel (GUM) distribution and three-parameter log-normal (LN3) distribution. These distributions are recommended for at-sight flood frequency analysis in various countries [2, 4, 24]. Furthermore, these distributions are most commonly traced in the hydrological literature for at-site and regional flood frequency analysis.
The most commonly used methods for estimation of parameters in flood frequency analysis are the maximum likelihood estimation (MLE) method, the method of moments (MOM), the L-moments (LM) method and the probability weighted moments method (PWM). The MLE method is an efficient and most widely used method for estimation of parameters. Recently, the LM method has gained more attention in the hydrological literature for estimation of parameters of probability distributions. In this research study, we used LM and MLE methods for estimation of parameters of the candidate probability distribution.
The methods usually use for selection of the best distribution are goodness-of-fit (GOF) tests (e.g. Anderson–Darling and Cramér–von Mises), accuracy measures (e.g. root mean square error and root mean squared percentage error), goodness-of-fit indices (e.g. AIC and BIC) and graphical methods (e.g. Q–Q plot and L-moment ratio diagram). In the hydrological literature, researchers have used different methods in order to find the best probability distribution. To identify the best-suited distribution at each site of Torne River, we have used goodness-of-fit (GOF) test and accuracy measures. The GOF tests are used to test that the data come from a specific distribution. The accuracy measures provide a term by term comparison of the deviation between the hypothetical distribution and the empirical distribution of the data. The details about accuracy measures and goodness-of-fit test used in this study are described in Sect. 3.
The estimation of flood frequency of the high return period is of great interest in flood frequency analysis. The flood frequency estimation of return periods is always associated with uncertainties. Uncertainty in flood frequency analysis arises from many sources. Uncertainties included in water resources management can be distinguished in data uncertainties, structural uncertainties and model/parameters uncertainties, see e.g. [13, 14]. Furthermore, there is uncertainty in the estimation of flood frequency of return periods much larger than the actual records, particularly in the type of probability density function (PDF) and its parameters. This is particularly true on the right tail of the PDF, the region of interest for flooding. In addition, there is uncertainty in the measurements. For example, see [15] for an in-depth discussion on epistemic uncertainty (reducible uncertainty) and natural uncertainty (irreducible uncertainty). The flood estimation on high return periods are always associated with high uncertainties. In this study, we quantify the uncertainty of a given quantile estimate for a specific fitted distribution by using the parametric bootstrap method.
In this research paper, the flood frequency calculation, using statistical distribution, is addressed for gauged catchments, for which we dispose a respectively long-term hydrological time series. The choice of an appropriate probability distribution and associated parameter estimation method is vital for at-site flood frequency analysis. The core objective of this study is to find the best-fit distribution among the candidate probability distributions with a particular method of estimation (MLE or LM) for annual maximum peak flow data at each site of the Torne River by using goodness-of-fit (GOF) tests and accuracy measures. We are also interested to look that, is there any best overall distribution and fitting method for these five sites of Torne River?. Finally, to estimate the quantiles of flood magnitude for the return period of 5, 10, 25, 50, 100, 200 and 500 years with non-exceedance probability at each site of the river using the best-fit probability distribution. To address the uncertainty of flood estimations, we estimate standard error of estimated quantiles and construct 95% confidence interval of flood quantile for the return period using the parametric bootstrap method. This is a first study for at-site flood frequency analysis of Torne River.
Probability density and quantiles functions of the probability distributions
Distributions | Probability density function \(f\left( y \right)\) | Quantile function y(F) |
---|---|---|
GEV | \(\frac{1}{\alpha }{\left[ {1 - \kappa \left( {\frac{{y - \mu }}{\alpha }} \right) } \right] ^{\frac{1}{k} - 1}}\exp \left\{ { - {{\left[ {1 - \kappa \left( {\frac{{y - \mu }}{\alpha }} \right) } \right] }^{\frac{1}{k}}}} \right\}\) | \(\mu + \frac{\alpha }{\kappa }\left[ {1 - {{\left( { - \log F} \right) }^\kappa }} \right]\) |
P3 | \(\frac{1}{{{\beta ^\alpha }\varGamma \alpha }}{\left( {y - \mu } \right) ^{\alpha - 1}}\exp \left\{ { - \frac{{\left( {y - \mu } \right) }}{\beta }} \right\}\) | Explicit analytical form is not available |
GLO | \(\frac{1}{\alpha }\left[ {1 - \kappa {{\left( {\frac{{y - \mu }}{\alpha }} \right) }^{(\frac{1}{k} - 1)}}} \right] {\left[ {1 + {{\left\{ {1 - \kappa \left( {\frac{{y - \mu }}{\alpha }} \right) } \right\} }^{1/\kappa }}} \right] ^{ - 2}}\) | \(\mu + \frac{\alpha }{\kappa }\left[ {1 - {{\left\{ {\left( {1 - F} \right) /F} \right\} }^\kappa }} \right] \;\) |
LN3 | \(\frac{1}{{\alpha \sqrt{2\pi } }}\exp \left[ { - \log \left\{ {\frac{{1 - k(y - \mu )}}{\alpha }} \right\} - \frac{1}{2}{{\left[ { - \frac{1}{k}\log \left\{ {\frac{{1 - k(y - \mu )}}{\alpha }} \right\} } \right] }^2}} \right]\) | Explicit analytical form is not available |
GUM | \(\frac{1}{\alpha }\exp \left[ { - \frac{{y - \mu }}{\alpha } - \exp \left( { - \frac{{y - \mu }}{\alpha }} \right) } \right]\) | \(\mu - \alpha \log \left( { - \log F} \right)\) |
2 Study area and data
The Torne River works as a border between northern Sweden and Finland, with total catchment area 40157 km\(^2\) of which 60% is within Swedish border and the remaining area is in Finland. The Muonio River, which is the biggest contributor of the Torne River, joins shortly after Pajala Pumphus. Another contributor river Lainio (259.74 km long) joins the Torne river shortly after Junosuando. In springtime, water flow is above average level, which converts into flood and this flood causes the damages to the waterfront constructions and buildings [6]. Therefore, Torne River is frequently affected by flooding problem [Swedish meteorological and hydrological institute (SMHI)]. The data of annual maximum flow of five gauging sites of Torne River (Swedish: Torneälven) are considered in this study. The data have been collected from SMHI (www.smhi.se). The length of the data series varies from 34 to 108 years. The summary of Torne River gauging sites characteristics is presented in Table 1.
3 Methodology
3.1 Candidate probability distributions
Assumptions results of five gauging sites of Torne River
Station name | n | r | P value | Mann–Kendall test | Wald–Wolfowitz test | |
---|---|---|---|---|---|---|
Test statistics | P value | P value | ||||
Kukkolankoski Övre | 108 | − 0.08 | 0.23 | 1.85 | 0.06 | 0.86 |
Pajala Pumphus | 50 | − 0.11 | 0.25 | 0.32 | 0.75 | 0.94 |
Abisko | 35 | 0.03 | 0.80 | 0.21 | 0.83 | 0.17 |
Junosuando | 52 | − 0.09 | 0.34 | 1.41 | 0.16 | 0.95 |
Övre Abiskojokk | 34 | − 0.06 | 0.66 | − 0.62 | 0.53 | 0.33 |
3.2 Maximum likelihood estimation (MLE) method
The MLE method estimates the parameters by maximizing the log-likelihood function of a probability distribution. Suppose we have n independent and identically distributed observations \({y_1},\,{y_2}, \ldots ,\,{y_n}\). Each \(y_i\) has a pdf given by \(f(y_i;\varvec{\mu })\). Here, \(\varvec{\mu } = ({\mu _1},\;{\mu _2},\ldots ,{\mu _k})\) is a vector of unknown parameters to be estimated. Then, the log-likelihood function is defined as \(l\varvec{\left( \mu \right) } = \sum \nolimits _{i = 1}^n {\log f\left( {{y_i};\varvec{\mu } } \right) }\). The maximum likelihood estimate of \(\varvec{\mu }\) is the value of the parameter vector \(\varvec{\mu }\) that maximize the \(l\varvec{\left( \mu \right) }\) for given data Y. We use numerical optimization methods in order to search \(\varvec{\mu }\) which give the maximum value of \(l\varvec{\left( \mu \right) }\). Many numerical optimization methods, e.g. Newton–Raphson method, Nelder and Mead, differential evolution, etc. are found in the literature. We have used Nelder and Mead method for numerical optimization proposed by Nelder and Mead [19].
3.3 Theory of L-moments (LM)
3.4 Standard error of estimated parameters
We use estimated parameters with MLE and LM method at each gauging site and draw 1000 samples of size equal to the length of data from each probability distribution.
For each simulated sample, we obtain the MLE and LM estimates for the parameters of the distribution.
For each gauging site, the standard errors are obtained by taking the standard deviation of these 1000 MLE and LM estimates of the parameters of each distribution.
Estimated parameters with MLE and LM methods
Station | Probability distributions | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GEV | P3 | GUM | GLO | LN3 | ||||||||||
MLE | MLE | MLE | MLE | MLE | ||||||||||
\({\hat{\mu }}\) | \({\hat{\alpha }}\) | \({\hat{\kappa }}\) | \({\hat{\mu }}\) | \({\hat{\alpha }}\) | \({\hat{\kappa }}\) | \({\hat{\mu }}\) | \({\hat{\alpha }}\) | \({\hat{\mu }}\) | \({\hat{\alpha }}\) | \({\hat{\kappa }}\) | \({\hat{\mu }}\) | \({\hat{\alpha }}\) | \({\hat{\kappa }}\) | |
16722 | 1993.73 (48.38) | 450.20 (34.34) | 0.16 (0.07) | 297.25 (763.60) | 128.76 (58.50) | 14.72 (12.50) | 1955.47 (43.91) | 433.50 (33.52) | 2149.33 (49.03) | 280.04 (22.95) | − 0.11 (0.07) | 2152.08 (51.38) | 483.32 (33.78) | − 0.17 (0.09) |
2012 | 768.35 (34.29) | 223.53 (24.67) | 0.43 (0.08) | 1663.77 (516.20) | − 52.30 (36.45) | 15.99 (20.76) | 718.05 (36.57) | 240.19 (25.24) | 839.62 (29.98) | 117.07 (13.92) | 0.08 (0.10) | 844.26 (31.44) | 204.85 (21.03) | 0.16 (0.12) |
2357 | 207.26 (9.80) | 52.41 (6.86) | 0.24 (0.11) | − 285.00 (869.20) | 5.69 (9.86) | 90.04 (308.12) | 200.73 (9.01) | 50.87 (6.84) | 225.42 (9.49) | 30.92 (4.35) | − 0.04 (0.12) | 225.60 (9.93) | 53.77 (6.45) | − 0.06 (0.15) |
4 | 281.69 (11.84) | 74.13 (8.10) | 0.02 (0.11) | 113.75 (51.00) | 40.43 (15.39) | 5.18 (3.10) | 280.81 (10.49) | 73.61 (8.00) | 307.58 (11.97) | 48.63 (6.16) | − 0.20 (0.08) | 308.92 (12.80) | 85.91 (9.24) | − 0.32 (0.12) |
957 | 108.80 (5.18) | 26.47 (3.77) | 0.08 (0.14) | 61.85 (27.44) | 17.66 (13.39) | 3.42 (4.08) | 107.71 (4.56) | 25.83 (3.47) | 119.34 (5.56) | 17.35 (2.54) | − 0.13 (0.14) | 118.11 (5.76) | 29.65 (3.86) | − 0.28 (0.19) |
Station | Probability distributions | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GEV | P3 | GUM | GLO | LN3 | ||||||||||
LM | LM | LM | LM | LM | ||||||||||
\({\hat{\mu }}\) | \({\hat{\alpha }}\) | \({\hat{\kappa }}\) | \({\hat{\mu }}\) | \({\hat{\alpha }}\) | \({\hat{\kappa }}\) | \({\hat{\mu }}\) | \({\hat{\alpha }}\) | \({\hat{\mu }}\) | \({\hat{\alpha }}\) | \({\hat{\kappa }}\) | \({\hat{\mu }}\) | \({\hat{\alpha }}\) | \({\hat{\kappa }}\) | |
16722 | 1990.07 (46.62) | 456.59 (36.53) | 0.15 (0.07) | 11.28 (994.14) | 114.04 (57.03) | 19.12 (19.50) | 1959.44 (40.39) | 403.29 (36.53) | 2157.98 (46.90) | 276.98 (22.79) | − 0.07 (0.05) | 2154.46 (48.38) | 490.66 (35.96) | − 0.15 (0.09) |
2012 | 764.68 (33.65) | 220.31 (23.58) | 0.40 (0.11) | 1964.77 (781.80) | − 39.13 (42.25) | 29.07 (22.81) | 728.59 (26.28) | 170.98 (22.34) | 839.06 (28.85) | 117.80 (13.84) | 0.06 (0.08) | 840.27 (32.31) | 208.73 (22.61) | 0.12 (0.13) |
2357 | 206.33 (10.49) | 53.49 (6.96) | 0.22 (0.13) | − 305.49 (270.70) | 5.80 (15.19) | 91.88 (23.56) | 201.21 (8.01) | 45.18 (7.06) | 225.54 (9.51) | 31.26 (4.46) | − 0.03 (0.09) | 225.36 (10.37) | 55.39 (6.62) | − 0.07 (0.16) |
4 | 279.83 (10.94) | 73.16 (9.12) | − 0.01 (0.10) | 148.25 (89.07) | 50.90 (21.98) | 3.43 (8.62) | 280.25 (10.75) | 74.02 (9.07) | 308.19 (12.16) | 48.68 (6.18) | − 0.18 (0.08) | 306.66 (13.10) | 85.97 (9.70) | − 0.37 (0.14) |
957 | 109.40 (5.50) | 28.50 (4.18) | 0.14 (0.13) | − 3.65 (103.66) | 7.84 (7.83) | 16.06 (20.99) | 107.62 (4.55) | 25.38 (4.10) | 119.92 (5.26) | 17.40 (2.54) | − 0.08 (0.10) | 119.68 (5.90) | 30.82 (3.93) | − 0.17 (0.17) |
Descriptive statistics (cubic metre per second)
Station | n | Mean | Median | S | CV | Skewness | Kurtosis |
---|---|---|---|---|---|---|---|
16722 | 108 | 2192.22 | 2140.00 | 493.03 | 0.22 | 0.37 | − 0.19 |
2012 | 50 | 827.28 | 837.53 | 211.16 | 0.26 | − 0.54 | 0.90 |
2357 | 35 | 227.29 | 223.53 | 54.71 | 0.24 | 0.17 | − 0.07 |
04 | 52 | 322.98 | 296.50 | 93.32 | 0.29 | 0.93 | 0.95 |
957 | 34 | 122.27 | 120.67 | 31.39 | 0.26 | 0.73 | 1.24 |
Rank score of distribution in both GOF tests and accuracy measures
Station | Method | Distribution | AD | RMSE | MAE | RMSEP | MAEP | \(R^2\) | Total rank |
---|---|---|---|---|---|---|---|---|---|
Kukkolankoski Övre | MLE | GEV | 7 | 5 | 7 | 7 | 7 | 5 | 38 |
P3 | 6 | 7 | 6 | 6 | 5 | 7 | 37 | ||
GLO | 4 | 3 | 4 | 4 | 4 | 3 | 22 | ||
LN3 | 5 | 6 | 5 | 5 | 6 | 6 | 33 | ||
GUM | 2 | 1 | 2 | 2 | 2 | 1 | 10 | ||
LM | GEV | 10 | 10 | 10 | 10 | 10 | 10 | 60 | |
P3 | 9 | 9 | 9 | 9 | 9 | 9 | 54 | ||
GLO | 3 | 4 | 3 | 3 | 3 | 4 | 20 | ||
LN3 | 8 | 8 | 8 | 8 | 8 | 8 | 48 | ||
GUM | 1 | 2 | 1 | 1 | 1 | 2 | 08 | ||
Pajala Pumphus (2012) | MLE | GEV | 3 | 4 | 7 | 4 | 4 | 4 | 26 |
P3 | 6 | 5 | 5 | 5 | 5 | 5 | 31 | ||
GLO | 4 | 8 | 3 | 8 | 8 | 8 | 39 | ||
LN3 | 7 | 6 | 6 | 6 | 6 | 6 | 37 | ||
GUM | 1 | 1 | 1 | 2 | 1 | 1 | 7 | ||
LM | GEV | 5 | 3 | 8 | 3 | 3 | 3 | 25 | |
P3 | 9 | 9 | 9 | 9 | 9 | 9 | 54 | ||
GLO | 8 | 7 | 4 | 7 | 7 | 7 | 40 | ||
LN3 | 10 | 10 | 10 | 10 | 10 | 10 | 60 | ||
GUM | 2 | 2 | 2 | 1 | 2 | 2 | 11 | ||
Abisko (2357) | MLE | GEV | 3 | 3 | 3 | 3 | 4 | 3 | 19 |
P3 | 8 | 5 | 6 | 4 | 6 | 5 | 34 | ||
GLO | 4 | 6 | 4 | 9 | 3 | 6 | 32 | ||
LN3 | 7 | 4 | 5 | 5 | 5 | 4 | 30 | ||
GUM | 2 | 1 | 2 | 2 | 2 | 1 | 10 | ||
LM | GEV | 6 | 7 | 8 | 6 | 8 | 7 | 42 | |
P3 | 9 | 9 | 9 | 7 | 9 | 9 | 52 | ||
GLO | 5 | 8 | 7 | 10 | 7 | 8 | 45 | ||
LN3 | 10 | 10 | 10 | 8 | 10 | 10 | 58 | ||
GUM | 1 | 2 | 1 | 1 | 1 | 2 | 8 | ||
Junosuando (4) | MLE | GEV | 5 | 1 | 1 | 2 | 2 | 1 | 12 |
P3 | 2 | 1 | 1 | 2 | 2 | 1 | 09 | ||
GLO | 10 | 10 | 6 | 9 | 7 | 10 | 52 | ||
LN3 | 4 | 3 | 4 | 5 | 5 | 3 | 24 | ||
GUM | 8 | 6 | 9 | 6 | 8 | 6 | 43 | ||
LM | GEV | 6 | 9 | 8 | 7 | 6 | 9 | 45 | |
P3 | 1 | 4 | 5 | 1 | 1 | 4 | 16 | ||
GLO | 9 | 5 | 3 | 10 | 9 | 5 | 41 | ||
LN3 | 3 | 7 | 7 | 4 | 4 | 7 | 32 | ||
GUM | 7 | 8 | 10 | 8 | 10 | 8 | 51 | ||
Övre Abiskojokk (957) | MLE | GEV | 6 | 5 | 7 | 7 | 7 | 5 | 37 |
P3 | 1 | 7 | 1 | 4 | 1 | 7 | 21 | ||
GLO | 5 | 10 | 4 | 3 | 4 | 10 | 36 | ||
LN3 | 4 | 6 | 5 | 6 | 6 | 6 | 33 | ||
GUM | 3 | 9 | 3 | 5 | 3 | 9 | 32 | ||
LM | GEV | 8 | 1 | 8 | 9 | 8 | 1 | 35 | |
P3 | 9 | 2 | 10 | 10 | 10 | 2 | 43 | ||
GLO | 7 | 4 | 6 | 1 | 5 | 4 | 27 | ||
LN3 | 10 | 3 | 9 | 8 | 9 | 3 | 42 | ||
GUM | 2 | 8 | 2 | 2 | 2 | 8 | 24 |
Quantile estimates of flood with 95% confidence interval at five gauging sites of Torne River
Station | Distribution | Method | Statistics | Non-exceedance probability and return periods (years) | ||||||
---|---|---|---|---|---|---|---|---|---|---|
0.80 | 0.90 | 0.96 | 0.98 | 0.99 | 1.00 | 1.00 | ||||
5.00 | 10.00 | 25.00 | 50.00 | 100.00 | 200.00 | 500.00 | ||||
16772 | GEV | LM | Lower | 2475.06 | 2703.84 | 2929.24 | 3056.12 | 3157.22 | 3236.94 | 3324.16 |
Fit | 2601.25 | 2857.74 | 3142.00 | 3327.52 | 3492.74 | 3640.52 | 3812.68 | |||
Upper | 2728.22 | 3011.45 | 3358.67 | 3612.61 | 3867.51 | 4120.59 | 4461.08 | |||
\(\sigma _{\mathrm{s}}\) | 64.33 | 78.28 | 109.83 | 142.87 | 181.92 | 225.49 | 288.13 | |||
2012 | LN3 | LM | Lower | 943.17 | 1016.74 | 1080.04 | 1114.79 | 1141.18 | 1164.31 | 1188.27 |
Fit | 1007.08 | 1087.59 | 1168.75 | 1218.72 | 1262.09 | 1300.52 | 1345.53 | |||
Upper | 1067.68 | 1155.55 | 1258.97 | 1332.30 | 1402.33 | 1470.83 | 1558.80 | |||
\(\sigma _{\mathrm{s}}\) | 31.69 | 35.55 | 45.80 | 55.86 | 66.96 | 78.66 | 94.65 | |||
2357 | LN3 | LM | Lower | 250.78 | 272.75 | 293.06 | 304.31 | 313.38 | 321.13 | 330.25 |
Fit | 273.37 | 299.61 | 328.49 | 347.65 | 365.24 | 381.63 | 401.88 | |||
Upper | 296.48 | 327.91 | 368.54 | 400.46 | 433.38 | 466.35 | 511.51 | |||
\(\sigma _{\mathrm{s}}\) | 11.66 | 14.15 | 19.52 | 24.73 | 30.66 | 37.15 | 46.49 | |||
04 | GLO | MLE | Lower | 350.20 | 392.09 | 446.76 | 486.07 | 529.94 | 577.23 | 641.49 |
Fit | 385.52 | 442.47 | 525.23 | 596.78 | 678.28 | 771.63 | 916.74 | |||
Upper | 424.56 | 499.39 | 629.33 | 754.07 | 920.40 | 1132.33 | 1491.74 | |||
\(\sigma _{\mathrm{s}}\) | 18.70 | 27.11 | 46.15 | 68.45 | 99.73 | 142.66 | 223.51 | |||
957 | P3 | LM | Lower | 133.74 | 146.98 | 159.11 | 166.19 | 171.67 | 176.38 | 181.46 |
Fit | 147.67 | 163.85 | 182.29 | 194.88 | 206.65 | 217.81 | 231.83 | |||
Upper | 162.08 | 183.06 | 210.76 | 231.51 | 251.71 | 272.35 | 299.50 | |||
\(\sigma _{\mathrm{s}}\) | 7.19 | 9.28 | 13.09 | 16.44 | 20.02 | 23.77 | 28.89 |
3.5 Goodness-of-fit (GOF) tests
where \(F({y_i})\,\) represents the cumulative distribution function (CDF) of the specified distribution.
3.6 Accuracy measure method
3.7 Quantile estimation
4 Result and discussion
We summarized the basic statistics of five gauging sites in Table 5. All data on gauging sites in the table are in cubic metre per second. It is observed that all data at these sites are skewed. This is a enough evidence to model the data with non-normal distribution. In flood frequency analysis, the basic statistical assumptions are independence, randomness and stationarity of the data series (see e.g. [8, 11]). The independence and randomness of the data series at given site are tested by using correlation coefficient (r) at lag-1 and Wald–Wolfowitz (WW) test, respectively. To check the stationarity of the data series, Mann–Kendall (MK) test has been applied. The assumptions verification results are summarized in Table 3. The results in Table 3 indicate that the data series at each gauging site of Torne River are suitable for flood frequency analysis and probability density estimation.
The estimated parameters for each distribution at each gauging site by using MLE and LM method of estimation along with standard error (SE) are reported in Table 4. To identify the best distribution at each site, we use GOF tests and accuracy measures. Each distribution with parameter estimation method is ranked in each GOF test and accuracy measure in Table 6. The distribution is assigned a rank score between 1 and 10 in GOF test and accuracy measures, rank score 10 for the best-fitted and 1 for the worse fitted distribution. The rank score scheme is based on the relative magnitude of accuracy measures and AD test P value. The distribution with the lowest RMSE, lowest MAE, lowest RMSEP, lowest MAEP or the highest \(R^2\) has the highest rank score value 10. In AD test, the distribution with the highest P value has the highest rank score value 10. The best distribution with estimation method at each site is identified based on the total rank score in GOF tests and accuracy measures methods. The total rank score in Table 6 indicates that GLO with MLE estimation method is best for Junosuando. For site Pajala Pumphus and Abisko, the LN3 distribution is performed better than other distributions with the LM method. The GEV and PE3 distribution with the LM method are best-fit distributions for gauging site Kukkolankoski Övre and Övre Abiskojokk, respectively.
In this study, a single distribution has not emerged as the best distribution for all gauging sites. This was also the case in [1, 5]. Overall, the LM estimation method performed better for identifying the suitable distribution (also see, [1]). The most suited distribution with MLE estimation method is identified at gauging site which has the highest CV and skewness, see Tables 5 and 6. It seems that the sites having extreme average of annual maxima of flood and catchment area (either very large or very small) are in favour of the LM method of estimation, see Tables 1, 5 and 6. If we look the landscape setting, the gauging sites which are at an extreme position (close and far away) to the Gulf of Bothnia are in favour of the LM estimation method. The sample size of the time series does not seem to be an important factor in favour of particular distribution or estimation method in this study.
One major objective of flood frequency analysis is to estimate the quantiles in the extreme upper tail of the best-fitted distribution at each gauging site. The quantiles estimate for the return periods 5, 10, 25, 50, 100, 200 and 500 years are calculated by using quantile function and parameters value of the best-fitted distributions. Quantile estimate \({y_T}\) with non-exceedance probability F for the best-fitted distributions is given in Table 7. The estimate of uncertainty (\(\sigma _{\mathrm{s}}\)) in quantile estimates and 95% confidence intervals of quantiles of flood for different return period are also presented in Table 7. The SE indicates that longer return periods have more uncertainty around the flood quantile estimates.
5 Conclusion
In this study, the annual maximum steam flow series of five gauging sites of Torne River are examined. Flood frequency analysis is performed by using GEV, P3, GUM, GLO and LN3 distributions. The MLE and LM parameter estimation techniques are used to estimate the distribution’s parameters. The study investigates the selection of best-fit probability distribution and estimation method for at-site flood frequency analysis of Torne river. The best-fit frequency distribution is identified at each gauging site based on the highest total rank score in goodness-of-fit tests and accuracy measures.
The results indicate that the GLO distribution using MLE for gauging site Junosuando and the LN3 distribution with a LM method for Pajala Pumphus and Abisko perform better than other distributions of this study. The GEV and P3 distributions using the LM method are the most suitable distribution at Kukkolankoski Övre and Övre Abiskojokk, respectively. At most gauging sites, the best distributions using LM estimation method are identified as the best-suited distributions.
The results found in this research study for flood frequency analysis of Torne River can be used in flood study, water resource planning and designing of hydraulic structures within the same basin and similar catchments. The best-fitted distributions used in this study could be considered as candidate distributions for regional flood frequency analysis of Torne River basin or at-site flood frequency analysis on other rivers in Sweden as well.
Notes
Acknowledgements
Open access funding provided by Stockholm University.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
References
- 1.Ahmad I, Fawad M, Mahmood I (2015) At-site flood frequency analysis of annual maximum stream flows in Pakistan using robust estimation methods. Pol J Environ Stud 24(6):2345–2353CrossRefGoogle Scholar
- 2.Castellarin A, Kohnová S, Gaál L, Fleig A, Salinas JL, Toumazis A, Kjeldsen TR, Macdonald N (2012) Review of applied-statistical methods for flood-frequency analysis in Europe. Technical report, (NERC) Centre for Ecology & HydrologyGoogle Scholar
- 3.Cicioni G, Giuliano G, Spaziani FM (1973) Best fitting of probability functions to a set of data for flood studies. In: Floods and droughtsGoogle Scholar
- 4.Cunnane C (1989) Statistical distributions for flood frequency analysis. Operational Hydrology Report (WMO)Google Scholar
- 5.Drissia TK, Jothiprakash V, Anitha AB (2019) Flood frequency analysis using L moments: a comparison between at-site and regional approach. Water Resour Manag 33(3):1013–1037CrossRefGoogle Scholar
- 6.Elfvendahl S, Liljaniemi P, Salonen N (2006) The River Torne international watershed: common Finnish and Swedish typology, reference conditions and a suggested harmonised monitoring program: results from the TRIWA project. County Administrative Board of Norrbotten [Länsstyrelsen i Norrbottens län]Google Scholar
- 7.Haddad K, Rahman A (2011) Selection of the best fit flood frequency distribution and parameter estimation procedure: a case study for Tasmania in Australia. Stoch Environ Res Risk Assess 25(3):415–428CrossRefGoogle Scholar
- 8.Hamed K, Rao AR (1999) Flood frequency analysis. CRC Press, Boca RatonGoogle Scholar
- 9.Hosking JRM (1986) The theory of probability weighted moments. IBM Research Rep RC12210, IBM, Yorktown Heights, NY Google ScholarGoogle Scholar
- 10.Hosking JRM (1990) L-moments: analysis and estimation of distributions using linear combinations of order statistics. J R Stat Soc Ser B (Methodol) 52(1):105–124MathSciNetzbMATHGoogle Scholar
- 11.Kite GW (2019) Frequency and risk analyses in hydrology. Water Resour Publications, LLC. https://books.google.se/books?id=b9OKxAEACAAJ
- 12.Laio F (2004) Cramer–von Mises and Anderson–Darling goodness of fit tests for extreme value distributions with unknown parameters. Water Resour Res 40(9):W09308CrossRefGoogle Scholar
- 13.Leandro J, Leitão JP, de Lima JLMP (2013) Quantifying the uncertainty in the soil conservation service flood hydrographs: a case study in the Azores Islands. J Flood Risk Manag 6(3):279–288CrossRefGoogle Scholar
- 14.Leandro J, Gander A, Beg MNA, Bhola P, Konnerth I, Willems W, Carvalho R, Disse M (2019) Forecasting upper and lower uncertainty bands of river flood discharges with high predictive skill. J Hydrol 576:749–763CrossRefGoogle Scholar
- 15.Merz B, Thieken AH (2005) Separating natural and epistemic uncertainty in flood frequency analysis. J Hydrol 309(1–4):114–132CrossRefGoogle Scholar
- 16.Meylan P, Favre AC, Musy A (2012) Predictive hydrology: a frequency analysis approach. CRC Press, Boca RatonCrossRefGoogle Scholar
- 17.Mkhandi SH, Kachroo RK, Gunasekara TAG (2000) Flood frequency analysis of Southern Africa: II. Identification of regional distributions. Hydrol Sci J 45(3):449–464CrossRefGoogle Scholar
- 18.Młyński D, Wałęga A, Stachura T, Kaczor G (2019) A new empirical approach to calculating flood frequency in ungauged catchments: a case study of the upper Vistula basin, Poland. Water 11(3):601CrossRefGoogle Scholar
- 19.Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313MathSciNetCrossRefGoogle Scholar
- 20.Önöz B, Bayazit M (1995) Best-fit distributions of largest available flood samples. J Hydrol 167(1–4):195–208CrossRefGoogle Scholar
- 21.Opere AO, Mkhandi S, Willems P (2006) At site flood frequency analysis for the Nile Equatorial basins. Phys Chem Earth Parts A/B/C 31(15–16):919–927CrossRefGoogle Scholar
- 22.Rahman AS, Rahman A, Zaman MA, Haddad K, Ahsan A, Imteaz M (2013) A study on selection of probability distributions for at-site flood frequency analysis in Australia. Nat Hazards 69(3):1803–1813CrossRefGoogle Scholar
- 23.Saf B (2009) Regional flood frequency analysis using L-moments for the West Mediterranean region of Turkey. Water Resour Manag 23(3):531–551CrossRefGoogle Scholar
- 24.Sevruk B, Geiger H (1981) Selection of distribution types for extremes of precipitation (No. 551.577). Secretariat of the World Meteorological OrganizationGoogle Scholar
- 25.The Swedish Meteorological and Hydrological Institute (2019) Hydrologiska observationer. Data files retrieved from SMHI hydrological observations. https://vattenwebb.smhi.se/station/. Accessed 20 Mar 2019
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.