Advertisement

SN Applied Sciences

, 1:1629 | Cite as

Selecting the best probability distribution for at-site flood frequency analysis; a study of Torne River

  • Mahmood Ul HassanEmail author
  • Omar Hayat
  • Zahra Noreen
Open Access
Research Article
Part of the following topical collections:
  1. 2. Earth and Environmental Sciences (general)

Abstract

At-site flood frequency analysis is a direct method of estimation of flood frequency at a particular site. The appropriate selection of probability distribution and a parameter estimation method are important for at-site flood frequency analysis. Generalized extreme value, three-parameter log-normal, generalized logistic, Pearson type-III and Gumbel distributions have been considered to describe the annual maximum steam flow at five gauging sites of Torne River in Sweden. To estimate the parameters of distributions, maximum likelihood estimation and L-moments methods are used. The performance of these distributions is assessed based on goodness-of-fit tests and accuracy measures. At most sites, the best-fitted distributions are with LM estimation method. Finally, the most suitable distribution at each site is used to predict the maximum flood magnitude for different return periods.

Keywords

Flood frequency analysis L-moments Maximum likelihood estimation 

1 Introduction

Floods are natural hazards and cause extreme damages throughout the world. The main reasons of floods are extreme rainfall, ice and snow melting, dam breakage and the lack of capacity of the river watercourse to convey the excess water. Floods are natural phenomena which cause disasters like destruction of infrastructure, damages in environmental and agricultural lands, mortality and economic losses. Many frequency distribution models have been developed for determination of hydraulic frequency, but none of the distribution models is accepted as a universal distribution to describe the flood frequency at any gauging site. The selection of a suitable distribution usually depends on the characteristics of available data at a particular site. We need to estimate the flood magnitude at a particular site for various purposes including construction of hydraulic structures (barrages, canals, bridges, dams, embankments, reservoirs and spillways), insurance studies, planning of flood management and rescue operations. We come across a number of methods which are available in the literature for flood magnitude estimation, but at-site flood frequency analysis remains the most direct method of estimation of flood frequency at a particular site.
Table 1

Summary of Torne River gauging sites

Station name

Station no.

Latitude

Longitude

Catchment area (km\(^2\))

Period for time series

Kukkolankoski Övre

16722

65.98

24.06

33,929.60

1911–2018

Pajala Pumphus

2012

67.21

23.40

11,038.10

1969–2018

Abisko

2357

68.19

19.99

3345.50

1984–2018

Junosuando

04

67.43

22.55

4348.00

1967–2018

Övre Abiskojokk

957

68.36

18.78

566.30

1985–2018

To describe the flood frequency at a particular site, the choice of an appropriate probability distribution and parameter estimation method are of immense importance. The probability distributions used in this study include the generalized extreme value (GEV) distribution, Pearson type-III (P3) distribution, generalized logistic (GLO) distribution, Gumbel (GUM) distribution and three-parameter log-normal (LN3) distribution. These distributions are recommended for at-sight flood frequency analysis in various countries [2, 4, 24]. Furthermore, these distributions are most commonly traced in the hydrological literature for at-site and regional flood frequency analysis.

Cicioni et al. [3] conducted at-site flood frequency analysis in Italy by using 107 stations. They identified LN3 and GEV as best-fitting distributions based on the Kolmogorov–Smirnov (KS), Anderson–Darling (AD) and Cramer–von Mises (CVM) goodness-of-fit tests. Saf [23] found the GLO as the most suitable distribution for Upper-West Mediterranean subregion in Turkey. Mkhandi et al. [17] used annual maximum flood data of 407 stations from 11 countries of Southern Africa to conduct the regional frequency analysis. They identified Pearson type-III (P3) distribution with probability weighted moment (PWM) method and log-Pearson type-III (LP3) with a method of moment (MOM) as the most suitable distributions for the regions. Młyński et al. [18] identified log-normal distribution as the most suitable for the upper Vistula Basin region (Poland). There have been many studies in the past literature on the comparison of various probability distributions with different parameter estimation methods for at-site flood frequency analysis, e.g. see [1, 5, 7, 22] (Fig. 1).
Fig. 1

Locations of gauging stations used in this study

The most commonly used methods for estimation of parameters in flood frequency analysis are the maximum likelihood estimation (MLE) method, the method of moments (MOM), the L-moments (LM) method and the probability weighted moments method (PWM). The MLE method is an efficient and most widely used method for estimation of parameters. Recently, the LM method has gained more attention in the hydrological literature for estimation of parameters of probability distributions. In this research study, we used LM and MLE methods for estimation of parameters of the candidate probability distribution.

The methods usually use for selection of the best distribution are goodness-of-fit (GOF) tests (e.g. Anderson–Darling and Cramér–von Mises), accuracy measures (e.g. root mean square error and root mean squared percentage error), goodness-of-fit indices (e.g. AIC and BIC) and graphical methods (e.g. Q–Q plot and L-moment ratio diagram). In the hydrological literature, researchers have used different methods in order to find the best probability distribution. To identify the best-suited distribution at each site of Torne River, we have used goodness-of-fit (GOF) test and accuracy measures. The GOF tests are used to test that the data come from a specific distribution. The accuracy measures provide a term by term comparison of the deviation between the hypothetical distribution and the empirical distribution of the data. The details about accuracy measures and goodness-of-fit test used in this study are described in Sect. 3.

The estimation of flood frequency of the high return period is of great interest in flood frequency analysis. The flood frequency estimation of return periods is always associated with uncertainties. Uncertainty in flood frequency analysis arises from many sources. Uncertainties included in water resources management can be distinguished in data uncertainties, structural uncertainties and model/parameters uncertainties, see e.g. [13, 14]. Furthermore, there is uncertainty in the estimation of flood frequency of return periods much larger than the actual records, particularly in the type of probability density function (PDF) and its parameters. This is particularly true on the right tail of the PDF, the region of interest for flooding. In addition, there is uncertainty in the measurements. For example, see [15] for an in-depth discussion on epistemic uncertainty (reducible uncertainty) and natural uncertainty (irreducible uncertainty). The flood estimation on high return periods are always associated with high uncertainties. In this study, we quantify the uncertainty of a given quantile estimate for a specific fitted distribution by using the parametric bootstrap method.

In this research paper, the flood frequency calculation, using statistical distribution, is addressed for gauged catchments, for which we dispose a respectively long-term hydrological time series. The choice of an appropriate probability distribution and associated parameter estimation method is vital for at-site flood frequency analysis. The core objective of this study is to find the best-fit distribution among the candidate probability distributions with a particular method of estimation (MLE or LM) for annual maximum peak flow data at each site of the Torne River by using goodness-of-fit (GOF) tests and accuracy measures. We are also interested to look that, is there any best overall distribution and fitting method for these five sites of Torne River?. Finally, to estimate the quantiles of flood magnitude for the return period of 5, 10, 25, 50, 100, 200 and 500 years with non-exceedance probability at each site of the river using the best-fit probability distribution. To address the uncertainty of flood estimations, we estimate standard error of estimated quantiles and construct 95% confidence interval of flood quantile for the return period using the parametric bootstrap method. This is a first study for at-site flood frequency analysis of Torne River.

This paper is organized as follows: Sect. 2 describes the study area and available data for the analysis. Section 3 deals with the model description, parameter estimation method and model comparison methods. Section 4 provides the results and discussion of the application of L-moments and maximum likelihood estimation method based flood frequency analysis of five gauging sites on the Torne River. Finally, Sect. 5 concludes the article.
Table 2

Probability density and quantiles functions of the probability distributions

Distributions

Probability density function \(f\left( y \right)\)

Quantile function y(F)

GEV

\(\frac{1}{\alpha }{\left[ {1 - \kappa \left( {\frac{{y - \mu }}{\alpha }} \right) } \right] ^{\frac{1}{k} - 1}}\exp \left\{ { - {{\left[ {1 - \kappa \left( {\frac{{y - \mu }}{\alpha }} \right) } \right] }^{\frac{1}{k}}}} \right\}\)

\(\mu + \frac{\alpha }{\kappa }\left[ {1 - {{\left( { - \log F} \right) }^\kappa }} \right]\)

P3

\(\frac{1}{{{\beta ^\alpha }\varGamma \alpha }}{\left( {y - \mu } \right) ^{\alpha - 1}}\exp \left\{ { - \frac{{\left( {y - \mu } \right) }}{\beta }} \right\}\)

Explicit analytical form is not available

GLO

\(\frac{1}{\alpha }\left[ {1 - \kappa {{\left( {\frac{{y - \mu }}{\alpha }} \right) }^{(\frac{1}{k} - 1)}}} \right] {\left[ {1 + {{\left\{ {1 - \kappa \left( {\frac{{y - \mu }}{\alpha }} \right) } \right\} }^{1/\kappa }}} \right] ^{ - 2}}\)

\(\mu + \frac{\alpha }{\kappa }\left[ {1 - {{\left\{ {\left( {1 - F} \right) /F} \right\} }^\kappa }} \right] \;\)

LN3

\(\frac{1}{{\alpha \sqrt{2\pi } }}\exp \left[ { - \log \left\{ {\frac{{1 - k(y - \mu )}}{\alpha }} \right\} - \frac{1}{2}{{\left[ { - \frac{1}{k}\log \left\{ {\frac{{1 - k(y - \mu )}}{\alpha }} \right\} } \right] }^2}} \right]\)

Explicit analytical form is not available

GUM

\(\frac{1}{\alpha }\exp \left[ { - \frac{{y - \mu }}{\alpha } - \exp \left( { - \frac{{y - \mu }}{\alpha }} \right) } \right]\)

\(\mu - \alpha \log \left( { - \log F} \right)\)

where \(\mu\), \(\alpha\) and \(\kappa\) are the location, scale and shape parameters of the distribution

2 Study area and data

The Torne River works as a border between northern Sweden and Finland, with total catchment area 40157 km\(^2\) of which 60% is within Swedish border and the remaining area is in Finland. The Muonio River, which is the biggest contributor of the Torne River, joins shortly after Pajala Pumphus. Another contributor river Lainio (259.74 km long) joins the Torne river shortly after Junosuando. In springtime, water flow is above average level, which converts into flood and this flood causes the damages to the waterfront constructions and buildings [6]. Therefore, Torne River is frequently affected by flooding problem [Swedish meteorological and hydrological institute (SMHI)]. The data of annual maximum flow of five gauging sites of Torne River (Swedish: Torneälven) are considered in this study. The data have been collected from SMHI (www.smhi.se). The length of the data series varies from 34 to 108 years. The summary of Torne River gauging sites characteristics is presented in Table 1.

3 Methodology

3.1 Candidate probability distributions

To describe the flood frequency at a particular site, the selection of an appropriate probability distribution is always important. We have considered generalized extreme value (GEV), Pearson type-III (P3) distribution, generalized logistic (GLO) distribution, Gumbel (GUM) distribution and three-parameter log-normal (LN3) distribution for the analysis of flood frequency at five gauging sites of the Torne River. The probability density function (pdf) and quantile function y(F) of these distributions are summarized in Table 2. These distributions are common in the literature and are recommended distributions for flood frequency analysis in many countries (see e.g. [2, 4, 21, 22]). We explain the detail of parameter estimation method (MLE and LM) in the following subsection.
Table 3

Assumptions results of five gauging sites of Torne River

Station name

n

r

P value

Mann–Kendall test

Wald–Wolfowitz test

Test statistics

P value

P value

Kukkolankoski Övre

108

−  0.08

0.23

1.85

0.06

0.86

Pajala Pumphus

50

−  0.11

0.25

0.32

0.75

0.94

Abisko

35

0.03

0.80

0.21

0.83

0.17

Junosuando

52

−  0.09

0.34

1.41

0.16

0.95

Övre Abiskojokk

34

−  0.06

0.66

−  0.62

0.53

0.33

where n represents the sample size of time series and r indicates the Pearson correlation coefficient

3.2 Maximum likelihood estimation (MLE) method

The MLE method estimates the parameters by maximizing the log-likelihood function of a probability distribution. Suppose we have n independent and identically distributed observations \({y_1},\,{y_2}, \ldots ,\,{y_n}\). Each \(y_i\) has a pdf given by \(f(y_i;\varvec{\mu })\). Here, \(\varvec{\mu } = ({\mu _1},\;{\mu _2},\ldots ,{\mu _k})\) is a vector of unknown parameters to be estimated. Then, the log-likelihood function is defined as \(l\varvec{\left( \mu \right) } = \sum \nolimits _{i = 1}^n {\log f\left( {{y_i};\varvec{\mu } } \right) }\). The maximum likelihood estimate of \(\varvec{\mu }\) is the value of the parameter vector \(\varvec{\mu }\) that maximize the \(l\varvec{\left( \mu \right) }\) for given data Y. We use numerical optimization methods in order to search \(\varvec{\mu }\) which give the maximum value of \(l\varvec{\left( \mu \right) }\). Many numerical optimization methods, e.g. Newton–Raphson method, Nelder and Mead, differential evolution, etc. are found in the literature. We have used Nelder and Mead method for numerical optimization proposed by Nelder and Mead [19].

3.3 Theory of L-moments (LM)

L-moments are introduced by Hosking [9, 10], which are linear functions of probability weighted moments (PWM’s). L-moments are alternative to the conventional moments, but computed from linear combinations of order statistics. L-moments can be defined for any random variable Y whose mean exists [10]. The rth-order PWM (\(\beta _r\)) is defined as
$$\begin{aligned} {\beta _r}=\int \limits _0^1 {y(F )F{{(y)}^r}} \hbox {d}F r = 0,1,2, \ldots \end{aligned}$$
where F(y) is a cumulative probability distribution and y(F) is a quantile function of distribution. The first four L-moments in terms of linear combination of PWM are defined as
$$\begin{aligned} \begin{array}{l} {\lambda _1} = {\beta _0}\\ {\lambda _2} = 2{\beta _1} - {\beta _0}\\ {\lambda _3} = 6{\beta _2} - 6{\beta _1} + {\beta _0}\\ {\lambda _4} = 20{\beta _3} - 30{\beta _2} + 12{\beta _1} - {\beta _0} \end{array} \end{aligned}$$
The first L-moment (\(\lambda _1\)) is a measure of location (mean), while the second L-moment represents the dispersion. Finally, the L-moment ratios defined by Hosking [10] are given below
$$\begin{aligned} \begin{array}{l} {\mathrm{{L-Coefficient}}}\,{\mathrm{{of}}}\,{\mathrm{{variation}}}\,({\tau _2}) = \frac{{{{{\lambda }}_2}}}{{{{{\lambda }}_1}}}\\ {\mathrm{{L-Skewness }}}({\tau _3}) = \frac{{{{{\lambda }}_3}}}{{{{{\lambda }}_2}}}\\ {\mathrm{{L-Kurtosis}}}\,({\tau _4}) = \frac{{{{{\lambda }}_4}}}{{{{{\lambda }}_2}}} \end{array} \end{aligned}$$
The unbiased sample estimators of \(\beta _i\) of the first four PWMs for any distribution can be computed as follows
$$\begin{aligned} \begin{array}{*{20}{l}} {\begin{array}{*{20}{l}} {{b_0} = {n^{ - 1}}\sum \limits _{j = 1}^n {{y_{j:n}}} }\\ {{b_1} = {n^{ - 1}}\sum \limits _{j = 2}^n {\frac{{(j - 1)}}{{(n - 1)}}{y_{j:n}}} } \end{array}}\\ {{b_2} = {n^{ - 1}}\sum \limits _{j = 3}^n {\frac{{(j - 1)(j - 2)}}{{(n - 1)(n - 2)}}{y_{j:n}}} }\\ {{b_3} = {n^{ - 1}}\sum \limits _{j = 4}^n {\frac{{(j - 1)(j - 2)(j - 3)}}{{(n - 1)(n - 2)(n - 3)}}{y_{j:n}}} } \end{array} \end{aligned}$$
where the data (\(y_{1:n}\)) are an ordered sample in ascending order from 1 to n. The parameters with L-moments estimation method are obtained by equating the sample L-moments with distribution L-moments.

3.4 Standard error of estimated parameters

The standard errors (SE) of estimated parameters indicate a measure of reliability of estimates and performance of estimation technique. In this study, we have obtained SE of estimated parameters by Monte Carlo simulation method. The description of this method is given as
  • We use estimated parameters with MLE and LM method at each gauging site and draw 1000 samples of size equal to the length of data from each probability distribution.

  • For each simulated sample, we obtain the MLE and LM estimates for the parameters of the distribution.

  • For each gauging site, the standard errors are obtained by taking the standard deviation of these 1000 MLE and LM estimates of the parameters of each distribution.

Table 4

Estimated parameters with MLE and LM methods

Station

Probability distributions

GEV

P3

GUM

GLO

LN3

MLE

MLE

MLE

MLE

MLE

\({\hat{\mu }}\)

\({\hat{\alpha }}\)

\({\hat{\kappa }}\)

\({\hat{\mu }}\)

\({\hat{\alpha }}\)

\({\hat{\kappa }}\)

\({\hat{\mu }}\)

\({\hat{\alpha }}\)

\({\hat{\mu }}\)

\({\hat{\alpha }}\)

\({\hat{\kappa }}\)

\({\hat{\mu }}\)

\({\hat{\alpha }}\)

\({\hat{\kappa }}\)

16722

1993.73

(48.38)

450.20

(34.34)

0.16

(0.07)

297.25

(763.60)

128.76

(58.50)

14.72

(12.50)

1955.47

(43.91)

433.50

(33.52)

2149.33

(49.03)

280.04

(22.95)

− 0.11

(0.07)

2152.08

(51.38)

483.32

(33.78)

− 0.17

(0.09)

2012

768.35

(34.29)

223.53

(24.67)

0.43

(0.08)

1663.77

(516.20)

− 52.30

(36.45)

15.99

(20.76)

718.05

(36.57)

240.19

(25.24)

839.62

(29.98)

117.07

(13.92)

0.08

(0.10)

844.26

(31.44)

204.85

(21.03)

0.16

(0.12)

2357

207.26

(9.80)

52.41

(6.86)

0.24

(0.11)

− 285.00

(869.20)

5.69

(9.86)

90.04

(308.12)

200.73

(9.01)

50.87

(6.84)

225.42

(9.49)

30.92

(4.35)

− 0.04

(0.12)

225.60

(9.93)

53.77

(6.45)

− 0.06

(0.15)

4

281.69

(11.84)

74.13

(8.10)

0.02

(0.11)

113.75

(51.00)

40.43

(15.39)

5.18

(3.10)

280.81

(10.49)

73.61

(8.00)

307.58

(11.97)

48.63

(6.16)

− 0.20

(0.08)

308.92

(12.80)

85.91

(9.24)

− 0.32

(0.12)

957

108.80

(5.18)

26.47

(3.77)

0.08

(0.14)

61.85

(27.44)

17.66

(13.39)

3.42

(4.08)

107.71

(4.56)

25.83

(3.47)

119.34

(5.56)

17.35

(2.54)

− 0.13

(0.14)

118.11

(5.76)

29.65

(3.86)

− 0.28

(0.19)

Station

Probability distributions

GEV

P3

GUM

GLO

LN3

LM

LM

LM

LM

LM

\({\hat{\mu }}\)

\({\hat{\alpha }}\)

\({\hat{\kappa }}\)

\({\hat{\mu }}\)

\({\hat{\alpha }}\)

\({\hat{\kappa }}\)

\({\hat{\mu }}\)

\({\hat{\alpha }}\)

\({\hat{\mu }}\)

\({\hat{\alpha }}\)

\({\hat{\kappa }}\)

\({\hat{\mu }}\)

\({\hat{\alpha }}\)

\({\hat{\kappa }}\)

16722

1990.07

(46.62)

456.59

(36.53)

0.15

(0.07)

11.28

(994.14)

114.04

(57.03)

19.12

(19.50)

1959.44

(40.39)

403.29

(36.53)

2157.98

(46.90)

276.98

(22.79)

− 0.07

(0.05)

2154.46

(48.38)

490.66

(35.96)

− 0.15

(0.09)

2012

764.68

(33.65)

220.31

(23.58)

0.40

(0.11)

1964.77

(781.80)

− 39.13

(42.25)

29.07

(22.81)

728.59

(26.28)

170.98

(22.34)

839.06

(28.85)

117.80

(13.84)

0.06

(0.08)

840.27

(32.31)

208.73

(22.61)

0.12

(0.13)

2357

206.33

(10.49)

53.49

(6.96)

0.22

(0.13)

− 305.49

(270.70)

5.80

(15.19)

91.88

(23.56)

201.21

(8.01)

45.18

(7.06)

225.54

(9.51)

31.26

(4.46)

− 0.03

(0.09)

225.36

(10.37)

55.39

(6.62)

− 0.07

(0.16)

4

279.83

(10.94)

73.16

(9.12)

− 0.01

(0.10)

148.25

(89.07)

50.90

(21.98)

3.43

(8.62)

280.25

(10.75)

74.02

(9.07)

308.19

(12.16)

48.68

(6.18)

− 0.18

(0.08)

306.66

(13.10)

85.97

(9.70)

− 0.37

(0.14)

957

109.40

(5.50)

28.50

(4.18)

0.14

(0.13)

− 3.65

(103.66)

7.84

(7.83)

16.06

(20.99)

107.62

(4.55)

25.38

(4.10)

119.92

(5.26)

17.40

(2.54)

− 0.08

(0.10)

119.68

(5.90)

30.82

(3.93)

− 0.17

(0.17)

Here, \({\hat{\mu }}\), \({\hat{\alpha }}\) and \({\hat{\kappa }}\) represent the estimated location, scale and shape parameters, respectively. The value in parenthesis is a standard error of estimated parameter

Table 5

Descriptive statistics (cubic metre per second)

Station

n

Mean

Median

S

CV

Skewness

Kurtosis

16722

108

2192.22

2140.00

493.03

0.22

0.37

− 0.19

2012

50

827.28

837.53

211.16

0.26

− 0.54

0.90

2357

35

227.29

223.53

54.71

0.24

0.17

− 0.07

04

52

322.98

296.50

93.32

0.29

0.93

0.95

957

34

122.27

120.67

31.39

0.26

0.73

1.24

Here, S represents the standard deviation and CV indicates the coefficient of variation

Table 6

Rank score of distribution in both GOF tests and accuracy measures

Station

Method

Distribution

AD

RMSE

MAE

RMSEP

MAEP

\(R^2\)

Total rank

Kukkolankoski Övre

MLE

GEV

7

5

7

7

7

5

38

P3

6

7

6

6

5

7

37

GLO

4

3

4

4

4

3

22

LN3

5

6

5

5

6

6

33

GUM

2

1

2

2

2

1

10

LM

GEV

10

10

10

10

10

10

60

P3

9

9

9

9

9

9

54

GLO

3

4

3

3

3

4

20

LN3

8

8

8

8

8

8

48

GUM

1

2

1

1

1

2

08

Pajala Pumphus (2012)

MLE

GEV

3

4

7

4

4

4

26

P3

6

5

5

5

5

5

31

GLO

4

8

3

8

8

8

39

LN3

7

6

6

6

6

6

37

GUM

1

1

1

2

1

1

7

LM

GEV

5

3

8

3

3

3

25

P3

9

9

9

9

9

9

54

GLO

8

7

4

7

7

7

40

LN3

10

10

10

10

10

10

60

GUM

2

2

2

1

2

2

11

Abisko (2357)

MLE

GEV

3

3

3

3

4

3

19

P3

8

5

6

4

6

5

34

GLO

4

6

4

9

3

6

32

LN3

7

4

5

5

5

4

30

GUM

2

1

2

2

2

1

10

LM

GEV

6

7

8

6

8

7

42

P3

9

9

9

7

9

9

52

GLO

5

8

7

10

7

8

45

LN3

10

10

10

8

10

10

58

GUM

1

2

1

1

1

2

8

Junosuando (4)

MLE

GEV

5

1

1

2

2

1

12

P3

2

1

1

2

2

1

09

GLO

10

10

6

9

7

10

52

LN3

4

3

4

5

5

3

24

GUM

8

6

9

6

8

6

43

LM

GEV

6

9

8

7

6

9

45

P3

1

4

5

1

1

4

16

GLO

9

5

3

10

9

5

41

LN3

3

7

7

4

4

7

32

GUM

7

8

10

8

10

8

51

Övre Abiskojokk (957)

MLE

GEV

6

5

7

7

7

5

37

P3

1

7

1

4

1

7

21

GLO

5

10

4

3

4

10

36

LN3

4

6

5

6

6

6

33

GUM

3

9

3

5

3

9

32

LM

GEV

8

1

8

9

8

1

35

P3

9

2

10

10

10

2

43

GLO

7

4

6

1

5

4

27

LN3

10

3

9

8

9

3

42

GUM

2

8

2

2

2

8

24

The bold values are rank score for best-fitted distribution

Table 7

Quantile estimates of flood with 95% confidence interval at five gauging sites of Torne River

Station

Distribution

Method

Statistics

Non-exceedance probability and return periods (years)

0.80

0.90

0.96

0.98

0.99

1.00

1.00

5.00

10.00

25.00

50.00

100.00

200.00

500.00

16772

GEV

LM

Lower

2475.06

2703.84

2929.24

3056.12

3157.22

3236.94

3324.16

Fit

2601.25

2857.74

3142.00

3327.52

3492.74

3640.52

3812.68

Upper

2728.22

3011.45

3358.67

3612.61

3867.51

4120.59

4461.08

\(\sigma _{\mathrm{s}}\)

64.33

78.28

109.83

142.87

181.92

225.49

288.13

2012

LN3

LM

Lower

943.17

1016.74

1080.04

1114.79

1141.18

1164.31

1188.27

Fit

1007.08

1087.59

1168.75

1218.72

1262.09

1300.52

1345.53

Upper

1067.68

1155.55

1258.97

1332.30

1402.33

1470.83

1558.80

\(\sigma _{\mathrm{s}}\)

31.69

35.55

45.80

55.86

66.96

78.66

94.65

2357

LN3

LM

Lower

250.78

272.75

293.06

304.31

313.38

321.13

330.25

Fit

273.37

299.61

328.49

347.65

365.24

381.63

401.88

Upper

296.48

327.91

368.54

400.46

433.38

466.35

511.51

\(\sigma _{\mathrm{s}}\)

11.66

14.15

19.52

24.73

30.66

37.15

46.49

04

GLO

MLE

Lower

350.20

392.09

446.76

486.07

529.94

577.23

641.49

Fit

385.52

442.47

525.23

596.78

678.28

771.63

916.74

Upper

424.56

499.39

629.33

754.07

920.40

1132.33

1491.74

\(\sigma _{\mathrm{s}}\)

18.70

27.11

46.15

68.45

99.73

142.66

223.51

957

P3

LM

Lower

133.74

146.98

159.11

166.19

171.67

176.38

181.46

Fit

147.67

163.85

182.29

194.88

206.65

217.81

231.83

Upper

162.08

183.06

210.76

231.51

251.71

272.35

299.50

\(\sigma _{\mathrm{s}}\)

7.19

9.28

13.09

16.44

20.02

23.77

28.89

The lower limit is the fifth percentile and upper limit is the 95th percentile of the quantile distribution. The standard deviation of a quantile distribution is \(\sigma _{\mathrm{s}}\) which represents the standard error of quantile estimates. The flood quantile estimates for different return periods indicate with bold value

3.5 Goodness-of-fit (GOF) tests

The goodness-of-fit tests are used to test that the observed data follow a particular distribution. We consider the Anderson–Darling (AD) test for the study. This test is often used in flood frequency analysis and has shown good performance in case of small sample size and heavy-tailed distributions [12, 20]. The test statistic for AD test is defined as
$$\begin{aligned} \hbox {AD}=-n-S \end{aligned}$$
where \({\sum \nolimits _{i = 1}^n {\left[ {\frac{{2i - 1}}{n}\left( {\log (1 - F({y_{n - i + 1}})) + \log (F({y_i}))} \right) } \right] } }\)

where \(F({y_i})\,\) represents the cumulative distribution function (CDF) of the specified distribution.

3.6 Accuracy measure method

In accuracy measure (AM) methods, we have used the mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), root mean squared percentage error (RMSPE) and correlation coefficient (\(R^2\)) to evaluate how adequately a given distribution fits the observed data. These measures are defined as
$$\begin{aligned} \hbox {MAE}= & {} \frac{1}{n}\sum \limits _{i = 1}^n | F({y_i}) - F({{\hat{y}}_i})|\\ \hbox {MAPE}= & {} \frac{{100}}{n}\sum \limits _{i = 1}^n \left| \frac{{(F({y_i}) - F({{{\hat{y}}}_i}))}}{{F({y_i})}}\right| \\ \hbox {RMSE}= & {} \sqrt{\frac{{\sum \nolimits _{i = 1}^n {{{(F({y_i}) - F({{{\hat{y}}}_i}))}^2}} }}{n}} \\ \hbox {RMSPE}= & {} \sqrt{\frac{1}{n}\sum \limits _{i = 1}^n {{{\left( \frac{{F({y_i}) - F({{{\hat{y}}}_i})}}{{F({y_i})}}\right) }^2}} } {\times } 100\\ {R^2}= & {} \frac{{\sum \nolimits _{i = 1}^n {{{(F({{{\hat{y}}}_i}) - {\bar{F}}({y_i}))}^2}} }}{{\sum \nolimits _{i = 1}^n {{{(F({{{\hat{y}}}_i}) - {\bar{F}}({y_i}))}^2}} + \sum \nolimits _{i = 1}^n {{{(F({y_i}) - F({{{\hat{y}}}_i}))}^2}} }} \end{aligned}$$
where \({\bar{F}}({y_i}) = \frac{{\sum \nolimits _{i = 1}^n {F({{{\hat{y}}}_i})} }}{n}\) and n represents the size of the data series. In all above accuracy measures, \(F({y_i})\) is the empirical cumulative distribution function (CDF) of the data (observed ordered values) and \(F({{\hat{y}}_i})\) indicates the theoretical CDF of the distribution (ordered estimated values from the distribution).

3.7 Quantile estimation

After selection of the best probability distribution, the main goal of flood frequency analysis is to estimate the quantile \({y_T}\) for a return period (T) of scientific relevance. \(P(Y \geqslant {y_T}) = \frac{1}{T}\) indicates the probability of exceedance from flood level \({y_T}\) once in T years. The cumulative probability of non-exceedance is defined as
$$\begin{aligned} F = F({y_T}) = P(Y \leqslant {y_T}) = 1 - P(Y \geqslant {y_T}) = 1 - \frac{1}{T} \end{aligned}$$
The distribution function \(F({y_T})\) can be expressed in inverse form as \({y_T} = y(F)\), and we can directly evaluate estimated quantile \({y_T}\) by replacing F. Sometimes, inverse of \(F({y_T})\) does not exist analytically, and then, the numerical method is used to evaluate \({y_T}\) for the given value of F. The expressions of quantile function of candidate distributions are summarized in Table 2. The quantile estimate for T years is calculated by substituting the value of \(F =(1-\frac{1}{T})\) in the expressions of quantile in Table 2. The standard error of estimated quantiles represents the uncertainty in the estimation of flood frequency of return periods. The confidence interval of flood quantile gives an estimated range of values which is likely to include the flood frequency of return periods. We use a parametric bootstrap method for estimation of standard error of estimated quantiles and confidence interval of flood quantiles of return periods. This method is more precise than an asymptotic computation when n is small [16, p. 133]. The detail of procedure involves in parametric bootstrap method is given in [16, p. 133].

4 Result and discussion

We summarized the basic statistics of five gauging sites in Table 5. All data on gauging sites in the table are in cubic metre per second. It is observed that all data at these sites are skewed. This is a enough evidence to model the data with non-normal distribution. In flood frequency analysis, the basic statistical assumptions are independence, randomness and stationarity of the data series (see e.g. [8, 11]). The independence and randomness of the data series at given site are tested by using correlation coefficient (r) at lag-1 and Wald–Wolfowitz (WW) test, respectively. To check the stationarity of the data series, Mann–Kendall (MK) test has been applied. The assumptions verification results are summarized in Table 3. The results in Table 3 indicate that the data series at each gauging site of Torne River are suitable for flood frequency analysis and probability density estimation.

The estimated parameters for each distribution at each gauging site by using MLE and LM method of estimation along with standard error (SE) are reported in Table 4. To identify the best distribution at each site, we use GOF tests and accuracy measures. Each distribution with parameter estimation method is ranked in each GOF test and accuracy measure in Table 6. The distribution is assigned a rank score between 1 and 10 in GOF test and accuracy measures, rank score 10 for the best-fitted and 1 for the worse fitted distribution. The rank score scheme is based on the relative magnitude of accuracy measures and AD test P value. The distribution with the lowest RMSE, lowest MAE, lowest RMSEP, lowest MAEP or the highest \(R^2\) has the highest rank score value 10. In AD test, the distribution with the highest P value has the highest rank score value 10. The best distribution with estimation method at each site is identified based on the total rank score in GOF tests and accuracy measures methods. The total rank score in Table 6 indicates that GLO with MLE estimation method is best for Junosuando. For site Pajala Pumphus and Abisko, the LN3 distribution is performed better than other distributions with the LM method. The GEV and PE3 distribution with the LM method are best-fit distributions for gauging site Kukkolankoski Övre and Övre Abiskojokk, respectively.

In this study, a single distribution has not emerged as the best distribution for all gauging sites. This was also the case in [1, 5]. Overall, the LM estimation method performed better for identifying the suitable distribution (also see, [1]). The most suited distribution with MLE estimation method is identified at gauging site which has the highest CV and skewness, see Tables 5 and 6. It seems that the sites having extreme average of annual maxima of flood and catchment area (either very large or very small) are in favour of the LM method of estimation, see Tables 1, 5 and 6. If we look the landscape setting, the gauging sites which are at an extreme position (close and far away) to the Gulf of Bothnia are in favour of the LM estimation method. The sample size of the time series does not seem to be an important factor in favour of particular distribution or estimation method in this study.

One major objective of flood frequency analysis is to estimate the quantiles in the extreme upper tail of the best-fitted distribution at each gauging site. The quantiles estimate for the return periods 5, 10, 25, 50, 100, 200 and 500 years are calculated by using quantile function and parameters value of the best-fitted distributions. Quantile estimate \({y_T}\) with non-exceedance probability F for the best-fitted distributions is given in Table 7. The estimate of uncertainty (\(\sigma _{\mathrm{s}}\)) in quantile estimates and 95% confidence intervals of quantiles of flood for different return period are also presented in Table 7. The SE indicates that longer return periods have more uncertainty around the flood quantile estimates.

5 Conclusion

In this study, the annual maximum steam flow series of five gauging sites of Torne River are examined. Flood frequency analysis is performed by using GEV, P3, GUM, GLO and LN3 distributions. The MLE and LM parameter estimation techniques are used to estimate the distribution’s parameters. The study investigates the selection of best-fit probability distribution and estimation method for at-site flood frequency analysis of Torne river. The best-fit frequency distribution is identified at each gauging site based on the highest total rank score in goodness-of-fit tests and accuracy measures.

The results indicate that the GLO distribution using MLE for gauging site Junosuando and the LN3 distribution with a LM method for Pajala Pumphus and Abisko perform better than other distributions of this study. The GEV and P3 distributions using the LM method are the most suitable distribution at Kukkolankoski Övre and Övre Abiskojokk, respectively. At most gauging sites, the best distributions using LM estimation method are identified as the best-suited distributions.

The results found in this research study for flood frequency analysis of Torne River can be used in flood study, water resource planning and designing of hydraulic structures within the same basin and similar catchments. The best-fitted distributions used in this study could be considered as candidate distributions for regional flood frequency analysis of Torne River basin or at-site flood frequency analysis on other rivers in Sweden as well.

Notes

Acknowledgements

Open access funding provided by Stockholm University.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. 1.
    Ahmad I, Fawad M, Mahmood I (2015) At-site flood frequency analysis of annual maximum stream flows in Pakistan using robust estimation methods. Pol J Environ Stud 24(6):2345–2353CrossRefGoogle Scholar
  2. 2.
    Castellarin A, Kohnová S, Gaál L, Fleig A, Salinas JL, Toumazis A, Kjeldsen TR, Macdonald N (2012) Review of applied-statistical methods for flood-frequency analysis in Europe. Technical report, (NERC) Centre for Ecology & HydrologyGoogle Scholar
  3. 3.
    Cicioni G, Giuliano G, Spaziani FM (1973) Best fitting of probability functions to a set of data for flood studies. In: Floods and droughtsGoogle Scholar
  4. 4.
    Cunnane C (1989) Statistical distributions for flood frequency analysis. Operational Hydrology Report (WMO)Google Scholar
  5. 5.
    Drissia TK, Jothiprakash V, Anitha AB (2019) Flood frequency analysis using L moments: a comparison between at-site and regional approach. Water Resour Manag 33(3):1013–1037CrossRefGoogle Scholar
  6. 6.
    Elfvendahl S, Liljaniemi P, Salonen N (2006) The River Torne international watershed: common Finnish and Swedish typology, reference conditions and a suggested harmonised monitoring program: results from the TRIWA project. County Administrative Board of Norrbotten [Länsstyrelsen i Norrbottens län]Google Scholar
  7. 7.
    Haddad K, Rahman A (2011) Selection of the best fit flood frequency distribution and parameter estimation procedure: a case study for Tasmania in Australia. Stoch Environ Res Risk Assess 25(3):415–428CrossRefGoogle Scholar
  8. 8.
    Hamed K, Rao AR (1999) Flood frequency analysis. CRC Press, Boca RatonGoogle Scholar
  9. 9.
    Hosking JRM (1986) The theory of probability weighted moments. IBM Research Rep RC12210, IBM, Yorktown Heights, NY Google ScholarGoogle Scholar
  10. 10.
    Hosking JRM (1990) L-moments: analysis and estimation of distributions using linear combinations of order statistics. J R Stat Soc Ser B (Methodol) 52(1):105–124MathSciNetzbMATHGoogle Scholar
  11. 11.
    Kite GW (2019) Frequency and risk analyses in hydrology. Water Resour Publications, LLC. https://books.google.se/books?id=b9OKxAEACAAJ
  12. 12.
    Laio F (2004) Cramer–von Mises and Anderson–Darling goodness of fit tests for extreme value distributions with unknown parameters. Water Resour Res 40(9):W09308CrossRefGoogle Scholar
  13. 13.
    Leandro J, Leitão JP, de Lima JLMP (2013) Quantifying the uncertainty in the soil conservation service flood hydrographs: a case study in the Azores Islands. J Flood Risk Manag 6(3):279–288CrossRefGoogle Scholar
  14. 14.
    Leandro J, Gander A, Beg MNA, Bhola P, Konnerth I, Willems W, Carvalho R, Disse M (2019) Forecasting upper and lower uncertainty bands of river flood discharges with high predictive skill. J Hydrol 576:749–763CrossRefGoogle Scholar
  15. 15.
    Merz B, Thieken AH (2005) Separating natural and epistemic uncertainty in flood frequency analysis. J Hydrol 309(1–4):114–132CrossRefGoogle Scholar
  16. 16.
    Meylan P, Favre AC, Musy A (2012) Predictive hydrology: a frequency analysis approach. CRC Press, Boca RatonCrossRefGoogle Scholar
  17. 17.
    Mkhandi SH, Kachroo RK, Gunasekara TAG (2000) Flood frequency analysis of Southern Africa: II. Identification of regional distributions. Hydrol Sci J 45(3):449–464CrossRefGoogle Scholar
  18. 18.
    Młyński D, Wałęga A, Stachura T, Kaczor G (2019) A new empirical approach to calculating flood frequency in ungauged catchments: a case study of the upper Vistula basin, Poland. Water 11(3):601CrossRefGoogle Scholar
  19. 19.
    Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313MathSciNetCrossRefGoogle Scholar
  20. 20.
    Önöz B, Bayazit M (1995) Best-fit distributions of largest available flood samples. J Hydrol 167(1–4):195–208CrossRefGoogle Scholar
  21. 21.
    Opere AO, Mkhandi S, Willems P (2006) At site flood frequency analysis for the Nile Equatorial basins. Phys Chem Earth Parts A/B/C 31(15–16):919–927CrossRefGoogle Scholar
  22. 22.
    Rahman AS, Rahman A, Zaman MA, Haddad K, Ahsan A, Imteaz M (2013) A study on selection of probability distributions for at-site flood frequency analysis in Australia. Nat Hazards 69(3):1803–1813CrossRefGoogle Scholar
  23. 23.
    Saf B (2009) Regional flood frequency analysis using L-moments for the West Mediterranean region of Turkey. Water Resour Manag 23(3):531–551CrossRefGoogle Scholar
  24. 24.
    Sevruk B, Geiger H (1981) Selection of distribution types for extremes of precipitation (No. 551.577). Secretariat of the World Meteorological OrganizationGoogle Scholar
  25. 25.
    The Swedish Meteorological and Hydrological Institute (2019) Hydrologiska observationer. Data files retrieved from SMHI hydrological observations. https://vattenwebb.smhi.se/station/. Accessed 20 Mar 2019

Copyright information

© The Author(s) 2019

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Department of StatisticsStockholm UniversityStockholmSweden
  2. 2.Department for EducationLondonUK
  3. 3.Division of Science and TechnologyUniversity of EducationLahorePakistan

Personalised recommendations