# Tail Distribution and Extreme Quantile Estimation Using Non-parametric Approaches

## Abstract

Estimation of tail distributions and extreme quantiles is important in areas such as risk management in finance and insurance in relation to extreme or catastrophic events. The main difficulty from the statistical perspective is that the available data to base the estimates on is very sparse, which calls for tailored estimation methods. In this chapter, we provide a survey of currently used parametric and non-parametric methods, and provide some perspectives on how to move forward with non-parametric kernel-based estimation.

## Keywords

Risk measures Extreme value theory Kernel estimation Bandwidth selection## 1 Introduction

This chapter presents a position survey on the overall objectives and specific challenges encompassing the state of the art in tail distribution and extreme quantile estimation of currently used parametric and non-parametric approaches and their application to Financial Risk Measurement. What is envisioned, is an enhanced non-parametric estimation method based on the Extreme Value Theory approach. The compounding perspectives of current challenges are addressed, like the threshold level of excess data to be chosen for extreme values and the bandwidth selection from a bias reduction perspective. The application of the kernel estimation approach and the use of Expected Shortfall as a coherent risk measure instead of the Value at Risk are presented. The extension to multivariate data is addressed and its challenges identified.

*Overview of the Following Sections.* In the following sections, Financial risk measures are presented in Sect. 2. Section 3, covers Extreme Value Theory, Sect. 4, Parametric estimation and Semi-parametric estimation methods, Sect. 5, Non-Parametric estimation methods and Sect. 6, the perspectives identified by the addressed challenges when estimating the presented financial risk measures.

## 2 Financial Risk Measures

The Long Term Capital Management collapse and the 1998 Russian debt crisis, the Latin American and Asian currency crises and more recently, the U.S. mortgage credit market turmoil, followed by the bankruptcy of Lehman Brothers and the world’s biggest-ever trading loss at Société Générale are some examples of financial disasters during the last twenty years. In response to the serious financial crises, like the recent global financial crisis (2007–2008), regulators have become more concerned about the protection of financial institutions against catastrophic market risks. We recall that market risk is the risk that the value of an investment will decrease due to movements in market factors. The difficulty of modelling these rare but extreme events has been greatly reduced by recent advances in Extreme Value Theory (EVT). Value at Risk (VaR) and the related concept of Expected Shortfall (ES) have been the primary tools for measuring risk exposure in the financial services industry for over two decades. Additional literature can be found in [39] for Quantitative Risk Management and in [42] or in [25] for the application of EVT in insurance, finance and other fields.

### 2.1 Value at Risk

*X*of a portfolio over a given time period \(\delta \), then VaR is a risk statistic that measures the risk of holding the portfolio for the time period \(\delta \). Assume that

*X*has a cumulative distribution function (cdf), \(F_{X}\), then we define VaR at level \(\alpha \in (0, 1)\) as

Monotonicity: Higher losses mean higher risk.

Translation Equivariance: Increasing (or decreasing) the loss increases (decreases) the risk by the same amount.

Subadditivity: Diversification decreases risk.

Positive Homogeneity: Doubling the portfolio size doubles the risk.

Any risk measure which satisfies these axioms is said to be coherent. A related concept to VaR, which accounts for the tail mass is the conditional tail expectation (CVaR), or Expected Shortfall (ES). ES is the average loss conditional on the VaR being exceeded and gives risk managers additional valuable information about the tail risk of the distribution. Due to its usefulness as a risk measure, in 2013 the Basel Committee on Bank Supervision has even proposed replacing VaR with ES to measure market risk exposure.

### 2.2 Conditional Value at Risk or Expected Shortfall

## 3 Extreme Value Theory: Two Main Approaches

*F*.

### 3.1 Block Maxima Approach

The Fisher and Tippett [29] and Gnedenko [30] theorems are the fundamental results in EVT. The theorems state that the maximum of a sample of properly normalized independent and identically distributed random variables converges in distribution to one of the three possible distributions: the Weibull, Gumbel or the Fréchet.

### Theorem 1

**(Fisher, Tippett, Gnedenko).**

We say that *F* is in the domain of attraction of \(H_{\gamma }\) and denote this by \(F\in \mathrm {DA}(H_{\gamma })\). The distribution function \(H_{\gamma }(\cdot )\) is called the Generalized Extreme Value distribution (GEV).

If \({\, \gamma >0}\), \(\, F\in \) DA (Fréchet): This domain contains the laws for which the survival function decreases as a power function. Such tails are know as “fat tails" or “heavy tails". In this area of attraction, we find the laws of Pareto, Student, Cauchy, etc.

If \({\, \gamma =0}\), \(\, F\in \) DA (Gumbel): This domain groups laws for which the survival function declines exponentially. This is the case of normal, gamma, log-normal, exponential, etc.

if \({\, \gamma <0}\), \(\, F\in \) DA (Weibull): This domain corresponds to thin tails where the distribution has a finite endpoint. Examples in this class are the uniform and reverse Burr distributions.

The Weibull distribution clearly has a finite endpoint (\(s_{+}(F)=\sup \{x,\, F(x)<1\}\)). This is usually the case of the distribution of mortality and insurance/re-insurance claims for example, see [20]. The Fréchet tail is thicker than the Gumbel’s. Yet, it is well known that the distributions of the return series in most financial markets are heavy tailed (fat tails). The term “fat tails” can have several meanings, the most common being “extreme outcomes occur more frequently than predicted by the normal distribution”.

The block Maxima approach is based on the utilization of maximum or minimum values of these observations within a certain sequence of constant length. For a sufficiently large number *k* of established blocks, the resulting peak values of these *k* blocks of equal length can be used for estimation. The procedure is rather wasteful of data and a relatively large sample is needed for accurate estimate.

### 3.2 Peaks Over Threshold (POT) Approach

The POT (Peaks-Over-Threshold) approach consists of using the generalized Pareto distribution (GPD) to approximate the distribution of excesses over a threshold. This approach has been suggested originally by hydrologists. This approach is generally preferred and forms the basis of our approach below. Both EVT approaches are equivalent by the Pickands-Balkema-de Haan theorem presented in [5, 40].

### Theorem 2

**(Pickands-Balkema-de Haan).**For a large class of underlying distribution functions

*F*,

This means that the conditional excess distribution function \(F_u\), for *u* large, is well approximated by a Generalized Pareto Distribution. Note that the tail index \(\gamma \) is the same for both the GPD and GEV distributions. The tail shape parameter \(\sigma \) and the tail index are the fundamental parameters governing the extreme behavior of the distribution, and the effectiveness of EVT in forecasting depends upon their reliable and accurate estimation. By incorporating information about the tail through our estimates of \(\gamma \) and \(\sigma \), we can obtain VaR and ES estimates, even beyond the reach of the empirical distribution.

## 4 Parametric and Semi-parametric Estimation Methods

Semi-parametric models (e.g., the Hill estimator).

Fully parametric models (e.g., the Generalized Pareto distribution or GPD).

### 4.1 Semi-parametric Estimation

### 4.2 Parametric Estimation

*u*as explained in the following two steps:

**First step—Tail distribution estimation**Let \(X_1, ..., X_n\) follow a distribution*F*and let \(Y_1,\ldots , Y_{N_n}, ( Y_i=X_i-u_n)\) be the exceedances over a chosen threshold \(u_n\). The distribution of excess \( F_{u_n}\) is given by:and then, the distribution$$\begin{aligned} F_{u_n}(y)=P(X-{u_n} \le y\,|\, X > {u_n}) \end{aligned}$$(12)*F*, of the extreme observations, is given by:The distribution of excess \( F_{u_n}\) is approximated by \(G_{\gamma ,\sigma (u_n)}\) and the first step consists in estimating the parameters of this last distribution using the sample \((Y_1,\ldots , Y_{N_n})\). The parameter estimations can be done using MLE. Different methods have been proposed to estimate the parameters of the GPD. Other estimation methods are presented in [26]. The Probability Weighted Moments (PWM) method proposed by Hosking and Wallis [32] for \(\gamma <1/2\) was extended by Diebolt et al. [21] by a generalization of PWM estimators for \(\gamma <3/2\), as for many applications, e.g., in insurance, distributions are known to have a tail index larger than 1.$$\begin{aligned} F({u_n} + y) = F(u_n)+\bar{F}_{u_n}(y) \times \bar{F}(u_n) \end{aligned}$$(13)**Second step—Quantile estimation**In order to estimate the extreme quantile \(x_{p}\) defined asWe estimate$$\begin{aligned} x_{p_n}:\, \bar{F} (x_{p_n})=1-F(x_{p_n})=p_n,\, \quad n p_n\rightarrow 0. \end{aligned}$$(14)*F*(*u*) by its empirical counterpart \(N_u/n\) and we approximate \( F_{u_n}\) by the approximate Generalized Pareto Distribution \(GPD(\hat{\gamma }_n,\hat{ \sigma }_n)\) in the Eq. (1). Then, for the threshold \(u=X_{n-k,n}\), the extreme quantile is estimated by$$\begin{aligned} \hat{x}_{p_n,k}=X_{n-k,n}+\hat{ \sigma }_n \, \frac{\Bigr (\frac{k}{np_n}\Bigr )^{\hat{ \gamma }_n}-1}{\hat{ \gamma }_n}. \end{aligned}$$(15)

The application of POT involves a number of challenges. The early stage of data analysis is very important in determining whether the data has the fat tail needed to apply the EVT results. Also, the parameter estimates of the limit GPD distributions depend on the number of extreme observations used. The choice of a threshold should be large enough to satisfy the conditions to permit its application (*u* tends towards infinity), while at the same time leaving sufficient observations for the estimation. A high threshold would generate few excesses, thereby inflating the variance of our parameter estimates. Lowering the threshold would necessitate using samples that are no longer considered as being in the tails which would entail an increase in the bias.

## 5 Non-parametric Estimation Methods

A main argument for using non-parametric estimation methods is that no specific assumptions on the distribution of the data is made *a priori*. That is, model specification bias can be avoided. This is relevant when there is limited information about the ‘theoretical’ data distribution, when the data can potentially contain a mix of variables with different underlying distributions, or when no suitable parametric model is available. In the context of extreme value distributions, the GPD and GEV distributions discussed in Sect. 3 are appropriate parametric models for the univariate case. However, for the multivariate case there is no general parametric form.

We restrict the discussion here to one particular form of non-parametric estimation, kernel density estimation [44]. Classical kernel estimation performs well when the data is symmetric, but has problems when there is significant skewness [9, 24, 41].

A common way to deal with skewness is transformation kernel estimation [45], which we will discuss with some details below. The idea is to transform the skew data set into another variable that has a more symmetric distribution, and allows for efficient classical kernel estimation.

Another issue for kernel density estimation is boundary bias. This arises because standard kernel estimates do not take knowledge of the domain of the data into account, and therefore the estimate does not reflect the actual behaviour close to the boundaries of the domain. We will also review a few bias correction techniques [34].

Even though kernel estimation is non-parametric with respect to the underlying distribution, there is a parameter that needs to be decided. This is the bandwidth (scale) of the kernel function, which determines the smoothness of the density estimate. We consider techniques intended for constant bandwidth [35], and also take a brief look at variable bandwidth kernel estimation [36]. In the latter case, the bandwidth and the location is allowed to vary such that bias can be reduced compared with using fixed parameters.

Kernel density estimation can be applied to any type of application and data, but some examples where it is used for extreme value distributions are given in [8, 9]. A non parametric method to estimate the VaR in extreme quantiles, based on transformed kernel estimation (TKE) of the cdf of losses was proposed in [3]. A kernel estimator of conditional ES is proposed in [13, 14, 43].

In the following subsections, we start by defining the classical kernel estimator, then we describe a selection of measures that are used for evaluating the quality of an estimate, and are needed, e.g, in the algorithms for bandwidth selection. Finally, we go into the different subareas of kernel estimation mentioned above in more detail.

### 5.1 Classical Kernel Estimation

Expressed in words, a classical kernel estimator approximates the probability density function associated with a data set through a sum of identical, symmetric kernel density functions that are centered at each data point. Then the sum is normalized to have total probability mass one.

We formalize this in the following way: Let \(k(\cdot )\) be a bounded and symmetric probability distribution function (pdf), such as the normal distribution pdf or the Epanechnikov pdf, which we refer to as the kernel function.

*n*independent and identically distributed observations \(X_1,\ldots ,X_n\) of a random variable

*X*with pdf \(f_X(x)\), the classical kernel estimator is given by

*b*is the bandwidth. Similarly, the classical kernel estimator for the cumulative distribution function (cdf) is given by

### 5.2 Selected Measures to Evaluate Kernel Estimates

*b*is included to show that minimizing MISE is one criterion for bandwidth selection. However, MISE can only be computed when the true density \(f_X(x)\) is known. MISE can be decomposed into two terms. The integrated square bias

### Example 1

*X*be uniformly distributed on \(\varOmega =[0,\,1]\), and let \(k(\cdot )\) be the Gaussian kernel. Then

*y*we have that

### 5.3 Bias-Corrected Kernel Estimation

*x*. The coefficients \(a_j=a_j(b,x)\) are computed as

*z*is the end point of the support of the kernel function. An example with modified Gaussian kernels close to a boundary is shown in Fig. 3. At the boundary, the amplitude of the kernels becomes higher to compensate for the mass loss, while away from the boundary they resume the normal shape and size. The kernel functions closest to the boundary become negative in a small region, but this does not affect the consistency of the estimate.

### 5.4 Transformation Kernel Estimation

The objective in transformation kernel estimation is to find a transformation of the random variable *X*, which for example has a right-skewed distribution into a symmetric random variable *Y*. Then classical kernel estimation can be successfully applied to *Y*.

*c*is the bandwidth used in this estimate.

*M*is determined by minimizing \(R(\hat{f}_Y^{\prime \prime })\). Given a scale

*M*, \(\alpha \) is determined such that no probability mass spills over at the right boundary. That is, the resulting density does not have mass at (or beyond) infinity.

*M*can be chosen as the median of the data, and \(\alpha \) and

*c*are found by maximizing a log likelihood function, see [12].

So far, we have only considered the possibility of performing one transformation, but one can also transform the data iteratively, or perform two specific consecutive transformations. Doubly transformed kernel estimation is discussed, e.g., in [9]. The idea is to first transform the data to something close to uniform, and then to apply an inverse beta transformation. This makes the final distribution close to a beta distribution, and the optimal bandwidth can then easily be computed.

### 5.5 Bandwidth Selection

*b*in kernel estimation has a significant impact on the quality of the estimator, but choosing the appropriate bandwidth requires the use of one estimator or another. The rule-of-thumb bandwidth estimator of Silverman [44],

Many bandwidth selection methods use a normal reference at some step in the process [11], but this introduces a parametric step in the non-parametric estimation. An interesting alternative, the Improved Sheather-Jones bandwidth selection algorithm, is also described in [11], where the normal reference is eliminated by formulating a non-linear equation for the optimal bandwidth.

*b*

## 6 More Challenges in Estimating the Risk Measures—Financial Time Series and Multivariate Case

*A Dynamic Approach.* Highlighting the underlying assumptions is relevant for understanding model uncertainty when estimating rare or extreme events. The VaR and ES are estimated given that the distribution of asset returns does not change over time. In the last two sections, when applying the POT approach to the returns in order to calculate these risk measures, their distribution was assumed to be stationary. A dynamic model which captures current risk is then more realistic. EVT can also be used based on a stochastic time series model. These dynamic models use an ARCH/GARCH type process along with the POT to model VaR and ES which depend on and change due to the fluctuations of the market. This approach, studied in [2], reflects two stylized facts exhibited by most financial return series, namely stochastic volatility and the fat-tailedness of conditional return distributions over short time horizons.

*The Multivariate Case for EVT.* When estimating the VaR of a multi-asset portfolio, under financial crises, correlations between assets often become more positive and stronger. Assuming that the variables are independent and identically distributed is a strong hypothesis. Portfolio losses are the result not only of the individual asset’s performance but also, and very importantly, the result of the interaction between assets. Hence, from the accuracy point of view, ideally we would prefer the multivariate approach.

An extension of the univariate EVT models using a dependence structure leads to a parametric model and is then expected to be less efficient for scarce data. A non-parametric approach should be preferred to estimate portfolio tail risk. Transformation kernel density estimation is used in [8] for studying multivariate extreme value distributions in temperature measurement data. Future directions involve to apply this type of methodology to real and simulated portfolio data.

## References

- 1.Acerbi, C., Tasche, D.: On the coherence of expected shortfall. J. Bank. Finance
**26**(7), 1487–1503 (2002)CrossRefGoogle Scholar - 2.McNeil, A.J., Frey, R.: Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach. J. Empir. Finance
**7**, 271–300 (2000)CrossRefGoogle Scholar - 3.Alemany, R., Bolancé, C., Guillén, M.: A nonparametric approach to calculating value-at-risk. Insur. Math. Econ.
**52**(2), 255–262 (2013)MathSciNetzbMATHCrossRefGoogle Scholar - 4.Artzner, P., Delbaen, F., Eber, J.M., Heath, D.: Coherent measures of risk. Math. Finance
**9**(3), 203–228 (1999)MathSciNetzbMATHCrossRefGoogle Scholar - 5.Balkema, A.A., de Haan, L.: Residual life time at great age. Ann. Probab.
**2**(5), 792–804 (1974)MathSciNetzbMATHCrossRefGoogle Scholar - 6.Beirlant, J., Dierckx, G., Goegebeur, Y., Matthys, G.: Tail index estimation and an exponential regression model. Extremes
**2**(2), 177–200 (1999)MathSciNetzbMATHCrossRefGoogle Scholar - 7.Beirlant, J., Dierckx, G., Guillou, A., Staăricaă, C.: On exponential representations of log-spacings of extreme order statistics. Extremes
**5**(2), 157–180 (2002)MathSciNetzbMATHCrossRefGoogle Scholar - 8.Beranger, B., Duong, T., Perkins-Kirkpatrick, S.E., Sisson, S.A.: Exploratory data analysis for moderate extreme values using non-parametric Kernel methods. arXiv:1602.08807 [stat.ME] (2016)
- 9.Bolancé, C., Bahraoui, Z., Alemany, R.: Estimating extreme value cumulative distribution functions using bias-corrected Kernel approaches. XREAP2015-01 (2015)Google Scholar
- 10.Bolancé, C., Guillén, M., Perch Nielsen, J.: Kernel density estimation of actuarial loss functions. Insur. Math. Econ.
**32**(1), 19–36 (2003)zbMATHCrossRefGoogle Scholar - 11.Botev, Z.I., Grotowski, J.F., Kroese, D.P.: Kernel density estimation via diffusion. Ann. Stat.
**38**(5), 2916–2957 (2010)MathSciNetzbMATHCrossRefGoogle Scholar - 12.Buch-larsen, T., Nielsen, J.P., Guillén, M., Bolancé, C.: Kernel density estimation for heavy-tailed distributions using the Champernowne transformation. Statistics
**39**(6), 503–516 (2005)MathSciNetzbMATHCrossRefGoogle Scholar - 13.Cai, Z., Wang, X.: Nonparametric estimation of conditional VaR and expected shortfall. J. Econ.
**147**(1), 120–130 (2008)MathSciNetzbMATHCrossRefGoogle Scholar - 14.Chen, S.: Non-parametric estimation of expected shortfall. J. Financ. Econ.
**6**, 87–107 (2008)Google Scholar - 15.Choi, E., Hall, P.: On bias reduction in local linear smoothing. Biometrika
**85**(2), 333–345 (1998)MathSciNetzbMATHCrossRefGoogle Scholar - 16.Clements, A., Hurn, S., Lindsay, K.: Mobius-like mappings and their use in Kernel density estimation. J. Am. Stat. Assoc.
**98**(464), 993–1000 (2003)MathSciNetzbMATHCrossRefGoogle Scholar - 17.Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics. Springer, London (2001). https://doi.org/10.1007/978-1-4471-3675-0zbMATHCrossRefGoogle Scholar
- 18.Csörgő, S., Deheuvels, P., Mason, D.: Kernel estimates of the tail index of a distribution. Ann. Stat.
**13**(3), 1050–1077 (1985)MathSciNetzbMATHCrossRefGoogle Scholar - 19.Csörgő, S., Viharos, L.: Estimating the tail index. In: Szyszkowicz, B. (ed.) Asymptotic Methods in Probability and Statistics, pp. 833–881. North-Holland, Amsterdam (1998)CrossRefGoogle Scholar
- 20.Danielsson, J.: Financial Risk Forecasting. Wiley, Hoboken (2011)Google Scholar
- 21.Diebolt, J., Guillou, A., Rached, I.: Approximation of the distribution of excesses through a generalized probability-weighted moments method. J. Stat. Plan. Infer.
**137**(3), 841–857 (2007)MathSciNetzbMATHCrossRefGoogle Scholar - 22.Dowd, K.: Measuring Market Risk. Wiley, Hoboken (2005)CrossRefGoogle Scholar
- 23.Duffie, D., Pan, J.: An overview of value at risk. J. Deriv.
**4**(3), 7–49 (1997)CrossRefGoogle Scholar - 24.Eling, M.: Fitting insurance claims to skewed distributions: are the skew-normal and skew-student good models? Insur. Math. Econ.
**51**(2), 239–248 (2012)MathSciNetCrossRefGoogle Scholar - 25.Embrechts, P. (ed.): Extremes and Integrated Risk Management. Risk Books, London (2000)Google Scholar
- 26.Embrechts, P., Klüppelberg, C., Mikosch, T.: Modelling Extremal Events: for Insurance and Finance. Springer, Heidelberg (1997). https://doi.org/10.1007/978-3-642-33483-2zbMATHCrossRefGoogle Scholar
- 27.Falk, M., Marohn, F.: Efficient estimation of the shape parameter in Pareto models with partially known scale. Stat. Decis.
**15**, 229–239 (1997)MathSciNetzbMATHGoogle Scholar - 28.Feuerverger, A., Hall, P.: Estimating a tail exponent by modelling departure from a Pareto distribution. Ann. Stat.
**27**(2), 760–781 (1999)MathSciNetzbMATHCrossRefGoogle Scholar - 29.Fisher, R.A., Tippett, L.H.C.: Limiting forms of the frequency distribution of the largest or smallest member of a sample. Math. Proc. Camb. Philos. Soc.
**24**(2), 180–190 (1928)zbMATHCrossRefGoogle Scholar - 30.Gnedenko, B.: Sur la distribution limite du terme maximum d’une série aléatoire. Ann. Math.
**44**(3), 423–453 (1943)MathSciNetzbMATHCrossRefGoogle Scholar - 31.Hill, B.M.: A simple general approach to inference about the tail of a distribution. Ann. Stat.
**3**, 1163–1174 (1975)MathSciNetzbMATHCrossRefGoogle Scholar - 32.Hosking, J.R., Wallis, J.R.: Parameter and quantile estimation for the generalized Pareto distribution. Technometrics
**29**(3), 339–349 (1987)MathSciNetzbMATHCrossRefGoogle Scholar - 33.Hull, J.C.: Risk Management and Financial Institutions. Prentice Hall, Upper Saddle River (2006)Google Scholar
- 34.Jones, M.C.: Simple boundary correction for Kernel density estimation. Stat. Comput.
**3**(3), 135–146 (1993)CrossRefGoogle Scholar - 35.Jones, M.C., Marron, J.S., Sheather, S.J.: A brief survey of bandwidth selection for density estimation. J. Am. Stat. Assoc.
**91**(433), 401–407 (1996)MathSciNetzbMATHCrossRefGoogle Scholar - 36.Jones, M.C., McKay, I.J., Hu, T.C.: Variable location and scale Kernel density estimation. Ann. Inst. Stat. Math.
**46**(3), 521–535 (1994)MathSciNetzbMATHGoogle Scholar - 37.Jorion, P.: Value at Risk: The New Benchmark for Managing Financial Risk. McGraw-Hill, New York (2001)Google Scholar
- 38.Kim, C., Kim, S., Park, M., Lee, H.: A bias reducing technique in Kernel distribution function estimation. Comput. Stat.
**21**(3–4), 589–601 (2006)MathSciNetzbMATHCrossRefGoogle Scholar - 39.McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques and Tools. Princeton Series in Finance. Princeton University Press, Princeton (2005)zbMATHGoogle Scholar
- 40.Pickands, J.: Statistical inference using extreme order statistics. Ann. Stat.
**3**(1), 119–131 (1975)MathSciNetzbMATHCrossRefGoogle Scholar - 41.Pitt, D., Guillen, M., Bolancé, C.: Estimation of parametric and nonparametric models for univariate claim severity distributions - an approach using R. XREAP2011-06 (2011)Google Scholar
- 42.Reiss, R.D., Thomas, M.: Statistical Analysis of Extreme Values: with Applications to Insurance, Finance, Hydrology and Other Fields. Birkhäuser, Basel (2007)zbMATHGoogle Scholar
- 43.Scaillet, O.: Nonparametric estimation of conditional expected shortfall. Insur. Risk Manag. J.
**74**(1), 639–660 (2005)Google Scholar - 44.Silverman, B.W.: Density Estimation for Statistics and Data Analysis, Monographs on Statistics and Applied Probability, vol. 26. Chapman & Hall/CRC, London (1986)Google Scholar
- 45.Wand, M.P., Marron, J.S., Ruppert, D.: Transformations in density estimation. J. Am. Stat. Assoc.
**86**(414), 343–353 (1991)MathSciNetzbMATHCrossRefGoogle Scholar

## Copyright information

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.