1 Introduction

The no-arbitrage paradigm is the cornerstone of mathematical finance. The fundamental work of Harrison, Kreps and Pliska [13,14,15, 22] and Delbaen and Schachermayer [6], to mention some of the most important contributions, paved the way for a sound theory for the pricing of contingent claims. In a general market model, the exclusion of arbitrage opportunities leads to intervals of fair prices.

Typically, the resulting no-arbitrage price bounds are too wide to provide practically meaningful information.Footnote 1 In practice, market-makers wish to have a framework for controlling the acceptable risk when setting their spreads. Pioneering contributions to incorporate risk in the pricing procedure for contingent claims were made by Carr et al. [3] as well as Föllmer and Leukert [9, 10], subsequent generalizations being made, e.g., by Nakano [24] or Rudloff [42]. The pricing framework of the present paper is in this spirit: by specifying acceptability functionals, an agent may control her shortfall risk in a rather intuitive manner. In particular, using the Average-Value-at-Risk (\({{\,\mathrm{\mathbb A\mathbb V@R}\,}}_\alpha \)) will allow for a whole range of prices between the extreme cases of hedging with probability one (the traditional approach) and hedging w.r.t. expectation by varying the parameter \(\alpha \,\).

Nowadays, there is great awareness of the epistemic uncertainty inherent in setting up a stochastic model for a given problem. For single-stage and two-stage situations, there is a plethora of available literature on different approaches to account for model ambiguity (see the lists contained in [31, pp. 232–233] or [45, p. 2]). Recently, balls w.r.t. the Kantorovich–Wasserstein distance around an estimated model have gained a lot of popularity (e.g., [7, 8, 11, 12, 25, 46]), while originally proposed by Pflug and Wozabal [34] in 2007. However, the literature on nonparametric ambiguity sets for multistage problems is still extremely sparse. Analui and Pflug [1] were the first to study balls w.r.t. the multistage generalization of the Kantorovich–Wasserstein distance, named nested distance,Footnote 2 for incorporating model uncertainty into multistage decision making. It is the aim of this article to further explore this rather uncharted territory. The classic mathematical finance problem of contingent claim pricing serves as a very well suited instance for doing so. In fact, while in the traditional pointwise hedging setup only the null sets of the stochastic model for the dynamics of the underlying asset price process influence the resulting price of a contingent claim, the full specification of the model affects the claim price when acceptability is introduced. Thus, model dependency is even stronger in the latter case, which is the topic of this paper.

Stochastic optimization offers a natural framework to deal with the problems of mathematical finance. Application of the fundamental work of Rockafellar and Wets [35,36,37,38,39,40,41] on conjugate duality and stochastic programming has led to a stream of literature on those topics. King [19] originally formulated the problem of contingent claim pricing as a stochastic program. Extensions of this approach have been made, amongst others, by King, Pennanen and their coauthors [18,19,20,21, 26,27,28], Kallio and Ziemba [17] or Dahl [5]. The stochastic programming approach naturally allows for incorporating features and constraints of real-world markets and allows to efficiently obtain numerical results by applying the powerful toolkit of available algorithms for convex optimization problems.

The main contribution of this article is the link between statistical model error and the pricing of contingent claims, where the pricing methodology allows for a controlled hedging shortfall. The setup is inspired by practically very relevant aspects of decision making under both aleatoric and epistemic uncertainty. Given the stochastic model from which future evolutions are drawn, agents are willing to accept a certain degree of risk in their decisions. However, it may be dangerously misleading to neglect the fact that it is impossible to detect the true model without error. Thus, a distributionally robust framework, which takes the limitations of nonparametric statistical estimation into account, is required. In the statistical terminology, balls w.r.t. the nested distance may be seen as confidence regions: by considering all models whose nested distance to the estimated baseline model does not exceed some threshold, it is ensured that the true model is covered with a certain probability and hence the decision is robust w.r.t. the statistical model estimation error. In particular, we prove a large deviations theorem for the nested distance, based on which we show that a scenario tree can be constructed out of data such that it converges (in terms of the nested distance) to the true model in probability at an exponential rate. Thus, distributionally robust claim prices w.r.t. nested distance balls as ambiguity sets include a hedge under the true model with arbitrary high probability, depending on the available data. In other words, we provide a framework that allows for setting up bid and ask prices for a contingent claim which result from finding hedging strategies with truly calculated risks, since the important factor of model uncertainty is not neglected.

This paper is organized as follows. In Sect. 2 we introduce our framework for acceptability pricing, i.e., we replace the traditional almost sure super-/ subreplication requirement by the weaker constraint of an acceptable hedge. The acceptability condition is formulated w.r.t. one given probability model. This lowers the ask price and increases the bid price such that the bid–ask spread may be tightened or even closed. Section 3 contains the main results of this article. We weaken the assumption of one single probability model assuming that a collection of models is plausible. In particular, we define the distributionally robust acceptability pricing problem and derive the dual problem formulation under rather general assumptions on the ambiguity set. The effect of the introduction of acceptability and ambiguity into the classical pricing methodology is nicely mirrored by the dual formulations. Moreover, we give a strong statistical motivation for using nested distance balls as ambiguity sets by proving a large deviations theorem for the nested distance. Section 4 contains illustrative examples to visualize the effect of acceptability and model ambiguity on contingent claim prices. In Sect. 5 we discuss the algorithmic solution of the \({{\,\mathrm{\mathbb A\mathbb V@R}\,}}\)-acceptability pricing problem w.r.t. nested distance balls as ambiguity sets. In particular, we exploit the duality results of Sect. 3 and the special stagewise structure of the nested distance by a sequential linear programming algorithm which yields approximate solutions to the originally semi-infinite non-convex problem. In this way, we overcome the current state-of-the-art computational methods for multistage stochastic optimization problems under non-parametric model ambiguity. Finally, we summarize our results in Sect. 6.

2 Acceptability pricing

2.1 Acceptability functionals

The terminology introduced in this section follows the book of Pflug and Römisch [33]. A detailed discussion of acceptability functionals and their properties can be found therein. Intuitively speaking, an acceptability functional \(\mathcal A\) maps a stochastic position \(Y \in L_p(\varOmega ), 1<p<\infty ,\) defined on a probability space \((\varOmega , \mathcal {F}, \mathbb P)\), to the real numbers extended by \(-\infty \) in such a way that higher values of the position correspond to higher values of the functional, i.e., a ‘higher degree of acceptance’. In particular, the defining properties of an acceptability functional are translation equivariance,Footnote 3concavity, monotonicity,Footnote 4 and positive homogeneity. We assume all acceptability functionals to be version independent,Footnote 5 i.e., \(\mathcal A(Y)\) depends only on the distribution of the random variable Y.

The following proposition is well-known. It follows directly from the Fenchel–Moreau–Rockafellar Theorem (see [35, Th. 5] and [33, Th. 2.31]).

Proposition 1

An acceptability functional \(\mathcal {A}\) which fulfills the above conditions has a dual representation of the form

$$\begin{aligned} \mathcal {A}(Y) = \inf \left\{ \mathbb {E}\left[ YZ\right] :Z\in \mathcal {Z} \right\} , \end{aligned}$$

where \(\mathcal {Z}\) is a closed convex subset of \(L^q(\varOmega )\), with \(1/p+1/q=1\,\). We call \(\mathcal {Z}\) the superdifferential of \(\mathcal {A}\). Monotonicity and translation equivariance imply that all \(Z \in \mathcal {Z}\) are nonnegative densities.

Assumption A1

There exists some constant \(K_1 \in \mathbb R\) such that for all \(Z \in \mathcal Z\) it holds \(\Vert Z \Vert _q \le K_1\,\).

This assumption implies that \(\mathcal {A}\) is Lipschitz on \(L_p\):

$$\begin{aligned} |\mathcal {A}(Y_1)-\mathcal {A}(Y_2)| \le K_1 \; \Vert Y_1 - Y_2\Vert _p . \end{aligned}$$
(1)

A good example for such an acceptability functional is the Average Value-at-Risk, \({{\,\mathrm{\mathbb A\mathbb V@R}\,}}_\alpha \), whose superdifferential is given by

$$\begin{aligned} \mathcal {Z} = \{Z\in L_1(\varOmega ):0\le Z \le 1/\alpha \text { and } \mathbb {E}(Z) = 1 \}. \end{aligned}$$

The extreme cases are represented by the essential infimum (\({{\,\mathrm{\mathbb A\mathbb V@R}\,}}_{0}(Y) := \lim _{\alpha \downarrow 0} {{\,\mathrm{\mathbb A\mathbb V@R}\,}}_\alpha (Y) = {\text {essinf}}(Y)\)Footnote 6) and the expectation (\(\alpha =1\)). Its superdifferentials are given by the set of all probability densities and just the function identically 1, respectively.

Other common names for the \({{\,\mathrm{\mathbb A\mathbb V@R}\,}}\) are Conditional-Value-at-Risk, Tail-Value-at-Risk, or Expected Shortfall. The subtleties between these terminologies are, e.g., addressed in Sarykalin et al. [43]. All our computational studies in Sect. 4 and Sect. 5 will be based on some \({{\,\mathrm{\mathbb A\mathbb V@R}\,}}_\alpha \), while our theoretical results are general.

2.2 Acceptable replications

Let us now introduce the notion of acceptability in the pricing procedure for contingent claims.

As usual in mathematical finance, we consider a market model as a filtered probability space \((\varOmega ,\mathcal {F},\mathbb {P})\), where the filtration is given by the increasing sequence of sigma-algebras \(\mathcal {F}=(\mathcal {F}_0, \mathcal {F}_1, \ldots , \mathcal {F}_T)\) with \(\mathcal {F}_0=\{\emptyset ,\varOmega \}\). The liquidly traded basic asset prices are given by a discrete-time \(\mathbb {R}_+^{m}\)-valued stochastic process \(S = (S_0, \ldots , S_T)\), where \(S_t=(S_t^{(1)}, S_t^{(2)}, \ldots , S_t^{(m)})\). We assume the filtration to be generated by the asset price process.

One asset, denoted by \(S^{(1)}\), serves as numéraire (a risk-less bond, say). We assume w.l.o.g. that \(S_t^{(1)} =1\) a.s. If not, we may replace \((S_t^{(1)}, S_t^{(2)}, \ldots , S_t^{(m)})\) by \((1, S_t^{(2)}/S_t^{(1)}, \ldots , S_t^{(m)}/S_t^{(1)})\).

A contingent claim C consists of an \(\mathcal {F}\)-adapted series of cash flows \(C=(C_{1},\ldots ,C_{T})\) measured in units of the numéraire. The fact that the payoff \(C_{t}\) is contingent on the respective state of the market up to time t is reflected by the condition that C is adapted to the filtration \(\mathcal {F}\), for which we write \(C \lhd \mathcal {F}\). A trading strategy \(x=(x_0,\ldots , x_{T-1})\) is an \(\mathcal {F}\)-adapted \(\mathbb {R}^{m}\)-valued process with \(x \lhd \mathcal {F}\).

To be more precise, let

$$\begin{aligned} \mathcal {L}^m_p&:= \mathbb {R}^m \times L_p^m(\varOmega ,\mathcal {F}_1) \times \dots \times L_p^m(\varOmega ,\mathcal {F}_T) , \\ \mathcal {L}^m_\infty&:= \mathbb {R}^m \times L_\infty ^m(\varOmega ,\mathcal {F}_1) \times \dots \times L_\infty ^m(\varOmega ,\mathcal {F}_{T-1}) , \end{aligned}$$

and

$$\begin{aligned} \mathcal {L}^1_q&:= L_q(\varOmega ,\mathcal {F}_1) \times \dots \times L_q(\varOmega ,\mathcal {F}_T) . \end{aligned}$$

We assume that \(S \in \mathcal {L}^m_p\), \(x \in \mathcal {L}_\infty ^m\) and \(C \in \mathcal {L}_p^1\). The norm in \(L^m_p\) is given by

$$\begin{aligned} \Vert Y\Vert _p = \sum _{i=1}^m \Vert Y^{(i)}\Vert _p , \end{aligned}$$

and similarly for \(L_\infty ^m\,\). Notice that \(x_0\) and \(S_0\) are deterministic vectors.

Assumption A2

We assume that all claims are Lipschitz-continuous functions of the underlying asset price process S.

Definition 1

Consider a contingent claim C and fix acceptability functionals \(\mathcal {A}_{t}\), for all \(t=1,\ldots ,T\). We assume that all functionals \(\mathcal {A}\) have a representation given by Proposition 1. Then the acceptable prices are given by the optimal values of the following stochastic optimization programs:

  1. (i)

    the acceptable ask price of C is defined as

    figure a
  2. (ii)

    the acceptable bid price of C is defined as

    figure b

    where the optimization runs over all trading strategies \(x \in \mathcal {L}_\infty ^m \) for the liquidly traded assets. The constraints in (2a) and (3a) are formulated for all \(t = 1,\ldots ,T-1\).

To interpret Definition 1, the acceptable ask price is given by the minimal initial capital required to acceptably superhedge the cash-flows \(C_t\), which have to be paid out by the seller. On the other hand, the acceptable bid price corresponds to the maximal amount of money that can initially be borrowed from the market to buy the claim, such that by receiving the payments \(C_t\) and always rebalancing one’s portfolio in an acceptable way, one ends up with an acceptable position at maturity.

In what follows we will mainly consider the ask price problem \((\mathrm{P})\) and its variants. The bid price problem \((\mathrm{P}^\prime )\) is its mirror image and all assertions and proofs for the problem \((\mathrm{P})\) can be rewritten literally for problem \((\mathrm{P}^\prime )\).

Let \((\mathrm{P}^\beta )\) for \(\beta =(\beta _1, \ldots , \beta _T)\) be the problem \(( P )\), where the conditions (2a) and (2b) are replaced by \(\mathcal {A}_t (\cdot ) \ge \beta _t\).

Assumption A3

The optima are attained and all solutions x to the problems \((\mathrm{P}^\beta )\), for \(\beta \) in a neighborhood of 0, are uniformly bounded, i.e., \( \exists K_2 \in \mathbb R s.t. \forall x:\Vert x\Vert _\infty \le K_2\).

We show the following auxiliary result for the problems \((\mathrm{P}^\beta )\).

Lemma 1

Let \(v^\beta \) be the optimal value of \((\mathrm{P}^\beta )\) and \(v^*\) be the optimal value of \((\mathrm{P})\). Then, in a neighborhood of 0,

$$\begin{aligned} |v^\beta -v^*|\le 2 {\bar{\beta }} \cdot \Vert S_0\Vert _1 \end{aligned}$$
(4)

where \({\bar{\beta }} = \sum _t |\beta _t|\).

Proof

If \(v^{\beta }\) is the optimal value of \((\mathrm{P}^\beta )\), then by inclusion of the feasible sets

$$\begin{aligned} v^{-|\beta |}\le & {} v^* \le v^{|\beta |} ,\\ v^{-|\beta |}\le & {} v^\beta \le v^{|\beta |} . \end{aligned}$$

We have to bound \(v^{|\beta |} - v^{-|\beta |}\). Let \(x_t^{*}\) be the solution of \((\mathrm{P}^{-|\beta |})\). \(x_t^{*}\) is not necessarily feasible for \((\mathrm{P}^{|\beta |})\). We modify \(x_t^{*}\) in order to get feasibility for \((\mathrm{P}^{|\beta |})\). Let \(a_t, t=1, \ldots ,T-1\,\), be the vector with identical components \(2 \sum _{s=t+1}^T |\beta _s| \) and let \(x_t = x_t^{*}+a_t\). Then

$$\begin{aligned}&\mathbb {E}[(x_{t-1}-x_t)^\top S_t Z_t]-\mathbb {E}[(x_{t-1}^{*}-x_t^{*})^\top S_t Z_t ]\\&\quad = \mathbb {E}[ (a_{t-1} - a_t)^\top S_t Z_t] = 2|\beta _t| \sum _{i=1}^{m} \mathbb {E}\left[ S_t^{(i)} Z_t\right] \\&\quad \ge 2 |\beta _t| \cdot \biggl (\inf \sum _{i=1}^{m} S_t^{(i)}\biggr ) \cdot \mathbb {E}[Z_t] \ge 2 |\beta _t| \end{aligned}$$

since \(\sum _i S_t^{(i)} \ge S_t^{(1)} = 1\) and \(\mathbb {E}[Z_t]=1\). By \(\mathbb {E}[(x_{t-1}^{*}-x_t^{*})^\top S_t Z_t ] \ge -|\beta _t|\), one gets that \(\mathbb {E}[(x_{t-1}-x_t)^\top S_t Z_t ] \ge |\beta _t|\), i.e., \(x_t\) is feasible for \((\mathrm{P}^{|\beta |})\). Notice that \(a_0\) has all components equal to \(\sum _t |\beta _t| = {\bar{\beta }}\). Now

$$\begin{aligned} 0 \le v^{|\beta |} - v^{-|\beta |} \le x_0^\top S_0 - x_0^{*\top } S_0 = a_0^\top S_0 = 2 {\bar{\beta }} \sum _i S_0^{(i)} = 2 {\bar{\beta }} \cdot \Vert S_0 \Vert _1, \end{aligned}$$

which concludes the proof. \(\square \)

Notice that the primal program \((\mathrm{P})\) is semi-infinite, if the constraints are written in the extensive form

$$\begin{aligned} \mathbb {E}\left[ \left( (x_{t-1}-x_t)^\top S_t - C_t \right) Z_t \right] \ge 0 \quad \hbox { for all } Z_t \in \mathcal {Z}_t , \end{aligned}$$

where \(Z = (Z_1, \ldots , Z_T) \in \mathcal {L}^1_q\).

Lemma 2 below demonstrates the validity of an approximation with only finitely many supergradients.

Since the \(L_p\) spaces are separable, there exist sequences \((Z_{t,1}, Z_{t,2}, \ldots )\) that are dense in \(\mathcal {Z}_t\), for each \(t\,\). Let

$$\begin{aligned} \mathcal {A}_{t,n}(Y) = \min \{ \mathbb {E}[Y \cdot Z_{t,i}]:1 \le i \le n\}. \end{aligned}$$

Since \(Z \mapsto \mathbb E[YZ]\) is continuous in \(L_p\,\), for every Y in \(L_p(\varOmega , \mathcal {F}_t)\) it holds that

$$\begin{aligned} \mathcal {A}_{t,n}(Y) \downarrow \mathcal {A}_t(Y), \end{aligned}$$

as \(n \rightarrow \infty \).

Lemma 2

Let \(v^*\) be the optimal value of the basic problem \((\mathrm{P})\) and let \(v^*_n\) be the optimal value of the similar optimization problem \((\mathrm{P}_n)\), where \(\mathcal {A}_t\) are replaced by \(\mathcal {A}_{t,n}\). Then

$$\begin{aligned} v_n^* \uparrow v^*. \end{aligned}$$

Proof

Suppose the contrary, that is \(\sup _n v_n^* \le v^* - 3 \eta < v^*\) for some \(\eta >0\). Introduce the notation

$$\begin{aligned} Y_t(x) = \left\{ \begin{array}{ll} (x_{t-1}-x_t)^\top \, S_t - C_t &{} \quad \hbox { for } 1\le t < T\\ x_{T-1}^\top \, S_T - C_T &{} \quad \hbox { for } t=T . \end{array} \right. \end{aligned}$$

By Assumption A1 and since \(x \in \mathcal {L}_\infty ^m\), it holds that \(x \mapsto \mathcal {A}_t (Y_t(x))\) and \(x \mapsto x_0^\top S_0\) are Lipschitz. Choose \(0 < \delta = \eta \left[ 2 \Vert S_0 \Vert _1 K_1 (K_2+K_3+1) \right] ^{-1}\) with \(K_3 \ge \Vert S_t\Vert _p\) for all t . Let \(x_t^*\) be the solution of \((\mathrm{P})\). We may find finite sub-sigma-algebras \(\tilde{\mathcal {F}}_t \subseteq \mathcal {F}_t\) such that with

$$\begin{aligned} \tilde{S}_t= & {} \mathbb {E}[S_t|\tilde{\mathcal {F}}_t] \quad \hbox { (componentwise)},\\ \tilde{C}_t= & {} \mathbb {E}[C_t|\tilde{\mathcal {F}}_t] ,\\ \tilde{x}_t^*= & {} \mathbb {E}[x_t^*|\tilde{\mathcal {F}}_t] \quad \hbox { (componentwise)}, \end{aligned}$$

we have that

$$\begin{aligned} \Vert S_t - \tilde{S}_t \Vert _p\le & {} \delta , \\ \Vert C_t - \tilde{C}_t \Vert _p\le & {} \delta , \\ \Vert x_t^* - \tilde{x}_t^* \Vert _\infty\le & {} \delta . \end{aligned}$$

Denote by \((\tilde{\mathrm{P}})\) the variant of the problem \((\mathrm{P})\), where the processes \((S_t)\) and \((C_t)\) are replaced by \((\tilde{S}_t)\) and \((\tilde{C}_t)\). Similarly as before introduce the notation

$$\begin{aligned} \tilde{Y}_t(x) = \left\{ \begin{array}{ll} (x_{t-1}-x_t)^\top \, \tilde{S}_t - \tilde{C}_t &{} \quad \hbox { for } 1\le t < T\\ x_{T-1}^\top \, \tilde{S}_T - \tilde{C}_T &{} \quad \hbox { for } t=T. \end{array} \right. \end{aligned}$$

Notice that

$$\begin{aligned}&|\mathcal {A}_t(\tilde{Y}_t(\tilde{x}^*_t))- \mathcal {A}_t(Y_t(x^*_t))|\\&\quad \le K_1 \Vert \tilde{Y}_t(\tilde{x}^*_t) - Y_t(x^*_t) \Vert _p \\&\quad \le K_1 \left[ \Vert \tilde{x}_t^* - x_t^* \Vert _\infty \Vert \tilde{S}_t \Vert _p + \Vert x_t^*\Vert _\infty \Vert \tilde{S}_t - S_t \Vert _p + \Vert \tilde{C}_t - C_t \Vert _p\right] \\&\quad \le K_1 [\delta K_3 + \delta K_2 + \delta ] = \eta \left[ 2\Vert S_0 \Vert _1\right] ^{-1}. \end{aligned}$$

By Lemma 1 we may conclude that

$$\begin{aligned} v^* \le \tilde{v}^* + \eta , \end{aligned}$$
(5)

where \(\tilde{v}^*\) is the optimal value of \((\tilde{\mathrm{P}})\). Let \((\tilde{\mathrm{P}}_n)\) be the variant of problem \((\tilde{\mathrm{P}})\), where all \(\mathcal {A}_t\) are replaced by \(\mathcal {A}_{t,n}\). The optimal value of \((\tilde{\mathrm{P}}_n)\) is denoted by \(\tilde{v}_n^*\). In this finite situation we may show that \(\tilde{v}_n^* \uparrow \tilde{v}^*\). Obviously, \(\tilde{v}_{n}^*\) is a monotonically increasing sequence with \(\tilde{v}_{n}^*\le \tilde{v}^*\).

It remains to demonstrate that \(\lim _{n} \tilde{v}_{n}^*\) cannot be smaller than \(\tilde{v}^*\). For this, let \(\tilde{x}^{{n}*}\) be a solution of \((\tilde{\mathrm{P}}_n)\). Because of the finiteness of the filtration \(\tilde{\mathcal {F}}\), the solutions of \((\tilde{\mathrm{P}}_n)\) as well as of \(\tilde{\mathrm{P}}\) are just bounded vectors in some high-, but finite dimensional \(\mathbb {R}^N\) and are all bounded by \(K_2\). Let \(\tilde{x}^{**}\) be an accumulation point of \((\tilde{x}^{{n}*})\), i.e., we have for some subsequence that \(\tilde{x}^{{n_{i}*}}\rightarrow \tilde{x}^{**}\). We show that \(\tilde{x}^{**}\) satisfies the constraints of \((\tilde{\mathrm{P}})\).

Suppose the contrary. Then there is a t such that \(\mathcal {A}_t(\tilde{Y}_t(\tilde{x}^{**})) < 0\). This implies that there is a \(Z_{t,m} \in \{ Z_{t,1}, Z_{t,2}, \ldots \}\) such that \(\mathbb {E} [ \tilde{Y}_t(\tilde{x}^{**}) \cdot Z_{t,m}]<0\). However, for \(n \ge m\), by construction \(\mathbb {E}[\tilde{Y}_t (\tilde{x}^{n*}) \cdot Z_{t,m}] \ge 0\) and since \(\tilde{x}^{n*} \rightarrow \tilde{x}^{**}\) componentwise, then also \(\mathbb {E}[\tilde{Y}_t (\tilde{x}^{**}) \cdot Z_{t,m}] \ge 0 .\) Since the objective function is continuous in \(\tilde{x}\) this implies that \(\lim _i \tilde{v}_{n_i}^*=\tilde{v}^*\) and, by monotonicity, \(\lim _{n} \tilde{v}_{n}^*=\tilde{v}^*\). We have therefore shown that we can find an index n such that

$$\begin{aligned} \tilde{v}^* < \tilde{v}^*_n+\eta . \end{aligned}$$
(6)

Let \(x^{n*}\) be the solution of \((\mathrm{P}_n)\) and let \({\hat{x}}^{n*}= \mathbb {E}[x^{n*}|\tilde{\mathcal {F}}_t]\,\). Analogously as before, one may prove that \(| \mathcal {A}_t (\tilde{Y}_t({\hat{x}}^{n*}) | \le \eta \left[ 2\Vert S_0 \Vert _1\right] ^{-1}\) and hence, by Lemma 1,

$$\begin{aligned} \tilde{v}_n^* \le v_n^* + \eta . \end{aligned}$$
(7)

Putting (5), (6) and (7) together one sees that

$$\begin{aligned} v^* \le v_n^* + 3 \eta , \end{aligned}$$

which contradicts the assumption that \(v^*_n < v^*-3 \eta \) . \(\square \)

We now turn to the duals of the problems \((\mathrm{P})\) and \((\mathrm{P}^\prime )\), called \((\mathrm{D})\) and \((\mathrm{D}^\prime )\), respectively. It turns out that also in our general acceptability case a martingale property appears in the dual as it is known for the case of a.s. super-/ subreplication.

Theorem 1

For all \(t=1,\ldots ,T\), let \(\mathcal {A}_{t}\) be acceptability functionals with corresponding superdifferentials \(\mathcal {Z}_t\). Then, the acceptable ask price is given by

figure c

and the acceptable bid price is given by

figure d

Proof

The acceptable ask/ bid price corresponds to a special case of the distributionally robust acceptable ask/ bid price introduced in Definition 2 below, namely when the ambiguity set reduces to a singleton. Hence, the validity of Theorem 1 follows directly from the proof of Theorem 2. \(\square \)

Remark 1

(Interpretation of the dual formulations) The objective of the dual formulations \(({\mathrm{D}})\) and \(({\mathrm{D}}^\prime )\) is to maximize (minimize, resp.) the expected value of the payoffs resulting from the claim w.r.t. some feasible measure \(\mathbb Q\). The constraints (8a) and (9a) require \(\mathbb Q\) to be such that the underlying asset price process is a martingale w.r.t. \(\mathbb Q\). This is well known from the traditional approach of pointwise super-/ subreplication. The acceptability criterion enters the dual problems in terms of the constraints (8b) and (9b), which reduce the feasible sets by a stronger condition than the two probability measures just having the same null sets. Making the feasible sets smaller obviously lowers the ask price and increases the bid price and thus gives a tighter bid–ask spread.

Proposition 2

For fixed acceptability functionals \(\mathcal A_1, \ldots , \mathcal A_T\), consider the acceptable ask price \(\pi ^{a}(\mathbb P)\) as a function of the underlying model \(\mathbb P\,\). This function is Lipschitz.

Proof

The assertion follows from Theorem 5 in the “Appendix”, considering the Lipschitz property of claims (Assumption A2) and the problem formulation resulting from Theorem 1. \(\square \)

3 Model ambiguity and distributional robustness

Traditional stochastic programs are based on a given and fixed probability model for the uncertainties. However, already since the pioneering paper of Scarf [44] in the 1950s, it was felt that the fact that these models are based on observed data as well as the statistical error should be taken into account when making decisions. Ambiguity sets are typically either a finite collection of models or a neighborhood of a given baseline model. In what follows we study the latter case and, in particular, we use the nested distance to construct parameter-free ambiguity sets.

3.1 Acceptability pricing under model ambiguity

In Sect. 2.2 we defined the bid/ ask price of a contingent claim as the maximal/ minimal amount of capital needed in order to sub-/ superhedge its payoff(s) w.r.t. an acceptability criterion. However, the result computed with this approach heavily depends on the particular choice of the probability model. This section weakens the strong dependency on the model. More specifically, acceptable bid and ask prices shall be based on an acceptability criterion that is robust w.r.t. all models contained in a certain ambiguity set.

Definition 2

Consider a contingent claim C. Then, for acceptability functionals \(\mathcal {A}_{t}\), \(t=1,\ldots ,T\), and an ambiguity set \({\mathcal P}_{\!\!\varepsilon }\) of probability models,

  1. (i)

    the distributionally robust acceptable ask price of C is defined as

    figure e
  2. (ii)

    the distributionally robust acceptable bid price is defined as

figure f

where the optimization runs over all trading strategies \(x \in \mathcal {L}_\infty ^m \) for the liquidly traded assets. The constraints in (10a) and (11a) are formulated for all \(t = 1,\ldots ,T-1\) and \(\mathcal {A}_t^{\mathbb {P}}\) denotes the value of the acceptability functional when the underlying probability model is given by \(\mathbb {P}\).

Theorem 2

Let \({\mathcal P}_{\!\!\varepsilon }\) be a convex set of probability models, which is spanned by a sequence of models \((\mathbb P_1, \mathbb P_2, \ldots )\,\). Moreover, let \({\mathcal P}_{\!\!\varepsilon }\) be dominated by some model \(\mathbb {P}_0\) and assume all densities w.r.t. \(\mathbb {P}_0\) to be bounded. For \(t=1,\ldots ,T\), let \(\mathcal {A}_{t}\) be acceptability functionals with corresponding superdifferentials \(\mathcal {Z}_{\mathcal {A}_{t}}\). Then, the distributionally robust acceptable ask price is given by

figure g

and the distributionally robust acceptable bid price is given by

figure h

Proof

Define

$$\begin{aligned} {\mathfrak {D}}_t := \left\{ Z_t f_t:~ \exists ~ \mathbb P \in {\mathcal P}_{\!\!\varepsilon } s.t. Z_t \in \mathcal {Z}_{\mathcal {A}_{t}^{\mathbb P}}, {\left. \frac{d\mathbb {P}}{d{{\,\mathrm{{\mathbb {P}}_0}\,}}}\bigg \vert \right. }_{\mathcal {F}_{t}} = f_t \right\} . \end{aligned}$$

Then, the constraints in \((\mathrm{PP}^\prime )\) can be written in the form

$$\begin{aligned} \mathbb {E}^{{{\,\mathrm{{\mathbb {P}}_0}\,}}}[(x_{t-1}-x_t)^\top S_t - C_t) {\mathfrak {d}}_t ] \ge 0 \quad \forall {\mathfrak {d}}_t \in {\mathfrak {D}}_t . \end{aligned}$$

Since all densities \(f_t\) are bounded by assumption,Footnote 7 Lemma 2 holds true if we replace \(Z_t \in \mathcal Z_t\) by \({\mathfrak {d}}_t \in {\mathfrak {D}}_t\). It can easily be seen that for each t there are sequences \(({\mathfrak {d}}_{t,1},{\mathfrak {d}}_{t,2}, \ldots )\) which are dense in \({\mathfrak {D}}_t\). Let us define

$$\begin{aligned} {\mathfrak {D}}_t^n := \left\{ \sum _{i=1}^{n_1} \sum _{j=1}^{n_2^{i}} \lambda _{i,j} Z_t^{j,i} f_t^{i}:~ \sum _{i=1}^{n_1} \sum _{j=1}^{n_2^{i}} \lambda _{i,j} = 1, \left| \left\{ (i,j) : 1 \le i \le n_1, 1\le j \le {n_2^{i}} \right\} \right| = n \right\} . \end{aligned}$$

Then, it holds that \({\mathfrak {D}}_t^{n} \subseteq {\mathfrak {D}}_t^{n+1}\) and \(\bigcup _n {\mathfrak {D}}_t^n = {\mathfrak {D}}_t\). Thus, by Lemma 2 we may approximate \((\mathrm{PP})\) by a problem of the form

figure i

Rearranging its Lagrangian leads to the following representation of \(({\mathrm{PP}}_n)\,\):

figure j

where

$$\begin{aligned} W_{t}^n:=\sum _{i=1}^{n_1}\sum _{j=1}^{n_2^{i}}\lambda _{t}^{i,j}Z_{t}^{i,j}f_{t}^{j} . \end{aligned}$$

This is a finite-dimensional bilinear problem. Notice that \(({\mathrm{PP}}_n)\) is always feasible.Footnote 8 We may thus interchange the \(\inf \) and the \(\sup \). Carrying out explicitly the minimization in x, the unconstrained minimax problem (14) can be written as the constrained maximization problem

figure k

Introducing a new probability measure \(\mathbb Q\) defined by the Radon–Nikodým derivative \(\frac{d\mathbb {Q}}{d\mathbb {{{\,\mathrm{{\mathbb {P}}_0}\,}}}} =W_T^n\), the problem can be rewritten in terms of \(\mathbb Q\) in the form

figure l

It is left to show that there is no duality gap in the limit, as \(n\rightarrow \infty \,\). Assume that the dual problem \((\mathrm{DD})\) has an optimal value \(\pi _{a}^{\prime }\ne \pi _{a}\,\). By the primal constraints in \((\mathrm{PP})\), for any dual feasible solution \(\mathbb {Q}\) it holds

$$\begin{aligned} \mathbb {E}^{\mathbb {Q}}\left[ \sum _{t=1}^{T}{C}_{t}\right] \le \mathbb {E}^{\mathbb {P}}\left[ \sum _{t=1}^{T-1}(x_{t-1}^{\top }{S}_{t}-x_{t}^{\top }{S}_{t})\cdot Z_{t} f_t+x_{T-1}^{\top }S_T \cdot Z_T f_T\right] =x_{0}^{\top }S_{0} . \end{aligned}$$

Thus, the optimal primal solution \(\pi _{a}\) is also greater than or equal to the optimal dual solution \(\pi _{a}^{\prime }\,\). Now assume \(\pi _{a}^{\prime }<\pi _{a}\,\). Then, since \(\pi _{a}^{n}\uparrow \pi _{a}\) by Lemma 2, there must exist some n such that \(\pi _{a}^{n}>\pi _{a}^{\prime }\,\). Moreover, there exists some \(\mathbb {Q}^{n}\), which is dual feasible and such that \(\mathbb {E}^{\mathbb {Q}^{n}}\left[ \sum _{t=1}^{T}{C}_{t}\right] =\pi _{a}^{n}\,\). This is a contradiction to \(\pi _{a}^{\prime }\) being the limit of the monotonically increasing sequence of optimal values of the approximate dual problems of the form \(({\mathrm{DD}}_n)\). Hence, \(\pi _{a}^{\prime }=\pi _{a}\), i.e., it is shown that there is no duality gap in the limit.

Finally, considering the structure of \({\mathfrak {D}}_t\), the condition \({\left. \frac{d\mathbb {Q}}{d\mathbb {{{\,\mathrm{{\mathbb {P}}_0}\,}}}}\bigg \vert \right. }_{\mathcal {F}_{t}} \in {\mathfrak {D}}_t\) means that it is of the form \(Z_t f_t\), where there exists some \(\mathbb P \in {\mathcal P}_{\!\!\varepsilon }\) such that \(Z_t \in \mathcal Z_{\mathcal A_t^{\mathbb P}}\) and \({\left. \frac{d\mathbb {P}}{d\mathbb {{{\,\mathrm{{\mathbb {P}}_0}\,}}}}\bigg \vert \right. }_{\mathcal {F}_{t}}=f_t\). This completes the derivation of the dual problem formulation \(({\mathrm{DD}})\). \(\square \)

3.2 Nested distance balls as ambiguity sets: a large deviations result

In order to find appropriate nonparametric distances for probability models used in the framework of stochastic optimization, one has to observe that a minimal requirement is that it metricizes weak convergence and allows for convergence of empirical distributions. The Kantorovich–Wasserstein distance does metricize the weak topology on the family of probability measures having a first moment. Its multistage generalization, the nested distance, measures the distance between stochastic processes on filtered probability spaces. The “Appendix” contains the definition and interpretation of both, the Kantorovich–Wasserstein distance and the nested distance.

Realistic probability models must be based on observed data. While for single- or vector-valued random variables with finite expectation the empirical distribution based on an i.i.d. sample converges in Kantorovich–Wasserstein distance to the underlying probability measure, the situation is more involved for stochastic processes. The simple empirical distribution for stochastic processes does not converge in nested distance (cf. Pflug and Pichler [32]), but a smoothed version involving density estimates does.

As we show here by merging the concepts of kernel estimations and transportation distances, one may get good estimates for confidence balls and ambiguity sets under some assumptions on regularity.

Let \(\mathbb {P}\) be the distribution of the stochastic process \(\xi =(\xi _1, \dots , \xi _T)\) with values \(\xi _t \in \mathbb {R}^m\). Notice that \(\mathbb {P}\) is a distribution on \(\mathbb {R}^\ell \) with \(\ell = m\cdot T\). Let \(\mathbb {P}^n\) be the probability measure of n independent samples from \(\mathbb {P}\). If \(\xi ^{(j)} =(\xi _1^{(j)}, \ldots , \xi _T^{(j)})\), \(j=1, \ldots ,n\) is such a sample, then the empirical distribution \({\hat{\mathbb {P}}}_n\) puts the weight 1 / n on each of the paths \(\xi ^{(j)}\). For the construction of nested ambiguity balls, the empirical distribution has to be smoothed by convolution with a kernel function k(x) for \(x \in \mathbb {R}^\ell \). For a bandwidth \(h>0\) to be specified later, let \(k_h(x)= \frac{1}{h^\ell }k(x/h)\). In what follows we will work with the kernel density estimate \({\hat{f}}_n = {\hat{\mathbb {P}}}_n * k_h\), where \(*\) denotes convolution.

Assumption A4

  1. 1.

    The support of \(\mathbb {P}\) is a set \(D= D_1 \times \dots \times D_T\), where \(D_i\) are compact sets in \(\mathbb {R}^m\);

  2. 2.

    \(\mathbb {P}\) has a Lebesgue density f, which is Lipschitz on D with constant L;

  3. 3.

    f is bounded from below and from above on D by \(0 < {\underline{c}} \le f(x) \le {\overline{c}}\);

  4. 4.

    the kernel function k vanishes outside the unit ball and is Lipschitz with constant L;

  5. 5.

    the conditional probabilities \(\mathbb {P}_t(A \vert x) = \mathbb {P}(\xi _t \in A \vert (\xi _1, \ldots , \xi _{t-1}) = x)\) satisfy

    $$\begin{aligned} {\mathsf {d}}\left( \mathbb {P}_t\left( \cdot |x\right) ,\mathbb {P}_t\left( \cdot |y\right) \right) \le \gamma _t\left\| x-y\right\| ,\quad x,y\in D \end{aligned}$$
    (15)

    for some \(\gamma _t>0\). Here, \({\mathsf {d}}\) denotes the Wasserstein distance for probabilities on \(\mathbb {R}^m\).

Remark 2

The proof of Theorem 3 below relies on the lower bound \({\underline{c}}\) of the density. As the denominator of the conditional density \(f(x\vert y)= f(x,y)/ f(y)\) has to be estimated by density estimation as well, the bound ensures that the denominator does not vanish. In fact, the assumptions on the compact cube (point 1.) can be weakened to D being a compact set; the proof, however, is slightly more involved then. For the other technical assumptions (under point 5.) we may refer to Mirkov and Pflug [23].

Theorem 3

(Large deviation for the nested distance) Under Assumption A4 there exists a constant \(K >0\) such that

(16)

for n sufficiently large and appropriately chosen bandwidth h. Here, denotes the nested distance.

The proof of (16) is based on several steps presented as propositions below. To start with we recall two important results for density estimates \({\hat{f}}_n = {\hat{\mathbb {P}}}_n * k_h\) for densities f on \(\mathbb {R}^\ell \).

Proposition 3

Under the Lipschitz conditions for f and k given above, it holds that

(17)

if the bandwidth is chosen as \(h=\varepsilon /(2 L)\).

Proof

See Bolley et al. [2, Prop. 3.1]. \(\square \)

Proposition 4

Let f and g be densities vanishing outside a compact set D and set \(\mathbb {P}^{f}(A)=\int _{A}f(x)\mathrm {d}x\) resp. \(\mathbb {P}^{g}(A)=\int _{A}g(x)\mathrm {d}x\,\). Then their Wasserstein distance is bounded by

(18)

Here \(\Delta \) is the diameter of D and \(\lambda (D)\) is the Lebesgue measure of D.

Proof

Cf. [32, Prop. 4]. \(\square \)

The next result extends the previous for conditional densities.

Proposition 5

Let f and g be bivariate densities on compact sets \({\bar{D}}_1 \times {\bar{D}}_2\) bounded by \(0<{\underline{c}} \le f,g \le {\overline{c}} <\infty \) which are sufficiently close so that \(\left\| f-g\right\| _{{\bar{D}}_{1}\times {\bar{D}}_{2}}\le {\underline{c}}\lambda ({\bar{D}}_{1}\times {\bar{D}}_{2}) [2\Delta ^{\ell }]^{-1}\,\). Then there is a universal constant \(\kappa _1\), depending on the set \({\bar{D}}:={\bar{D}}_{1}\times {\bar{D}}_{2}\) only, so that the conditional densities are close as well, i.e., they satisfy

$$\begin{aligned} \left| f(x|y)-g(x|y)\right| \le \kappa _1 \sup _{x^{\prime }\in {\bar{D}}_{1},y^{\prime }\in {\bar{D}}_{2}}\left| f(x^{\prime },y^{\prime })-g(x^{\prime },y^{\prime })\right| \end{aligned}$$

for all \(x\in {\bar{D}}_{1}\) and \(y\in {\bar{D}}_{2}\), i.e.,

$$\begin{aligned} \sup _{y\in {\bar{D}}_{2}}\left\| f(\cdot |y)-g(\cdot |y)\right\| _{{\bar{D}}_{1}}\le \kappa _1 \left\| f-g\right\| _{{\bar{D}}_{1}\times {\bar{D}}_{2}}. \end{aligned}$$
(19)

Proof

To abbreviate the notation set \(\varepsilon :=\sup _{x,y}\left| f(x,y)-g(x,y)\right| \) and note that \(\varepsilon \le {\underline{c}}\lambda ({\bar{D}}) [2\Delta ^{\ell }]^{-1}\,\). Consider the marginal density \(f(y):=\int _{{\bar{D}}_{1}}f(x,y)\mathrm {d}x\) (\(g(y):=\int _{{\bar{D}}_{1}}g(x,y)\mathrm {d}x\), resp.). It holds that

$$\begin{aligned} \left| f(y)-g(y)\right| \le \int _{{\bar{D}}_{1}}\left| f(x,y)-g(x,y)\right| \mathrm {d}x\le \int _{{\bar{D}}_{1}}\varepsilon \, \mathrm {d}x\le \Delta ^{\ell }\cdot \varepsilon . \end{aligned}$$

Clearly \(|f(y)|\ge {\underline{c}}\lambda ({\bar{D}}_{1})\), where \(\lambda ({\bar{D}}_{1})\) is the Lebesgue measure of \({\bar{D}}_{1}\) and therefore

$$\begin{aligned} \left| \frac{f(y)-g(y)}{f(y)}\right| \le \frac{\Delta ^{\ell }}{{\underline{c}}\lambda ({\bar{D}}_{1})}\cdot \varepsilon \le \frac{1}{2} . \end{aligned}$$
(20)

The elementary inequality \(\frac{1}{1+x}\le 1+2\left| x\right| \) is valid for \(x\ge -\nicefrac {1}{2}\). With (20) it follows that

$$\begin{aligned} g(x|y)-f(x|y)&=\frac{g(x,y)}{g(y)}-\frac{f(x,y)}{f(y)}=\frac{g(x,y)}{f(y)}\cdot \frac{1}{1+\frac{g(y)-f(y)}{f(y)}}-\frac{f(x,y)}{f(y)}\\&\le \frac{g(x,y)}{f(y)}\left( 1+2\frac{|g(y)-f(y)|}{f(y)}\right) -\frac{f(x,y)}{f(y)}\\&=\frac{g(x,y)-f(x,y)}{f(y)}+2\frac{g(x,y)}{f(y)}\frac{|g(y)-f(y)|}{f(y)}\\&\le \frac{\varepsilon }{{\underline{c}}\lambda ({\bar{D}}_{1})}+2\frac{{\overline{c}}}{{\underline{c}} \lambda ({\bar{D}}_1)}\frac{\Delta ^{\ell }}{{\underline{c}}\lambda ({\bar{D}}_{1})}\cdot \varepsilon \le \kappa _1 \varepsilon \end{aligned}$$

with \(\kappa _1=\frac{1}{{\underline{c}}\lambda ({\bar{D}}_{1})}+\frac{2{\overline{c}}\Delta ^{\ell }}{({\underline{c}}\lambda ({\bar{D}}_{1}))^2}\). The assertion of the proposition finally follows by exchanging the roles of the densities f and g. \(\square \)

Theorem 4

Given Assumption A4 there exists a constant \(\kappa _2\) such that

(21)

for all \(\varepsilon >0\) and n sufficiently large.

Proof

It follows from (18) and (19) that

$$\begin{aligned} {\mathsf {d}}\left( \mathbb {P}^{f(\cdot |y)},\mathbb {P}^{{\hat{f}}_{n}(\cdot |y)}\right) \le \kappa _3 \left\| f(\cdot |y)-{\hat{f}}_{n}(\cdot |y)\right\| _{\infty }\le \kappa _3 \left\| f-{\hat{f}}_{n}\right\| _{\infty } \end{aligned}$$

for \(\kappa _3=2 \Delta \lambda (D) \kappa _1\). Recall the large deviation result from [2, Th. 2.8], which is given by

$$\begin{aligned} \mathbb {P}^n({\mathsf {d}}({\hat{\mathbb {P}}}_n,\mathbb {P}) > \eta ) \le \exp (-n \kappa ^\prime \eta ^2) , \end{aligned}$$

for some universal constant \(\kappa ^\prime \) depending on the Lipschitz constants of f and k only.

With (17) it follows that

$$\begin{aligned} \mathbb {P}&\left( \sup _{y\in {\bar{D}}_{2}}{\mathsf {d}}\left( \mathbb {P}^{f(\cdot |y)},\mathbb {P}^{{\hat{f}}_{n}(\cdot |y)}\right)>\varepsilon \right) \le \mathbb {P}\left( \left\| f-{\hat{f}}_{n}\right\| _{\infty }>\frac{\varepsilon }{\kappa _3}\right) \\&\le \mathbb {P}^n\left( {\mathsf {d}} \left( {\hat{\mathbb {P}}}_{n},\mathbb {P}\right) > \frac{\varepsilon ^{\ell +2}}{(2L\kappa _3)^{\ell +2}}\right) \le \exp \left\{ -\kappa ^\prime n\left( \frac{\varepsilon ^{\ell +2}}{(2L\kappa _3)^{\ell +2}}\right) ^{2}\right\} . \end{aligned}$$

Setting \(\kappa _2:=\kappa ^\prime (2L\kappa _3)^{-2\ell -4}\) in (21) reveals the result. \(\square \)

Proof of Theorem 3

The previous theorem will be applied to the conditional densities of \(\xi _t\) given the past \(\xi _1, \ldots , \xi _{t-1}\). Thus the sets \({\bar{D}}_i\) are interpreted as \({\bar{D}}_1 = D_t\) and \({\bar{D}}_2 = D_1 \times \dots \times D_{t-1}\). For the probability measure \(\mathbb {P}\) satisfying (15) and any other measure \({\tilde{\mathbb {P}}}\) satisfying \({\mathsf {d}}\left( \mathbb {P}_t\left( \cdot |x\right) ,{\tilde{\mathbb {P}}}_t\left( \cdot |x\right) \right) \le \varepsilon _{t}\) at stage t we have that

$$\begin{aligned} {{\,\mathrm{\mathsf {dI}}\,}}\left( \mathbb {P},{\tilde{\mathbb {P}}}\right) \le \sum _{t=1}^{T}\varepsilon _{t}\gamma _{t}\prod _{s=t+1}^{T}(1+\gamma _{s}), \end{aligned}$$

see [31, Sec. 4.2] or [23].

We employ the results elaborated above for \({\tilde{\mathbb {P}}}:={\hat{\mathbb {P}}}_{n}*k_{h}\). Then

$$\begin{aligned}&\mathbb {P}^n\left( {{\,\mathrm{\mathsf {dI}}\,}}\left( \mathbb {P},{\hat{\mathbb {P}}}_n*k_{h}\right)>\varepsilon \right) \\&\quad \le \mathbb {P}^n\left( \sum _{t=1}^{T} {\mathsf {d}} \left( \mathbb {P}_{t}\left( \cdot |x_{t}\right) ,{\tilde{\mathbb {P}}}_{t}\left( \cdot |x_{t}\right) \right) \gamma _{t}\prod _{s=t+1}^{T}(1+\gamma _{s})>\varepsilon \right) \\&\quad = \sum _{t=1}^{T}\mathbb {P}^n \left( {\mathsf {d}}\left( \mathbb {P}_{t}\left( \cdot |x_{t}\right) ,{\tilde{\mathbb {P}}}_{t}\left( \cdot |x_{t}\right) \right) >\frac{\varepsilon }{T\gamma _{t}\prod _{s=t+1}^{T}(1+\gamma _{s})}\right) . \end{aligned}$$

We employ (21) to deduce that

$$\begin{aligned} \mathbb {P}^n \left( {{\,\mathrm{\mathsf {dI}}\,}}\left( \mathbb {P},{\hat{\mathbb {P}}}_n*k_{h}\right) >\varepsilon \right) \le \sum _{t=1}^{T}e^{-\kappa _2 n\varepsilon _{t}^{2\ell +4}} \end{aligned}$$

with \(\varepsilon _{t}:=\varepsilon [T\gamma _{t}\prod _{s=t+1}^{T}(1+\gamma _{s})]^{-1}\).

The desired large deviation result follows for n sufficiently large for any \(K<\min _{t\in \left\{ 1, \ldots ,T\right\} } \kappa _2 \left[ \left( T\gamma _{t}\prod _{s=t+1}^{T}(1+\gamma _{s})\right) ^{2\ell +4}\right] ^{-1}\). \(\square \)

The smoothed model \(\hat{\mathbb {P}}_n*k_{h}\) is not yet a tree, but by Theorem 6 of the “Appendix” one may findFootnote 9 a finite tree process \(\bar{\mathbb {P}}_n\), which is arbitrarily close to it. Therefore, by eventually increasing the probability bound in (16) by another constant factor, it holds true also for \(\bar{\mathbb {P}}_n\,\).

Remark 3

From a statistical perspective, the results contained in this section represent a strong motivation to use nested distance balls as ambiguity sets for general stochastic optimization problems on scenario trees constructed from observed data. In particular, the distributionally robust acceptable ask price allows the seller of a claim to invest in a trading strategy which gives an acceptable superhedge of the payments to be made under the true model with arbitrary high probability, given sufficient available data.

4 Illustrative examples

One may summarize the results of the previous sections in the following way: If the martingale measure is not unique (‘incomplete market’), then typically there is a positive bid–ask spread in the (pointwise) replication model. This spread does also exist in the acceptability model. However, if the acceptability functional is the \({{\,\mathrm{\mathbb A\mathbb V@R}\,}}_{\alpha }\), then by changing \(\alpha \) we can get the complete range between the replication model (\(\alpha \rightarrow 0)\) and the expectation model (\(\alpha =1)\). At least in the latter case, but possibly even for some \(\alpha < 1\,\), there is no bid–ask spread and thus a unique price. On the other hand, model ambiguity widens the bid–ask spread: The more models are considered, i.e., the larger the radius of the ambiguity set, the wider is the bid–ask spread. For illustrative purposes, let us look at the simplest form of examples which demonstrate these effects.

Example 1

Consider a three-stage ternary tree, where the paths are uniformly distributed and given by the columns of the matrix

$$\begin{aligned}\begin{bmatrix} 100&\quad 100&\quad 100&\quad 100&\quad 100&\quad 100&\quad 100&\quad 100&\quad 100 \\ 110&\quad 110&\quad 110&\quad 100&\quad 100&\quad 100&\quad 90&\quad 90&\quad 90 \\ 112&\quad 110&\quad 108&\quad 102&\quad 100&\quad 98&\quad 92&\quad 90&\quad 88 \end{bmatrix} .\end{aligned}$$

Since infinitely many equivalent martingale measures can be constructed on this tree, there is a considerable bid–ask spread for the pointwise replication model, which corresponds to the \({{\,\mathrm{\mathbb A\mathbb V@R}\,}}_{\alpha }\)-acceptability pricing model with \(\alpha =0\). However, by increasing \(\alpha \) for both contract sides, the bid–ask spread gets monotonically smaller. For \(\alpha =1\), there is no bid–ask spread, since all martingale measures coincide in their expectation and both buyer and seller only consider expectation in their valuation. Figure 1a visualizes this behavior for the price of a call option struck at \(95\%\): the bid price increases with \(\alpha \), while the ask price decreases. For \(\alpha =1\) they coincide.

Computationally, \({{\,\mathrm{\mathbb A\mathbb V@R}\,}}\)–acceptability pricing on scenario trees boils down to solving a linear program (LP). It is thus straightforward to implement and the problem scales with the complexity of LPs.

Example 2

In contrast, one may consider a three-stage binary tree model with uniformly distributed scenarios given by the columns of the matrix

$$\begin{aligned}\begin{bmatrix}100&\quad 100&\quad 100&\quad 100 \\ 105&\quad 105&\quad 95&\quad 95 \\ 108&\quad 102&\quad 98&\quad 92 \end{bmatrix} .\end{aligned}$$

This tree can carry only one single martingale measure. In such a model, the change of acceptability levels does not change the price, since also under weakened acceptability the price is determined by a martingale measure, namely the unique one (in case \(\alpha \) is small enough such that it is feasible). However, in an ambiguity situation, a bid–ask spread may appear, since there are typically many martingale measures contained in ambiguity sets. We consider nested distance balls around the baseline tree, where we keep the uniform distribution of the scenarios for simplicity, but allow the values of the process to change.Footnote 10 The result for a call option struck at \(95\%\) can be seen in Fig. 1b. While there is a unique price for small radii \(\varepsilon \) of the nested distance ball, an increasing bid–ask spread appears for larger values of \(\varepsilon \).

Fig. 1
figure 1

Distributionally robust acceptability pricing: the bid–ask spread as a function of the acceptability level \(\alpha \) and the ambiguity radius \(\varepsilon \,\)

5 Algorithmic solution

The nested distance between two given scenario trees can be obtained by solving an LP. However, the distributionally robust \({{\,\mathrm{\mathbb A\mathbb V@R}\,}}\)–acceptability pricing problem w.r.t. nested distance balls as ambiguity sets results in a highly non-linear, in general non-convex problem. Therefore, we assume the tree structure to be given by the baseline model. In particular, it is assumed that different probability models within the ambiguity set differ only in terms of the transition probabilities; state values and the information structure are kept fixed.

Still, distributionally robust acceptability pricing is a semi-infinite non-convex problem. The only algorithmic approach available in the literature for similar problems is based on the idea of successive programming (cf. [31, Chap. 7.3.3]): an approximate solution is computed by starting with the baseline model only and alternately adding worst case models and finding optimal solutions. However, for typical instances of tree models this is computationally hard, as it involves the solution of a non-convex problem in each iteration step.

Hence, we tackle the dual formulation presented in Theorem 2. The structure of the nested distance enables an iterative approach. Algorithm 1 finds an approximate solution by solving a sequence of linear programs. Based on duality considerations and algorithmic exploitation of the specific stagewise transportation structure inherent to the nested distance, the algorithm approximates the solution of a semi-infinite non-convex problem by a sequence of LPs. The current state-of-the-art method, on the other hand, requires the solution of a non-convex program in each iteration step. Clearly, a sequential linear programming approach improves the performance considerably.Footnote 11 Moreover, our algorithm turned out to find feasible solutions in many cases where our implementation of a successive programming method fails to do so.

Let us extend the concept of the nested distance to subtrees, iteratively from the leaves to the root (‘top-down’). For two scenario trees (here with identical filtration structures), define \({{\,\mathrm{\mathsf {dI}}\,}}_T(i,j)\) as the distance of the paths leading to the leave nodes \(i{,}j \in \mathcal N_T\). Moreover, define

$$\begin{aligned}{{\,\mathrm{\mathsf {dI}}\,}}_t(k,l) := \sum _{i \in k+} \sum _{j \in l+} \pi (i,j \vert k,l){{\,\mathrm{\mathsf {dI}}\,}}_{t+1}(i,j) ,\end{aligned}$$

for all nodes \(k,l \in \mathcal N_t\), where \(0 \le t < T\,\). Then, the nested distance between the two trees is given by \({{\,\mathrm{\mathsf {dI}}\,}}_0(1,1)\,\). This stagewise backwards approach (cf. [31, Alg. 2.1]) is the basic idea of Algorithm 1. As we assume the tree structure to be fixed, Algorithm 1 iterates through the tree in the same top-down manner and searches for the optimal solution in each stage, while ensuring that the nested distance constraint remains satisfied. The variables are the conditional transition probabilities under \(\mathbb Q\), i.e., \(q_i := \mathbb Q[i \vert i-]\), as well as the transportation subplans \(\pi (i,j \vert i-,j-)\), as defined in the “Appendix”. We use the notation \(n-\) for the immediate predecessor of some node n. As the measure \(\mathbb P\) is in fact not needed explicitly since it is given by the transportation plan from \(\hat{\mathbb P}\,\), condition (4.3) in Algorithm 1 serves to ensure that it is still well-defined implicitly (note that always some node \(\tilde{k} \in \mathcal N_{t-1}\) needs to be fixed). Condition (1) ensures that \(\mathbb Q\) is a martingale measure, \(\mathbb Q\) represents conditional probabilities by condition (2), condition (3) corresponds to the constraint on the measure change (\(d\mathbb Q / d\mathbb P \le 1 / \alpha \)) resulting from the primal \({{\,\mathrm{\mathbb A\mathbb V@R}\,}}_\alpha \)–acceptability conditions, and (4.1)–(4.3) represent the constraint that there must be one \(\mathbb P\) contained in the nested distance ball such that condition (3) holds.

The algorithm optimizes the variables stagewise top-down. The optimal solution at stage \(t+1\) depends on the values of the variables for all stages up to stage t, which result from the previous iteration step. Therefore, the algorithm iterates as long as there is further improvement possible at some stage, given updated variable values for the earlier stages of the tree. Otherwise, it terminates and the optimal solution of our approximate problem is found.

figure m

Example 3

Consider the price of a plain vanilla call option struck at 95, in the Black–Scholes model with parameters \(S_0 = 100, r = 0.01, \sigma = 0.2, T = 1\). Applying optimal quantization techniques (see, e.g., [31, Chap. 4] for an overview) to discretize the lognormal distribution, we construct a scenario tree with 500 nodes. While there exists a unique martingale measure (and thus a unique option price) in the Black–Scholes model, the discrete approximation allows for several martingale measures (and thus a positive bid–ask spread). Figure 2 visualizes the bid–ask spread as a function of the \({{\,\mathrm{\mathbb A\mathbb V@R}\,}}\)–acceptability level \(\alpha \) and the radius \(\varepsilon \) of the nested distance ball used as model ambiguity set. For \(\alpha \rightarrow 1\) and \(\varepsilon = 0\), the spread closes and the resulting price approximates the true Black–Scholes price up to 4 digits. For illustrative purposes, the spread between the bid and the ask price surface is shown from two perspectives.

Fig. 2
figure 2

The bid–ask spread as a function of acceptability and ambiguity

6 Conclusion

In this paper we extended the usual methods for contingent claim pricing into two directions. First, we replaced the replication constraint by a more realistic acceptability constraint. By doing so, the claim price does explicitly depend on the stochastic model for the price dynamics of the underlying (and not just on its null sets). If the model is based on observed data, then the calculation of the claim price can be seen as a statistical estimate. Therefore, as a second extension, we introduced model ambiguity into the acceptability pricing framework and we derived the dual problem formulations in the extended setting. Moreover, we used the nested distance for stochastic processes to define a confidence set for the underlying price model. In this way, we link acceptability prices of a claim to the quality of observed data. In particular, the size of the confidence region decreases with the sample size, i.e., the number of observed independent paths of the stochastic process of the underlying. For a given sample of observations, the ambiguity radius indicates how much the baseline ask/ bid price should be corrected to safeguard the seller/ buyer of a claim against the inherent statistical model risk, as Sect. 5 illustrates.