1 Introduction

This article is motivated by an attempt to understand the range of possible prices of an American put in a robust, or model-independent, framework. In our interpretation, this means that we assume we are given today’s prices of a family of European-style vanilla puts (for a continuum of strikes and for a discrete set of maturities). The goal is to find a consistent model for the underlying for which the American put has the highest price, where by definition a model is consistent if the discounted price process is a martingale and the model-based discounted expected values of European put payoffs match the given prices of European puts.

This notion of model-independent, or robust, bounds on the prices of exotic options was introduced in Hobson [17] in the context of lookback options, and has been applied several times since; see Brown et al. [8] (barrier options), Cox and Obłój [11] (no-touch options), Hobson and Neuberger [19] and Hobson and Klimmek [18] (forward-start straddles), Carr and Lee [9] and Cox and Wang [12] (variance options), Stebegg [27] (Asian options) and the survey article Hobson [24]. The principal idea is that the prices of the vanilla European puts determine the marginal distributions of the price process at the traded maturities (but not the joint distributions) and that these distributional requirements, coupled with the martingale property, place meaningful and useful restrictions on the class of consistent models. These restrictions lead to bounds on the expected payoffs of path-dependent functionals, or equivalently, to bounds on the prices of exotic options.

In addition to the pricing problem, there is a related dual or hedging problem. In the dual problem, the aim is to construct a static portfolio of European put options and a dynamic discrete-time hedge in the underlying which combine to form a superhedge (pathwise over a suitable class of candidate price paths) for the exotic option. The value of the dual problem is the cost of the cheapest superhedge. There is a growing literature, beginning with Beiglböck et al. [5] for discrete-time problems, and Galichon et al. [15] in continuous time, which aims to explain how to formulate the problem in such a way that there is no duality gap, i.e., the highest model-based price is equal to the cheapest superhedge, either for specific derivatives or in general.

Many of the early papers on robust hedging exploited a link with the Skorokhod embedding problem (Skorokhod [26, Sect. 7]). For example, in the study of the lookback option in Hobson [17], the consistent model which achieves the highest lookback price is constructed from the Azéma and Yor [2] solution of the Skorokhod embedding problem. More recently, Beiglböck et al. [5] (see also Dolinsky and Soner [14] and Touzi [29]) have championed the connection between robust hedging problems and martingale optimal transport. In this paper, we make use of the left-curtain martingale coupling introduced by Beiglböck and Juillet [4] and developed by Henry-Labordère and Touzi [16] and Beiglböck et al. [6].

The study of American-style claims in a robust framework was initiated by Neuberger [25]; see also Hobson and Neuberger [20], Bayraktar and Zhou [3] and Aksamit et al. [1]. (There is also a paper by Cox and Hoeggerl [10] which asks about the possible shapes of the price of an American put, considered as a function of strike, given the prices of co-maturing European puts.) The main innovation of the present paper is that rather than focusing on general American payoffs and proving that the pricing (primal) problem and the dual (hedging) problem have the same value, we focus explicitly on American puts and try to say as much as possible about the structure of the consistent price process for which the model-based American put price is maximised, and the structure of the cheapest superhedge.

Our problem can be cast as follows. Let \(M = (M_{0} = \bar{\mu }, M _{1} = X, M_{2}=Y)\) represent the discounted price of an underlying asset, where \(\bar{\mu }\) is a known constant. The laws of \(X\) and \(Y\) are presumed to be given and \(\mathcal{L}(X) = \mu \) and \(\mathcal{L}(Y)= \nu \), where \(\mu \) and \(\nu \) are (integrable) probability measures on ℝ with mean \(\bar{\mu }\). Given a martingale model (a filtered probability space supporting a stochastic process \(M\) which is a martingale), we consider an American put on \(M\) with strike \(K\). The option may only be exercised at time 1 or time 2; if the put is exercised at time 1, the payoff is \((K_{1} - X)^{+}\), if the put is exercised at time 2, the payoff is \((K_{2}-Y)^{+}\). Here \(K_{1}\) and \(K_{2}\) represent the discounted strikes of the put. For any martingale model, the model-based price of the American put is then given by the expected value of the payoff calculated under the best available stopping time (defined with respect to the filtration associated to the given model, and taking values in time 1 or time 2). Our primal problem is to find the highest possible model-based price of the American put, i.e., the highest expected payoff, where expectations are calculated under the probability measure of a consistent model (a model under which \(M\) is a martingale and has the given laws at times 1 and 2).

There is a corresponding dual or hedging problem of finding the cheapest superhedge based on static portfolios of European puts and a piecewise constant holding of the underlying asset; see Sect. 2.2.

Our main achievement is as follows:

Main result 1

Suppose\(\mu \)is continuous. The highest model-based expected payoff of the American put is equal to the cheapest superhedging price. Moreover, the highest model-based expected payoff is attained by the model associated with the left-curtain martingale coupling of Beiglböck and Juillet [4] (and a judiciously chosen stopping rule). Further, we can characterise the cheapest superhedging strategy, and it is one of four possible types.

For fixed \(\mu ,\nu \) and \(K_{1} > K_{2}\), there is typically a family of optimal models. Fixing \(\mu \) and \(\nu \) but varying \(K_{1}\) and \(K_{2}\), it turns out that there is a model which is optimal for all \(K_{1}\) and \(K_{2}\) simultaneously. This model is related to the left-curtain coupling of Beiglböck and Juillet [4].

The remainder of the paper is structured as follows. In the next section, we formulate precisely our problem of finding the robust, model-independent price of an American put. We also explain how the pricing problem is related to the dual problem of constructing the cheapest superhedge. In Sect. 3, we assume that \(\mu \) is continuous, transform the primal pricing problem into a martingale optimal transport (MOT) problem, and show by studying a series of ever more complicated setups how to determine the best model and hedge.

By weak duality, the highest model price is bounded above by the cost of the cheapest superhedge. Hence, if, on the one hand, we can identify a consistent model and stopping rule, and, on the other hand, a superhedge such that the expected payoff in that model with that stopping rule is equal to the cost of the superhedge, then we must have identified an optimal model and an optimal stopping rule together with an optimal hedging strategy. Moreover, there is no duality gap. This is the strategy of our proofs. One feature of our analysis is that wherever possible, we provide pictorial explanations and derivations of our results. In our view, this helps bring insights which may be hidden under calculus-based approaches.

2 Preliminaries and setup

2.1 The financial model and model-based prices for American puts

Suppose time is discrete and interest rates are non-stochastic. Without loss of generality, we may identify trading times with the nonnegative integers \(t=0,1,2\). Let \(M=(M_{t})_{t=0,1,2}\) be the discounted asset price which we expect to be a martingale under a pricing measure. We assume \(M_{0}\) is known at time 0.

We are interested in pricing an American put with strike \(K\) and maturity 2. Under the bond price numeraire, this corresponds to a put with strike \(K_{1}\) at time 1 and \(K_{2}\) at time 2 (and \(K_{1}>K_{2}\) provided interest rates are strictly positive, which we now assume without further comment). Note that we do not allow exercise at \(t=0\).

We suppose we are given European put prices for maturities 1 and 2 for a continuum of strikes. By classical arguments (e.g. Breeden and Litzenberger [7]), it is possible to infer the laws of the price of the asset, and hence the laws of the discounted asset price. Denote the law of \(X=M_{1}\) by \(\mu \) and the law of \(Y=M_{2}\) by \(\nu \). It follows from Jensen’s inequality that if \(\mu \) and \(\nu \) have arisen from sets of European put options in this way, \(\mu \) and \(\nu \) are in convex order, and we write \(\mu \leq _{\mathrm{cx}} \nu \) (see Sect. 2.4 for a further discussion of the properties of \(\mu\) and \(\nu \)).

Definition 1

(Hobson and Neuberger [21])

Suppose we have \(\mu \leq _{\mathrm{cx}} \nu \) and let \(\mathcal{S}= (\varOmega , \mathcal{F}, \mathbb{P}, \mathbb{F}= ( \mathcal{F}_{0}, \mathcal{F} _{1}, \mathcal{F}_{2} ))\) be a filtered probability space. We say that \(M=(M_{0},M_{1},M_{2})=(\bar{\mu },X,Y)\) is an \((\mathcal{S},\mu,\nu )\)-consistent stochastic process and write \(M \in \mathcal{M}(\mathcal{S}, \mu , \nu )\) if

(1) \(M\) is an \(\mathcal{S}\)-martingale;

(2) \(\mathcal{L}(M_{1}) = \mu \) and \(\mathcal{L}(M_{2}) = \nu \).

We say that \((\mathcal{S},M)\) is a \((\mu ,\nu )\)-consistent model if \(\mathcal{S}\) is a filtered probability space and \(M\) is an \((\mathcal{S},\mu ,\nu )\)-consistent stochastic process. Where \(\mu \) and \(\nu \) are clear from the context, this is sometimes abbreviated to a consistent model.

Let \(B_{1} \in \mathcal{F}_{1}\). Define the stopping time \(\tau _{B_{1}}\) by \(\tau _{B_{1}} = 1\) on \(B_{1}\) and \(\tau _{B_{1}}=2\) on \(B_{1}^{c}\). (Conversely, any stopping rule taking values in \(\{1,2\}\) has a representation of this form.) Suppose \((\mathcal{S},M)\) is a \((\mu ,\nu )\)-consistent model. The \((\mathcal{S},M)\)-model-based expected payoff (\(\mathrm{MBEP}\)) of the American put under the stopping rule \(\tau _{B_{1}}\) is

$$ \mathcal{A}(B_{1}, M, \mathcal{S}) = \mathbb{E}[(K_{\tau _{B_{1}}} - M _{\tau _{B_{1}}})^{+}] . $$

Then, optimising over stopping rules under the model \((\mathcal{S},M)\), the model-based price of the American put is \(\mathcal{A}(M, \mathcal{S}) = \sup _{B_{1}\in \mathcal{F}_{1}} \mathcal{A}(B_{1},M, \mathcal{S}) \). The highest model-based expected payoff for the American put is

$$ \mathcal{P}= \mathcal{P}(\mu ,\nu ) = \sup _{\mathcal{S}} \sup _{M \in \mathcal{M}(\mathcal{S}, \mu , \nu )} \mathcal{A}(M, \mathcal{S}) . $$

Amongst the class of consistent models, there is a natural and important class of models which we call the class of canonical models. Although we typically expect nonnegative prices in the finance context, in this definition and the mathematical analysis which follows, we allow measures \(\mu \) and \(\nu \) supported on ℝ.

Definition 2

Suppose \(\mu \leq _{\mathrm{cx}} \nu \). We call \((\hat{\mathcal{S}} = ( \hat{\varOmega }, \hat{\mathcal{F}}, \hat{\mathbb{P}}, \hat{\mathbb{F}} = ( \hat{\mathcal{F}}_{0}, \hat{\mathcal{F}}_{1}, \hat{\mathcal{F}}_{2} )), \hat{M})\)canonical\((\mu ,\nu )\)-consistent model if \((\hat{\mathcal{S}},\hat{M})\) is a \((\mu ,\nu )\)-consistent model such that \(\hat{\varOmega } = \mathbb{R}\times \mathbb{R}\), \(\hat{\mathbb{F}} = \mathcal{B}(\hat{\varOmega })\), \(\hat{M}_{1}(\omega _{1},\omega _{2}) = \omega _{1}\), \(\hat{M}_{2}(\omega _{1},\omega _{2}) = \omega _{2}\) and such that \(\hat{\mathcal{F}}_{0}\) is trivial, \(\hat{\mathcal{F}}_{1} = \sigma (\hat{M}_{1})\) and \(\hat{\mathcal{F}}_{2} = \sigma (\hat{M} _{1},\hat{M}_{2})\). Then \(\hat{B}_{1} \in \hat{\mathcal{F}}_{1}\) can be identified with an element \(\hat{B}\) in \(\mathcal{B}(\mathbb{R})\) via \(\hat{B}_{1} = \hat{B} \times \mathbb{R}\).

In the canonical setting, different models (consistent or not) can be parametrised by a probability measure \(\pi \) on \(\mathbb{R}^{2}\). To simplify the notation, we write \(\hat{M}_{\pi }\) for the canonical model \((\hat{\mathcal{S}}_{\pi }= (\hat{\varOmega }, \hat{\mathcal{F}}, \hat{\mathbb{P}}_{\pi }, \hat{\mathbb{F}}), \hat{M})\), where \(\hat{\mathbb{P}}_{\pi }\) is the probability measure such that \(\hat{\mathbb{P}}_{\pi }[X \in dx, Y \in dy] = \pi (dx, dy)\).

Define \(\hat{\mathcal{P}}= \sup \sup _{\hat{B} \in \mathcal{B}( \mathbb{R})} \mathbb{E}[ (K_{\tau _{\hat{B}}} - M_{\tau _{\hat{B}}})^{+}]\), where the first supremum is taken over canonical \((\mu ,\nu )\)-consistent models and \(\tau _{\hat{B}}(\omega ) = 1\) if \(X(\omega ) \in \hat{B}\) and \(\tau _{\hat{B}}(\omega )=2\) otherwise. Clearly, since the set of canonical consistent models is a subset of the set of all consistent models, we have \(\hat{\mathcal{P}} \leq \mathcal{P}\).

In this paper, we concentrate on the case where \(\mu \) is continuous. In that case, we show that \(\hat{\mathcal{P}} = \mathcal{P}\). However, if \(\mu \) has atoms, the situation becomes more delicate, as pointed out in Hobson and Neuberger [20]; see also Hobson and Neuberger [21], Bayraktar and Zhou [3] and Aksamit et al. [1]. On the one hand, we must allow a wider range of possible candidates for exercise-determining sets \(B_{1}\). On atoms of \(X\), we may want to sometimes stop and sometimes continue, although we must still take stopping decisions which do not violate the martingale property of future price movements. On the other hand, the functions \(T_{d}\), \(T_{u}\) that characterise the left-curtain coupling (see Sect. 2.5) become ill-defined on the points where \(\mu \) has atoms. Then it is not clear how the optimal model can be identified. For these reasons, we must extend our notion of a martingale coupling and generalise, in a useful fashion, the left-curtain martingale coupling of Beiglböck and Juillet [4] to the case with atoms. The appropriate extension of the left-curtain coupling to the case with atoms in \(\mu \) is discussed in a companion paper [22]; in the present paper, we focus on the financial aspects of our results, namely the application to the robust hedging of American puts.

2.2 Superhedging

The following notion of a robust superhedge for an American option was first introduced by Neuberger [25]; see also Bayraktar and Zhou [3] and Hobson and Neuberger [20].

We work in discounted units over two time-points. Consider a general American-style option with payoff \(a\) if exercised at time 1, and payoff \(b\) if exercised at time 2, where \(a:\mathbb{R}\to \mathbb{R}_{+}\) and \(b: \mathbb{R}\to \mathbb{R}_{+}\) are positive functions.

Definition 3

\((\phi ,\psi , (\theta _{i} )_{i = 1,2})\) is a superhedge for \((a,b)\) if for all \(x,y\in \mathbb{R}\),

$$\begin{aligned} a(x) & \leq \phi (x) + \psi (y) + \theta _{1}(x)(y-x), \end{aligned}$$
(2.1)
$$\begin{aligned} b(y) & \leq \phi (x) +\psi (y) + \theta _{2}(x)(y-x). \end{aligned}$$
(2.2)

The hedging cost (\(\mathrm{HC}\)) associated with the superhedge \((\phi ,\psi ,(\theta _{i})_{i=1,2})\) is given by

$$ \mathcal{C} = \mathcal{C}\big(\phi ,\psi , (\theta _{i} )_{i = 1,2}; \mu ,\nu \big) = \int \phi (x) \mu (dx) + \int \psi (y)\nu (dy) , $$

where we set \(\mathcal{C} = \infty \) if \(\int \phi (x)^{+} \mu (dx) + \int \psi (y)^{+} \nu (dy) = \infty \). We let \({\mathcal{H}}(a,b)\) be the set of superhedging strategies \((\phi ,\psi , (\theta _{i} )_{i = 1,2})\).

The idea behind the definition is that the hedger purchases a portfolio of maturity-1 European puts (and calls) with payoff \(\phi \) and a portfolio of maturity-2 European puts (and calls) with payoff \(\psi \). (The fact that this can be done and has cost \(\mathcal{C}\) follows from arguments of Breeden and Litzenberger [7].) In addition, if the American option is exercised at time 1, the hedger holds \(\theta _{1}\) units of the underlying between times 1 and 2; otherwise the hedger holds \(\theta _{2}\) units of the underlying over this time-period. In the former case, (2.1) implies that the strategy superhedges the American option payout; in the latter case, (2.2) implies the same.

The dual (superhedging) problem is to find

$$ \mathcal{D}={\mathcal{D}}(a,b;\mu , \nu ) = \inf _{(\phi ,\psi , (\theta _{i} )_{i = 1,2}) \in {\mathcal{H}}(a,b)} \mathcal{C}\big(\phi ,\psi , (\theta _{i} )_{i = 1,2};\mu ,\nu \big) . $$

Potentially, the space \({\mathcal{H}}= {\mathcal{H}}(a,b)\) could be very large and it is extremely useful to be able to search over a smaller space. The next lemma shows that any convex \(\psi \) with \(\psi \geq b\) can be used to generate a superhedge \((\phi ,\psi , (\theta _{i} )_{i = 1,2})\).

For a convex function \(\chi \), let \(\chi '_{+}\) denote the right derivative of \(\chi \).

Lemma 4

Suppose\(\psi \geq b\)with\(\psi \)convex. Define\(\phi = (a-\psi )^{+}\)and set\(\theta _{2}=0\)and\(\theta _{1}= - \psi '_{+}\). Then\((\phi ,\psi , (\theta _{i} )_{i = 1,2})\)is a superhedge.

Proof

We have

$$ b(y) \leq \psi (y) \leq \phi (x)+\psi (y) = \phi (x) +\psi (y) + \theta _{2}(x)(y-x) $$

and (2.2) follows. Also, by the convexity of \(\psi \), \(\psi (x) \leq \psi (y) - \psi '_{+}(x)(y-x)\) and

$$ a(x) \leq \big(a(x)-\psi (x)\big)^{+} + \psi (x) \leq \phi (x) + \psi (y) + \theta _{1}(x)(y-x). $$

Hence (2.1) follows. □

Let \(\breve{\mathcal{H}}= \breve{\mathcal{H}}(b)\) be the set of convex functions \(\psi \) with \(\psi \geq b\). For \(\psi \in \breve{\mathcal{H}}\), we can define the associated hedging cost \(\breve{\mathcal{C}}(\psi ; \mu ,\nu )\) by

$$\begin{aligned} \breve{\mathcal{C}}(\psi ; \mu ,\nu ) &= \mathcal{C}\left ((a- \psi )^{+},\psi ,\theta _{1} = -\psi '_{+}, \theta _{2} =0;\mu ,\nu \right ) \\ & =\int \big(a(x)-\psi (x)\big)^{+} \mu (dx) + \int \psi (y) \nu (dy). \end{aligned}$$

The reduced dual hedging problem restricts attention to superhedges generated from \(\psi \in \breve{\mathcal{H}}\) and is to find

$$ \breve{\mathcal{D}} = \breve{\mathcal{D}}(a,b;\mu ,\nu ) = \inf _{\psi \in \breve{\mathcal{H}}(b)} \breve{\mathcal{C}}(\psi ; \mu ,\nu ) . $$

Clearly, we have \({\mathcal{D}} \leq \breve{\mathcal{D}}\); we shall show that \({\mathcal{D}} = \breve{\mathcal{D}}\) for the American put.

2.3 Weak and strong duality

Let \((\mathcal{S},M)\) be a \((\mu ,\nu )\)-consistent model and let \(\tau \) be an arbitrary stopping time in this framework. The expected payoff of the American put under this stopping rule is \(\mathbb{E}[(K _{\tau }{-} M_{\tau})^{+}]\). Conversely, let \(\psi \) be any convex function with \(\psi (y) \geq (K_{2}{-} y)^{+}\) and let \(\phi (x)= ((K _{1}-x)^{+} - \psi (x))^{+}\) and \(\theta _{i}(x) = - \psi '_{+}(x) I _{\{ i = 1 \}}\). Then for any \(i \in \{1,2\}\), we have \((K_{i} {-} M _{i})^{+} \leq \psi (M_{2}) {+} \phi (M_{1}) {+} \theta _{i}(M_{1})(M_{2}{-}M _{1})\) and hence for any random time \(\tau \) taking values in \(\{1,2\}\), \((K_{\tau }- M_{\tau })^{+} \leq \psi (M_{2}) + \phi (M _{1}) + \theta _{\tau }(M_{1})(M_{2}- M_{1})\). Then \(\mathbb{E}[(K_{ \tau }- M_{\tau })^{+}] \leq \mathbb{E}^{X \sim \mu , Y \sim \nu }[ \phi (X) + \psi (Y)]\) and we have the weak duality \(\mathcal{P}\leq \mathcal{D}\).

In Sect. 3, we show that we can find \((\hat{\mathcal{S}}^{*}, \hat{M}^{*}, \hat{B}^{*})\), with \((\hat{\mathcal{S}}^{*}, \hat{M}^{*})\) a canonical \((\mu ,\nu )\)-consistent model and \(\hat{B}^{*} \in \mathcal{B}( \mathbb{R})\), and \(\psi ^{*} \in \breve{H}\) such that

$$ \mathcal{A}(\hat{B}^{*} \times \mathbb{R},\hat{M}^{*}, \hat{\mathcal{S}}^{*}) = \breve{\mathcal{C}}(\psi ^{*}; \mu , \nu ) . $$

Then \(\mathcal{A}(\hat{B}^{*}\times \mathbb{R},\hat{M}^{*}, \hat{\mathcal{S}}^{*}) \leq \hat{\mathcal{P}} \leq \mathcal{P}\leq \mathcal{D}\leq \breve{\mathcal{D}} \leq \breve{\mathcal{C}}(\psi ^{*}; \mu , \nu )\); but since the two outer terms are equal, we have \(\mathcal{P}=\mathcal{D}\) and strong duality. Moreover, \((\hat{\mathcal{S}}^{*},\hat{M}^{*})\) is a canonical, consistent model which generates the highest price for the American put (and \(\tau ^{*}\) given by \(\tau ^{*}=1\) if and only if \(X \in \hat{B}^{*}\) is the optimal exercise rule), and \(\psi ^{*}\) generates the cheapest superhedge.

2.4 Measures and convex order

Given a finite and integrable measure \(\eta \) (not necessarily a probability measure) on ℝ, define \(\bar{\eta } = \frac{ \int _{\mathbb{R}}x \eta (dx)}{\int _{\mathbb{R}}\eta (dx)}\) to be the barycentre of \(\eta \). Let \(\mathcal{I}_{\eta }\) be the smallest interval containing the support of \(\eta \), and let \(\{ \ell _{\eta }, r_{\eta }\}\) be the endpoints of \(\mathcal{I}_{\eta }\). Define the function \(P_{\eta }: \mathbb{R}\to \mathbb{R}_{+}\) by \(P_{\eta }(k) = \int _{-\infty }^{k} (k-x) \eta (dx)\). Then \(P_{\eta }\) is convex and increasing and represents the discounted European put price, expressed as a function of the strike, if the discounted underlying has law \(\eta \) at maturity. In addition, we have \(\{k : P_{\eta }(k) > \eta (\mathbb{R})(k - \bar{\eta })^{+} \} \subseteq \mathcal{I}_{\eta }\). Note that \(P_{\eta }\) is related to the potential \(U_{\eta }\) defined by \(U_{\eta }(k) : = \int _{\mathbb{R}} |k-x| \eta (dx)\) by \(P_{\eta }(k) = \frac{1}{2}(U_{\eta }(k) + (k- \bar{ \eta }) \eta (\mathbb{R}))\).

For any real numbers \(c< d\) and a measure \(\eta \), let \(\eta _{c,d}\) be the measure given by \(\eta _{c,d}(A)= \eta (A \cap (c,d))\), \(A\in \mathcal{B}(\mathbb{R})\). Let \(\tilde{\eta }_{c,d} = \eta - \eta _{c,d}\).

Two measures \(\eta \) and \(\chi \) are in convex order, and we write \(\eta \leq _{\mathrm{cx}} \chi \), if and only if \(\eta (\mathbb{R})= \chi (\mathbb{R})\), \(\bar{\eta }= \bar{\chi }\) and \(P_{\eta }(k) \leq P_{\chi }(k)\) on ℝ (or equivalently, if \(\int _{ \mathbb{R}}f\,d\eta \leq \int _{\mathbb{R}}f\,d\chi \) for any convex \(f:\mathbb{R}\to \mathbb{R}\)). Necessarily, we then have \(\ell _{ \chi }\leq \ell _{\eta }\leq r_{\eta }\leq r_{\chi }\). Let \({\varPi }( \eta ,\chi )\) be the set of martingale couplings of \(\eta \) and \(\chi \), i.e.,

$$\begin{aligned} {\varPi }(\eta ,\chi ) = \{ \pi \in \mathcal{P}( \mathbb{R}^{2}) & : \mbox{$\pi$ has first marginal $\eta $ and second marginal $\chi $},\\ & \phantom{=} \mbox{and (2.3) holds}\}, \end{aligned}$$

where \(\mathcal{P}(\mathbb{R}^{2})\) is the set of probability measures on \(\mathbb{R}^{2}\) and (2.3) is the martingale condition

$$\begin{aligned} \int _{x \in B} \int _{y \in \mathbb{R}} y \pi (dx,dy) &= \int _{x \in B} \int _{y \in \mathbb{R}} x \pi (dx,dy) \\ &= \int _{B} x \eta (dx), \qquad \forall B \in \mathcal{B}(\mathbb{R}). \end{aligned}$$
(2.3)

By a classical result of Strassen [28], \({\varPi }( \mu ,\nu )\) is nonempty if and only if \(\mu \leq _{\mathrm{cx}}\nu \). Note that there is a 1–1 correspondence between canonical, \((\mu ,\nu )\)-consistent models and elements \(\pi \in \varPi (\mu , \nu )\), given by \(\hat{\mathbb{P}}[\hat{M}_{1} \in dx, \hat{M}_{2} \in dy] = \pi (dx,dy)\).

Given \(\pi \in \varPi (\mu ,\nu )\), we say \(\pi \) maps \(A \subseteq \mathbb{R}\) to \(B\subseteq \mathbb{R}\) if \(\pi (A \times \mathbb{R}) = \pi (A \times B)\) (or equivalently, if under the canonical model, \(\hat{M}_{1} \in A\) implies \(\hat{M}_{2} \in B\) almost surely). We say \(\pi \) maps \(A\) onto \(B\) if \(\pi (A \times \mathbb{R}) = \pi (A \times B) =\pi (\mathbb{R}\times B)\) (or equivalently, if under the canonical model, \(\hat{M}_{1} \in A\) if and only if \(\hat{M}_{2} \in B\) almost surely). Finally, we say \(\pi \) is constant on \(A\) if \(\pi (C \times \mathbb{R}) = \pi (C \times C)\) for all \(C \subseteq A\) (or equivalently, if \(\hat{M}_{2} = \hat{M}_{1}\) almost surely on \(\{\hat{M}_{1} \in A\}\)).

For a pair of measures \(\eta ,\chi \) on ℝ, let the function \(D = D_{\eta ,\chi }: \mathbb{R}\to \mathbb{R}_{+}\) be defined by \(D_{\eta ,\chi }(k) = P_{\chi }(k) - P_{\eta }(k)\). Note that if \(\eta ,\chi \) have equal mass and equal barycentre, then \(\eta \leq _{\mathrm{cx}} \chi \) is equivalent to \(D \geq 0\) on ℝ. Let \(\mathcal{I}_{D} = [\ell _{D},r_{D}]\) be the smallest closed interval containing \(\{ k : D_{\eta ,\chi }(k)>0 \}\). If \(\mathcal{I}_{D}\) is such that \(\mathcal{I}_{D} \subseteq \mathcal{I}_{\chi }\), then we must have \(\eta =\chi \) on \([\ell _{\chi }, \ell _{D}) \cup (r_{D}, r_{ \chi }]\).

The following lemma tells us that if \(D_{\eta ,\chi }(x)=0\) for some \(x\), then in any martingale coupling of \(\eta \) and \(\chi \), no mass can cross \(x\). The result is well known and can be traced back (at least) to Hobson [23, Sect. 2].

Lemma 5

Suppose\(\eta \)and\(\chi \)are probability measures with\(\eta \leq _{\mathrm{cx}} \chi \). Suppose that\(D(x)=0\). If\(\pi \in {\varPi }( \eta ,\chi )\), then

$$ \pi \big((-\infty ,x) \times (x,\infty )\big) + \pi \big((x,\infty ) \times (-\infty ,x)\big)=0. $$

It follows from Lemma 2.5 that if there is a point \(x\) in the interior of the interval \(\mathcal{I}_{\eta }\) such that \(D_{\eta ,\chi }(x)=0\), then we can separate the problem of constructing martingale couplings of \(\eta \) to \(\chi \) into a pair of subproblems involving mass to the left and right of \(x\), respectively, always taking care to allocate the mass of \(\chi \) at \(x\) appropriately. Indeed, if there are multiple \(x_{j}\) with \(D_{\eta ,\chi }(x_{j})=0\), then we can divide the problem into a sequence of ‘irreducible’ problems,Footnote 1 each taking place on an interval \(\mathcal{I}_{i}\) such that \(D>0\) on the interior of \(\mathcal{I}_{i}\) and \(D=0\) at the endpoints. All mass starting in a given interval is transported to a point in the same interval. However, in our setting, in addition to specifying a model (or equivalently a martingale coupling), we also need to specify a stopping rule, and this needs to be defined across all irreducible components simultaneously. For this reason, we do not insist that \(D>0\) on the interior of \(\mathcal{I}_{\chi }\), although this will be the case in the simple settings in which we build our solution.

2.5 The left-curtain coupling

The left-curtain coupling (or martingale transport) was introduced by Beiglböck and Juillet [4] and further studied by Henry-Labordère and Touzi [16] and Beiglböck et al. [6].

For real numbers \(c,d\) with \(c \leq x \leq d\), define the probability measure \(\chi _{c,x,d}\) by \(\chi _{c,x,d} = \frac{d-x}{d-c}\delta _{c} + \frac{x-c}{d-c} \delta _{d}\) with \(\chi _{c,x,d} = \delta _{x}\) if \((d-x)(x-c)=0\). Note that \(\chi _{c,x,d}\) has mean \(x\). \(\chi _{c,x,d}\) is the law of a Brownian motion started at \(x\) evaluated on the first exit from \((c,d)\).

Lemma 6

(Beiglböck and Juillet [4, Corollary 1.6])

Let\(\mu ,\nu \)be probability measures with\(\mu \leq _{\mathrm{cx}} \nu \)and assume that\(\mu \)is continuous. Then there exists a pair of measurable functions\(T_{d} : \mathbb{R}\to \mathbb{R}\)and\(T_{u} : \mathbb{R}\to \mathbb{R}\)with the properties that\(T_{d}(x) \leq x \leq T_{u}(x)\), that\(T_{u}(x) \leq T_{u}(x')\)and\(T_{d}(x') \notin (T_{d}(x),T_{u}(x))\)for all\(x< x'\), and that if we define\(\pi _{\mathrm{lc}}(dx,dy) = \mu (dx) \chi _{T_{d}(x),x,T_{u}(x)}(dy)\), then\(\pi _{\mathrm{lc}} \in {\varPi }( \mu ,\nu )\). \(\pi _{\mathrm{lc}}\) is called the left-curtain martingale coupling.

Note that there is no claim of uniqueness of the functions \(T_{d},T _{u}\) in Lemma 2.6. Indeed, \(T_{d},T_{u}\) are unique only \(\mu \)-a.s. (see Beiglböck et al. [6, Proposition 3.4]). For example, the definitions of \(T_{d}\) and \(T_{u}\) are immaterial outside \([\ell _{\mu },r_{\mu }]\). Further, if \(T_{u}\) has a (necessarily upward) jump at \(x'\), then it does not matter what value we take for \(T_{u}(x')\) provided \(T_{u}(x') \in [T_{u}(x'-),T_{u}(x'+)]\). (Since we are assuming \(\mu \) is continuous, the probability that we choose an \(x\)-coordinate value of \(x'\) is zero.) More importantly, if \((T_{d},T_{u})\) satisfy the properties of Lemma 2.6 and if \(T_{u}(x)= x\) on an interval \([\underline{x},\overline{x})\), then we can modify the definition of \(T_{d}\) on \([\underline{x},\overline{x})\) to either \(T_{d}(x)= x\) or \(T_{d}(x)= T_{d}(\underline{x}-)\) and still satisfy the relevant monotonicity properties. Henry-Labordère and Touzi [16] resolve this indeterminacy by setting \(T_{d}(x) = x\) on the set \(\{T_{u}(x)=x\}\) and also taking \(T_{u}\) and \(T_{d}\) to be right-continuous.

In the sequel, we follow Henry-Labordère and Touzi [16] by taking \(T_{d}(x) = x\) on the set \(\{T_{u}(x)=x\}\), but we do not make right-continuity assumptions on \(T_{d}\) and \(T_{u}\). Also we write \((f,g)\) in place of \((T_{d},T _{u})\). Our functions \(f\) and \(g\) will eventually be defined on ℝ (see Sect. 3.3.5), but for now we define them just on \([\ell _{\mu }, r_{\mu }]\).

Lemma 7

Let\((T_{d},T_{u})\)be a pair of functions satisfying the properties in Lemma 2.6. Suppose they lead to a solution\(\pi _{\mathrm{lc}} \in {\varPi }(\mu ,\nu )\). On\([\ell _{\mu }, r_{\mu }]\), set\(g(x)=T _{u}(x)\); on\(\{g(x)>x\}\), set\(f(x)=T_{d}(x)\); and on\(\{g(x)=x\}\), set\(f(x)=x\). Then\((f,g)\)are such that\(f(x) \leq x \leq g(x)\)and for all\(x'>x\), we have\(g(x') \geq g(x)\)and\(f(x') \notin (f(x),g(x))\). Moreover,

$$ \mu (dx) \chi _{f(x),x,g(x)}(dy) = \mu (dx) \chi _{T_{d}(x),x,T_{u}(x)}(dy). $$

Proof

The property \(f(x) \leq x \leq g(x)\) is immediate; so we only need to check that for \(x'>x\), we have \(g(x') \geq g(x)\) and \(f(x') \notin (f(x),g(x))\). Monotonicity of \(g\) is inherited from monotonicity of \(T_{u}\). If \(g(x)=x\), then \(f(x)=x\) and \(f(x') \notin (f(x),g(x))= \emptyset \). If \(g(x)> x\) and \(g(x')>x'\), then

$$ f(x') = T_{d}(x') \notin \big(T_{d}(x),T_{u}(x)\big) = \big(f(x),g(x) \big). $$

Finally, if \(g(x)>x\) and \(g(x') = x'\), then

$$ f(x')=x' \notin \big(f(x),x'=g(x')\big) \supseteq \big(f(x),g(x) \big). $$

 □

Figure 2.1 gives a stylised representation of \(f\) and \(g\) in the case where \(\nu \) has no atoms. (Atoms of \(\nu \) lead to horizontal sections of \(f\) and \(g\); see Sect. 3.6.) In the figure, the set \(\{ g(x)>x \}\) is a finite union of intervals whereas in general it may be a countable union of intervals. Similarly, in the figure, \(f\) has finitely many downward jumps whereas in general, it may have countably many jumps. Nonetheless Fig. 2.1 captures the essential behaviour of \(f\) and \(g\).

Fig. 2.1
figure 1

Stylised plot of the functions \(f\) and \(g\) in the general case (with no atoms). Note that on the set \(\{g(x)=x\}\), we have \(f(x)=x\)

Suppose \(\nu \) is also continuous and fix \(x\). Under the left-curtain martingale coupling, mass in the interval \((f(x),x)\) at time 1 is mapped to the interval \((f(x), g(x))\) at time 2. Thus the points \(f(x), g(x) \) with \(f(x) \leq x \leq g(x)\) are solutions to

$$\begin{aligned} \int _{f}^{x} \mu (dz) & = \int _{f}^{g} \nu (dz), \end{aligned}$$
(2.4)
$$\begin{aligned} \int _{f}^{x} z \mu (dz) & = \int _{f}^{g} z \nu (dz). \end{aligned}$$
(2.5)

Essentially, (2.4) is a preservation-of-mass condition and (2.5) is preservation of the mean and the martingale property. If \(\nu \) has atoms, then (2.4) and (2.5) become

$$\begin{aligned} \int _{f}^{x} \mu (dz) & = \int _{(f,g)} \nu (dz) + \lambda _{f} + \lambda _{g}, \end{aligned}$$
(2.6)
$$\begin{aligned} \int _{f}^{x} z \mu (dz) & = \int _{(f,g)} z \nu (dz) + f \lambda _{f} + g \lambda _{g} , \end{aligned}$$
(2.7)

respectively, where \(0 \leq \lambda _{f} \leq \nu ( \{ f \} )\) and \(0 \leq \lambda _{g} \leq \nu ( \{ g \} )\).

Returning to the case of continuous \(\mu \) and \(\nu \), for fixed \(x\), there can be multiple solutions to (2.4) and (2.5). If, however, we consider \(f\) and \(g\) as functions of \(x\) and impose the additional monotonicity properties of Lemma 2.6 (i.e., \(g(x) \leq g(x')\) and \(f(x') \notin (f(x),g(x))\) for \(x< x'\)), then typically, for \(\mu \)-almost all \(x\), there is a unique solution to (2.4) and (2.5). However, there are exceptional \(x\) at which \(f\) jumps and at which there are multiple solutions; see Sect. 3.3.

Remark 8

In a related problem, Hobson and Klimmek [18] show how under natural simplifying assumptions, including the dispersion assumption below (see Assumption 3.3), upper and lower functions can be characterised as solutions of a pair of coupled differential equations. In our case, \((f,g)\) solve a pair of coupled differential equations on an interval \([e_{-},r_{\mu })\), obtained from differentiating (2.4) and (2.5), namely

$$\begin{aligned} \frac{df}{dx} \phantom{-} &= - \frac{g-x}{g-f} \frac{\rho (x)}{\eta (f)-\rho (f)}, \\ \frac{dg}{dx} \phantom{-} &= \phantom{-} \frac{x-f}{g-f} \frac{\rho (x)}{\eta (g)}, \end{aligned}$$

with the initial condition \(f(e_{-})= e_{-} =g(e_{-})\). In addition, see Henry-Labordère and Touzi [16, Eqs. (3.10) and (3.9)], where the construction of \(T_{d}\) and \(T_{u}\) is exactly based on the resolution of (2.4) and (2.5).

3 Robust bounds for American puts when \(\mu \) is atom-free

3.1 Problem formulation

Our goal in this section is to derive the highest consistent model price for the American put. We begin by giving a concise reformulation of the primal problem (recall Sect. 2.1) as a problem of martingale optimal transport (MOT), and stating the main theorem (Theorem 3.1). Then we first study the problem in a simple special case, second generalise to a case which exhibits all the main features and third present the analysis in the general case.

Recall the definition of the canonical \((\mu ,\nu )\)-consistent model (abbreviated to \(\hat{M}_{\pi }\)) for which \(\hat{\mathbb{P}}[\hat{M} _{1} \in dx, \hat{M}_{2} \in dy] = \pi (dx,dy)\), where \(\pi \in \varPi (\mu ,\nu )\). For a pair of fixed constants \(K_{1}\) and \(K_{2}\), the problem we consider is to find

$$ \hat{\mathcal{P}}:= \sup _{\pi \in {\varPi }(\mu ,\nu )} \sup _{B \in \mathcal{B}(\mathbb{R})} \mathbb{E}^{\mathcal{L}(X,Y) = \pi } \big[ (K_{1} - X)^{+} I_{ \{ X \in B \} } + (K_{2} - Y)^{+} I _{ \{ X \notin B \} } \big]. $$

Note that \(\hat{\mathcal{P}}\) corresponds to the highest model-based price of the American put over the specific subset of consistent models, and therefore \(\hat{\mathcal{P}}\leq \mathcal{P}\). By weak duality (recall Sect. 2.3), it follows that \(\hat{\mathcal{P}}\leq \mathcal{P}\leq \mathcal{D}\leq \breve{\mathcal{D}}\).

Throughout this paper, we assume that\(\mu \)has no atoms. The same assumption is made in (parts of) Beiglböck and Juillet [4], Henry-Labordère and Touzi [16] and Beiglböck et al. [6]. The extension of the left-curtain martingale coupling to the case where \(\mu \) has atoms is the subject of Hobson and Norgilas [22].

Theorem 1

Suppose\(\mu \)has no atoms. Then\(\hat{\mathcal{P}}=\mathcal{P}= \mathcal{D}=\breve{\mathcal{D}}\).

We begin by considering a couple of degenerate cases.

We say the put is in the money at time 1 (respectively time 2) if \(X < K_{1}\) (respectively \(Y< K_{2}\)). If the inequality is reversed, then the put is out of the money. If \(K_{1} \leq \ell _{\mu }\), then the American put is always out of the money at time 1, and the American put is equivalent to the European put with strike \(K_{2}\) and maturity 2. Since puts with strike \(K\) and maturity 1 are costless for \(K\leq \ell _{\mu }\), a simple superhedging strategy is to purchase one European put with strike \(K_{1}\) and sell one European put with strike \(K_{2}\), both with maturity 1, and also purchase one European put with strike \(K_{2}\) and maturity 2. (This strategy is of the form discussed in Lemma 2.4 and is generated by \(\psi (y) = (K_{2}-y)^{+}\).) The cost of this hedge is \(P_{\nu }(K_{2})\); this is also the model-based expected payoff of the American put under any consistent model.

If \(K_{1} \leq K_{2}\), then \(\mathbb{E}[(K_{2}-Y)^{+}|X] \geq (K_{2} - X)^{+} \geq (K_{1}-X)^{+}\) and \(\tau =2\) is optimal. Again, the American put is equivalent to the European put with strike \(K_{2}\) and maturity 2. In this case, for a superhedge, it is sufficient to purchase one European put with strike \(K_{2}\) and maturity 2. By Lemma 2.4 (with \(\psi (y) = (K_{2}-y)^{+}\) and \(\phi =0\)), this generates a superhedge with cost \(P_{\nu }(K_{2})\). Again, this is the model-based expected payoff of the American put under any consistent model.

For the remainder of the paper, we make the

Standing Assumption 2

\(K_{1} > \max \{ \ell _{\mu }, K_{2} \}\).

3.2 American puts under the dispersion assumption

3.2.1 The left-curtain coupling

The goal in this section is to present the theory in a simple special case, and to illustrate the main features and solution techniques of our approach unencumbered by technical issues or the consideration of exceptional cases. The following assumption is a small modification of one introduced by Hobson and Klimmek [18]; see also Henry-Labordère and Touzi [16]. For illustration, see Fig. 3.1.

Fig. 3.1
figure 2

Sketch of the densities \(\rho \) and \(\eta \) and the locations of \(f=f(x)\), \(g=g(x)\) for given \(x>e_{-}\). Time-1 mass in the interval \((f,x)\) stays in the same place if possible. Mass which cannot stay constant is mapped to \((f,e_{-})\) or \((x,g)\) in a way which respects the martingale property

Assumption 3

(Dispersion assumption)

Laws \(\mu \) and \(\nu \) are absolutely continuous with continuous densities \(\rho \) and \(\eta \), respectively; \(\nu \) has support on \((\ell _{\nu },r_{\nu }) \subseteq (-\infty , \infty )\) and \(\eta >0\) on \((\ell _{\nu },r_{\nu })\); \(\mu \) has support on \((\ell _{\mu },r_{\mu }) \subseteq (\ell _{\nu },r_{\nu })\) and \(\rho >0\) on \((\ell _{\mu },r_{\mu })\). In addition:

  • \((\mu -\nu )^{+}\) is concentrated on an interval \(E = (e_{-},e_{+})\) and \(\rho >\eta \) on \(E\);

  • \((\nu - \mu )^{+}\) is concentrated on \((\ell _{\nu }, r_{\nu }) \setminus E\) and \(\eta >\rho \) on \((\ell _{\nu },e_{-}) \cup (e_{+},r_{\nu })\).

If \(\mu \leq _{\mathrm{cx}} \nu \) are centred normal distributions with different variances or distinct lognormal distribution with common mean, then Assumption 3.3 is satisfied.

Under the dispersion assumption, \(\{ k:D_{\mu ,\nu }(k)>0 \}\) is an interval and \(D=D_{\mu ,\nu }\) is convex to the left of \(e_{-}\), concave on \((e_{-},e_{+})\) and again convex above \(e_{+}\).

Lemma 4

(Henry-Labordère and Touzi [16, Sect. 3.4])

Suppose that Assumption 3.3holds. For all\(x \in (e_{-}, r_{\mu })\), there exist\(f,g\)with\(f < e_{-} < x < g\)such that (2.4) and (2.5) hold. Moreover, if we consider\(f\)and\(g\)as functions of\(x\)on\((e_{-},r_{\mu })\), then\(f\)and\(g\)are continuous, \(f\)is strictly decreasing and\(g\)is strictly increasing, \(\lim _{x \downarrow e_{-}} f(x) = e_{-} = \lim _{x \downarrow e_{-}} g(x)\), \(\lim _{x \uparrow r_{\mu }} f(x) = \ell _{\nu }\)and\(\lim _{x \uparrow r_{\mu }} g(x) = r_{\nu }\).

The principle behind the left-curtain martingale coupling in Beiglböck and Juillet [4] is that they determine where to map mass at \(x\) at time 1 sequentially working from left to right. In our current setting, there is an interval \((\ell _{\mu },e_{-}]\) on which mass can remain unmoved between times 1 and 2. To the right of \(e_{-}\), we can define \(f,g\) in such a way that mass is moved as little as possible. This leads to the ODEs in Remark 2.8.

3.2.2 The American put

Suppose \(K_{1} \in (e_{-},r_{\mu }]\) and suppose \(f\) and \(g\) are constructed as in Lemma 3.4. Define \(\varLambda :[g ^{-1}(K_{1}),K_{1}] \to \mathbb{R}\) by

$$\begin{aligned} \varLambda (x) &= \frac{(K_{2} - f(x))-(K_{1}-x)}{x-f(x)} - \frac{K_{1} -x}{g(x)-x} \\ &= \frac{g(x)-K_{1}}{g(x)-x} - \frac{K_{1} -K_{2}}{x-f(x)}. \end{aligned}$$
(3.1)

Pictorially \(\varLambda \) is the difference in slope of the two dashed lines in Fig. 3.2.

Fig. 3.2
figure 3

Sketch of put payoffs with points \(x\), \(f\) and \(g\) marked. \(\varLambda (x)\) is the difference in slope of the two dashed lines

Lemma 5

Suppose\(K_{1} \in (e_{-},r_{\mu }]\)and\(f(K_{1}) < K_{2}\). Then there is a unique scalar\(x^{*}=x^{*}(\mu ,\nu ;K_{1},K_{2}) \in (g^{-1}(K _{1}),K_{1})\)such that\(\varLambda (x^{*})=0\). Moreover, \(f(x ^{*})< K_{2}\)and

$$\begin{aligned} \frac{K_{2} - f(x^{*})}{g(x^{*})-f(x^{*})} = \frac{K_{1} -x^{*}}{g(x ^{*})-x^{*}} &= \frac{(x^{*} - f(x^{*})) -(K_{1} -K_{2})}{x^{*} - f(x ^{*})} \\ & = 1 - \frac{K_{1} -K_{2}}{x^{*} - f(x^{*})}. \end{aligned}$$
(3.2)

Proof

First, from the continuity and monotonicity properties of \(f\) and \(g\), we have that (see Fig. 3.2) \(\varLambda \) is continuous and strictly increasing. Moreover, \(\varLambda (g^{-1}(K_{1}))= -\frac{K_{1}-K_{2}}{g^{-1}(K_{1}) - (f \circ g^{-1})(K_{1})} < 0\) and \(\varLambda (K_{1})=\frac{K_{2}-f(K_{1})}{K_{1} - f(K_{1})}>0\) by hypothesis. Hence there is a unique root to \(\varLambda =0\). At this root, the equalities in (3.2) hold. □

Suppose \(K_{1} > e_{-}\) and \(f(K_{1}) < K_{2}\) and that \(x^{*}=x^{*}( \mu ,\nu ;K_{1},K_{2}) \in (e_{-},K_{1})\) is such that \(\varLambda (x ^{*})=0\). It is easy to find a martingale coupling \(\pi \) of \(\mu \) and \(\nu \) such that \(\pi \) maps \((f(x^{*}),x^{*})\) onto \((f(x^{*}),g(x^{*}))\), and such that \(\pi \) is constant on \((-\infty ,f^{*})\). For example, we may take \(\hat{\pi } = \pi _{\mathrm{lc}} = \pi _{\mathrm{lc}}(\mu ,\nu )\), the left-curtain martingale coupling of Beiglböck and Juillet [4]. More generally, let \({\pi }_{x^{*}} \in {\varPi }(\mu ,\nu )\) be any martingale coupling such that \({\pi }_{x^{*}}\) maps \((-\infty , f(x^{*}))\) to itself, maps \((f(x^{*}),x^{*})\) onto \((f(x^{*}),g(x^{*}))\), and maps \((x^{*}, \infty )\) to \((-\infty ,f(x^{*})) \cup (g(x^{*}),\infty )\). The martingale coupling represented in Fig. 3.3 has this property.

Fig. 3.3
figure 4

Sketch of functions \(f\) and \(g\) under the dispersion assumption, with the regions \(K_{2} < f(K_{1})\) and \(K_{2} > f(K_{1})\) shaded. This is a simple special case of Fig. 2.1

Consider a canonical \((\mu ,\nu )\)-consistent model \(\hat{M}_{\pi _{x ^{*}}}\), under which the corresponding probability measure \(\hat{\mathbb{P}}\) is given by \(\hat{\mathbb{P}}[X \in dx, Y \in dy] = {\pi }_{x^{*}}(dx,dy)\). Let \(\tau ^{*}\) be the stopping time such that \(\tau ^{*}=1\) on \((-\infty ,x^{*})\) and \(\tau ^{*}=2\) otherwise. Our claim in Theorem 3.6 below is that \(\hat{M}_{\pi _{x^{*}}}\) and the stopping time \(\tau ^{*}\) are such that the model-based price of the American put under this stopping time is the highest possible, over all consistent models.

Continue to suppose \(K_{1} > e_{-}\) and \(f(K_{1}) < K_{2}\). Now we define a superhedge of the American put. Let \(\psi ^{*}\) be the function

$$ \psi ^{*}(z) = \left \{ \textstyle\begin{array}{l@{\quad }l} K_{2} - z, & z \leq f(x^{*}),\\ \frac{(g(x^{*}) - z)(K_{2} - f(x^{*}))}{g(x^{*}) - f(x^{*})}, \qquad & f(x^{*}) < z \leq g(x^{*}), \\ 0, & z > g(x^{*}). \end{array}\displaystyle \right . $$
(3.3)

Note that by construction and by (3.2), \(\frac{K _{2}-f(x^{*})}{g(x^{*})-f(x^{*})} = \frac{K_{1}-x^{*}}{g(x^{*})-x^{*}}\). Therefore, we have that \(\psi ^{*}(x^{*}) = K_{1}-x^{*}\). Moreover, \(\psi ^{*}\) is convex and satisfies \(\psi ^{*}(z) \geq (K_{2}-z)^{+}\). Hence by Lemma 2.4, \(\psi ^{*}\) can be used to generate a superhedge \((\psi ^{*}, \phi ^{*},(\theta ^{*}_{i})_{i=1,2})\).

In the following theorem, we assume that the American put is not always strictly in the money at time 1 (or equivalently, \(K_{1} \leq r_{ \mu }\)). Discussion of the case \(K_{1}>r_{\mu }\) is postponed until Sect. 3.3.5 below.

Theorem 6

Suppose Assumption 3.3holds and\(K_{1} \leq r_{\mu }\).

(1) Suppose\(K_{1} \in (e_{-}, r_{\mu }]\)and\(f(K_{1}) < K _{2}\). The model\(\hat{M}_{\pi _{x^{*}}}\)described in the previous paragraphs is a canonical\((\mu ,\nu )\)-consistent model for which the price of the American option is the highest. The stopping time\(\tau ^{*}\)is the optimal exercise time. The function\(\psi ^{*}\)defined in (3.3) defines the cheapest superhedge. Moreover, the highest model-based price is equal to the cost of the cheapest superhedge.

(2) Suppose that we have either Case A: \(K_{1} \leq e_{-}\)or Case B: \(K_{1} \in (e_{-}, r_{\mu }]\), together with\(f(K_{1}) \geq K _{2}\). Then there exists a canonical, consistent model under which

$$ \{Y< K_{2}\} = \{X< K_{2}\} \cup \{X>K_{1}, Y< K_{2}\} \qquad \textit{a.s.} $$

and any model with this property, with the stopping rule\(\tau =1\)if\(X< K_{1}\)and\(\tau =2\)otherwise, attains the highest consistent model price. The cheapest superhedge is generated from\({\psi }(x) = (K_{2}-x)^{+}\), and the highest model-based price is equal to the cost of the cheapest superhedge.

Remark 7

In Part 2 of Theorem 3.6, the left-curtain coupling generates a model for which \(\{Y< K_{2}\} = \{X< K_{2}\} \cup \{X>K_{1}, Y< K_{2}\}\), and hence when associated with the stopping rule of the theorem, this attains the highest consistent model price.

Proof

(1) Suppose \(K_{1} > e_{-}\) and \(f(K_{1}) < K_{2}\). Then by Lemma 3.5, there is a unique \(x^{*} \in (g^{-1}(K_{1}),K _{1})\) such that \(\varLambda (x^{*})=0\). For this \(x^{*}\), we can find \(f^{*}=f(x^{*})\) and \(g^{*}=g(x^{*})\) with \(f^{*}< K_{2}\) and \(K_{1} < g^{*}\) such that \(\frac{K_{2} -f^{*}}{g^{*}-f^{*}}= \frac{K _{1} - x^{*}}{g^{*} - x^{*}}\); see Fig. 3.4. For typographical reasons, we abbreviate this \((x^{*},f^{*},g^{*})\) to \((x,f,g)\)for the rest of this proof.

Fig. 3.4
figure 5

A combination of Figs. 3.3 and 3.2, showing how they jointly define the best model and best hedge. By adjusting \(x\), we can find \(x^{*}\) such that \(\varLambda (x^{*})=0\). Together the quantities \((f(x^{*}),x^{*},g(x^{*}))\) define the optimal model, stopping time and hedge

Since \(\nu \) is continuous, we have that \(f,x,g\) solve (2.4) and (2.5). The elements \(f,x,g\) can be used to define a model using the construction after Lemma 3.5 above. For this model, we can calculate the expected payoff of the American put. At the same time, we can use \((f,x,g)\) to define a superhedge. The remaining task is to show that the cost of the superhedge equals that of the model-based expected payoff. Then by the discussion in Sect. 2.3, we have found an optimal model and a cheapest superhedge.

The model-based expected payoff (\(\mathrm{MBEP}\)) of the American put (for this model and stopping rule) is

$$\begin{aligned} {\mathrm{MBEP}} &= &{ \int _{-\infty }^{x} (K_{1}-w) \mu (dw) + \int _{-\infty } ^{f} (K_{2} - w) (\nu - \mu )(dw) } \\ & = P_{\mu }(x) + (K_{1} - x) P'_{\mu }(x) + D(f) + (K_{2} - f) D'(f). \end{aligned}$$

Now we consider the hedging cost (\(\mathrm{HC}\)). Set \(\varTheta = \frac{K _{2} - f}{g-f} \in (0,1)\). Note that since \(x\) has \(\varLambda (x)=0\), we have \(\varTheta = \frac{K_{1}-x}{g-x}\). Recall the definition of \(\psi ^{*}\) in (3.3). Then

$$ \psi ^{*}(y) = \varTheta (g-y)^{+} + (1- \varTheta )(f-y)^{+} . $$

Following Lemma 2.4, we can use \(\psi ^{*}\) to generate a superhedging strategy. The hedging cost (\(\mathrm{HC}\)) of this strategy is

$$ {\mathrm{HC}}= \varTheta P_{\nu }(g) + (1- \varTheta )P_{\nu }(f) + (1- \varTheta )\big( P_{\mu }(x) - P_{\mu }(f) \big), $$
(3.4)

where the first two terms arise from the purchase of the static time-2 portfolio \(\psi ^{*}\) and the third comes from the purchase of the time-1 portfolio \(((K_{1}-w)^{+} - \psi ^{*}(w))^{+}\). The expression in (3.4) can be rewritten as

$$ P_{\mu }(x) + D(f) + \varTheta \big( P_{\nu }(g) - P_{\nu }(f) - P_{ \mu }(x) + P_{\mu }(f)\big) . $$

Now we consider the difference between the hedging cost and the model-based expected payoff. First recall that \(P_{\chi }(k) = \int _{-\infty }^{k} (k - x) \chi (dx)\), \(\chi \in \{\mu ,\nu \}\), and that \(D(k)=D_{\mu ,\nu }(k) = P_{\nu }(k) - P_{\mu }(k)\). Then (2.4) and (2.5) can be rewritten as

$$\begin{aligned} P'_{\mu }(x) - P'_{\mu }(f) & = P'_{\nu }(g) - P'_{\nu }(f), \end{aligned}$$
(3.5)
$$\begin{aligned} \big(xP'_{\mu }(x) - P_{\mu }(x)\big) - \big(fP'_{\mu }(f) - P_{ \mu }(f)\big) & = \big(gP'_{\nu }(g) - P_{\nu }(g)\big) \\ & \quad{}- \big(fP'_{\nu }(f) - P_{\nu }(f)\big). \end{aligned}$$
(3.6)

We find

$$\begin{aligned} {\mathrm{HC}}-\mathrm{MBEP} & = \varTheta \big( P_{\nu }(g) - P_{\nu }(f) - P_{\mu }(x) + P_{\mu }(f) \big) \\ & \quad{}- (K_{1} - x) P'_{\mu }(x) - (K_{2} - f) D'(f) \\ & = \varTheta \big(gP'_{\nu }(g) - x P'_{\mu }(x) - f D'(f) \big) \\ & \quad{}- (K_{1} - x) P'_{\mu }(x) - (K_{2} - f) D'(f) \\ & = \varTheta \big( (g-x) P'_{\mu }(x) + (g-f) D'(f) \big) \\ & \quad{}- (K_{1} - x) P'_{\mu }(x) - (K_{2} - f) D'(f) \\ & = P'_{\mu }(x) \big( \varTheta (g-x) - (K_{1} - x) \big) + D'(f) \big( \varTheta (g-f)- (K_{2}-f) \big) \\ & = 0, \end{aligned}$$

where we use (3.6), (3.5) and the definition of \(\varTheta \), respectively. Optimality of the model, stopping rule and hedge now follows.

(2) Now suppose \(K_{1} \leq e_{-}\). Consider an exercise rule in which the American put is exercised at time 1 if it is in the money, otherwise it is exercised at time 2, and a model in which mass below \(K_{1}\) at time 1 stays constant between times 1 and 2. (This is possible since \(\mu \leq \nu \) on \((-\infty , e_{-})\) and \(K_{1} \leq e_{-}\).) The expected payoff of the American put is

$$ \int _{-\infty }^{K_{1}} (K_{1} - x) \mu (dx) + \int _{-\infty }^{K_{2}} (K_{2} - y) (\nu - \mu )(dy) = P_{\mu }(K_{1}) + P_{\nu }(K_{2}) - P _{\mu }(K_{2}). $$
(3.7)

Alternatively, suppose \(K_{1} > e_{-}\) and \(f(K_{1}) \geq K_{2}\). Then under the left-curtain martingale coupling, mass below \(K_{2}\) at time 1 stays constant between times 1 and 2 (note that \(K_{2} \leq f(K_{1}) \leq e_{-}\)), and mass between \(K_{2}\) and \(K_{1}\) at time 1 is mapped to \((K_{2},\infty )\). Then, mass which is below \(K_{2}\) at time 2 was either below \(K_{2}\) at time 1, or above \(K_{1}\) at time 1. The expected payoff under this model (using a strategy of exercising at time 1 if the American put is in the money) is again given by (3.7).

Now consider the hedging cost. Let \(\psi (y) = (K_{2} - y)^{+}\). If we define \(\phi \) as in Lemma 2.4, we find \(\phi (x) = (K _{1}-x)^{+} - (K_{2}-x)^{+} = (K_{1} -(x \vee K_{2}))^{+}\), see Fig. 3.5, and the superhedging cost is

$$ {\mathrm{HC}}= P_{\nu }(K_{2}) + P_{\mu }(K_{1}) - P_{\mu }(K_{2}). $$

Hence the model-based expected payoff equals the hedging cost.

Fig. 3.5
figure 6

Sketch of put payoffs with \(\psi (y)=(K_{2}-y)^{+}\) and \(\phi (x)=(K_{1}-x)^{+} - (K_{2}-x)^{+}\)

 □

3.3 Two intervals of \(g>x\) and one downward jump in \(f\)

We now relax the dispersion assumption to the case where \(f\) is not monotone. The simplest situation when this may arise is when there are two intervals on which \(g(x)>x\). We do not contend that there are many natural examples which fall into this situation, but rather that this intermediate case illustrates phenomena which are to be found in the general case, but which were not to be found under the dispersion assumption.

Assumption 8

(Single-jump assumption)

Laws \(\mu \) and \(\nu \) are absolutely continuous with continuous densities \(\rho \) and \(\eta \), respectively; \(\nu \) has support on \((\ell _{\nu },r_{\nu }) \subseteq (-\infty ,\infty )\) and \(\eta >0\) on \((\ell _{\nu },r_{\nu })\); \(\mu \) has support on \((\ell _{\mu },r_{ \mu }) \subseteq (\ell _{\nu },r_{\nu })\) and \(\rho >0\) on \((\ell _{ \mu },r_{\mu })\). In addition:

  • \((\mu -\nu )^{+}\) is concentrated on \(E = (e^{1}_{-},e ^{1}_{+})\cup (e^{2}_{-},e^{2}_{+})\) with \(e^{1}_{+}< e^{2}_{-}\), and \(\rho >\eta \) on \(E\);

  • \((\nu - \mu )^{+}\) is concentrated on \((\ell _{\nu }, r_{\nu }) \setminus E\) and \(\eta >\rho \) on \((\ell _{\nu },e^{1}_{-}) \cup (e^{1}_{+},e^{2}_{-})\cup (e^{2}_{+},r_{\nu })\);

  • there exist \(f^{\prime }< e^{1}_{-}\) and \(x^{\prime }\in (e^{1}_{+},e^{2}_{-})\) such that

    $$ \int ^{x^{\prime }}_{f^{\prime }}\mu (dz) = \int ^{x^{\prime }}_{f^{ \prime }}\nu (dz) \quad \mbox{and} \quad \int ^{x^{\prime }}_{f^{ \prime }}z\mu (dz) = \int ^{x^{\prime }}_{f^{\prime }}z\nu (dz). $$
    (3.8)

Under Assumption 3.8, it is possible to find functions \(g:(\ell _{\mu },r_{\mu }) \rightarrow (\ell _{\nu },r_{\nu })\) and \(f:(\ell _{\mu },r_{\mu }) \rightarrow (\ell _{\nu },r_{\nu })\) with the properties (see the lower part of Fig. 3.6)

Fig. 3.6
figure 7

Picture of \(f\) and \(g\) under Assumption 3.8

(1) \(g(x) = x\) on \((\ell _{\mu }, e^{1}_{-}] \cup [x', e^{2}_{-}]\);

(2) \(g(x) > x\) on \((e^{1}_{-},x') \cup (e^{2}_{-},r_{\mu })\);

(3) \(g\) is continuous and strictly increasing;

(4) \(f(x) = x\) on \((\ell _{\mu }, e^{1}_{-}] \cup [x', e^{2}_{-}]\);

(5) \(f :(e^{1}_{-},x') \to (f',x')\) is continuous and strictly decreasing;

(6) \(f :(e^{2}_{-},r_{\mu }) \to (\ell _{\nu },e^{2}_{-}) \setminus (f',x')\) is strictly decreasing;

(7) there exists \(x'' \in (e^{2}_{-},r_{\mu })\) with the property that \(f\) jumps at \(x''\) and satisfies \(f(x''-)=x' > f' = f(x''+)\). Away from \(x''\), \(f\) is continuous on \((e_{-}^{2},r_{\mu })\).

By construction, we have that

$$ \int ^{x^{\prime \prime }}_{x^{\prime }}\mu (dz)= \int ^{g(x^{\prime \prime })}_{x^{\prime }}\nu (dz)\quad \text{and} \quad \int ^{x^{\prime \prime }}_{x^{\prime }}z\mu (dz)= \int ^{g(x^{\prime \prime })}_{x^{\prime }}z\nu (dz), $$
(3.9)

so that if mass in \((x',x'')\) at time 1 is mapped to \((x',g(x''))\) at time 2, then the total mass and mean are preserved. Further, given that \((f',x')\) satisfy (3.8), we also have that \(\int ^{x^{\prime \prime }}_{f^{\prime }}\mu (dz)= \int ^{g(x^{\prime \prime })}_{f^{\prime }}\nu (dz)\) and \(\int ^{x^{\prime \prime }}_{f^{\prime }}z\mu (dz)= \int ^{g(x^{\prime \prime })}_{f^{\prime }}z\nu (dz)\). In particular, given (3.8) and (3.9), the pair of equations

$$ \int ^{x^{\prime \prime }}_{f}\mu (dz)=\int ^{g(x^{\prime \prime })} _{f}\nu (dz) \quad \text{and} \quad \int ^{x^{\prime \prime }}_{f}z \mu (dz)=\int ^{g(x^{\prime \prime })}_{f}z\nu (dz) $$

has two solutions for \(f\), namely \(f{=}x'\) and \(f{=}f'\). Hence, in defining the left-curtain martingale coupling, there are two choices for \(f\) at \(x''\): we may take \(f(x'')= x'\) or \(f(x'')=f'\). Rather than assuming one of these choices (for example by requiring left-continuity of \(f\)), it is convenient to allow \(f\) to be multi-valued. Then, for each \(x\) such that \(g(x)>x\), let \(\aleph (x) = \{ f : (f,x,g(x))\mbox{ solves (2.4) and (2.5)} \}\). Then we have that in the setting of Assumption 3.8, for \(x > e_{-}\), \(|\aleph (x)| = 1\) except at \(x''\) and there \(\aleph (x'') = \{ f(x''+),f(x''-) \} = \{ f',x' \}\).

Recall the definition \(\varLambda (x) = \frac{g(x)-K_{1}}{g(x)-x} - \frac{K _{1}-K_{2}}{x-f(x)}\). If \(f\) is multi-valued, then \(\varLambda \) is also multi-valued. In Sect. 3.2, one of our main steps was to find \(x\) such that \(\varLambda (x)=0\), and our aim is similar here.

Introduce \(\varUpsilon =\varUpsilon _{K_{1},K_{2}}(f,x,g)\) which is defined for \(f\leq K_{2},x \leq K_{1} \leq g\) by

$$ \varUpsilon (f,x,g) = \frac{(K_{2}-f)-(K_{1}-x)}{x-f} - \frac{K_{1}-x}{g-x} = \frac{g-K_{1}}{g-x} - \frac{K_{1}-K_{2}}{x-f} . $$

Instead of seeking \(x\) which is a root of \(\varLambda (x)=0\), our goal is to find \((f,x,g)\) with \(g=g(x)\) and \(f \in \aleph (x)\) such that \(\varUpsilon (f,x,g)=0\).

For a fixed \(K_{1}\), the value of \(K_{2}\) such that \(\varUpsilon (f'=f(x''+),x'',g(x''))=0\) is given by \(K_{2} =f'+(K_{1}-x ^{\prime \prime })\frac{g(x^{\prime \prime })-f'}{g(x^{\prime \prime })-x^{\prime \prime }}\). On the other hand, if we define \(K_{2}\) by \(K _{2} =x'+(K_{1}-x^{\prime \prime })\frac{g(x^{\prime \prime })-x'}{g(x ^{\prime \prime })-x^{\prime \prime }}\), we get \(\varUpsilon (x'=f(x''-),x'',g(x''))=0\). This motivates the introduction of the linear increasing functions \(L_{u}\), \(L_{d}:[x^{\prime \prime },g(x^{\prime \prime })]\to \mathbb{R}\) defined by

$$\begin{aligned} L_{u}(x) &=x^{\prime }+(x-x^{\prime \prime })\frac{g(x^{\prime \prime })-x^{\prime }}{g(x^{\prime \prime })-x^{\prime \prime }}, \end{aligned}$$
(3.10)
$$\begin{aligned} L_{d}(x) &=f'+(x-x^{\prime \prime })\frac{g(x^{\prime \prime })-f'}{g(x ^{\prime \prime })-x^{\prime \prime }}. \end{aligned}$$
(3.11)

Pictorially, \(L_{d}\) and \(L_{u}\) are the lower and upper boundaries, respectively, of the dotted triangular area \(\mathcal{G}\) in Fig. 3.7.

Fig. 3.7
figure 8

Picture of \(f\) and \(g\) in the single-jump case, now with 4 regions shaded (cross-hatched, diagonally, dotted and blank)

From Fig. 3.7, we identify four regions (and various subregions) on which four different hedging strategies will be needed in order to find the cheapest superhedge for the American put. (Compare this with two regimes under the dispersion assumption in Fig. 3.3.)

Define

$$ \mathcal{R}_{1} = \{ (k_{1},k_{2}): e^{1}_{-} < k_{1} < x', f(k_{1}) < k_{2} < k_{1} \}, $$

which we write more compactly as \(\mathcal{R}_{1} = \{ e^{1}_{-} < k _{1} < x', f(k_{1}) < k_{2} < k_{1} \}\). Using the same compact notation, define

$$\begin{aligned} \mathcal{R}_{2} & = \{ e^{2}_{-} < k_{1} < x'', f(k_{1}) < k_{2} < k _{1} \} \cup \{ k_{1} = x'' , x' < k_{2} < k_{1} \}, \\ \mathcal{R}_{3} & = \{ x'' < k_{1} < g(x''), L_{u}(k_{1}) \leq k_{2} < k_{1} \}, \\ \mathcal{R}_{4} & = \{ x'' < k_{1} < g(x''), f(k_{1}) < k_{2} \leq L _{d}(k_{1}) \}, \\ \mathcal{R}_{5} & = \{ g(x'') \leq k_{1} \leq r_{\mu }, f(k_{1}) < k _{2} < k_{1} \}, \\ \mathcal{B}_{1} & = \{ \ell _{\mu }\leq k_{1} \leq e^{1}_{-}, k_{2} < k_{1} \} \cup \{ e^{1}_{-} < k_{1} < x', k_{2} \leq f(k_{1}) \} \\ & \phantom{ = \{ \ell _{\mu }\leq k_{1} \leq e^{1}_{-}, k_{2} < k_{1} \}\,} \cup \{ x'' < k_{1} \leq r_{\mu }, k_{2} \leq f(k_{1}) \}, \\ \mathcal{B}_{2} & = \{ x' \leq k_{1} \leq x'', k_{2} \leq f' \}, \\ \mathcal{B}_{3} & = \{ x' \leq k_{1} \leq e^{2}_{-}, x' \leq k_{2} < k_{1} \} \cup \{ e^{2}_{-} < k_{1} \leq x'', x' \leq k_{2} \leq f(k _{1}) \}, \\ \mathcal{G} & = \{ x'' < k_{1} < g(x''), L_{d}(k_{1}) < k_{2} < L_{u}(k _{1}) \}, \\ \mathcal{W} & = \{ x' \leq k_{1} \leq x'', f' < k_{2} < x' \}, \end{aligned}$$

and set \(\mathcal{R}= \bigcup _{i=1}^{5} \mathcal{R}_{i}\) and \(\mathcal{B}= \bigcup _{i=1}^{3} \mathcal{B}_{i}\). In general, on the boundaries between the regions, the boundaries could be allocated to either region. However, we allocate points on the boundary to the region where the hedge is simplest.

Note that \(\mathcal{R}\cup \mathcal{B}\cup \mathcal{G}\cup \mathcal{W}= \{ (k_{1},k_{2}) : \ell _{\mu }\leq k_{1} \leq r_{\mu }, k _{2} < k_{1} \}\).

3.3.1 Case \((K_{1},K_{2}) \in \mathcal{R}\)

Lemma 9

Suppose that we have\((K_{1},K_{2}) \in \mathcal{R}\). Then there exist a unique\(x^{*}=x^{*}(\mu ,\nu ;K_{1},K_{2}) \in (g^{-1}(K _{1}),K_{1})\)and\(f^{*} \in \aleph (x^{*})\)such that\(\varUpsilon (f ^{*},x^{*},g^{*}= g(x^{*}))=0\).

Proof

Suppose that \((K_{1},K_{2}) \in \mathcal{R}_{1} \cup \mathcal{R}_{2} \cup \mathcal{R}_{5}\). Consider \(\varLambda :[g^{-1}(K_{1}),K_{1}] \to \mathbb{R}\) defined by (3.1). Note that for this choice of \((K_{1},K_{2})\), \(f\) and \(g\) are both continuous on \([g^{-1}(K_{1}),K_{1}]\); see Fig. 3.6. Hence \(\varLambda (x)=\varUpsilon (f(x),x,g(x))\) is also continuous. Then the same argument as in the proof of Lemma 3.5 shows that there exists a unique \(x^{*}=x^{*}(\mu ,\nu ;K_{1},K_{2}) \in (g^{-1}(K_{1}),K _{1})\) such that \(\varLambda (x^{*})=0\).

Now suppose \((K_{1},K_{2}) \in \mathcal{R}_{3} \cup \mathcal{R}_{4}\) and consider \(\varLambda \) as before. Recall that \(\varLambda \) is increasing, \(\varLambda (g^{-1}(K_{1}))<0\) and \(\varLambda (K_{1})>0\). On the other hand, \(g^{-1}(K_{1})< x^{\prime \prime }\) and hence \(\varLambda \) has an upward jump at \(x^{\prime \prime }\) (since \(f\) has a downward jump at \(x^{\prime \prime }\)). There are two cases depending on whether \((K_{1},K_{2}) \in \mathcal{R}_{3}\) or \(\mathcal{R}_{4}\).

(1) Suppose that \(K_{2}>L_{u}(K_{1})\). Then we have \(\varLambda (x^{ \prime \prime }-)>0\). Moreover, since \(\varLambda (g^{-1}(K_{1}))<0\), the continuity of \(\varLambda \) on \((g^{-1}(K_{1}),x^{\prime \prime })\) guarantees that there exists a unique scalar \(x^{*}=x^{*}(\mu ,\nu ;K_{1},K_{2}) \in (g^{-1}(K _{1}),x^{\prime \prime })\) such that \(\varLambda (x^{*})=0\). If \(K_{2} = L_{u}(K_{1})\), then \(\varUpsilon (x',x''g(x''))=0\) and we take \(x^{*} = x''\), \(g^{*} = g(x'')\) and \(f^{*} = f(x''-) = x'\).

(2) Suppose that \(K_{2}< L_{d}(K_{1})\). Then \(\varLambda (x^{\prime \prime }+)<0\). Further, since \(\varLambda (K_{1})>0\), there exists a unique \(x^{*}=x^{*}(\mu ,\nu ;K_{1},K_{2}) \in (x^{\prime \prime },K_{1})\) such that \(\varLambda (x^{*})=0\). If \(K_{2} = L_{d}(K_{1})\), then \(\varUpsilon (f',x'',g(x''))=0\) and we take \(x^{*} = x''\), \(g^{*} = g(x'')\) and \(f^{*} = f(x''+) = f'\). □

By Lemma 3.9, for \((K_{1},K_{2}) \in \mathcal{R}\), there exists \(( f^{*} \in \aleph (x^{*}),x^{*}, g^{*}=g(x^{*}) )\) such that \(\varUpsilon (f^{*},x^{*},g^{*})=0\). Suppose \((K_{1},K_{2}) \in \mathcal{R}_{1} \cup \mathcal{R}_{4} \cup \mathcal{R}_{5}\). In this case, we let \(\hat{M}_{\pi _{x^{*}}}\) be a canonical \((\mu ,\nu )\)-consistent model (recall that \(\hat{M}_{\pi }\) is the abbreviated notation for the canonical model \((\hat{\mathcal{S}}_{ \pi }, \hat{M})\) for which \(\hat{\mathbb{P}}[X \in dx, Y \in dy] = \pi (dx,dy)\)). Here \({\pi }_{x^{*}} \in {\varPi }(\mu ,\nu )\) is a martingale coupling that is constant on \((-\infty , f^{*})\), maps \((f^{*},x^{*})\) onto \((f^{*},g^{*})\) and \((g^{*},\infty )\) to \((-\infty ,f^{*}) \cup (g^{*},\infty )\).

Recall the proof of Theorem 3.6. There, to show that \(\mathrm{MBEP}=\mathrm{HC}\), we used the fact that for the canonical model \(\hat{M}_{\pi _{x^{*}}}\), \(\pi _{x^{*}}\) is constant on \((-\infty , f^{*})\) and maps \((f^{*},x^{*})\) onto \((f^{*},g^{*})\). In fact, the equality \(\mathrm{MBEP}=\mathrm{HC}\) holds for any canonical model for which the associated martingale coupling has the same property. Then the mass that is ‘unexercised’ at time 1 and is in the money at time 2 has time-2 law given by \((\nu -\mu ) \lvert _{(-\infty ,f^{*})}\), where \(f^{*}< e_{-}\). When \(f(x')< e_{-} ^{1}\) (as is the case when \((K_{1},K_{2}) \in \mathcal{R}_{1} \cup \mathcal{R}_{4} \cup \mathcal{R}_{5}\)), the same proof applies, so that \(\mathrm{MBEP}=\mathrm{HC}\) and we have optimality. On the other hand, if \((K_{1},K_{2}) \in \mathcal{R}_{2} \cup \mathcal{R}_{3}\), then it is not the case that \(f^{*}< e^{1}_{-}\) and thus, in order to specify the optimal model, we need to impose additional structure on the coupling \(\tilde{\mu }_{f^{*},x^{*}} \mapsto \tilde{\nu }_{f^{*},g^{*}}\).

Suppose that \((K_{1},K_{2}) \in \mathcal{R}_{2} \cup \mathcal{R}_{3}\). Then \(x^{\prime }< f^{*}\) so that \((f^{\prime },x^{\prime }) \cap (f^{*},g^{*})=\emptyset \). From the defining properties of \(f^{\prime }\) and \(x^{\prime }\), we see that there exists a martingale coupling, which we term \({\pi }_{x^{\prime },x^{*}} \in {\varPi }(\mu , \nu )\), which is constant on \((-\infty ,f')\) and \((x',f^{*})\), maps \((f',x')\) onto itself and \((f^{*},x^{*})\) onto \((f^{*},g^{*})\), and maps \((x^{*},\infty )\) to \((-\infty ,f') \cup (x',f^{*}) \cup (g ^{*},\infty )\).

If \((K_{1},K_{2}) \in \mathcal{R}_{1} \cup \mathcal{R}_{4} \cup \mathcal{R}_{5}\), we have the canonical model \(\hat{M}_{\pi _{x^{*}}}\), and in the case when \((K_{1},K_{2}) \in \mathcal{R}_{2} \cap \mathcal{R}_{3}\), we have \(\hat{M}_{{\pi }_{x^{\prime },x^{*}}}\). For both models, we consider a candidate stopping time \(\tau ^{*}=1\) if \(X< x^{*}\) and \(\tau ^{*}=2\) otherwise, and a candidate superhedge \((\psi ^{*}, \phi ^{*},(\theta ^{*}_{i})_{i=1,2})\) generated by the function \(\psi ^{*}\) defined in (3.3).

Theorem 10

Suppose Assumption 3.8holds and\((K_{1},K_{2}) \in \mathcal{R}\). Then, depending on whether\((K_{1},K_{2})\)is in\(\mathcal{R}_{1} \cup \mathcal{R}_{4} \cup \mathcal{R}_{5}\)or\(\mathcal{R}_{2} \cup \mathcal{R}_{3}\), the models\(\hat{M}_{\pi _{x ^{*}}}\)and\(\hat{M}_{{\pi }_{x^{\prime },x^{*}}}\)and the stopping time\(\tau ^{*}\)are the consistent models for which the price of the American option is the highest. The function\(\psi ^{*}\)defined in (3.3) defines the cheapest superhedge. Moreover, the highest model-based price is equal to the cost of the cheapest superhedge.

Proof

If \((K_{1},K_{2}) \in \mathcal{R}_{1} \cup \mathcal{R}_{4} \cup \mathcal{R}_{5}\), then the proof is essentially the same as the proof of the first case in Theorem 3.6. We repeat the main steps for convenience. First find \(x^{*}\in (g^{-1}(K_{1}),K_{1})\) and \(f^{*} \in \aleph (x^{*})\) such that \(\varUpsilon (f^{*},x^{*}g^{*}=g(x ^{*}))=0\). If \(x^{*}=x''\), we find \(f^{*} = f(x''+)=f'\). Under the candidate model \(\hat{M}_{\pi _{x^{*}}}\), mass below \(f^{*}\) at time 1 is mapped to the same point at time 2 (which is possible since \(f^{*}< e^{1}_{-}\)), and mass in \((f^{*},x^{*})\) is mapped onto \((f^{*},g^{*})\), while mass above \(x^{*}\) is either mapped to below \(f^{*}\) or to above \(g^{*}\). Then under the candidate stopping rule \(\tau ^{*}\), the model-based expected payoff is equal to the cost of the hedging strategy generated by \(\psi ^{*}\), i.e.,

$$\begin{aligned} {\mathrm{MBEP}} & = \int _{-\infty }^{x^{*}} (K_{1}-w)^{+} \mu (dw) + \int _{-\infty }^{f^{*}} (K_{2} - w)^{+} (\nu - \mu )(dw) \\ & = P_{\mu }(x^{*}) + (K_{1} - x^{*}) P'_{\mu }(x^{*}) + D(f^{*}) + (K _{2} - f^{*}) D'(f^{*}) \\ & = \mathrm{HC}. \end{aligned}$$

Now suppose that \((K_{1},K_{2}) \in \mathcal{R}_{2} \cup \mathcal{R} _{3}\). Then by Lemma 3.9, there exist a unique \(x^{*}\in (g^{-1}(K_{1}),x^{\prime \prime }]\) and \(f^{*} \in \aleph (x ^{*})\) such that \(\varUpsilon (f^{*},x^{*},g^{*}=g(x^{*}))=0\). If \(x^{*}=x''\), then we have \(f^{*}=f(x''-) = x'\). Then, since \(\nu \) is continuous, we have that \(f^{*},x^{*},g^{*}\) solve (2.4) and (2.5). Note, however, that \(x^{\prime } \leq f^{*}< e^{2}_{-}\).

Under the candidate model \(\hat{M}_{{\pi }_{x^{\prime },x^{*}}}\), mass in \((f^{\prime },x^{\prime })\) at time 1 is mapped onto the same interval at time 2. Also, mass below \(f^{\prime }\) and mass in \((x^{\prime },f^{*})\) at time 1 is mapped to the same point at time 2, and mass in \((f^{*},x^{*})\) is mapped onto \((f^{*},g^{*})\). Mass above \(x^{*}\) is either mapped to below \(f^{\prime }\), to \((x^{\prime },f ^{*})\), or to above \(g^{*}\). In particular, \((\nu - \mu ) \lvert _{(-\infty ,f^{\prime })\cup (x^{\prime },f^{*})}\) is the law of the mass that was not ‘exercised’ at time 1 and is in the money at time 2. From (3.8), we have \(\int _{x'}^{f'} (K_{2}-w) (\nu - \mu )(dw)=0\). Then

$$\begin{aligned} {\mathrm{MBEP}} & = \int _{-\infty }^{x^{*}} (K_{1}-w) \mu (dw) + \int _{-\infty }^{f^{\prime }} (K_{2} - w) (\nu - \mu )(dw) \\ & \quad{} +\int _{x^{\prime }}^{f^{*}} (K_{2} - w) (\nu - \mu )(dw) \\ & = \int _{-\infty }^{x^{*}} (K_{1}-w) \mu (dw) + \int _{-\infty }^{f ^{*}} (K_{2} - w) (\nu - \mu )(dw) \\ & \quad{} - \int _{f^{\prime }}^{x^{\prime }} (K_{2} - w) (\nu - \mu )(dw) \\ & = \int _{-\infty }^{x^{*}} (K_{1}-w) \mu (dw) + \int _{-\infty }^{f ^{*}} (K_{2} - w) (\nu - \mu )(dw) \\ & = P_{\mu }(x^{*}) + (K_{1} - x^{*}) P'_{\mu }(x^{*}) + D(f^{*}) + (K _{2} - f^{*}) D'(f^{*}) \\ & = \mathrm{HC}. \end{aligned}$$

 □

3.3.2 \((K_{1},K_{2}) \in \mathcal{B}= \mathcal{B}_{1} \cup \mathcal{B}_{2} \cup \mathcal{B}_{3}\)

Theorem 11

Suppose that Assumption 3.8holds and\((K_{1},K_{2}) \in \mathcal{B}\). Then there is a consistent model for which\(\{Y< K_{2}\} = \{X< K_{2}\}\cup \{X>K_{1}, Y< K_{2}\}\)a.s. and, if\(x^{\prime }< K_{2}\), \(\{f^{\prime }< X< x^{\prime }\}=\{f^{\prime }< Y< x ^{\prime }\}\). Any model with these properties and with the stopping rule\(\tau =1\)if\(X< K_{1}\)and\(\tau =2\)otherwise attains the highest consistent model price. The cheapest superhedge is generated from\({\psi }(x) = (K_{2}-x)^{+}\), and the highest model-based price is equal to the cost of the cheapest hedge.

Proof

Let \(\psi (y) = (K_{2} - y)^{+}\). As in Lemma 2.4, define a corresponding \(\phi \). We find that \(\phi (x) = (K_{1}-x)^{+} - (K_{2}-x)^{+}\), and the superhedging cost (which is the same for all the cases) is

$$ {\mathrm{HC}}= P_{\nu }(K_{2}) + P_{\mu }(K_{1}) - P_{\mu }(K_{2}). $$

Suppose \((K_{1},K_{2}) \in \mathcal{B}_{1}\). Then using the properties of \(f\) and \(g\) and the left-curtain coupling, we see that the proof that the model-based expected payoff is equal to the hedging cost is the same as in the second case of Theorem 3.6. In particular,

$$\begin{aligned} {\mathrm{MBEP}} &=\int _{-\infty }^{K_{1}} (K_{1} - x) \mu (dx) + \int _{-\infty }^{K_{2}} (K_{2} - y) (\nu - \mu )(dy) \\ &= P_{\mu }(K_{1}) + P_{\nu }(K_{2}) - P_{\mu }(K_{2}). \end{aligned}$$

Now suppose \((K_{1},K_{2}) \in \mathcal{B}_{2}\). Then under the left-curtain coupling, mass from \((f^{\prime },x^{\prime })\) at time 1 is mapped onto the same interval at time 2. Therefore mass which is below \(K_{2}\) at time 2 was either below \(K_{2}\) at time 1, or above \(x^{\prime }\) at time 1. Therefore, we again have

$$ {\mathrm{MBEP}}=\int _{-\infty }^{K_{1}} (K_{1} - x) \mu (dx) + \int _{-\infty }^{K_{2}} (K_{2} - y) (\nu - \mu )(dy). $$

Finally, suppose \((K_{1},K_{2}) \in \mathcal{B}_{3}\). We again utilise the fact that under the left-curtain coupling, mass from \((f^{\prime },x^{\prime })\) at time 1 is mapped onto the same interval at time 2. In both cases, the mass which is below \(K_{2}\) at time 2 was either below \(K_{2}\) at time 1, or above \(K_{1}\) at time 1. In particular, the law of the mass that can be ‘exercised’ at time 2 is given by \((\nu -\mu )\lvert _{(-\infty ,f^{\prime })\cup (x^{\prime },K_{2})}\). Then using \(\int ^{x^{\prime }}_{f^{\prime }}(K_{2}-z)(\nu -\mu )(dz)=0\), we again have

$$\begin{aligned} {\mathrm{MBEP}} &=\int _{-\infty }^{K_{1}} (K_{1} - x) \mu (dx) + \int _{-\infty }^{f^{\prime }} (K_{2} - y) (\nu - \mu )(dy) \\ & \quad{}+ \int _{x^{\prime }}^{K_{2}} (K_{2} - y) (\nu - \mu )(dy) \\ &=\int _{-\infty }^{K_{1}} (K_{1} - x) \mu (dx) + \int _{-\infty }^{K _{2}} (K_{2} - y) (\nu - \mu )(dy), \end{aligned}$$

which ends the proof. □

3.3.3 \((K_{1},K_{2}) \in \mathcal{W}\)

Suppose \((K_{1},K_{2}) \in \mathcal{W}\). For this case, we associate the following superhedge: let \(\psi ^{x^{\prime }}\) be given by

$$ \psi ^{x^{\prime }}(z) = \left \{ \textstyle\begin{array}{l@{\quad }l} K_{2} - z, \qquad & z \leq f^{\prime }, \\ (K_{2}-f^{\prime })-(z-f^{\prime })\frac{K_{2}-f^{\prime }}{x^{\prime }-f^{\prime }}, \qquad & f^{\prime }< z \leq x^{\prime }, \\ 0, \qquad & z > x^{\prime }; \end{array}\displaystyle \right . $$
(3.12)

see Fig. 3.8. Since \(\psi ^{x^{\prime }}\) is convex and \(\psi ^{x^{\prime }}(z)\geq (K_{2}-z)^{+}\), we can use Lemma 2.4 to generate a corresponding superhedging strategy \((\psi ^{x^{\prime }}, \phi ^{x^{\prime }}, (\theta ^{x^{\prime }}_{i})_{i=1,2})\).

Fig. 3.8
figure 9

Picture of \(f\) and \(g\) along with superhedge for \((K_{1},K_{2}) \in \mathcal{W}\)

Theorem 12

Suppose Assumption 3.8holds and\((K_{1},K_{2})\in \mathcal{W}\). Then there is a consistent model for which\(\{f' < X< x' \} = \{f'< Y< x'\}\), and any model with this property and with the stopping rule\(\tau =1\)if\(X< K_{1}\)and\(\tau =2\)otherwise attains the highest consistent model price. The cheapest superhedge is generated from\({\psi }^{x^{\prime }}\)defined in (3.12), and the highest model-based price is equal to the cost of the cheapest hedge.

Proof

First note that

$$ \psi ^{x^{\prime }}(z)=\varTheta (x^{\prime }-z)^{+} +(1-\varTheta )(f^{ \prime }-z)^{+}, $$

where \(\varTheta = \frac{K_{2} - f'}{x'-f'}\). Since \(x^{\prime }< K_{1}\), we have

$$\begin{aligned} \phi ^{x^{\prime }}(w)+\psi ^{x^{\prime }}(z) & = (K_{1}-w)^{+}- \psi ^{x^{\prime }}(w)+\psi ^{x^{\prime }}(z) \\ & = (K_{1}-w)^{+} + \varTheta \big((x^{\prime }-z)^{+} - (x^{\prime }-w)^{+} \big) \\ & \quad{}+(1-\varTheta )\big((f^{\prime }-z)^{+}-(f^{\prime }-w)^{+} \big). \end{aligned}$$

It follows that \(\mathrm{HC}=P_{\mu }(K_{1}) +\varTheta D(x^{\prime })+(1- \varTheta )D(f^{\prime })\) is the cost of this strategy (under any consistent model).

Now consider the model-based expected payoff. From (3.8), it follows that \(\mu _{{f^{\prime },x^{\prime }}}\) and \(\nu _{{f^{\prime },x^{\prime }}}\) have the same mean and mass and are in convex order. Moreover, the same holds for \(\tilde{\mu }_{{f^{\prime },x^{\prime }}}\) and \(\tilde{\nu }_{{f^{\prime },x^{\prime }}}\). Therefore, there exists a martingale coupling, which we term \({\pi }_{x^{\prime }}\in {\varPi }( \mu ,\nu )\), which is constant on \((-\infty ,f^{\prime })\) and maps \((f^{\prime },x^{\prime })\) onto itself. It follows that under this model, the law of the mass that can be ‘exercised’ at time 2 is given by \((\nu -\mu )\lvert _{(-\infty ,f^{\prime })}\).

Note that since \(f^{\prime }\) and \(x^{\prime }\) satisfy (3.8), and hence \(\int _{f'}^{x'} (x'-w) (\nu - \mu )(dw)=0\),

$$\begin{aligned} D(x^{\prime })-D(f^{\prime }) & = \int _{-\infty }^{x^{\prime }} (x ^{\prime }-w)^{+} (\nu -\mu )(dw) - \int _{-\infty }^{f^{\prime }} (f ^{\prime }- w)^{+} (\nu - \mu )(dw) \\ & = \int _{-\infty }^{f^{\prime }} (x^{\prime }-f^{\prime }) (\nu - \mu )(dw) + \int _{f^{\prime }}^{x^{\prime }} (x^{\prime }- w) (\nu - \mu )(dw) \\ & = (x^{\prime }-f^{\prime }) \int _{-\infty }^{f^{\prime }} (\nu - \mu )(dw). \end{aligned}$$

Then given that we stop at time 1 if \(X< K_{1}\) and at time 2 otherwise, we have

$$\begin{aligned} {\mathrm{MBEP}} &= \int _{-\infty }^{K_{1}} (K_{1}-w)^{+} \mu (dw) + \int _{-\infty }^{f^{\prime }} (K_{2} - w)^{+} (\nu - \mu )(dw) \\ & = \int _{-\infty }^{K_{1}} (K_{1}-w) \mu (dw) + \int _{-\infty }^{f ^{\prime }} (f^{\prime }- w) (\nu - \mu )(dw) \\ & \quad{}+ (K_{2}- f^{\prime }) \int _{-\infty }^{f^{\prime }} ( \nu -\mu )(dw) \\ & = P_{\mu }(K_{1}) +D(f^{\prime })+ \varTheta \big(D(x^{\prime })-D(f ^{\prime })\big) \\ & = P_{\mu }(K_{1}) + \varTheta D(x^{\prime })+(1-\varTheta )D(f^{\prime }) = \mathrm{HC}, \end{aligned}$$

as required. □

3.3.4 \((K_{1},K_{2}) \in \mathcal{G}\)

Recall from (3.10), (3.11) the construction of \(L_{u}\) and \(L_{d}\). For \(K_{1}\in (x^{\prime \prime },g(x^{\prime \prime }))\) and \(K_{2}\in (L_{d}(K_{1}),L_{u}(K_{1}))\), there does not exist any \(x^{*}\in (g^{-1}(K_{1}),K_{1})\) such that \(\varLambda (x^{*})=0\); instead we have that \(\varLambda (x''-) < 0 < \varLambda (x''+)\). On the other hand, from (3.9) we have that there exists a martingale coupling of \(\mu _{{x^{\prime },x^{\prime \prime }}}\) and \(\nu _{{x^{\prime },g(x^{\prime \prime })}}\). Moreover, note that the restrictions of \(\tilde{\mu }_{{f^{\prime },x^{\prime }}}\) to \((x^{\prime },x^{\prime \prime })\) and \(\tilde{\nu }_{{f^{\prime },x ^{\prime }}}\) to \((x^{\prime },g(x^{\prime \prime }))\) are equal to \(\mu _{{x^{\prime },x^{\prime \prime }}}\) and \(\nu _{{x^{\prime },g(x^{\prime \prime })}}\), respectively. Then we define a martingale coupling \({\pi }_{x^{\prime },x^{\prime \prime }} \in {\varPi }(\mu ,\nu )\) which is constant on \((-\infty ,f')\), maps \((f',x')\) onto itself, \((x', x'')\) onto \((x', g(x''))\), and \((x'',\infty )\) to \((-\infty ,f') \cup (g(x''),\infty )\). Let \(\hat{M}_{{\pi }_{x^{\prime },x^{\prime \prime }}}\) be the canonical model under which \(\hat{\mathbb{P}}[X\in dx, Y\in dy]= \pi _{x^{\prime },x^{\prime \prime }}(dx,dy)\). Note that the model \(\hat{M}_{{\pi }_{x^{\prime },x^{\prime \prime }}}\) is a refinement of \(\hat{M}_{{\pi }_{x^{\prime }}}\) used in the proof of Theorem 3.12.

Given \(x^{\prime }\) and thus also \(x^{\prime \prime }\), we define the superhedge as follows. First define linear functions \(\Delta _{1}:[f ^{\prime },x^{\prime }]\to \mathbb{R}\) and \(\Delta _{2}:[x^{\prime },g(x ^{\prime \prime })]\to \mathbb{R}\) by

$$\begin{aligned} \Delta _{1}(x) &=(K_{2}-f^{\prime })-(x-f^{\prime })\frac{(K_{2}-f^{ \prime })-\Delta _{2}(x^{\prime })}{x^{\prime }-f^{\prime }}, \\ \Delta _{2}(x) &=\big(g(x^{\prime \prime })-x\big)\frac{K_{1}-x^{ \prime \prime }}{g(x^{\prime \prime })-x^{\prime \prime }}. \end{aligned}$$

Then \(\Delta _{1}(f^{\prime })=(K_{2}-f^{\prime })\), \(\Delta _{1}(x^{ \prime })=\Delta _{2}(x^{\prime })\), \(\Delta _{2}(x'')=K_{1}-x''\) and \(\Delta _{2}(g(x^{\prime \prime }))=0\). Moreover, direct calculation shows that \(-1<\Delta ^{\prime }_{1}(x)<\Delta ^{\prime }_{2}(x)<0\). Now define a function \(\psi ^{x^{\prime },x^{\prime \prime }}\) by

$$ \psi ^{x^{\prime },x^{\prime \prime }}(z) = \left \{ \textstyle\begin{array}{l@{\quad }l} K_{2} - z, \qquad & z \leq f^{\prime }, \\ \Delta _{1}(z), \qquad & f^{\prime }< z \leq x^{\prime }, \\ \Delta _{2}(z), \qquad & x^{\prime }< z \leq g(x^{\prime \prime }), \\ 0, \qquad & z > g(x^{\prime \prime }). \end{array}\displaystyle \right . $$
(3.13)

By construction, \(\psi ^{x^{\prime },x^{\prime \prime }}\) is convex and \(\psi ^{x^{\prime },x^{\prime \prime }}(z)\geq (K_{2}-z)^{+}\) (see Fig. 3.9), and thus by Lemma 2.4, it can be used to construct a superhedge \((\psi ^{x^{\prime },x^{\prime \prime }}, \phi ^{x^{\prime },x^{\prime \prime }},\theta _{1,2}^{x^{ \prime },x^{\prime \prime }})\).

Fig. 3.9
figure 10

Picture of \(f\) and \(g\) along with superhedge for the dotted region \(\mathcal{G}\). The hedge function \(\psi ^{x',x''}\) has a kink at \(x'\)

Theorem 13

Suppose Assumption 3.8holds and\((K_{1},K_{2}) \in \mathcal{G}\). The model\(\hat{M}_{{\pi }_{x^{\prime },x^{\prime \prime }}}\)and the stopping time\(\tau =1\)if\(X< x^{\prime \prime }\)and\(\tau =2\)otherwise attain the highest consistent model price. Moreover, \(\psi ^{x^{\prime },x^{\prime \prime }}\)defined in (3.13) generates the cheapest superhedge, and the highest model-based price is equal to the cost of the cheapest superhedge.

Proof

The candidate canonical model is associated with the martingale coupling \({\pi }_{x^{\prime },x^{\prime \prime }}\) which is constant on \((-\infty ,f')\), maps \((f^{\prime },x^{\prime })\) onto itself, maps \((x^{\prime },x^{\prime \prime })\) onto \((x^{\prime },g(x^{\prime \prime }))\), and \((x^{\prime \prime },\infty )\) to \((-\infty ,f^{ \prime }) \cup (g(x''),\infty )\). Then for the candidate stopping time \(\tau \) (exercise at time 1 if \(X< x^{\prime \prime }\) and at time 2 otherwise), we have that the law of \(Y\) (under \(\hat{M}_{{\pi }_{x ^{\prime },x^{\prime \prime }}}\)), on the event that the option was not exercised at time 1, is given by \((\nu -\mu ) \lvert _{(-\infty ,f^{\prime })} + \nu \lvert _{(g(x^{\prime \prime }),\infty )}\). Therefore

$$\begin{aligned} {\mathrm{MBEP}} &= \int _{-\infty }^{x^{\prime \prime }} (K_{1}-w)^{+} \mu (dw) + \int _{-\infty }^{f^{\prime }} (K_{2} - w)^{+} (\nu - \mu )(dw) \\ & = P_{\mu }(x^{\prime \prime }) +(K_{1}-x^{\prime \prime })P_{ \mu }^{\prime }(x^{\prime \prime })+ D(f^{\prime }) +(K_{2}-f^{\prime })D^{\prime }(f^{\prime }). \end{aligned}$$

Now consider the hedging cost generated by \(\psi ^{x^{\prime },x^{\prime \prime }}\). Let us define the quantities \(\varTheta _{1}=\frac{K_{2}-f^{\prime }-\Delta _{2}(x^{\prime })}{x^{\prime }-f^{\prime }}= - \Delta _{1}'\) and \(\varTheta _{2}=\frac{K_{1}-x^{\prime \prime }}{g(x^{\prime \prime })-x^{\prime \prime }}= - \Delta _{2}'\). Note that we can rewrite (3.13) as

$$ \psi ^{x^{\prime },x^{\prime \prime }}(z)=\varTheta _{2}\big(g(x^{\prime \prime })-z\big)^{+} + (\varTheta _{1}-\varTheta _{2})(x^{\prime }-z)^{+} +(1- \varTheta _{1})(f^{\prime }-z)^{+}. $$

Then

$$ \phi (z)=(1-\varTheta _{1})\big((x^{\prime }-z)^{+}-(f^{\prime }-z)^{+} \big) +(1-\varTheta _{2})\big((x^{\prime \prime }-z)^{+}-(x^{\prime }-z)^{+} \big) , $$

and thus the hedging cost is

$$\begin{aligned} {\mathrm{HC}} & = \varTheta _{2}P_{\nu }\big(g(x^{\prime \prime })\big) + (1- \varTheta _{1})D(f^{\prime }) + (1-\varTheta _{2})P_{\mu }(x^{\prime \prime }) + (\varTheta _{1}-\varTheta _{2})D(x^{\prime }) \\ & = P_{\mu }(x^{\prime \prime }) + D(f^{\prime }) +\varTheta _{1}\big(D(x ^{\prime })-D(f^{\prime })\big) \\ & \quad{}+\varTheta _{2}\Big(P_{\nu }\big(g(x^{\prime \prime })\big)-P _{\nu }(x^{\prime })-P_{\mu }(x^{\prime \prime })+P_{\mu }(x^{\prime })\Big). \end{aligned}$$

Now using (3.8) and the fact that \(g(x')=x'\), we have that \(D'(f')=D'(x')\) and \(f'D'(f') - D(f') = x'D'(x')- D(x')\). Hence

$$ \varTheta _{1}\big(D(x^{\prime })-D(f^{\prime })\big)=(K_{2}-f^{\prime })D ^{\prime }(f^{\prime })-\Delta _{2}(x^{\prime })D^{\prime }(f^{\prime }). $$
(3.14)

Moreover, (3.9) gives that

$$\begin{aligned} &\varTheta _{2}\Big(P_{\nu }\big(g(x^{\prime \prime })\big) - P_{\nu }(x ^{\prime })-P_{\mu }(x^{\prime \prime })+P_{\mu }(x^{\prime })\Big) \\ &\quad = \varTheta _{2}\Big( g(x'') P'_{\nu }\big(g(x'')\big) - x'' P'_{ \mu }(x'') - x' D'(x')\Big) \\ &\quad = \varTheta _{2}\Big( \big(g(x'') - x''\big) P'_{\mu }(x'') + \big(g(x'') - x'\big)D'(x')\Big) \\ &\quad=(K_{1}-x^{\prime \prime })P^{\prime }_{\mu }(x^{\prime \prime })+ \Delta _{2}(x^{\prime })D^{\prime }(f^{\prime }). \end{aligned}$$
(3.15)

Then combining (3.14) and (3.15), we conclude that \(\mathrm{HC}=\mathrm{MBEP}\). □

3.3.5 \(K_{1} > r_{\mu }\)

In Lemma 3.4 and under the dispersion assumption, we have constructed \(f\) and \(g\), but only on the interval \((e_{-},r_{ \mu }]\). More generally, when \(\mu \) is continuous, the arguments of Beiglböck and Juillet [4] and Henry-Labordère and Touzi [16] allow us to construct \(T_{d}=f\) and \(T_{u}=g\) on \([\ell _{\mu },r_{\mu }]\) for arbitrary laws \(\mu \leq _{\mathrm{cx}} \nu \). For their purposes, the definitions of \(f\) and \(g\) outside the range of \(\mu \) are not important since they have no impact on the construction of the left-curtain martingale coupling.

Nonetheless, we can extend the definitions of \(f\) and \(g\) to ℝ in a way which respects the conditions in Lemma 2.6, by setting

$$ \textstyle\begin{array}{ccl} f(x)=x=g(x), & \hspace{10mm} & -\infty < x \leq \ell _{\mu },\\ f(x) = \ell _{\nu }\quad \text{and} \quad g(x)=r_{\nu }, && r_{\mu }< x< r_{\nu },\\ f(x)=x=g(x), && r_{\nu }\leq x < \infty . \end{array} $$

We shall show that with these definitions for \(f\) and \(g\), the analysis of the previous sections extends to the case \(K_{1}> r_{\mu }\); see Fig. 3.10.

Fig. 3.10
figure 11

The various cases for \(K_{1} > r_{\nu }\) in the setting of Sect. 3.3

Suppose that \(r_{\nu }>r_{\mu }\) and \(r_{\mu }< K_{1} < r_{\nu }\). Then \(\varLambda (r_{\mu }) = \frac{r_{\nu }- K_{1}}{r_{\nu }- r_{\mu }} - \frac{K _{1}-K_{2}}{r_{\mu }- \ell _{\nu }}\) and \(\varLambda (r_{\nu }-)=\infty \). If \(\varLambda (r_{\mu }) \geq 0\) and \(\varLambda \) is continuous, then there exists \(x^{*} \in [\ell _{\mu },r_{\mu }]\) such that \(g(x^{*})>x^{*}\) and \(\varLambda (x^{*})=0\). Then, exactly as in Sect. 3.2.2, we can construct a model, stopping time and superhedge such that the model-based expected payoff equals the hedging cost, and hence the model, stopping time and hedge are all optimal. The model could be based on the left-curtain coupling, and the optimal exercise rule is to exercise the American put at time 1 if \(X < x^{*}\). Even if \(\varLambda \) is not continuous, there may exist \(x^{*}\) such that \(\varLambda (x^{*})=0\) and the same arguments apply (see Sect. 3.3.1). If not, then we are in the setting of Sect. 3.3.4, but again we can identify the optimal model and hedge. Essentially, the case \(\varLambda (r_{\mu }) \geq 0\) is covered by a direct extension of existing arguments. Note that \(\varLambda (r_{ \mu }) \geq 0\) is equivalent to

$$ K_{2} \geq K_{1} - \frac{(r_{\mu }- \ell _{\nu })(r_{\nu }- K_{1})}{r_{\nu }- r_{\mu }}. $$

Now suppose \(r_{\mu }< K_{1} < r_{\nu }\) and \(K_{2} < K_{1} - \frac{(r _{\mu }- \ell _{\nu })(r_{\nu }- K_{1})}{r_{\nu }- r_{\mu }}\). Then \(\varLambda (r_{\mu })<0\), and since \(\varLambda (r_{\nu }-)=\infty \) and \(\varLambda \) is continuous on \([r_{\mu },r_{\nu }]\) (note that we have defined \(f\) and \(g\) to be constant on this range), there must exist \(x^{*} \in (r_{\mu },K_{1})\) such that \(\varLambda (x^{*})=0\). It is always optimal to exercise at time 1, and any martingale coupling can be used to generate a model which attains the highest model-based price \(P_{\mu }(K_{1})=(K_{1} - \overline{\mu })\). A cheapest superhedge is generated by

$$ \psi (y) = \frac{K_{2}-\ell _{\nu }}{r_{\nu }-\ell _{\nu }} (r_{\nu }- y)^{+} + \frac{r_{\nu }- K_{2}}{r_{\nu }-\ell _{\nu }} (\ell _{\nu }- y)^{+}. $$
(3.16)

The cost of this hedge is

$$\begin{aligned} & \frac{K_{2}-\ell _{\nu }}{r_{\nu }-\ell _{\nu }} P_{\nu }(r_{\nu }) + \frac{r_{\nu }- K_{2}}{r_{\nu }-\ell _{\nu }} P_{\nu }(\ell _{\nu }) + P _{\mu }(K_{1}) - \frac{K_{2}-\ell _{\nu }}{r_{\nu }-\ell _{\nu }} P_{ \mu }(r_{\nu }) - \frac{r_{\nu }- K_{2}}{r_{\nu }-\ell _{\nu }} P_{ \mu }(\ell _{\nu }) \\ &\quad = \frac{K_{2}-\ell _{\nu }}{r_{\nu }-\ell _{\nu }} (r_{\nu }- \bar{ \mu })+ (K_{1} - \bar{\mu }) - \frac{K_{2}-\ell _{\nu }}{r_{\nu }- \ell _{\nu }}(r_{\nu }- \bar{\mu }) = K_{1} - \bar{\mu }. \end{aligned}$$

Finally, if \(K_{1}>r_{\nu }\), then \(Y< K_{1}\) almost surely under any consistent model and

$$ \mathbb{E}[(K_{2}-Y)^{+}| \mathcal{F}_{1}] \leq \mathbb{E}[(K_{1}-Y)^{+}| \mathcal{F}_{1}] = \mathbb{E}[K_{1}-Y| \mathcal{F}_{1}] = K_{1}-X. $$

Therefore, it is always optimal to exercise the American put at time 1. If \(K_{2}> r_{\nu }\) or \(K_{2}< \ell _{\nu }\) then we are in the case studied in Sect. 3.3.2 and the cheapest hedge is generated by a time-2 payoff \(\psi (y) = (K_{2}-y)^{+}\). If \(K_{2} \in [\ell _{ \nu }, r_{\nu }]\), then we are in the case studied in Sect. 3.3.3 and the cheapest superhedge is generated by \(\psi =\psi (y)\), where \(\psi \) is given by (3.16). In either case, the highest model-based expected payoff is \(P_{\mu }(K _{1})= (K_{1} - \bar{\mu })\), and this is also the cost of the superhedge; see Fig. 3.10.

3.4 Intervals where \(\nu \) has no mass, or \(\nu = \mu \)

The definition of the left-curtain martingale coupling (recall Lemma 3.4) only requires that \(g=T_{u}\) is increasing, and not that it is continuous. In general, \(g\) may have jumps; such jumps occur when there is an interval on which \(\nu \) places no mass.

If \(g\) has a jump, then we need to adapt the superhedge. Suppose \(g\) has a jump at \(\hat{x}\) (which has to be upwards since \(g\) is increasing) and \(f\) is continuous at \(\hat{x}\). Suppose further that \(K_{1}\) is such that \(\hat{x}\in (g^{-1}(K_{1}),K_{1})\). Then as before, we should like to find \(x^{*}\in (g^{-1}(K_{1}),K_{1})\) such that \(\varLambda (x^{*})=0\). Recall that \(\varLambda \) is increasing and suppose \(\varLambda (g^{-1}(K _{1}))<0<\varLambda (K_{1})\). If \(\varLambda (\hat{x}-)<0\) and \(\varLambda ( \hat{x}+)>0\), then there will be no solution to \(\varLambda =0\). However, by keeping \(x=\hat{x}, \hat{f}=f(\hat{x})\) fixed in (3.1) and varying \(g\) only, we can find \(\hat{g}\in (g(\hat{x}-),g(\hat{x}+))\) such that \((\hat{g}-K_{1})/(\hat{g}-\hat{x})=(K_{1}-K_{2})/( \hat{x}-\hat{f})\) so that \(\varUpsilon (f(\hat{x}), \hat{x}, \hat{g}) = 0\). Then the candidate (and indeed optimal) superhedging strategy is generated by \(\psi ^{*}\) given in (3.3), with \((f^{*},x^{*},g ^{*})=(\hat{f},\hat{x},\hat{g})\); see Fig. 3.11. Moreover, since \(\nu \) does not charge \((g(\hat{x}-),g(\hat{x}+))\), the triple \((\hat{f},\hat{x},\hat{g})\) solves the mass and mean equations (2.4) and (2.5). The strong duality between the model-based expected payoff and the hedging cost follows as before.

Fig. 3.11
figure 12

Sketch of put payoffs with points \(\hat{x}\), \(\hat{f}\) and \(\hat{g}\) marked

Alternatively, suppose \(f\) has a downward jump at \(\bar{x}\). This can happen if \(\nu =\mu \) on \((f(\bar{x}+),f(\bar{x}-))\). Suppose that \(K_{1}\) is such that \(\bar{x}\in (g^{-1}(K_{1}),K_{1})\) and \(\varLambda (\bar{x}-)<0\) and \(\varLambda (\bar{x}+)>0\), so that again we cannot find \(x\in (g^{-1}(K_{1}),K_{1})\) with \(\varLambda (x)=0\). We can deal with this similarly as in the case of a discontinuity in \(g\): we first choose \(\bar{f}\in (f(\bar{x}+),f(\bar{x}-))\) such that \(\varUpsilon (\bar{f},\bar{x},g(\bar{x}))=0\), and then consider a hedging strategy generated by \(\psi ^{*}\) with \((f^{*},x^{*},g^{*})=(\bar{f}, \bar{x},g(\bar{x}))\). Note that \(\mu = \nu \) on \((f(\bar{x}+),f( \bar{x}-))\); so if (2.4) and (2.5) hold for some \(f \in [f(\bar{x}+),f(\bar{x}-)]\) (with \(\bar{x}, \bar{g}\)), then they hold for all \(f\) in this interval. It follows that we can construct a coupling in which \((\bar{f},\bar{x})\) is mapped to \((\bar{f},\bar{g})\), and strong duality holds.

In the case of \(f\) and \(g\) jumping simultaneously, we have a pictorial representation of the regions of pairs \((K_{1},K_{2})\) which lead to a hedging strategy which has to be adapted as above; see Fig. 3.12. If \(g\) has a jump at \(\hat{x}\), then \(\varLambda ( \hat{x}-)<0\) and \(\varLambda (\hat{x}+)>0\) is equivalent to the point \((K_{1},K_{2})\) lying in the interior of a triangle with vertices \(\{(g(\hat{x}-),g(\hat{x}-)),(g(\hat{x}+),g(\hat{x}+)),(\hat{x},f( \hat{x}))\}\). On the other hand, if \(f\) jumps downwards at \(\bar{x}\), then \(\varLambda (\bar{x}-)<0\) and \(\varLambda (\bar{x}+)>0\) is equivalent to the point \((K_{1}, K_{2})\) lying in the interior of a triangle with vertices \(\{(\bar{x},f(\bar{x}-)), (\bar{x},f(\bar{x}+)), (g(\bar{x}),g(\bar{x}))\}\) (compare this with the region \(\mathcal{G}\)).

Fig. 3.12
figure 13

Atoms of \(\nu \) correspond to flat sections in \(f\) and \(g\). Regions of no mass of \(\nu \) correspond to jumps of \(f\) and \(g\)

Exceptionally, we may have simultaneous jumps in \(g\) and \(f\) at \(\check{x}\). Then the set of \((K_{1},K_{2})\) for which these arguments are needed is a quadrilateral with vertices \(\{(\check{x}, f(\check{x}-)), (\check{x}, f(\check{x}+)), (g(\check{x}+), g( \check{x}+)), (g(\check{x}-), g(\check{x}-))\}\). In particular, then there are multiple pairs \((\check{f},\check{g})\) with \(\check{f} \in (f(\check{x}+), f(\check{x}-))\) and \(\check{g}\in (g(\check{x}-), g(\check{x}+))\) such that \(\varUpsilon (\check{f}, \check{x}, \check{g}) = 0\), so that an optimal hedging strategy is not unique.

3.5 The general case for continuous \(\nu \)

In the previous sections, we have shown how the left-curtain coupling can be used to find an optimal model, exercise strategy and a superhedge, under the assumption that both \(\mu \) and \(\nu \) are continuous together with further regularity and simplifying assumptions which we labelled the dispersion assumption and the single-jump assumption. Under the latter assumption, the existence of points that solve (3.8) led us to identify two further types of hedging strategy that were not present under the dispersion assumption, making four in total.

If we relax the assumptions further and require only that both \(\mu \) and \(\nu \) are continuous, then we expect that in some cases there may exist multiple pairs \((f^{\prime }_{i},x^{\prime }_{i})\), \(i=1,2,3,\dots \), that solve (3.8). Note that from the monotonicity of \(g\), we can write \(\{ x : g(x) > x \}\) as a countable union of intervals, and on each such interval, \(f\) is decreasing. \(f\) jumps over the intervals \((f_{i}',x_{i}')\) identified above (at least those with \(x'\) to the left of the current value of \(x\)). In particular, \(f\) has only countably many downward jumps. Figure 2.1 is a stylised representation of the general left-curtain martingale coupling, not least because in the figure \(f\) has only finitely many jumps. Starting from Fig. 2.1 and using the constructions in Sect. 3.3, we can divide the area of all \((K_{1}, K_{2}< K_{1})\) into four regions; see Fig. 3.13. They key point is that these four regions are characterised exactly as in the cases described in Sect. 3.3. For given \((K_{1},K_{2})\), we can determine which of the types of hedging strategy is a candidate optimal superhedge, and determine a candidate optimal stopping rule. (We can always use the model associated with the left-curtain martingale coupling \(\pi _{\mathrm{lc}}\).) The fact that these candidates are indeed optimal can be proved using exactly analogous techniques to those used in Sect. 3.3.

Fig. 3.13
figure 14

General picture of \(f,g\) with shading of regions. There remain four types of shading corresponding to four forms of optimal hedge

More specifically, we can divide \(\{(k_{1},k_{2}): k_{2}< k_{1} \}\) into two disjoint regions, \(\{ (k_{1},k_{2}):k_{2} \leq f(k_{1})\}\) and \(\{(k_{1},k_{2}): f(k_{1}) < k_{2} < k_{1} \}\). We can divide the former into two further regions \(\mathcal{W}= \{ (k_{1},k_{2}) : k_{2}< k_{1}, \exists x \leq k_{1} \mbox{ with }f(x)< k_{2}< g(x) \}\) and \(\mathcal{B}= \{ (k_{1},k_{2}): k_{2} \leq f(k_{1}) \} \setminus \mathcal{W}\). The latter we again divide into two regions \(\mathcal{G}\) and \(\mathcal{R}= \{ (k_{1},k_{2}): f(k_{1}) < k_{2} < k _{1}\} \setminus \mathcal{G}\). Here we can write \(\mathcal{G}= \bigcup _{x:f(x-)>f(x+) } \Delta (x)\), where \(\Delta (x)\) is a triangle with vertices \(\{(x, f(x+)), (x, f(x-)), (g(x),g(x))\}\). Then on each of the regions \(\mathcal{W}\), ℬ, \(\mathcal{G}\) and ℛ, we have a superhedge exactly as described in Sect. 3.3. Moreover, again by the arguments of Sect. 3.3, we can show that the hedging cost associated with the superhedging strategy is precisely the model-based expected payoff of the American put under the martingale coupling \(\pi _{\mathrm{lc}}\) (and candidate stopping rule), thus proving the optimality of the hedge and of the model/exercise rule.

For example, suppose \((K_{1},K_{2})\in \mathcal{W}\). (The cases for \((K_{1},K_{2}) \in \mathcal{R}\cup \mathcal{B}\) are generally even simpler, and for \((K_{1},K_{2}) \in \mathcal{G}\), the story is roughly equally involved.) Recall that under the (single-jump) Assumption 3.8, in order to show that \(\mathrm{MBEP}= \mathrm{HC}\), we used the existence of \(\bar{x}\) and \(\bar{f}\) satisfying (3.8) together with the fact that

$$ \int _{y}\int _{x>K_{1}}(K_{2}-y)^{+}\pi _{\mathrm{lc}}(dx,dy)= \int _{-\infty }^{\bar{f}}(K_{2}-y)(\nu -\mu )(dy). $$
(3.17)

Then for general probability measures \(\mu \) and \(\nu \), provided we can find \(\bar{x}\), \(\bar{f}\) satisfying (3.8) and (3.17), the proof that \(\mathrm{MBEP}=\mathrm{HC}\) and hence of optimality follows exactly as in Theorem 3.12.

Lemma 14

Suppose\((K_{1},K_{2})\in \mathcal{W}\). Then there exist\(\bar{x}, \bar{f}\)such that

$$ \bar{f}< K_{2}< \bar{x}\leq K_{1} \qquad \textit{and} \qquad \int _{\bar{f}}^{\bar{x}}z^{i}\mu (dz)=\int _{\bar{f}}^{\bar{x}}z^{i} \nu (dz),\quad i\in \{1,2\}, $$
(3.18)

and such that under the left-curtain coupling, we have

$$ \{X>K_{1},Y\leq K_{2}\}=\{X>K_{1},Y\leq \bar{f}\}=\{Y\leq \bar{f}\} \setminus \{X\leq \bar{f}\} \qquad \textit{a.s.}, $$

so that (3.17) holds.

Proof

Define \(\mathcal{X}=\mathcal{X}_{K_{1},K_{2}}=\{x:x\leq K_{1}, f(x)< K _{2}< g(x)\}\). Since \((K_{1},K_{2})\) is in \(\mathcal{W}\), \(\mathcal{X} _{K_{1},K_{2}}\) is nonempty. Define \(\hat{x}=\sup \{x:x\in \mathcal{X}\}\). We show that \(\hat{x}\) and a suitably defined \(\hat{f}\) are such that (3.17) and (3.18) hold.

First suppose that \(\hat{x}< K_{1}\). Suppose further that \(g(\hat{x})> \hat{x}\). Take \(\tilde{x}\in (\hat{x},g(\hat{x})\wedge K_{1})\). Then \(g(\tilde{x})\geq g(\hat{x})>\tilde{x}\). Also \(f(\tilde{x})\notin (f( \hat{x}),g(\hat{x}))\) and if \(f(\tilde{x})\geq g(\hat{x})\), then we have \(f(\tilde{x})\geq g(\hat{x})>\tilde{x}\) which is a contradiction. So \(f( \tilde{x})\leq f(\hat{x})< K_{2}< g(\hat{x})\leq g(\tilde{x})\), and then \(\tilde{x}\in \mathcal{X}\), contradicting the maximality of \(\hat{x}\). Hence \(g(\hat{x})\leq \hat{x}\) (and thus \(g(\hat{x})= \hat{x}\)). But then \(f(\hat{x})=\hat{x}\) and \(\hat{x}\notin \mathcal{X}\).

Hence there exists \((x_{n})_{n\geq 1}\) such that \(x_{n}\in \mathcal{X}\) and \(x_{n}\uparrow \hat{x}\). Let \(g(\hat{x}-)=\lim g(x _{n})\). By the same argument as above, we cannot have \(g(\hat{x}-)> \hat{x}\). Hence \(\hat{x} = g(\hat{x}-) > K_{2}\).

Now suppose \(\hat{x}=K_{1}>K_{2}\). Then \(K_{1}\notin \mathcal{X}\) because we cannot have both \(K_{2}\leq f(K_{1})\) and \(f(K_{1})< K_{2}< g(K_{1})\). Hence there exists \((x_{n})_{n\geq 1}\) such that \(x_{n}\in \mathcal{X}\) and \(x_{n}\uparrow \hat{x}\). Let \(g_{n}=g(x_{n})\) and \(f_{n}=f(x_{n})\). If \(g(K_{1})>K_{1}\), then there exists \(n_{0}\) such that for all \(n\geq n_{0}\), \(g_{n}>K_{1}\). Then \(f(K_{1})\notin (f_{n},g_{n})\) and therefore \(f(K_{1})\leq f_{n}< K _{2}\), contradicting \(K_{2}\leq f(K_{1})\). Hence \(g(\hat{x}-)=g(K_{1}-)=K _{1}\).

In either case, \(\hat{x}\notin \mathcal{X}\) and there exists \((x_{n})_{n\geq 1}\) such that \(x_{n}\in \mathcal{X}\), \(x_{n}\uparrow \hat{x}\) and \((f_{n})_{n\geq 1}\) is a decreasing sequence while \((g_{n})_{n\geq 1}\) satisfies \(g_{n}\uparrow \hat{x}\). Let \(\hat{f}= \lim _{n\to \infty }f_{n}\). Then

$$ \int _{f_{n}}^{x_{n}}z^{i}\mu (dz)=\int _{f_{n}}^{g_{n}}z^{i}\nu (dz), \qquad i\in \{1,2\}, $$

and by taking limits, we have that \(\hat{x}\) and \(\hat{f}\) solve (3.18). Note also that \(\hat{x}>K_{2}\).

We are left to show that \(\hat{x}\) and \(\hat{f}\) solve (3.17). This follows from the fact that \(\hat{f} < K_{2}\), together with the set identifications

$$ \{X>K_{1},Y\leq K_{2}\} = \{X>K_{1}, Y \leq \hat{f}\} = \{X > \hat{f}, Y \leq \hat{f}\}=\{Y\leq \bar{f}\}\setminus \{X\leq \bar{f}\} . $$

 □

Remark 15

The set \(\{x: g(x)>x\}\) is a collection of intervals and we let \(I_{+}\) denote the set of right endpoints of these intervals. As remarked above, Fig. 3.13 is drawn in the case of ‘finite complexity’ in the sense that the set \(I_{+}\) contains a finite number of elements. The results extend easily to countable \(I_{+}\), provided \(I_{+}\) contains no accumulation points.

In general, \(I_{+}\) may contain an accumulation point, and as discussed in Henry-Labordère and Touzi [16], care is needed in the construction of the left-curtain mappings \((T_{d},T _{u})\) in this case. However, from our perspective, such subtleties do not cause a problem. The reason is that we do not aim to derive the left-curtain coupling, but rather take the left-curtain coupling as a given and use it to solve the put pricing problem.

Our construction of the best model and the cheapest hedge is local in the sense that when in Fig. 3.13, we examine in which region the point \((K_{1},K_{2})\) lies, the fine detail of the picture in other parts of \((k_{1},k_{2})\)-space is not important. So the existence of accumulation points can only be an issue if \(K_{1}\) is equal to one of those accumulation points.

Let \(x_{\infty }\) be such an accumulation point in \(I_{+}\) and suppose \(K_{1}=x_{\infty }\). Depending on the value of \(K_{2}\), then either there exists \((x',f')\) with \(f' < K_{2} < x'\) such that (3.8) holds or not. In the former case, we can follow the analysis of Sect. 3.3.3, and in the latter, Sect. 3.3.2; in either case, we construct a model and hedge such that the model price and hedging cost agree, thus proving optimality of both.

3.6 Atoms in the target law

When \(\nu \) has atoms, the preservation-of-mass and mean conditions become (2.6) and (2.7), respectively. In particular, atoms of \(\nu \) correspond to the flat sections in \(f\) or \(g\); see Fig. 3.12. In this case, we still can find all the optimal quantities as before. In particular, \(\varLambda (x):= \frac{g(x)-K _{1}}{g(x)-x} - \frac{K_{1} - K_{2}}{x-f(x)}\) is strictly increasing in \(x\), even if \(f\) and/or \(g\) is constant. Hence we can find solutions to \(\varLambda =0\) (more generally solutions \(x,f \in \aleph (x)\) to \(\varUpsilon (f,x,g=g(x))=0\)) exactly as before. The superhedge is unchanged. A little care is needed in constructing the optimal model, but under the associated martingale coupling, mass in \((f(x^{*}),x ^{*})\) is mapped onto \((f(x^{*}),g(x^{*}))\) together with (potentially) atoms at \(f(x^{*})\) or \(g(x^{*})\). Specifically, given \(f^{*},x^{*},g ^{*}\), we can find \(\lambda ^{*}_{f}\) and \(\lambda ^{*}_{g}\) such that (2.6) and (2.7) hold. Then in any optimal canonical model \(\hat{M}_{\pi }\), \(\pi \) is constant on \((-\infty , f ^{*})\), and the law of \(\hat{M}_{2}\) on the event \(\{\hat{M}_{1} \in (f^{*},x^{*})\}\) is \(\nu _{x^{*}}\) which is defined to be \(\nu _{x^{*}} = \nu |_{(f^{*},g^{*})} + \lambda ^{*}_{f^{*}} \delta _{f ^{*}} + \lambda ^{*}_{g^{*}} \delta _{g^{*}}\). We also find that the law of \(\hat{M}_{2}\) on the event \(\{\hat{M}_{1} \in (x^{*},\infty )\}\) is \(\nu - \nu _{x^{*}}- \mu |_{(-\infty ,f^{*})}\).

4 Discussion and extensions

4.1 The role of the left-curtain coupling

For any pair of strikes \((K_{1},K_{2})\), the left-curtain model attains the highest expected payoff for the American put. However, although it optimises simultaneously across all pairs of strikes, it is not (in general) optimal for linear combinations of American puts. For example, if we consider a generalised American option with payoff \(a\) if exercised at time 1 and \(b\) if exercised at time 2, where \(a(x) = \sum _{j=1}^{J}(K^{j}_{1}- x)^{+}\) and \(b(y)=\sum _{j=1}^{J}(K_{2}^{j} - y)^{+}\) (with \(K^{j}_{2} \leq K^{j}_{1}\) for each \(j\)), then the model associated with the left-curtain coupling is typically not optimal. The reason is that a model \((\mathcal{S},M)\) is only optimal when it is combined with the best stopping rule, and the optimal stopping rule does depend on \((K_{1}, K_{2})\).

Conversely, although the model associated with the left-curtain coupling is optimal (simultaneously across all pairs \((K_{1},K_{2})\)), we do not need the full power of this coupling when we work with fixed \((K_{1},K_{2})\). In the dispersion assumption case, all we need is a coupling in which \((f(x^{*}), x^{*})\) is mapped onto \((f(x^{*}),g(x ^{*}))\) where \(x^{*}\) is such that \(\varLambda (x^{*})=0\), and \((-\infty , f^{*})\) is mapped to itself, but not necessarily in a constant fashion. There are many martingale couplings which have this property.

The intuition behind the optimality of the left-curtain coupling is as follows. With American puts, there is a tension between the time-decay of the option payout promoting early exercise, and the convexity of the payoff function promoting delay. If the aim is to maximise the payoff of the option, then any paths which are in the money at time 1 and will remain in the money are best exercised at time 1. However, once a path has been exercised, any further volatility is irrelevant. In particular, when designing a candidate optimal model, we should try to keep paths which are exercised at time 1 constant (or near constant) whenever possible. Thus the probability space should be split into two regions: one region where the put is in the money at time 1 and is exercised, and thereafter paths move little, and a second region where the put is out of the money at time 1 (and sometimes just in the money, but left unexercised at time 1), and then paths move a long way between times 1 and 2. The left-curtain coupling has this property.

4.2 Multiple exercise times

It is natural to ask if it is possible to extend the analysis to American puts which can be exercised at multiple dates \((T_{1}, T_{2}, \ldots , T_{N})\) where \(N>2\), or equivalently to martingales \(M = (M_{n})_{0 \leq n \leq N}\) with marginals \((\mu _{n})\) where \(\mu _{1}\) has mean \(M_{0} = \bar{\mu }\) and \(\mu _{n} \leq _{ \mathrm{cx}} \mu _{n+1}\) for \(1 \leq n \leq N-1\). It is clear that many of the ideas extend naturally to the multi-marginal case. However, the number of types of hedging strategy may grow exponentially with \(N\). This is left as future work.