Problem-driven scenario generation: an analytical approach for stochastic programs with tail risk measure
Abstract
Scenario generation is the construction of a discrete random vector to represent parameters of uncertain values in a stochastic program. Most approaches to scenario generation are distribution-driven, that is, they attempt to construct a random vector which captures well in a probabilistic sense the uncertainty. On the other hand, a problem-driven approach may be able to exploit the structure of a problem to provide a more concise representation of the uncertainty. In this paper we propose an analytic approach to problem-driven scenario generation. This approach applies to stochastic programs where a tail risk measure, such as conditional value-at-risk, is applied to a loss function. Since tail risk measures only depend on the upper tail of a distribution, standard methods of scenario generation, which typically spread their scenarios evenly across the support of the random vector, struggle to adequately represent tail risk. Our scenario generation approach works by targeting the construction of scenarios in areas of the distribution corresponding to the tails of the loss distributions. We provide conditions under which our approach is consistent with sampling, and as proof-of-concept demonstrate how our approach could be applied to two classes of problem, namely network design and portfolio selection. Numerical tests on the portfolio selection problem demonstrate that our approach yields better and more stable solutions compared to standard Monte Carlo sampling.
Mathematics Subject Classification
90C15 (stochastic programming)1 Introduction
Stochastic programming is a tool for making decisions under uncertainty. Under this modeling paradigm, uncertain parameters are modeled as a random vector, and one attempts to minimize (or maximize) the expectation or risk measure of some loss function which depends on the initial decision. However, what distinguishes stochastic programming from other stochastic modeling approaches is its ability to explicitly model future decisions based on outcomes of stochastic parameters and initial decisions, and the associated costs of these future decisions. The power and flexibility of the stochastic programming approach comes at a price: stochastic programs are usually analytically intractable, and often not susceptible to solution techniques for deterministic programs.
Typically, a stochastic program can only be solved when it is scenario-based, that is when the random vector for the problem has a finite discrete distribution. For example, stochastic linear programs become large-scale linear programs when the underlying random vector is discrete. In the stochastic programming literature, the mass points of this random vector are referred to as scenarios, the discrete distribution as the scenario set and the construction of this set as scenario generation. Scenario generation can consist of discretizing a continuous probability distribution, or directly modeling the uncertain quantities as discrete random variables. The more scenarios in a set, the more computational power that is required to solve the problem. The key issue of scenario generation is therefore how to represent the uncertainty to ensure that the solution to the problem is reliable, while keeping the number of scenarios low so that the problem is computationally tractable.
A common approach to scenario generation is to fit a statistical model to the uncertain problem parameters and then generate a random sample from this for the scenario set. This has desirable asymptotic properties [22, 33], but may require large sample sizes to ensure the reliability of the solutions it yields. This can be mitigated somewhat by using variance reduction techniques such as stratified sampling and importance sampling [24]. Sampling also has the advantage that it can be used to construct confidence intervals on the true solution value [25]. Another approach is to construct a scenario set whose distance from the true distribution, with respect to some probability metric, is small [12, 19, 28]. These approaches tend to yield better and much more stable solutions to stochastic programs than does sampling.
A characteristic of these approaches to scenario generation is that they are distribution-driven; that is, they only aim to approximate a distribution and are divorced from the stochastic program for which they are producing scenarios. By exploiting the structure of a problem, it may be possible to find a more parsimonious representation of the uncertainty. Note that such a problem-driven approach may not yield a discrete distribution which is close to the true distribution in a probabilistic sense; the aim is only to find a discrete distribution which yields a high quality solution to our problem.
Stochastic programs often have the objective of minimizing the expectation of a loss function. This is particularly appropriate when the initial decision represents a strategic decision that is going to be used again and again, and individual large losses do not matter in the long term. For example, in a stochastic facility location problem (e.g. see [5]) the locations of several facilities must be chosen subject to the unknown demands of customers in a way which minimizes fixed investment costs, and future distribution costs. In other cases, the decision may be only used once or a few times, and the occurrence of large losses may have serious consequences such as bankruptcy. This is characteristic of the portfolio selection problem [26] studied in detail in the latter part of this paper. In this latter case, minimizing the expectation alone is not appropriate as this does not necessarily mitigate against the possibility of large losses. One possible remedy is to use a risk measure which penalises in some way the likelihood and severity of potential large losses.
In this paper we are interested in stochastic programs which use tail risk measures. A precise definition of a tail-risk measure will be given in Sect. 3 but for now, one can think of a tail risk measure as a function of a random variable which only depends on the upper tail of its distribution function. Tail risk measures are useful as they summarize the extent of potential losses in the worst possible outcomes. Examples of tail risk measures include the Value-at-Risk (VaR) [21] and the Conditional Value-at-Risk (CVaR) [29], both of which are commonly used in financial contexts. Although the methodology developed in this paper can in principle be applied to any loss function, in this work we are mainly interested in loss functions which arise in one and two-stage stochastic programs.
Distribution-driven scenario generation methods are particularly problematic for stochastic programs involving tail risk measures. This is because these methods tend to spread their scenarios evenly across the support of distribution and so struggle to adequately represent the tail risk without using a potentially prohibitively large number of scenarios.
In this paper, we propose an analytic problem-driven approach to scenario generation applicable to stochastic programs which use tail risk measures of a form made precise in Sect. 3. We observe that the value of a tail risk measure depends only on scenarios confined to an area of the distribution that we call the risk region. This means that all scenarios that are not in the risk region can be aggregated into a single point. By concentrating scenarios in the risk region, we can calculate the value of a tail risk measure more accurately.
Given a risk region for a problem, we propose a simple algorithm for generating scenarios which we call aggregation sampling. This algorithm takes samples from the random vector until a specified number of samples in the risk region have been produced, and all other scenarios are aggregated into a single scenario. We provide and give proofs of conditions under which this method is asymptotically consistent with standard Monte Carlo sampling.
In general, finding a risk region is difficult as it is determined by the loss function, problem constraints and the distribution of the uncertain parameters. Therefore, we derive risk regions for two classes of problem as a proof-of-concept of our methodology. The first class of problems are those with monotonic loss functions which, as will be shown, occur naturally in the context of network design. The second class are portfolio selection problems. For both types of risk regions we run numerical tests which demonstrate that our methodology yields better quality solutions and with greater reliability than standard Monte Carlo sampling.
This paper is organized as follows: in Sect. 2 we discuss related work; in Sect. 3 we define tail risk measures and their associated risk regions; in Sect. 4 we discuss how these risk regions can be exploited for the purposes of scenario generation; in Sect. 5 we prove that our scenario generation method is consistent with standard Monte Carlo sampling; in Sects. 6 and 7 we derive risk regions for the two classes of problems described above; in Sect. 8 we present numerical tests; finally in Sect. 9 we summarize our results and make some concluding remarks.
Notation Throughout this paper random variables and vectors are represented by bold (mainly Greek) letters: \(\varvec{\theta },\ \varvec{\xi },\ \varvec{\zeta }\) and outcomes of these are represented by the corresponding non-bold letters: \(\theta ,\ \xi ,\ \zeta \). Inequalities used with vectors and matrices always apply component-wise. \(\left\| \cdot \right\| \) represents the standard Euclidean norm.
2 Related work
There are relatively few cases of problem-driven scenario generation in the literature. The earliest example of which we are aware is the importance sampling approach of [8] which constructs a sampler from the loss function. Importance sampling has been used more recently for scenario generation for problems which, like our own, concern rare events. In [23] an importance sampling scheme is used for a multistage problem involving the CVaR risk measure. In [4], an importance sampling approach is proposed for chance-constrained stochastic programs where the permitted probabilities of constraint violation are very small.
There is an interesting connection between problem-driven scenario generation and distributionally robust optimization [11, 38, 39]. In distributionally robust optimization, the distribution of the random variables in a stochastic program is itself uncertain, and one must optimize for the worst-case distribution. Solving a distributionally robust optimization problem thus involves finding, at least implicitly, the worst-case distribution or scenario set for given objective and constraints. In this sense, distributionally robust optimization could be considered as a problem-driven scenario generation method. Of particular relevance for this work, the paper [9] solves a distributionally robust portfolio selection problem involving the CVaR risk measure where the distribution of asset returns has specified discrete marginals, but unknown joint distribution.
The idea that in stochastic programs with tail risk measures some scenarios do not contribute to the calculation of the tail-risk measure was also exploited in [17]. However, they propose a solution algorithm rather than a method of scenario generation. Their approach is to iteratively solve the problem with a subset of scenarios, identify the scenarios which have loss in the tail, update their scenario set appropriately and resolve, until the true solution has been found. Their method has the benefit that it is not distribution dependent. On the other hand, their method works for only the \(\beta {\text {-CVaR}}\) risk measure, while our approach works in principle for any tail risk measure.
3 Tail risk measures and risk regions
In this section we present the core theory to our scenario generation methodology. Specifically, in Sect. 3.1 we formally define tail-risk measures of random variables and in Sect. 3.2 we define risk regions and present some key results related to these.
3.1 Tail risk of random variables
In our set-up we suppose we have some random variable representing an uncertain loss. For our purposes, we take a risk measure to be any function of a random variable. The following formal definition is adapted from [35].
Definition 1
(Risk measure) Let \((\varOmega , {\mathbb {P}})\) be a probability space, and \(\varTheta \) be the set of measurable real-valued random variables on \((\varOmega , {\mathbb {P}})\). Then, a risk measure is some function \(\rho : \varTheta \rightarrow {\mathbb {R}}\cup \{\infty \}\).
For a risk measure to be useful, it should in some way penalize potential large losses. For example, in the classical Markowitz problem [26], one aims to minimize the variance of the return of a portfolio. By choosing a portfolio with a low variance, we reduce the probability of larges losses as a direct consequence of Chebyshev’s inequality (see for instance [6]). Various criteria for risk measures have been proposed; in [3] a coherent risk measure is defined to be a risk measure which satisfies axioms such as positive homogeneity and subadditivity; another perhaps desirable criterion for risk measures is that the risk measure is consistent with respect to first and second order stochastic dominance, see [27] for instance.
Besides not satisfying some of the above criteria, a major drawback with using variance as a measure is that it penalizes all large deviations from the mean, that is, it penalizes large profits as well as large losses. This motivates the idea of using risk measures which depend only on the upper tail of the loss distribution. To formalize this idea, we first recall the definition of quantile function.
Definition 2
The \(\beta \)-quantile can be interpreted as the smallest value for which the distribution function is greater than or equal to \(\beta \). The \(\beta \)-tail of a distribution is the restriction of the distribution function to values equal to or above the \(\beta \)-quantile. In the context of risk management, we typically have \(0.9 \le \beta < 1.0\). The following definition says that a tail risk measure is a risk measure that only depends on the \(\beta \)-tail of a distribution.
Definition 3
(Tail risk measure) Let \(\rho _\beta : \varTheta \rightarrow {\mathbb {R}}\cup \{\infty \}\) be a risk measure per Definition 1. Then \(\rho _\beta \) is a \(\beta \)-tail risk measure if \(\rho _\beta (\varvec{\theta })\) depends only on the restriction of quantile function of \(\varvec{\theta }\) above \(\beta \), in the sense that if \(\varvec{\theta }\) and \(\varvec{\tilde{\theta }}\) are random variables with \(\mathrel {F_{\varvec{\theta }}^{-1}|_{[\beta ,1]}}= \mathrel {F_{\varvec{\tilde{\theta }}}^{-1}|_{[\beta ,1]}}\) then \(\rho _\beta (\varvec{\theta }) = \rho _\beta (\varvec{\tilde{\theta }})\).
To show that \(\rho _\beta \) is a \(\beta \)-tail risk measure, we must show that \(\rho _\beta (\varvec{\theta })\) can be written as a function of the quantile function above or equal to \(\beta \). Two very popular tail risk measures are the value-at-risk [21] and the conditional value-at-risk [30]:
Example 1
Example 2
The observation that we exploit for this work is that very different random variables will have the same \(\beta \)-tail risk measure as long as their \(\beta \)-tails are the same.
When showing that two distributions have the same \(\beta \)-tails, it is convenient to use distribution functions rather than quantile functions. The following result gives conditions which ensure that the \(\beta \)-tails of two distributions are the same. We will make use of these in proofs later in this paper.
Lemma 1
- (i)
\(F_{\varvec{\tilde{\theta }}}(\theta ) = F_{\varvec{\theta }}(\theta )\) for all \(\theta \ge F_{\varvec{\theta }}^{-1}(\beta )\) and \(F_{\varvec{\tilde{\theta }}}(\theta ) < \beta \) for all \(\theta < F_{\varvec{\theta }}^{-1}(\beta )\).
- (ii)
\(F_{\varvec{\tilde{\theta }}}(\theta ) = F_{\varvec{\theta }}(\theta )\) for all \(\theta \ge L\) for some \(L < F_{\varvec{\theta }}^{-1}(\beta )\).
Proof
We first prove that condition (i) implies that the \(\beta \)-tails are the same. Since \(F_{\varvec{\tilde{\theta }}}(\theta ) = F_{\varvec{\theta }}(\theta ) \ge \beta \) for all \(\theta \ge F_{\varvec{\theta }}^{-1}(\beta )\), we must have \(F_{\varvec{\tilde{\theta }}}^{-1}(\beta ) \le F_{\varvec{\theta }}^{-1}(\beta )\). Also, given \(F_{\varvec{\tilde{\theta }}}(\theta ) < \beta \) for all \(\theta < F_{\varvec{\theta }}^{-1}(\beta )\) we must have \(F_{\varvec{\tilde{\theta }}}^{-1}(\beta ) \ge F_{\varvec{\theta }}^{-1}(\beta )\) and so \(F_{\varvec{\tilde{\theta }}}^{-1}(\beta ) = F_{\varvec{\theta }}^{-1}(\beta )\).
In the case condition (ii) holds, we have for \(L< \theta < F_{\varvec{\theta }}^{-1}(\beta )\) that \(F_{\varvec{\tilde{\theta }}}(\theta ) = F_{\varvec{\theta }}(\theta ) < \beta \), and since distribution functions are non-decreasing this means that \(F_{\varvec{\tilde{\theta }}}(\theta ) < \beta \) for all \(\theta < F_{\varvec{\theta }}^{-1}(\beta )\). The result now follows by application of condition (i). \(\square \)
3.2 Risk regions
In order to solve these problems accurately, we need to be able to approximate well the tail risk measure of our the loss function \(f(x, \varvec{\xi })\) for all feasible decisions \(x\in {\mathcal {X}}\).
Since tail risk measures depend only on those outcomes which are in the \(\beta \)-tail, we aim to identify which outcomes lead to a loss in the \(\beta \)-tails for a feasible decision. This motivates the following definition.
Definition 4
Definition 5
The motivation for the term aggregation condition comes from Theorem 1 which follows. This result ensures that if a set satisfies the aggregation condition then we can transform the probability distribution of \(\varvec{\xi }\) so that all the mass in the complement of this set can be aggregated into a single point without affecting the value of the tail risk measure. This property is particularly relevant to scenario generation as if we have such a set, then all scenarios which it does not contain can be aggregated, reducing the size of the stochastic program. Note that the \(\beta \)-aggregation condition does not hold if \(\varvec{\xi }\) is a discrete random vector. However, in this case, the conclusion of the theorem holds without any extra conditions on \({\mathcal {R}}\).
Theorem 1
- (a)
\({\mathcal {R}}\) satisfies the \(\beta \)-aggregation condition,
- (b)
\(\varvec{\xi }\) is a discrete random vector.
Proof
It is difficult to verify that a set \({\mathcal {R}}\supseteq {\mathcal {R}}_{{\mathcal {X}}}(\beta )\) satisfies the \(\beta \)-aggregation condition by directly checking that the condition (8) holds. The following proposition gives conditions under which it holds immediately for \({\mathcal {R}}_{{\mathcal {X}}}(\beta ')\) when \(\beta ' < \beta \).
Proposition 1
Proof
For convenience, we now drop \(\beta \) from our notation and terminology. Thus, we refer to the \(\beta \)-risk region and \(\beta \)-aggregation condition as simply the risk region and aggregation condition respectively, and write \({\mathcal {R}}_{{\mathcal {X}}}(\beta )\) as \({\mathcal {R}}_{{\mathcal {X}}}\).
All sets satisfying the aggregation condition must contain the risk region, however, the aggregation condition does not necessarily hold for the risk region itself.
We must impose extra conditions on the problem to avoid some degenerate cases where the aggregation condition and the conclusion of Theorem 1 do not hold. The following example demonstrates such a degenerate case.
Example 3
The following result provides extra conditions for continuous distributions which ensure that the aggregation condition holds for the risk region \({\mathcal {R}}_{{\mathcal {X}}}\).
Proposition 2
- (i)
\(\xi \mapsto f(x,\xi )\) is continuous for all \(x\in {\mathcal {X}}\),
- (ii)For each \(x\in {\mathcal {X}}\) there exists \(x'\in {\mathcal {X}}\) such that$$\begin{aligned} {\text {int}}\left( \varXi \right) \cap {\text {int}}\left( {\mathcal {R}}_x\cap {\mathcal {R}}_{x'}\right) \ne \emptyset \text { and } {\text {int}}\left( \varXi \right) \cap {\text {int}}\left( {\mathcal {R}}_{x'}{\setminus } {\mathcal {R}}_{x}\right) \ne \emptyset , \end{aligned}$$(11)
- (iii)
\({\text {int}}\left( \varXi \right) \cap {\text {int}}\left( {\mathcal {R}}_{{\mathcal {X}}}\right) \) is connected.
Proof
The following proposition gives a condition under which the non-risk region is convex.
Proposition 3
Suppose that for each \(x\in {\mathcal {X}}\) the function \(\xi \mapsto f(x,\xi )\) is convex. Then, the non-risk region \({\mathcal {R}}_{{\mathcal {X}}}^{c}\) is convex.
Proof
For \(x\in {\mathcal {X}}\), if \(\xi \mapsto f(x,\xi )\) is convex then the set \({\mathcal {R}}_{x}^{c} = \{\xi \in \varXi : f(x,\xi ) < F_{x}^{-1}(\beta )\}\) must be convex. The intersection of convex sets is convex, hence \({\mathcal {R}}_{{\mathcal {X}}}^{c} = \bigcap _{x\in {\mathcal {X}}}{\mathcal {R}}_{x}^{c}\) is convex. \(\square \)
The random vector in the following definition plays a special role in our theory.
Definition 6
If \({\mathcal {R}}\) satisfies the aggregation condition and \({\mathbb {E}}_{} \left[ \varvec{\xi }|\varvec{\xi }\in {\mathcal {R}}^{c} \right] \in {\mathcal {R}}_{{\mathcal {X}}}^{c}\) then Theorem 1 guarantees that \({\rho _{\beta }\left( f\left( x,\psi _{{\mathcal {R}}}(\varvec{\xi })\right) \right) = \rho _{\beta }\left( f\left( x,\varvec{\xi }\right) \right) }\) for all \(x\in {\mathcal {X}}\). The latter condition holds, for example, if \(\xi \mapsto f(x, \xi )\) is convex for all \(x\in {\mathcal {X}}\), since by Proposition 3 we have that \({\mathcal {R}}^{c}_{{\mathcal {X}}}\) is convex and also \({\mathcal {R}}^{c}\subseteq {\mathcal {R}}_{{\mathcal {X}}}^{c}\). Under these conditions, as well as preserving the value of the tail risk measure, the function \(\psi _{{\mathcal {R}}}\) will also preserve the expectation for affine loss functions.
Corollary 1
Proof
4 Scenario generation
In the previous section, we showed that under mild conditions the value of a tail risk measure only depends on the distribution of outcomes in the risk region. In this section we demonstrate how this feature may be exploited for the purposes of scenario generation.
We assume throughout this section that our scenario sets are constructed from some underlying probabilistic model from which we can draw independent identically distributed samples. We also assume we have a set \({\mathcal {R}}_{{\mathcal {X}}}\subseteq {\mathcal {R}}\subset \varXi \) which satisfies the aggregation condition for the problem under consideration, and for which we can easily test membership. The set \({\mathcal {R}}\) may be an exact risk region, that is \({\mathcal {R}}={\mathcal {R}}_{{\mathcal {X}}}\), or it could a conservative risk region, that is \({\mathcal {R}}\supset {\mathcal {R}}_{{\mathcal {X}}}\). To avoid repeating cumbersome terminology, we simply refer to \({\mathcal {R}}\) as a risk region, differentiating between the conservative and exact cases only where necessary. The complement \({\mathcal {R}}^{c}\) will be referred to as the aggregation region for reasons which will become clear. Our general approach to scenario generation is to prioritize the construction of scenarios in the risk region \({\mathcal {R}}\).
In Sect. 4.1 we present and analyse a scenario generation method which we call aggregation sampling. In Sect. 4.2 we briefly discuss alternative ways of exploiting risk regions for scenario generation.
4.1 Aggregation sampling
Aggregation sampling can be thought of as equivalent to sampling from the aggregated random vector from Definition 6 for large sample sizes. Aggregation sampling is thus consistent with standard Monte Carlo sampling only if \({\mathcal {R}}\) satisfies the aggregation condition and \({{\mathbb {E}}_{} \left[ \varvec{\xi }|\varvec{\xi }\in {\mathcal {R}}^{c} \right] \in {\mathcal {R}}^{c}}\). In Sect. 5, we provide conditions under which we can prove consistency. Note that it is possible that the algorithm could terminate without sampling any scenario in the aggregation region. This could happen in cases where \({\mathbb {P}}\left( \varvec{\xi }\in {\mathcal {R}}^{c}\right) \) is very small, and the number of specified risk scenarios n is relatively small. In this case, to ensure that the algorithm terminates in a reasonable amount of time and that the scenario set which the algorithm outputs always has a consistent number of scenarios, we sample an arbitrary scenario in place of a scenario representing the aggregated scenarios. This situation is irrelevant for the asymptotic analysis of the algorithm.
4.2 Alternative approaches
Alternative sampling methods The above algorithms and analyses assume that the samples of \(\varvec{\xi }\) were identically, independently distributed. However, in principle the algorithms will work for any unbiased sequence of samples. This opens up the possibility of enhancing the scenario aggregation and reduction algorithms by using them in conjunction with variance reduction techniques such as importance sampling, or antithetic sampling [20].2 The formulae (12) and (13) will still hold, but a will be the probability of a sample occuring in the aggregation region rather than the actual probability of the aggregation region itself.
Alternative representations of the aggregation region The above algorithms can also be generalized in how they represent the non-risk region. Because aggregation sampling and aggregating reduction only represent the non-risk region with a single scenario, they do not in general preserve the overall expectation of the loss function, or any other statistics of the loss function except for the value of a tail risk measure. These algorithms should therefore generally only be used for problems which only involve tail risk measures. However, if the loss function is affine (in the sense of Corollary 1), then collapsing all points in the non-risk region to the conditional expectation preserves the overall expectation.
If expectation or any other statistic of the cost function is used in the optimization problem then one could represent the non-risk region region with many scenarios. For example, instead of aggregating all scenarios in the non-risk region into a single point we could apply a clustering algorithm to them such as k-means. The ideal allocation of points between the risk and non-risk regions will be problem dependent and is beyond the scope of this paper.
5 Consistency of aggregation sampling
The reason that aggregation sampling and aggregation reduction work is that, for large sample sizes, they are equivalent to sampling from the aggregated random vector, and if the aggregation condition holds then the aggregated random vector yields the same optimization problem as the original random vector. We only prove consistency for aggregation sampling and not aggregation reduction as the proofs are very similar. Essentially, the only difference is that aggregation sampling has the additional complication of terminating after a random number of samples.
We suppose in this section that we have a sequence of independently identically distributed (i.i.d.) random vectors \(\varvec{\xi }_{1}, \varvec{\xi }_{2}, \ldots \) with the same distribution as \(\varvec{\xi }\), and which are defined on the product probability space \(\varOmega ^{\infty }\).
5.1 Uniform convergence of empirical \(\beta \)-quantiles
Theorem 2
- (i)
For each \(x\in {\mathcal {X}}\), \(F_{x}\) is strictly increasing and continuous at \(F_{x}^{-1}(\beta )\),
- (ii)
For all \({\bar{x}}\in {\mathcal {X}}\) with probability 1 the mapping \(x\mapsto f(x,\varvec{\xi })\) is continuous at \({\bar{x}}\),
- (iii)
\({\mathcal {X}} \subset {\mathbb {R}}^k\) is compact.
The proof of this result relies on various continuity properties of the distribution and quantile functions which are provided in “Appendix A”. Some elements of the proof below have been adapted from [34, Theorem 7.48], a result which concerns the uniform convergence of expectation functions.
Proof
Corollary 2
- (iv)
\({\mathbb {E}}_{} \left[ \ \varvec{\xi }| \varvec{\xi }\in {\mathcal {R}}^c\ \right] \in {\text {int}}\left( {\mathcal {R}}_{{\mathcal {X}}}^c\right) \).
- (v)
The mapping \(x \mapsto f\left( x, {\mathbb {E}}_{} \left[ \ \varvec{\xi }| \varvec{\xi }\in {\mathcal {R}}^c\ \right] \right) \) is continuous.
Proof
Since \({\mathcal {R}}\) satisfies the aggregation condition, and condition (a) holds, by Theorem 1, we have that \({\tilde{F}}_{x}^{-1}(\beta ) = F_{x}^{-1}(\beta )\) for all \(x\in {\mathcal {X}}\). Therefore, to prove this result, we will apply Theorem 2 to \(f(x, \psi _{{\mathcal {R}}}(\varvec{\xi }))\) and so must show that conditions (i)–(iii) from Theorem 2 also hold for \(f(x, \psi _{{\mathcal {R}}}(\varvec{\xi }))\). Condition (iii) holds immediately, and condition (ii) holds for \(f(x, \psi _{{\mathcal {R}}}(\varvec{\xi }))\) since \(x\mapsto f(x,\varvec{\xi })\) is continuous with probability 1, and \(x \mapsto f\left( x, {\mathbb {E}}_{} \left[ \ \varvec{\xi }| \varvec{\xi }\in {\mathcal {R}}^c\ \right] \right) \) is continuous.
In the next subsection this result will be used to show that any point in the interior of the non-risk region \({\mathcal {R}}^{c}\) will, with probability 1, be in the non-risk region of the sampled scenario set as the sample size grows large.
5.2 Equivalence of aggregation sampling with sampling from aggregated random vector
The main obstacle in showing that aggregation sampling is equivalent to sampling from the aggregated random vector is to show that the aggregated scenario in the non-risk region converges almost surely to the conditional expectation of the non-risk region as the number of specified risk scenarios tends to infinity. Recall from Sect. 4 that N(n) denotes the effective sample size in aggregation sampling when we require n risk scenarios and is distributed as \(n+\mathcal {NB}(n, a)\) where a is the probability of the non-risk region. The purpose of the next lemma is to show that as \(n\rightarrow \infty \) the number of samples drawn from the non-risk region almost surely tends to infinity.
Lemma 2
Suppose \(M(n)\sim \mathcal {NB}(n, p)\) where \(0<p<1\). Then with probability 1 we have that \({\lim _{n\rightarrow \infty }M(n) = \infty }\).
Proof
The next Corollary shows that the strong law of large numbers still applies for the conditional expectation of the non-risk region in aggregation sampling despite the sample size being a random quantity.
Corollary 3
Proof
To show that aggregation sampling yields solutions consistent with the underlying random vector \(\varvec{\xi }\), we show that with probability 1, for n large enough, it is equivalent to sampling from the aggregated random vector \(\psi _{{\mathcal {R}}}(\varvec{\xi })\), as defined in Definition 6. If the region \({\mathcal {R}}\) satisfies the aggregation condition, and \({\mathbb {E}}_{} \left[ \varvec{\xi } | \varvec{\xi }\in {\mathcal {R}}^{c} \right] \in {\mathcal {R}}_{{\mathcal {X}}}^{c}\), Theorem 1 tells us that \(\rho _\beta \left( f(x,\psi _{{\mathcal {R}}}(\varvec{\xi }))\right) = \rho _\beta \left( f(x,\varvec{\xi })\right) \) for all \(x\in {\mathcal {X}}\). Hence, if sampling is consistent for the risk measure \(\rho _{\beta }\), then aggregation sampling is also consistent.
Theorem 3
- (vi)
For each \(x\in {\mathcal {X}}\), \(\xi \mapsto f(x, \xi )\) is continuous at \({\mathbb {E}}_{} \left[ \ \varvec{\xi }| \varvec{\xi }\in {\mathcal {R}}^c\ \right] \)
Proof
Note that although the continuity conditions (ii), (v) and (vi) look complicated, the loss function \(f : {\mathcal {X}}\times \varXi \rightarrow {\mathbb {R}}\) will typically be continuous everywhere, and so these will be satisfied automatically.
6 A conservative risk region for monotonic loss functions
In order to use risk regions for scenario generation, we need to have a characterization of the risk region which conveniently allows us to test membership. In general this is a difficult as the risk region depends on the loss function, the distribution and the problem constraints. Therefore, as a proof-of-concept, in the following two sections we derive risk regions for two classes of problems. In this section we propose a conservative risk region for problems which have monotonic loss functions.
Definition 7
(Monotonic loss function) A loss function \(f: {\mathcal {X}}\times \varXi \rightarrow {\mathbb {R}}\) is monotonic increasing if for all \(x\in {\mathcal {X}}\) and \(\xi , \tilde{\xi }\in \varXi \) such that \(\xi < \tilde{\xi }\) we have \(f(x, \xi ) < f(x, \tilde{\xi })\). Similarly, we say it is monotonic decreasing if for all \(x\in {\mathcal {X}}\) and \(\xi , \tilde{\xi }\in \varXi \) such that \(\xi <\tilde{\xi }\) we have \(f(x,\xi ) > f(x, \tilde{\xi })\).
Monotonic loss functions occur naturally in stochastic linear programming. The following result presents a class of loss functions which arise in the context of network design, and gives conditions under which they are monotonic.
Proposition 4
- 1.
\(q, u > 0\),
- 2.
\(b\ge 0\),
- 3.
\(W, B, T, V \ge 0\).
Proof
This recourse function arises in stochastic network design, and the problem formulation in the previous proposition was adapted from a model in the paper [31]. In this type of problem, we have a network consisting of suppliers, processing units, and customers, and decisions must be made relating to opening facilities and the capacities of nodes and arcs. The problem which defines the recourse function \(Q(x, \xi )\) depends on the capacity and opening decisions x of the first stage, and the demand of the customers \(\xi \). The aim of the problem is construct of flow of products y which minimize transportation costs for satisfying customers demand, plus penalties for any unsatisfied demand z.
For a problem with a monotonic loss function, the following result defines a conservative risk region.
Theorem 4
Proof
7 An exact risk region for the portfolio selection problem
In this section, we characterize exactly the risk region of the portfolio selection problem when the distribution of asset returns belongs to a certain class of distributions.
In the portfolio selection problem one aims to choose a portfolio of financial assets with uncertain returns. For \(i = 1,\ldots , d\), let \(x_{i}\) denote the amount to invest in asset i, and \(\varvec{\xi }_{i}\) the random return of asset i. The loss function in this problem is the negative total return, that is \(f(x, \varvec{\xi }) = \sum _{i=1}^d -x_i \varvec{\xi }_i = -x^T \varvec{\xi }\), and \(\varXi = {\mathbb {R}}^d\). The set \({\mathcal {X}}\subset {\mathbb {R}}^d\) of feasible portfolios may encompass constraints like no short-selling (\(x \ge 0\)), total investment (\(\sum _{i=1}^{d} x_{i} = 1\)) and quotas on certain stocks (\(x \le c\)).
The following corollary gives sufficient conditions for the risk region to satisfy the aggregation condition, and for aggregation sampling to be consistent.
Corollary 4
- 1.
\(\varvec{\xi }\) is continuous with support \({\mathbb {R}}^d\),
- 2.
There exists \(x_{1},x_{2}\in {\mathcal {X}}\) which are linearly independent,
- 3.
\(0\notin {\mathcal {X}}\),
- 4.
\({\mathcal {X}}\) is compact.
Proof
To prove that \({\mathcal {R}}\) satisfies the aggregation condition, it is enough to show that \({\mathcal {R}}_{{\mathcal {X}}}\) satisfies the aggregation condition. We prove this by showing that all the conditions of Proposition 2 hold. Note that \(x\mapsto -x^{T}\xi \) is continuous so condition (i) of Proposition 2 holds immediately.
Since \({\mathcal {R}}_{x_{1}}\) and \({\mathcal {R}}_{x_{2}}\) are non-parallel half-spaces, their union \({\mathcal {R}}_{x_{1}}\cup {\mathcal {R}}_{x_{2}}\) is connected. Similarly, for any \(x\in {\mathcal {X}}\), we must have \({\mathcal {R}}_{x}\) being non-parallel with either \({\mathcal {R}}_{x_{1}}\) or \({\mathcal {R}}_{x_{2}}\) and so \({\mathcal {R}}_{x}\cup {\mathcal {R}}_{x_{1}}\cup {\mathcal {R}}_{x_{2}}\) must also be connected. Hence, \({\mathcal {R}}_{{\mathcal {X}}} = \bigcup _{x\in {\mathcal {X}}}\left( {\mathcal {R}}_{x}\cup {\mathcal {R}}_{x_{1}}\cup {\mathcal {R}}_{x_{2}}\right) \) is connected so condition (iii) of Proposition 2 is also satisfied. Hence \({\mathcal {R}}\) satisfies the aggregation condition.
By Proposition 3\({\mathcal {R}}_{{\mathcal {X}}}^{c}\) is convex, and since \({\mathcal {R}}^{c}\subseteq {\mathcal {R}}_{{\mathcal {X}}}^{c}\) and \({\mathcal {R}}_{{\mathcal {X}}}\) is open we have \({\mathbb {E}}_{} \left[ \ \varvec{\xi }| \varvec{\xi }\in {\mathcal {R}}^c\ \right] \in {\text {int}}\left( {\mathcal {R}}_{{\mathcal {X}}}^c\right) \), and so condition (iv) of Theorem 3 holds. Finally, condition (v) of Theorem 3 holds by assumption and so aggregation sampling with the set \({\mathcal {R}}\) is consistent in sense of Theorem 3. \(\square \)
Elliptical distributions are a general class of distributions which include among others the multivariate Normal and multivariate t-distributions. See [16] for a full overview of the subject.
Definition 8
Definition 9
(Projection) Let \(C \subset {\mathbb {R}}^d\) be a closed convex set. Then for any point \(\xi \in {\mathbb {R}}^d\), we define the projection of \(\xi \) onto C to be the unique point \(p_C(\xi )\in C\) such that \(\inf _{x\in C} \left\| x-\xi \right\| = \left\| p_C(\xi ) - \xi \right\| \).
By a slight abuse of notation, for a set \({\mathcal {A}}\subset {\mathbb {R}}^d\) and a matrix \(T\in {\mathbb {R}}^{d\times d}\), we write \({T\left( {\mathcal {A}}\right) := \{ T\xi : \xi \in {\mathcal {A}}\}}\). Finally, recall that the conic hull of a set \({\mathcal {A}}\subset {\mathbb {R}}^d\), which we denote \({\text {conic}}\left( {\mathcal {A}} \right) \), is the smallest convex cone containing \({\mathcal {A}}\).
Theorem 5
Proof
8 Numerical tests
In this section, we test the performance of the methodology developed in this paper. For the portfolio selection problem, when \({\mathcal {X}}\subseteq {\mathbb {R}}_{+}^{d}{\setminus }\{0\}\) the loss function \(f(x,\xi ) = -x^{T}\xi \) is monotonic decreasing. We therefore use this problem throughout this section to test both the conservative risk region presented in Sect. 6, and the exact risk region presented in Sect. 7.
In order to test whether a point belongs to the exact non-risk region in (41) requires the projection of a point onto a convex cone. This can be done by solving a small linear complementarity problem. See [36] or our follow-up paper [15] for more details. We solve linear complementarity problems using code from the Siconos numerics library [1]. To test whether a point \(\xi \in \varXi \) belongs to the conservative risk region in (38), involves the evaluation of the probability \({\mathbb {P}}\left( \varvec{\xi }< \xi \right) \). Since calculating this probability exactly involves evaluating a multidimensional integral we approximate the probability by taking a large sample from \(\varvec{\xi }\), and using the empirical distribution function of this sample. Repeatedly testing membership of both types of risk region is therefore computationally intensive. Ways of mitigating this issue are discussed in our follow-up paper [15]. These membership tests, and the aggregation sampling algorithm have been implemented and made available as a package for the Julia programming language [14]. All experiments were conducted on a laptop with an Intel Core i7-720QM CPU at 1.6 GHz.
8.1 Probability of risk regions
As discussed in Sect. 4.1, the performance of the aggregation sampling algorithm with respect to standard Monte Carlo sampling improves as the probability of the aggregation region increases. In this first experiment we observe the behavior of this probability over a range of dimensions.
The results of this experiment are plotted in Fig. 1. In Fig. 1a, b are plotted the probabilities of the conservative and exact aggregation regions. To aid the readers’ intuition we have also plotted a reduced scenario set in two dimensions using conservative and exact risk regions in Fig. 1c, d for \(\rho =0.3\) and \(\beta =0.95\).
Probabilities of conservative and exact aggregation regions
8.2 Performance of aggregation sampling
We now test the performance of the aggregation sampling algorithm using conservative and exact risk regions against standard Monte Carlo sampling in terms of the quality of the solutions each method yields.
We are interested in the quality and stability of the solutions that are yielded by our scenario generation method as compared to standard Monte Carlo sampling for a given scenario set size. To this end, in each experiment, for a range of scenario set sizes, we construct 100 scenario sets using sampling and aggregation sampling with conservative and exact risk regions, solve the resulting problems, and calculate the optimality gaps for the solutions that these yield.
Error bar plot of optimality gaps yielded by sampling and aggregation sampling
In both cases, both aggregation sampling methods outperform standard Monte Carlo sampling for all scenario set sizes in terms of the size and variability of the calculated optimality gaps. This is because for aggregation sampling we are effectively sampling more scenarios compared with standard Monte Carlo sampling. Aggregation sampling with exact risk regions also significantly outperforms aggregation sampling with conservative risk regions. The improved performance can be expected given that its probability is greater than that of the conservative risk region which gives a greater effective sample size.
9 Conclusions
In this paper we have demonstrated that for stochastic programs which use a tail risk measure, a significant portion of the support of the random vector in the problem may not participate in the calculation of that tail risk measure, whatever feasible decision is used. As a consequence, for scenario-based problems, if we concentrate our scenarios in the region of the distribution which is important to the problem, the risk region, we can represent the uncertainty in our problem in a more parsimonious way, thus reducing the computational burden of solving it.
We have proposed and analyzed two specific methods of scenario generation using risk regions: aggregation sampling and aggregation reduction. Both of these methods were shown to be more effective, in comparison to standard Monte Carlo sampling, as the probability of the non-risk region increases: in essence the higher this probability the more redundancy there is in the original distribution. The application of our methodology relies on having a convenient characterization of a risk region. For portfolio selection problems we derived the exact risk region when returns have an elliptical distribution. However, a characterization of the exact risk region will generally not be possible. Nevertheless, it is sufficient to have a conservative risk region. For stochastic programs with monotonic loss functions, a wide problem class which includes some network design problems, we were able to derive such a region.
The effectiveness of our methodology depends on the probability of the aggregation region, that is the exact or conservative non-risk region used in our scenario generation algorithms. We observed that for both the stochastic programs with monotonic loss function and portfolio selection problems that this probability tends to zero as the dimension of the random vector in the problem increases. However, in some circumstances this effect is mitigated. We observed that small positive correlations slowed down this convergence for the portfolio selection problem.
We tested the performance of our aggregation sampling algorithm for portfolio selection problems using both the exact non-risk region and the conservative risk region for monotonic loss functions. This demonstrated a significant improvement on the performance of standard Monte Carlo sampling, particularly when an exact non-risk region was used.
The methodology has much potential. For some small to moderately-sized network design problems this methodology could yield much better solutions. In particular the methodology is agnostic to the presence of integer variables, and so could be used to solve difficult mixed integer programs.
In our follow-up paper [15] we demonstrate that our methodology may be applied to more difficult and realistic portfolio selection problems such as those involving integer variables, and for which the asset returns are no longer elliptically distributed. In the same paper we also discuss some of the technical issues involved in applying the method, such as finding the conic hull of the feasible region, and methods of projecting points onto this. We also investigate the use of artificial constraints as a way of making our methodology more effective.
Footnotes
- 1.
For simplicity of exposition we discount the event that the while loop of the algorithm terminates with \(n_{{\mathcal {R}}^{c}} = 0\) which occurs with probability \((1-a)^{n}\).
- 2.
Batch sampling methods such as stratified sampling will not work with aggregation sampling which requires samples to be drawn sequentially.
Notes
Acknowledgements
We would like to thank the reviewers and guest editor for their very thorough feedback which has allowed us to much improve this paper. Thanks also to Burak Buke and David Leslie who also gave feedback on an earlier version of the paper. Finally, we gratefully acknowledge the support of the EPSRC funded EP/H023151/1 STOR-i Centre for Doctoral Training.
References
- 1.Acary, Vincent, Pérignon, Franck: Siconos: a software platform for modeling, simulation, analysis and control of nonsmooth dynamical systems. Simul. News Eur. 17(3/4), 19–26 (2007)Google Scholar
- 2.Acerbi, C., Tasche, D.: On the coherence of expected shortfall. J. Bank. Finance 26(7), 1487–1503 (2002)CrossRefGoogle Scholar
- 3.Artzner, P., Delbaen, F., Eber, J., Heath, D.: Coherent measures of risk. Math. Finance 9(3), 203–228 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
- 4.Barrera, J., Homem-de Mello, T., Moreno, E., Pagnoncelli, B.K., Canessa, G.: Chance-constrained problems and rare events: an importance sampling approach. Math. Program. 157(1), 153–189 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
- 5.Bieniek, M.: A note on the facility location problem with stochastic demands. Omega 55, 53–60 (2015)CrossRefGoogle Scholar
- 6.Billingsley, P.: Probability and Measure, 3rd edn. Wiley, New York (1995)zbMATHGoogle Scholar
- 7.Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming. Springer, New York (1997)zbMATHGoogle Scholar
- 8.Dantzig, G.B., Glynn, P.W.: Parallel processors for planning under uncertainty. Ann. Oper. Res. 22(1), 1–21 (1990)MathSciNetzbMATHCrossRefGoogle Scholar
- 9.Doan, X.V., Li, X., Natarajan, K.: Robustness to dependency in portfolio optimization using overlapping marginals. Oper. Res. 63(6), 1468–1488 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
- 10.Dunning, I., Huchette, J., Lubin, M.: Jump: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
- 11.Dupačová, J.: Uncertainties in minimax stochastic programs. Optimization 60(10–11), 1235–1250 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
- 12.Dupačová, J., Gröwe-Kuska, N., Römisch, W.: Scenario reduction in stochastic programming: an approach using probability metrics. Math. Program. 95(3), 493–511 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
- 13.Fairbrother, J: Distributions modelling FTSE100 stock returns (2017). https://dx.doi.org/10.17635/lancaster/researchdata/158. Accessed 24 Nov 2019
- 14.Fairbrother, J: TailRiskScenGen.jl: a julia package for scenario generation for stochastic programs with tail risk measure (2017). https://github.com/STOR-i/TailRiskScenGen.jl. Accessed 24 Nov 2019
- 15.Fairbrother, J., Turner, A., Wallace, S.W.: Scenario generation for single-period portfolio selection problems with tail risk measures: coping with high dimensions and integer variables. INFORMS J. Comput. 30(3), 472–491 (2018)MathSciNetCrossRefGoogle Scholar
- 16.Fang, K.T., Kotz, S., Ng, K.W.: Symmetric Multivariate and Related Distributions (Chapman & Hall/CRC Monographs on Statistics & Applied Probability), vol. 11. Chapman and Hall, London (1989)Google Scholar
- 17.García-Bertrand, R., Mínguez, R.: Iterative scenario based reduction technique for stochastic optimization using conditional value-at-risk. Optim. Eng. 15(2), 355–380 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
- 18.Gurobi Optimization Inc. Gurobi optimizer reference manual (2016)Google Scholar
- 19.Heitsch, H., Römisch, W.: Scenario tree reduction for multistage stochastic programs. CMS 6(2), 117–133 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
- 20.Higle, J.L.: Variance reduction and objective function evaluation in stochastic linear programs. INFORMS J. Comput. 10(2), 236–247 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
- 21.Jorion, P.: Value at Risk: The New Benchmark for Controlling Market Risk. Irwin Professional, Norman (1996)Google Scholar
- 22.King, A.J., Rockafellar, R.T.: Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18(1), 148–162 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
- 23.Kozmík, V., Morton, D.P.: Evaluating policies in risk-averse multi-stage stochastic programming. Math. Program. 152(1), 275–300 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
- 24.Linderoth, J., Shapiro, A., Wright, S.: The empirical behavior of sampling methods for stochastic programming. Ann. Oper. Res. 142(1), 215–241 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
- 25.Mak, W.K., Morton, D.P., Wood, R.K.: Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24, 47–56 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
- 26.Markowitz, H.M.: Portfolio selection. J. Finance 7, 77–91 (1952)Google Scholar
- 27.Ogryczak, W., Ruszczyński, A.: Dual stochastic dominance and related mean-risk models. SIAM J. Optim. 13(1), 60–78 (2002). (electronic)MathSciNetzbMATHCrossRefGoogle Scholar
- 28.Pflug, G.C.: Scenario tree generation for multiperiod financial optimization by optimal discretization. Math. Program. 89(2), 251–271 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
- 29.Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2(3), 21–41 (2000)CrossRefGoogle Scholar
- 30.Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank. Finance 26(7), 1443–1471 (2002)CrossRefGoogle Scholar
- 31.Santoso, T., Ahmed, S., Goetschalckx, M., Shapiro, A.: A stochastic programming approach for supply chain network design under uncertainty. Eur. J. Oper. Res. 167(1), 96–115 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
- 32.Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley Series in Probability and Statistics—Applied Probability and Statistics Section Series. Wiley, Hoboken (1980)Google Scholar
- 33.Shapiro, A.: Monte Carlo sampling methods. In Ruszczyński, A., Shapiro, A. (eds.) Stochastic Programming, volume 10 of Handbooks in Operations Research and Management Science, chapter 6. Elsevier Science B.V., Amsterdam, pp. 353–425 (2003)Google Scholar
- 34.Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory, volume 9 of MPS-SIAM Series on Optimization. SIAM, Philadelphia (2009)Google Scholar
- 35.Tasche, D.: Expected shortfall and beyond. J. Bank. Finance 26(7), 1519–1533 (2002)zbMATHCrossRefGoogle Scholar
- 36.Ujvari, M.: On the projection onto a finitely generated cone. Acta Cybern. 22. https://doi.org/10.14232/actacyb.22.3.2016.7 MathSciNetzbMATHCrossRefGoogle Scholar
- 37.Wächter, A., Biegler, L.T.: On the implementation of a primal–dual interior point filter line search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
- 38.Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62(6), 1358–1376 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
- 39.Žáčková, J.: On minimax solutions of stochastic linear programming problems. Časopis pro pěstování matematiky 91(4), 423–430 (1966)MathSciNetzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.