Anyone writing a book will rarely follow a plan that was not revised several times during the process. This was definitely the case when this book was written. We have discussed many different versions before we arrived at the current format. In some of these versions mathematical terms like “convergence of functions” or “cardinality of sets” played an important role. At the end, we found a way to discuss the Brownian motion without using these terms explicitly. The obvious consequence could have been to simply drop this material.

Discussions with students and colleagues taught us that these topics can also be of use in several other areas of economics. Therefore we decided to leave the supplements in our book. The four subsequent sections can be read independently of each other. The entire chapter can be skipped for the understanding of the Brownian motion.

7.1 Cardinality of Sets

Imagine adding 0 to the set of scores on a dice as another element:

$$\displaystyle \begin{aligned} \{0, 1, 2, 3, 4, 5, 6\}. \end{aligned}$$

Obviously, this set is larger than the original set: instead of six there exist now seven elements. With this simple fact in mind, one is inclined to conclude that this idea will also be applicable in the case of infinite sets. For example, if we compare the set \(\mathbb N\) of all natural numbers with the set \(\mathbb Z\) of integers, it seems reasonable to suppose that \(\mathbb Z\) is greater than \(\mathbb N\).

However, one cannot prove whether such a proposition is correct or false by looking at the number of elements. This number is infinite in \(\mathbb Z\) as well as \(\mathbb N\), and we had already realized that infinite is not a number that can be used to perform simple arithmetic operations such as addition or comparisons. Thus, one has to create another concept if one wants to compare infinite sets. This boils down to cardinality.

If one looks at infinite sets, results dealing with finite sets seem to contradict common sense. First, one might think that the set of natural numbers is smaller than the set of integers since all negative values − 1, −2, … are missing. However, one can prove by a simple consideration that this conclusion is mistaken. Rather, it is shown that the set of integers is exactly as large as the set of natural numbers or both have “the same cardinality” which we will explain below. This underlines the fact that infinity must be handled very carefully. It is better not to rely on common sense or “intuition”!

The idea of cardinality is to employ a one-to-one relation when comparing two sets rather than counting their elements. Two sets are said to have the same cardinality (or are “equal in size”) only if there exists a one-to-one relation between all their elements.

With finite sets counting elements or using one-to-one relations lead to the same result. Figure 7.1 illustrates that the set with seven elements is greater than the set with six elements: one element from the set {0, 1, …, 6} will never find a “partner.”

Fig. 7.1
figure 1

The two finite sets {0, 1, …, 6} and {1, 2, …, 6} do not have equal cardinality

In the case of the two infinite sets, however, the outcome is surprising. This is demonstrated by the assignment in Fig. 7.2: each natural number is mapped to exactly one integer and this mapping is one-to-one. One can clearly observe that both every natural number and every integer appear exactly once. Those preferring formulas might use

$$\displaystyle \begin{aligned} f:\mathbb N \,\rightarrow\,\mathbb Z, \qquad f(n)= \begin{cases} -\frac{n}{2}, \quad \text{if }n\text{ is even}\,;\\ \frac{n+1}{2}, \quad \text{if }n\text{ is odd}\,. \end{cases} \end{aligned} $$
(7.1)

f is a function that obviously assigns an integer to each natural number n and f is also reversible in the sense that every integer in \(\mathbb Z\) is also captured.

Fig. 7.2
figure 2

The infinite sets \( \mathbb N\) and \( \mathbb Z\) have equal cardinality

The idea of cardinality will be further illustrated with another example.

Example 7.1 (Cantor’s Diagonal Argument)

The set of nonnegative rational numbers \(\mathbb Q_+\) has the same cardinality as the set of natural numbers. To show the equivalence it is necessary to prove—analogous to Fig. 7.2—that it is possible to uniquely assign all nonnegative rational numbers to natural numbers.

The rational numbers \(\mathbb Q_+\) consist of all fractions \(\frac {m}{n}\) with m and n being positive natural numbers. These rational numbers are now arranged in an infinite two-dimensional matrix as shown in Fig. 7.3.Footnote 1 The arrows shown illustrate how one may imagine the one-to-one correspondence between the natural and the rational numbers: the 1 is assigned to fraction \(\frac {1}{1}\), the 2 to fraction \(\frac {2}{1}\), the 3 to fraction \(\frac {1}{2}\), the 4 to fraction \(\frac {1}{3}\), the 5 to fraction \(\frac {2}{2}\), and so on.

Fig. 7.3
figure 3

Cantor’s diagonal argument to prove that \( \mathbb N\) and \( \mathbb Q_+\) have equal cardinality

This procedure would create a one-to-one relation if there was not an annoying blemish. The right matrix contains too many elements. The rational numbers \(\frac {1}{1}\), \(\frac {2}{2}\), \(\frac {3}{3},\) …or \(\frac {3}{17}, \frac {6}{34}, \frac {9}{51},\ldots \) are actually identical and do not represent different rational numbers at all. Therefore, they must not be assigned to different natural numbers. One has to make sure that they are accounted for only once. This is achieved by “thinning-out” the right matrix. All fractions \(\frac {m}{n}\) consisting of m, n which are not coprime are deleted. In this case the diagonal construction is only carried out for values that are coprime. The formal proof is much more complicated due to this “thinning-out” and must—if one wants to be formally precise—be conducted with complete induction. However, we will not present the details of this proof.

A set whose cardinality corresponds to the cardinality of the natural numbers is called countable. In this sense natural numbers, integers and rational numbers are countable. Countable quantities are of great importance because they can appear as indices in sums and products. An expression of the form ∑iA a i makes sense if and only if A is countable. If \(A=\mathbb N\) one can even write \(\lim _{n\to \infty }\sum _{i=1}^n a_i\) for this sum.

One could suspect that for all infinite sets it can be proven—with ingenious tricks—that they are countable. However, that is not the case and we will show for a very prominent set that it is larger than the set of natural numbers.

Example 7.2 (Uncountability)

We prove that the set of real numbers \(\mathbb R\) has a different cardinality than the set of natural numbers. That is quite simple.

To this end we assume that someone claims being able to map the set of real numbers one-to-one to the set of natural numbers. This person would be able to list all real numbers one after the other. This would constitute a sequence of all real numbers. In particular, this person can name a unique predecessor and successor for each real number. We will show that at least one real number is still missing—which is a contradiction. This proves that the set of real numbers must be larger than the set of natural numbers.

In Fig. 7.4 we present the sequence of real numbers with their (possibly infinite) decimal representation which the above person claims to be complete, i.e., containing all real numbers. Instead of the decimals 0, 1, …, 9 we use symbols a i, b i, c i, d i, … for every real number.Footnote 2

Fig. 7.4
figure 4

Cantor’s diagonal argument to prove the uncountability of real numbers

The missing number can be constructed very easily. We consider Fig. 7.4 as a matrix of numbers and focus on the diagonal (the diagonal elements are printed in red). Using the diagonal we form a new real number of the form 0. z 1 z 2 z 3 z 4…. As first decimal z 1 of this new real number, a decimal must be selected such that it does not equal a 1. The second decimal must fulfill the inequality z 2 ≠ b 2, for the third decimal the inequality z 3 ≠ c 3 must hold, and so on. The new real number formed in this way cannot match any of the numbers mentioned in our person’s supposedly complete list. With each element of our person’s list (at least) one decimal in the representation is different from our newly constructed number. We have found the missing number!

These considerations show that the set of real numbers can hardly be counted. It is said that the real numbers are uncountable. Therefore, it follows that an expression of the form \(\sum _{i\in \mathbb R}a_i\) does not represent a mathematically meaningful term: each element i in an index set must have a unique predecessor and a unique successor, a situation impossible for the real numbers \(\mathbb R\).

Example 7.2 shows that there exist infinite sets with different cardinalities. The set of real numbers \(\mathbb R\) is “larger” than \(\mathbb N\), while the sets of natural numbers is “as large” as the sets \(\mathbb Z\) and \(\mathbb Q_+\). In mathematics this is indicated by appropriate symbols. The number of natural numbers is not indicated by the rather fuzzy infinity sign but by the symbol 0.Footnote 3 Since the cardinality of the real numbers is greater than 0, the symbol 1 is used.

Concluding Remark

Finally, we would like to draw the reader’s attention to an interesting issue. We have already shown that the set of natural numbers is smaller than the set of real numbers. Instead of the set of natural numbers, one could use their power set \({\mathcal {P}}(\mathbb N)\), i.e., the set of all subsets of natural numbers. This power set contains the set of all even numbers, the set of all odd numbers, the set of all natural numbers less than 5, and so on. Without presenting the mathematical details, it can be shown that the power set has the same cardinality as the set of real numbers. On page 20 we had made it clear that for a finite set of n elements the number of subsets is just 2n. This relationship is assigned to the symbols just introduced by writing the following equation:

$$\displaystyle \begin{aligned} 2^{\aleph_0}=\aleph_1. \end{aligned} $$
(7.2)

However, this symbolic notation should not be confused with real arithmetic operations. One must not write 0 =log2( 1).

What do these considerations tell us? If mathematicians transfer as in (7.2) a symbolic notation from one subject area to another, one is tempted to use it in all its dimensions. Unfortunately, such practice cannot only be wrong but even be dangerous. We have already experienced this situation while discussing the notation of Brownian motion.

7.2 Continuous and Almost Nowhere Differentiable Functions

In order to discuss the Brownian motion thoroughly, it is useful to deal with remarkable features of functions. The paths of Brownian motion are continuous functions which one cannot differentiate at (almost) any point. Anyone wanting to handle such functions properly must recognize that the use of mathematical operations known from ordinary analysis is inadmissible. Compared to ordinary analysis dealing with Brownian paths can be considered as being “exotic.”

Non-mathematicians probably cannot imagine continuous functions that are not differentiable (almost) anywhere. We would like to assist this understanding by an example developed by Weierstraß.Footnote 4 He also showed that in mathematics such functions are anything but rare. Prior to Weierstraß these functions had been regarded as “monster curves.”Footnote 5 It was assumed that these functions were either only special cases or that the points where differentiation is not possible were indeed rare.

Weierstraß considered the function

$$\displaystyle \begin{aligned} w(x)=\sum_{n=0}^\infty \frac{\sin (3^n\, x)}{2^n}. \end{aligned} $$
(7.3)

To give an idea of the appearance of this function, Fig. 7.5 shows only the first seven summands of a Taylor series.Footnote 6 We concentrate on two characteristics of the Weierstraß function: first its continuity and second its differentiability.

Fig. 7.5
figure 5

Approximation of the Weierstraß function w(x) using the first seven summands

Non-mathematicians state that a function is continuous if one can draw its path without interrupting the movement of the drawing pen. Although this is not a precise definition one may suspect that the Weierstraß function is continuous when looking at Fig. 7.5. Even with more precision the same result applies: the numerator of each fraction is at most 1 and the denominator grows exponentially. Therefore, the sum converges for each x. Furthermore, it also converges uniformly. This means that the difference between \(\sum _{n=0}^m \frac {\sin (3^n\, x)}{2^n}\) and w(x) going to zero can be estimated independently of x. In such cases the property of continuity of the summands \(\frac {\sin (3^n\, x)}{2^n}\) also applies to the function w(x).

The above considerations do not represent a complete proof but only give an indication of the evidence: the result is intuitively appealing. Looking at the definition of the function w(x) the following observation is decisive. The numerator of each additional summand exists in the interval [−1, 1]. On the other hand, the denominator of each new summand grows exponentially. Hence, each new summand (however it may behave) contributes only marginally to the change of the function value. Therefore, continuity is maintained at the limit.

Let us turn to the second characteristic of the function w(x). Weierstraß was able to show that the function cannot be differentiated except for a few values x. While the proof is difficult, one can illustrate the result as follows: deriving the sum with respect to x one obtainsFootnote 7

$$\displaystyle \begin{aligned} \frac{dw(x)}{dx}=\lim_{N\to\infty}\sum_{n=0}^N \left(\frac{3}{2}\right)^n\cos (3^n\, x). \end{aligned} $$
(7.4)

To examine this limit in more detail we first ignore the factor \(\left (\frac {3}{2}\right )^n\) and draw several graphs of the function \(\cos {}(3^n x)\) depending on n (see Fig. 7.6).

Fig. 7.6
figure 6

Cosine functions \(\cos {}(3^1 x)\), \(\cos (3^2 x)\), and \(\cos {}(3^3 x)\)

It can easily be seen that the frequency of the cosine function increases with every exponent n. Since the increasing fluctuations are multiplied by the factor \(\left (\frac {3}{2}\right )^n\), their impact on the sum grows with n. Obviously, the sum can only converge for numbers x where the cosine function approaches zero. The zeros of these cosine functions are very thinly scattered.Footnote 8 For all other x the sum diverges to plus or minus infinity and this represents the default case. Thus, the first derivative of this function is almost everywhere either minus or plus infinity. This implies that the function cannot be differentiated anywhere.

7.3 Convergence Terms

From numerous discussions with students and colleagues we learned that there is certainly interest in looking more closely at the issue of convergence of functions. When looking at convergence of numbers it is entirely irrelevant how to define convergence precisely. Regardless of the definition of convergence of numbers, all turn out to be equivalent. However, this is entirely different when dealing with sequences of functions. There are many different ways to define convergence with each option being fundamentally different from one another. While most non-mathematicians can imagine what a sequence of numbers is, the issue of dealing with a sequence of functions is very different.

To illustrate this phenomenon we use an analogy. Finding the shortest route from Berlin to San Francisco depends on the way the earth is looked at. Using a conventional map of the world it will be concluded that the shortest route of the two cities is always south of 53 North. However, when using a globe you will find that the shortest route is in fact via Greenland. This analogy is similar to the convergence concept for functions: there are not just one but several ways of defining the convergence of a sequence of functions. The results depend on the chosen convergence definition.

Convergence is important in the context of limits. To understand the applications, it is useful to realize how proofs are conducted in the theory of Lebesgue integrationFootnote 9: if one wants to prove that a certain property or a given proposition applies in general, one can make life easier to start by proving the correctness of the proposition for linear or piecewise linear functions. In order to show the general validity, one has to move from these simple functions to more general ones. To this end one has to consider the limit of a sequence of functions. A proposition applying to each (piecewise linear or simple) element of a function sequence will also apply to the limit of this sequence and thus to a general function. It should be noted it must not matter whether one integrates first and subsequently passes to the limit or vice versa. Integration and limit must be interchangeable:

$$\displaystyle \begin{aligned} \lim_n \int_\Omega \stackrel{!}{=} \int_\Omega \lim_n. \end{aligned} $$
(7.5)

Let us look at random variables as an example of functions. For random variables expectation and variance are (Lebesgue) integrals.Footnote 10 From (7.5) it should follow

$$\displaystyle \begin{aligned} \lim_{n\to\infty} \operatorname*{\mathrm{E}} \left[Z_n\right] \stackrel{!}{=}\operatorname*{\mathrm{E}} \left[\lim_{n\to\infty} Z_n\right] \end{aligned} $$
(7.6)

and

$$\displaystyle \begin{aligned} \lim_{n\to\infty} \operatorname*{\mathrm{Var}} \left[Z_n\right] \stackrel{!}{=} \operatorname*{\mathrm{Var}} \left[\lim_{n\to\infty} Z_n\right]. \end{aligned} $$
(7.7)

Remember that Z n is a random variable and thus a measurable function.

The above claims deserve two remarks: first, there is an exclamation mark above the equal signs. We need a definition of a limit such that right and left sides are identical. It is apparent that limit and expectation or limit and variance can be swapped. Second, consider the left side of Eq. (7.5) which represents limits of sequences of numbers since expected values and variances are numbers. The right side of Eq. (7.5) does not contain a sequence of numbers but a sequence of functions. While students of economics are aware of how to determine a limit of a sequence of numbers, they may not know what a sequence of a function is let alone how to determine its limit.

Before introducing two important concepts of convergence, namely pointwise convergence and mean square convergence,Footnote 11 we will start with sequences of numbers.

Sequences of Numbers

In mathematical analysis, it is stated that a sequence of numbers converges to a limit if the numbers with a sufficiently large index will approach a particular value. For example, if you look at the sequence of numbers

$$\displaystyle \begin{aligned} s_n = a+\frac{1}{n} \qquad \text{with } n=1, 2, \ldots, \end{aligned} $$
(7.8)

we have

$$\displaystyle \begin{aligned} s_1=a+1, \quad s_2=a+\frac{1}{2}, \quad s_3=a+\frac{1}{3}, \end{aligned} $$
(7.9)

and so on. By letting n increase the second summand decreases and approaches zero.Footnote 12 For n → the summand can be neglected. Thus, the sequence converges to a which is written as

$$\displaystyle \begin{aligned} \lim_{n\to\infty}s_n=\lim_{n\to\infty}\left(a+\frac{1}{n}\right)=a. \end{aligned} $$
(7.10)

After exploring sequences of numbers we will now concentrate on sequences of functions.

Sequences of Functions

We look at the simple example

$$\displaystyle \begin{aligned} f_n(t) = a+\frac{t}{n}. \end{aligned} $$
(7.11)

With increasing n one obtains

$$\displaystyle \begin{aligned} f_1(t)=a+t, \quad f_2(t)=a+\frac{t}{2}, \quad f_3(t)=a+\frac{t}{3}, \end{aligned} $$
(7.12)

and so on. It seems clear that such a sequence of functions converges and how its limit is determined. In a sequence of numbers individual numerical values at the limit should converge to a certain value. With a sequence of functions it is quite plausible to expect that with increasing n a function “clings to a limit function.” In the above example the functions f n(t) are approaching the limit function f(t) = a. Figure 7.7 illustrates this vividly. With increasing n the influence of the term \(\frac {t}{n}\) gets less and less significant in Eq. (7.11). The limit function takes the form limn f n(t) = a.

Fig. 7.7
figure 7

What is the limit of a function sequence?

Pointwise Convergence

This definition can be regarded as a “natural” candidate based on the above example.

Definition 7.1 (Pointwise Convergence)

Consider a sequence of functions of the form \(f_n: \Omega \,\rightarrow \,\mathbb {R}\).

A sequence of functions f n converges pointwise Footnote 13 to a function f if and only if the following is validFootnote 14:

$$\displaystyle \begin{aligned} \lim_{n\to\infty}f_n(\omega)=f(\omega)\qquad \forall\omega\in\Omega\,. \end{aligned} $$
(7.13)

With this definition of convergence integration and limit can be swapped only under certain conditions.Footnote 15

We will now present an example which demonstrates that the interchangeability of integration and limit is lost if one uses pointwise convergence. The expected value of the limit does not equal the limit of expectations.

Let us consider the state space \(\Omega =\mathbb {R}\) and a function f n which is zero on the real line except in the neighborhood of \(n\in \mathbb {R}\). The area below the function should be exactly one. Figure 7.8 illustrates such a function that show a rectangle at index n. With increasing index the rectangle is moving to infinity.Footnote 16

Fig. 7.8
figure 8

An example with regard to the pointwise convergence

We look at this sequence of functions and apply the definition of pointwise convergence. Doing so we will show that the limit of this sequence is zero with the rectangle neither changing its form nor disappearing entirely. This might be surprising.

  • The functions f n converge pointwise to zero: consider a fixed value t. For t the following applies

    $$\displaystyle \begin{aligned} \lim_{n\to\infty}f_n(t)=0\,, \end{aligned} $$
    (7.14)

    because any index n will eventually be greater than t. This is why the following must hold:

    $$\displaystyle \begin{aligned} \lim_{n\to\infty}f_n(t)=0\quad \Longrightarrow \quad \int_{-\infty}^\infty \lim_{n\to\infty}f_n(t)\,dt=0. \end{aligned} $$
    (7.15)
  • On the other hand, the area under each function is 1 and therefore

    $$\displaystyle \begin{aligned} \int_{-\infty}^\infty f_n(t)\,dt=\int_{-n}^n f_n(t)\,dt=n+\frac{1}{2}-\left(n-\frac{1}{2}\right)=1, \end{aligned} $$
    (7.16)

    and therefore

    $$\displaystyle \begin{aligned} \lim_{n\to\infty} \int_{-\infty}^{\infty} f_n(t)\,dt=\lim_{n\to\infty} 1 = 1. \end{aligned} $$
    (7.17)

Equations (7.15) and (7.17) show that one must not interchange integration and limit in the sequence of functions considered here. This conclusion can be expressed as

$$\displaystyle \begin{aligned} \lim_n\int \neq \int\lim_n. \end{aligned} $$
(7.18)

For the reasons described above such a result is useless. We must therefore note that pointwise convergence is not an appropriate concept. Rather, it is advisable to find another concept of convergence which permits the interchangeability of integration and limit.

Mean Square Convergence

This concept of convergenceFootnote 17 is used to ensure that expectation (i.e., expected value and variance) and limit can be interchanged. To this end we assume a measure space \((\Omega \,, {\mathcal {F}},\, \mu )\). It is presupposed that there is a sequence of measurable functions f n.

Mean square convergence measures the difference of a function (out of the sequence) and its limit. Mean square convergence is defined that the sequence converges if both the expectation and variance of this difference go to zero. The formal definition reads as follows.

Definition 7.2 (Mean Square Convergence)

A sequence of measurable functions f n converges in mean square to a function f

$$\displaystyle \begin{aligned} \lim_{n\to\infty}f_n=f\,, \end{aligned} $$
(7.19)

if and only if

$$\displaystyle \begin{aligned} \lim_{n\to\infty}\int_\Omega \left|f_n(\omega)-f(\omega)\right|{}^2\,d\mu(\omega)=0 \end{aligned} $$
(7.20)

applies.

We will show that the mean square convergence ensures that integration and limit can be interchanged. For this we concentrate again on a probability measure, i.e., we consider random variables. We use the definition of mean square convergence and rely on the identity (5.36). Assume limn f n = f. Thus we get from (7.20)

(7.21)

Since neither of the two summands can be negative, both \(\lim _{n\to \infty } \operatorname *{\mathrm {Var}}[f_n-f]=0\) and apply. If the squared expectation is zero, \(\lim _{n\to \infty } \operatorname *{\mathrm {E}}[f_n-f]=0\) must hold. The expectation is linear, and therefore \(\lim _{n\to \infty } \operatorname *{\mathrm {E}}[f_n]= \operatorname *{\mathrm {E}}[f]\) is true. Thus \(\lim _{n\to \infty } \operatorname *{\mathrm {E}}[f_n]= \operatorname *{\mathrm {E}}[\lim _{n\to \infty }f_n]\). That was what we had to show.

7.4 Conditional Expectations Are Random Variables

Finally, we want to draw the reader’s attention to an aspect of conditional expectations that was originated by Kolmogoroff.Footnote 18 So far we have realized that a conditional expectation is a real number that refers to an event A (the condition).Footnote 19 The expectation depends on this event A. If we choose a different event, a different expectation will usually result. Therefore, Kolmogoroff has proposed that the conditional expectation should be interpreted as a random variable.Footnote 20

To understand this idea we need to remember how we had defined random variables. We wanted to perceive them as functions of elementary events. On page 15 we have shown that a random variable X can be characterized as a function

$$\displaystyle \begin{aligned} X: \Omega\,\rightarrow\, \mathbb{R} \end{aligned} $$
(7.22)

with its conditional expectation

$$\displaystyle \begin{aligned} \operatorname*{\mathrm{E}}[X| {\mathcal{F}}]: \Omega\,\rightarrow\, \mathbb{R} \end{aligned} $$
(7.23)

also being interpreted as a random variable. The following two examples will help to better understand this concept.

Example 7.3 (Binomial Model)

With Table 7.1 we refer to Example 5.6 from page 15. While the first column of this table shows the states, the second column represents the cash flows CF 3. The conditional expectation (at time t = 2) is given in the third column.

Table 7.1 States, cash flows CF 3, and conditional expectations of the cash flows in the binomial model of Example 5.6

The σ-algebra \({\mathcal {F}}_2\) corresponds to the set of information that the decision-maker assumes today he will have available at the time t = 2. On the basis of this information the decision-maker forms his expectations. In Table 7.1 we have grouped by parentheses those states that cannot be discriminated at time t = 2. Let us call the combination of two such states a “box.” At time t = 2 he only knows which box he will be in but he cannot discriminate the states within the box.

Example 7.3 demonstrates the following: if a specific elementary event ω is given, the event {ω} and other elementary events are combined into a set A (the above-mentioned “box”). The set A contains only those elementary events that the decision-maker cannot discriminate from ω on the base of his information set given. In this example he was able to observe the uu node at t = 2 but did not (yet) know whether the state uuu or uud will occur at t = 3. The conditional expected value \( \operatorname *{\mathrm {E}}[X|{\mathcal {F}}]\) assigns the actual number \( \operatorname *{\mathrm {E}}[X|A]\) to the elementary event ω. To determine the conditional expected values, the payments associated with the elementary events are weighted with their respective probabilities of occurrence.

Example 7.4 (Share Price)

To further deepen our reflections we consider a state space Ω = [0, 1]. Each real number ω ∈ [0, 1] represents an elementary event. If we choose the Lebesgue measureFootnote 21 λ with the corresponding σ-algebra, a probability space is generated since λ( Ω) = 1 holds.

Let us consider the random variable

$$\displaystyle \begin{aligned} X(\omega)=\omega^2. \end{aligned} $$
(7.24)

With the elementary event \(\omega =\frac {1}{2}\) the random variable assumes the value \(X(\omega )=\frac {1}{4}\). We present the path of this random variable in Fig. 7.9 as a dashed curve.

Fig. 7.9
figure 9

Illustration of the conditional expectation \( \operatorname *{\mathrm {E}}[X| {\mathcal {F}}]\)

Let us determine the conditional expectation for the following σ-algebra

$$\displaystyle \begin{aligned} {\mathcal{F}}=\left\{ \emptyset,\; \left\{\left[0, \frac{1}{2}\right)\right\}, \;\left\{\left[\frac{1}{2}, 1\right]\right\},\; \{[0, 1]\} \;\right\}. \end{aligned} $$
(7.25)

In this case the decision-maker cannot tell with certainty which specific elementary event ω ∈ [0, 1] is present; instead he receives only the information whether the elementary event is greater or less than \(\frac {1}{2}\).Footnote 22 This is all he knows. What is the conditional expectation of the random variable X?

Concentrating on the first subinterval we get according to (5.37)Footnote 23 a conditional expectation of

$$\displaystyle \begin{aligned} \operatorname*{\mathrm{E}}\left[X|\omega<\frac{1}{2}\right]=\frac{1}{\frac{1}{2}}\int_{0}^{\frac{1}{2}}X^2\,d\lambda(\omega)=2\left[\frac{X^3}{3}\right]_{0}^{\frac{1}{2}} \end{aligned} $$
(7.27)

and for the second subinterval

$$\displaystyle \begin{aligned} \operatorname*{\mathrm{E}}\left[X|\omega>\frac{1}{2}\right]=\frac{1}{\frac{1}{2}}\int_{\frac{1}{2}}^{1}X^2\,d\lambda(\omega)=2\left[\frac{X^3}{3}\right]_{\frac{1}{2}}^{1}\,. \end{aligned} $$
(7.28)

Thus, we can present the conditional expectation simply by

$$\displaystyle \begin{aligned} \operatorname*{\mathrm{E}}[X|{\mathcal{F}}]=\begin{cases} \frac{1}{12}, & \text{if } \omega\in \left[0,\,\frac{1}{2}\right)\,, \\ & \\ \frac{7}{12}, & \text{if } \omega \in \left[\frac{1}{2},\,1\right]. \end{cases} \end{aligned} $$
(7.29)

Figure 7.9 shows the form of the conditional expectation which is a constant function with a jump at \(\omega =\frac {1}{2}\).

As before we recognize the idea of conditional expectation. Beginning with an elementary event ω one must first determine the smallest set A which is part of the σ-algebra \({\mathcal {F}}\) and also includes ω. The conditional expectation \( \operatorname *{\mathrm {E}}[X|A]\) is calculated using Eq. (5.37) and represents the value of the random variable \( \operatorname *{\mathrm {E}}[X|{\mathcal {F}}]\) at ω.

Finally, let us present the following rules for calculating for conditional expectations.

Expected value of known quantities :

If \(X \in {\mathcal {F}}\) (it is also said that X is \({\mathcal {F}}\)-measurable), then \( \operatorname *{\mathrm {E}}[X|{\mathcal {F}}]=X\) applies.

In order to illustrate the theorem imagine having to determine the conditional expectation of an uncertain quantity X(ω). However, the situation is such that the uncertain state ω can be derived directly from the observed value of the quantity X. Thus the observed quantity is not really uncertain, a result confirming the first theorem.

Further, if Z is \({\mathcal {F}}\)-measurable and bounded, then \( \operatorname *{\mathrm {E}}[Z\cdot X|{\mathcal {F}}]=Z\cdot \operatorname *{\mathrm {E}}[X|{\mathcal {F}}]\) holds.

Linearity :

For any numbers a, b the following is true: \( \operatorname *{\mathrm {E}}[aX+bY|{\mathcal {F}}]=a \operatorname *{\mathrm {E}}[X|{\mathcal {F}}]+b \operatorname *{\mathrm {E}}[Y|{\mathcal {F}}]\,\).

Since the conditional expectation represents a generalization of the classic (unconditional) expectation, the property of linearity remains valid. That is the substance of this theorem.

Monotonicity :

If X ≥ 0, then \( \operatorname *{\mathrm {E}}[X|{\mathcal {F}}]\ge 0\) applies.

Since probabilities are nonnegative the expected value of nonnegative variables remains nonnegative. This applies to conditional expectations as well.

Limit almost everywhere :

If X n is a monotonously growing sequence of random variables which converges to X almost everywhere and if X has a finite expectation, \(\lim _{n\to \infty } \operatorname *{\mathrm {E}}[X_n|{\mathcal {F}}]= \operatorname *{\mathrm {E}}[X|{\mathcal {F}}]\) holds.

We had emphasized in Sect. 7.3 that the interchangeability of limit and expectation is of considerable importance in probability theory. This is one of the strengths of the concept of conditional expectation. Under certain conditions limit and expectation can be swapped using almost everywhere-convergence.

Iterated expectation :

If \({\mathcal {F}}\subset {\mathcal {G}}\), then \( \operatorname *{\mathrm {E}}[ \operatorname *{\mathrm {E}}[X|{\mathcal {G}}]|{\mathcal {F}}]= \operatorname *{\mathrm {E}}[X|{\mathcal {F}}]\).

If iterated conditional expectations are to be calculated the inner expectation \( \operatorname *{\mathrm {E}}[X|{\mathcal {G}}]\) can be omitted.