Skip to main content

Part of the book series: UNITEXT for Physics ((UNITEXTPH))

  • 1837 Accesses

Abstract

The axiomatic definition of probability was introduced by A.N. Kolmogorov in 1933 and starts with the concepts of sample space \((\Omega )\) and space of events \((\mathcal{B}_{\Omega })\) with structure of \({\sigma }\)-algebra. When the pair \((\Omega ,\mathcal{B}_{\Omega })\) is equipped with a measure \(\mu \) we have a measure space \((E,\mathcal{B},{\mu })\) and, if the measure is a probability measure P we talk about a probability space \((\Omega ,\mathcal{B}_{\Omega },P)\). Lets discuss all these elements.

The Theory of Probabilities is basically nothing else but common sense reduced to calculus

P.S. Laplace

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Given two sets \(A,B{\subset }{\Omega }\), we shall denote by \(A^c\) the complement of A (that is, the set of all elements of \(\Omega \) that are not in A) and by \(A{\setminus }B\equiv A{\cap }B^c\) the set difference or relative complement of B in A (that is, the set of elements that are in A but not in B). It is clear that \(A^c=\Omega {\setminus }A\).

  2. 2.

    This is not completely true if the sample space is non-denumerable since there are subsets that can not be considered as events. It is however true for the subsets of \(\mathcal{R}^n\) we shall be interested in. We shall talk about that in Sect. 1.1.2.2.

  3. 3.

    Is not difficult to show the existence of Lebesgue non-measurable sets in \(\mathcal R\). One simple example is the Vitali set constructed by G. Vitali in 1905 although there are other interesting examples (Hausdorff, Banach–Tarsky) and they all assume the Axiom of Choice. In fact, the work of R.M. Solovay around the 70s shows that one can not prove the existence of Lebesgue non-measurable sets without it. However, one can not specify the choice function so one can prove their existence but can not make an explicit construction in the sense Set Theorists would like. In Probability Theory, we are interested only in Lebesgue measurable sets so those which are not have nothing to do in this business and Borel’s algebra contains only measurable sets.

  4. 4.

    The same algebra is obtained if one starts with (ab), (ab] or [ab].

  5. 5.

    It is important to note that a random variable \(X(w):\Omega {\longrightarrow }\mathcal R\) is measurable with respect to the \(\sigma \)-algebra \(\mathcal{B}_\Omega \).

  6. 6.

    In fact for the events \(A,B\,{\in }\,B_{\Omega }\) we should talk about conditional independence for it is true that if \(C\,{\in }\,B_{\Omega }\), it may happen that \(P(A,B)=P(A)P(B)\) but conditioned on C, \(P(A,B|C)\,{\ne }\,P(A|C)P(B|C)\) so A and B are related through the event C. On the other hand, that \(P(A|B)\,{\ne }\,P(A)\) does not imply that B has a “direct” effect on A. Whether this is the case or not has to be determined by reasoning on the process and/or additional evidences. Bernard Shaw said that we all should buy an umbrella because there is statistical evidence that doing so you have a higher life expectancy. And this is certainly true. However, it is more reasonable to suppose that instead of the umbrellas having any mysterious influence on our health, in London, at the beginning of the \(\mathrm{XX}\) century, if you can afford to buy an umbrella you have most likely a well-off status, healthy living conditions, access to medical care,...

  7. 7.

    Although is usually the case, the terms prior and posterior do not necessarily imply a temporal ordering.

  8. 8.

    The condition \(P(X\le x)\) is due to the requirement that F(x) be continuous on the right. This is not essential in the sense that any non-decreasing function G(x), defined on \(\mathcal R\), bounded between 0 and 1 and continuous on the left \((G(x)=\lim _{{\epsilon }{\rightarrow }0^+}G(x-{\epsilon }))\) determines a distribution function defined as F(x) for all x where G(x) is continuous and as \(F(x+{\epsilon })\) where G(x) is discontinuous. In fact, in the general theory of measure it is more common to consider continuity on the left.

  9. 9.

    Note that the representation of a real number \(r\,{\in }\,[0,1]\) as \((a_1,a_2,\ldots ): \sum _{n=1}^{\infty }a_n3^{-n}\) with \(a_i=\{0,1,2\}\) is not unique. In fact \(x=1/3\,{\in }\,Cs(0,1)\) and can be represented by \((1,0,0,0,\ldots )\) or \((0,2,2,2,\ldots )\).

  10. 10.

    It is habitual to avoid the indices and write p(x) meaning “the probability density function of the variable x” since the distinctive features are clear within the context.

  11. 11.

    Recall that for continuous random quantities \(P(X_2=x_2)=P(X_1=x_1)=0)\). One can justify this expression with kind of heuristic arguments; essentially considering \(X_1\,{\in }\,{\Lambda }_1=(-\infty ,x_1]\), \(X_2\,{\in }\,{\Delta }_{\epsilon }(x_2)=[x_2,x_2+{\epsilon }]\) and taking the limit \(\epsilon \rightarrow 0^+\) of

    $$\begin{aligned} P({X}_1\,{\le }\,x_1|{X}_2\,{\in }\,{\Delta }_{\epsilon }(x_2))= \frac{P({X}_1\,{\le }\,x_1,{X}_2\,{\in }\,{\Delta }_{\epsilon }(x_2))}{P({X}_2\,{\in }\,{\Delta }_{\epsilon }(x_2))}= \frac{F(x_1,x_2+{\epsilon })-F(x_1,x_2)}{F_2(x_2+{\epsilon })-F_2(x_2)} \end{aligned}$$

    See however [1]; Vol 2; Chap. 10, for the Radon–Nikodym density with conditional measures.

  12. 12.

    In what follows we consider the Stieltjes-Lebesgue integral so \(\int \rightarrow \sum \) for discrete random quantities and in consequence:

    $$\begin{aligned} \int _{-\infty }^{\infty }g(x)\,dP(x)= \int _{-\infty }^{\infty }g(x)\,p(x)\,dx \longrightarrow \sum _{\forall x_k} g(x_k)\,P(X=x_k). \end{aligned}$$
  13. 13.

    In the following examples, \(-{\pi }\,{\le }\,arg(z)<{\pi }\).

  14. 14.

    If the random quantities \(X_i\) are not identically distributed the idea is the same but one hast to deal with permutations and the expressions are more involved.

References

  1. V.I. Bogachev, Measure Theory (Springer, Berlin, 2006)

    Google Scholar 

  2. A. Gut, Probability: A Graduate Course, Springer Texts in Statistics (Springer, Berlin, 2013)

    Google Scholar 

  3. H.L. Lebesgue, Sur le développment de la notion d’intégrale (1926)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Maña .

Appendices

Appendices

Appendix 1: Indicator Function

This is one of the most useful functions in maths. Given subset \(A{\subset }{\Omega }\) we define the Indicator Function \({\mathbf {1}}_{A}(x)\) for all elements \(x\,{\in }\,{\Omega }\) as:

$$\begin{aligned} {\mathbf {1}}_{A}(x) = \left\{ \begin{array}{l} 1 \qquad \mathrm{if}\,\,x\,{\in }\,A \\ 0 \qquad \mathrm{if}\,\,x{\notin }A \end{array} \right. \nonumber \end{aligned}$$

Given two sets \(A,B{\subset }{\Omega }\), the following relations are obvious:

$$\begin{aligned} {\mathbf {1}}_{A{\cap }B}(x)= & {} \mathrm{min}\{{\mathbf {1}}_{A}(x), {\mathbf {1}}_{B}(x)\} = {\mathbf {1}}_{A}(x)\,{\mathbf {1}}_{B}(x) \nonumber \\ {\mathbf {1}}_{A{\cup }B}(x)= & {} \mathrm{max}\{{\mathbf {1}}_{A}(x), {\mathbf {1}}_{B}(x)\} = {\mathbf {1}}_{A}(x)+{\mathbf {1}}_{B}(x)- {\mathbf {1}}_{A}(x)\,{\mathbf {1}}_{B}(x) \nonumber \\ {\mathbf {1}}_{A^c}(x)= & {} 1 - {\mathbf {1}}_{A}(x) \nonumber \end{aligned}$$

It is also called “Characteristic Function” but in Probability Theory we reserve this name for the Fourier Transform.

Appendix 2: Lebesgue Integral and Lebesgue Measure

The Lebesgue integral extends the Riemann theory of integration and is the natural integral in the Theory of Probability. There are thousands of good references on the subject; Chap. 2, Vol. 1 of “Measure Theory” by Bogachev [1] is a recommended reading but let’s have a short and rather informal introduction for those who did not yet look into this.

Even though we use the Lebesgue integral in Probability ... Can we survive with a rough idea of what it is without entering in the mathematical formalism? Yes. The reason is that for the problems we have to deal with in experimental Particle Physics we have either probability density functions that are Riemann integrable (and, when it exists, coincides with that of Lebesgue) or we actually do Lebesgue integrals “unconsciously”. Suppose for instance that we have a probability space \(({\Omega },\mathcal{B},\mu )\) a partition \(\Omega =A_1\cup A_2\), the algebra \(\mathcal{B}=\{{\emptyset },{\Omega },A_1,A_2\}\) and a probability measure such that \(\mu (A_1)=1/3\) and \(\mu (A_2)=2/3\). What is the expected value of the random quantity \(X(A_k)=k\) (that it is measurable with respect to \(\mathcal B\))? We have that

$$\begin{aligned} E[X] = \sum _{k=1}^2\,X(A_k)\,{\mu }(A_k) = \sum _{k=1}^2\,k\,(k/3) = 5/3 \end{aligned}$$

Essentially, this is a Lebesgue integral so like Moliere’s Bourgeois Gentleman, we have been speaking prose and didn’t even know it! Let’s look at another example following the original idea of Lebesgue [3]. In fact, there are more axiomatic ways to define the Lebesgue integral but this the most intuitive of all. Suppose we want to evaluate

$$\begin{aligned} \int _{1}^{2}\ln \,x\,dx \end{aligned}$$

In principle, for a real valued function f(x) defined on [ab] the basic approach to evaluate \(\int _a^bfdx\) is that of Riemann and goes as follows. Consider a partition of the interval \([a,b]={\cup }_{k=0}^{n-1}\Delta _k\) with

$$\begin{aligned} \Delta _k=[x_k,x_{k+1})\,\,\, \mathrm{for}\,\,\, k=0,\ldots n-2;\qquad \Delta _{n-1}=[x_{n-1},x_{n}];\quad x_0=a, \quad x_n=b \end{aligned}$$

and the sequence \(\{x_k'\,{\in }\,{\Delta }_k\}_{k=0}^{n-1}\) of interior points. Note that the length of each subinterval \(\Delta _k\) is \((x_{k+1}-x_k)\). Now, define the Riemann sum

$$\begin{aligned} S_n = \sum _{k=0}^{n-1}\,f(x_k')\,(x_{k+1}-x_k) \end{aligned}$$

and the limit of \(S_n\) as the partition gets finer and finer in such a way that \(\max (x_{k+1}-x_k)\rightarrow 0\). If the limit exists, we say that the function f(x) is Riemann integrable and the limit is the (Riemann) integral. Therefore, for the posed problem:

  1. (1)

    Take a partition of the domain [1, 2] where \(x_k=1+k{\epsilon }\) with \(x_0=1\) and \(x_n=2\rightarrow {\epsilon }=1/n\);

  2. (2)

    For each subinterval \(\Delta _k\), of length \(\epsilon \), take \(x_k'=x_k\) so \(f(x_k')= \ln x_k=\ln (1+k\epsilon )\);

  3. (3)

    Evaluate the sum

    $$\begin{aligned} S_n = \sum _{k=0}^{n-1}[\ln x_k]\,\epsilon = \frac{1}{n}\, \sum _{k=0}^{n-1}\ln (1+k/n) = \frac{1}{n}\,\ln \prod _{k=0}^{n-1}(1+k/n) = \frac{1}{n}\,\ln \left\{ \frac{\Gamma (2n)}{\Gamma (n)\,n^n}\right\} \end{aligned}$$

    and take the limit \(\epsilon {\rightarrow }0^{+}\) \((n\rightarrow \infty )\). You can check that \(\lim _{n{\rightarrow }\infty }S_{n}=2\ln \,2-1\).

Consider now a measure space \((\mathcal{R},\mathcal{B},{\mu })\) and a non-negative, bounded and Borel measurable function f(x). Lebesgue’s definition of the integral rests on partitioning the range of f(x) instead of the domain. Thus, we start with a partition of \([0,\sup f]={\cup }_{k=0}^{n-1}\Delta _k\) where

$$\begin{aligned} \Delta _k=[y_k,y_{k+1})\,\,\, \mathrm{for}\,\,\, k=0,\ldots n-2;\qquad \Delta _{n-1}=[y_{n-1},y_{n}];\quad y_0=0, \quad y_n=\sup f \end{aligned}$$

Being f Borel measurable, \(\mu \{f^{-1}(\Delta _k)\}\) exists so we can evaluate the sum

$$\begin{aligned} S_n = \sum _{k=0}^{n-2}y_k\,\mu [f^{-1}(\Delta _k)] + y_{n-1}\,\mu [f^{-1}(\Delta _{n-1})] \end{aligned}$$

Again, as the partition gets finer in such a way that \(\max (y_{k+1}-y_k)\rightarrow 0\), the limit will be the Lebesgue integral provided it exists. For the problem at hand:

  1. (1)

    Take a partition of the range \([0,\ln 2]\) where \(y_k=k{\epsilon }\), \(y_0=0\) and \(y_n=\ln 2 \rightarrow {\epsilon }=n^{-1}\,\ln 2\);

  2. (2)

    For each subinterval \(\Delta _k\) determine the length of the corresponding interval on the support; that is, \(\mu (\Delta _k)=f^{-1}(y_{k+1})-f^{-1}(y_{k}) =e^{k{\epsilon }}(e^{\epsilon }-1)\)

  3. (3)

    Evaluate the sum

    $$\begin{aligned} S_n = \sum _{k=0}^{n-1}y_k\,{\mu }(\Delta _k) = {\epsilon }(e^{\epsilon }-1)\, \sum _{k=0}^{n-1}k\,e^{k\epsilon } = 2\,\mathrm{ln}\,2+ {\epsilon }e^{\epsilon }(1-e^{\epsilon })^{-1} \end{aligned}$$

    and take the limit \(\epsilon {\rightarrow } 0^{+}\) \((n{\rightarrow } \infty )\). As expected, \(\lim _{\epsilon {\rightarrow } 0^+}S_n=2\ln \,2-1\).

Partitioning the range of the function and determining the “length” (measure) of each corresponding set on the domain allows to integrate functions defined over sets for which the Riemann integral does not exist. The typical example that you have almost certainly seen is the integral over [0, 1] of the function \(f(x)={\mathbf {1}}_{\mathcal{Q}{\cap }[0,1]}(x)\). It is nowhere continuous and therefore is not Riemann integrable. but \(\mathcal{Q}\) is countable so \(\mu ({\mathcal{Q}{\cap }[0,1]})=0\) and therefore

$$\begin{aligned} \int _{[0,1]}\,{\mathbf {1}}_{\mathcal{Q}{\cap }[0,1]}\,d{\mu } = {\mu }(\mathcal{Q}{\cap }[0,1]) = 0 \nonumber \end{aligned}$$

Nevertheless, the crucial difference with respect Riemann’s integral is not the partition of the range but the possibility to perform integrals over “wilder” sets and, for us, the chance to consider arbitrary probability measures over arbitrary sets. But, for this, we have to clarify how to define the measure of a set. In general, we shall be concerned only with \(\mathcal{R}^n\) and it turns out that there is a unique measure \(\lambda \) on \(R^n\) that is invariant under translations and such that for the unit cube \(\lambda ([0,1]^n)=1\): the Lebesgue measure that assigns to an interval \([a,b]\,{\in }\,\mathcal R\) what we intuitively would guess: \(\lambda ([a,b])=(b-a)\). However, as explained in Sect. 1.1.2.2, if we want to satisfy these conditions there is a price to pay: not all subsets of \(\mathcal R\) are measurable.

Let’s finish with a more axiomatic introduction and some properties. Consider the measure space \((\Omega ,\mathcal{B},{\mu })\); eventually a probability space with \(\mu \) a probability measure. Then, for \(S{\subset }\mathcal B\) we define

$$\begin{aligned} {\mu }(S)\,\mathop {=}\limits ^{def}\,\int _S\,d{\mu } = \int _{\Omega }\, {\mathbf {1}}_{S}\,d{\mu } \end{aligned}$$

where \({\mu }(S)\) may be \(+{\infty }\) (unless it is a finite measure). Now, given a finite partition \(\{S_k;k=1,{\ldots },n\}\) of \(\Omega \) and a simple function

$$\begin{aligned} S = \sum _{k=1}^{n}\,a_k\,{\mathbf {1}}_{S_k} \qquad \mathrm{where}\qquad a_k\,{\ge }\,0\,\,\,\forall k \quad \mathrm{and}\quad {\mu }(S_k)<+{\infty } \ \ \mathrm{if}\ \ a_k\,{\ne }\,0 \end{aligned}$$

it is natural to define for a measurable set \(A{\subset }\Omega \):

$$\begin{aligned} \int _{A}\,S\,d{\mu }\,\mathop {=}\limits ^{def}\, \int _{\Omega }\,S\,{\mathbf {1}}_{A}\,d{\mu } = \sum _{k=1}^{n}\,a_k\,{\mu }(S_k{\cap }A) \nonumber \end{aligned}$$

Then:

  1. (1)

    Let f be a non-negative measurable function with respect to \(\mathcal B\) (that may take the value \(+{\infty }\)). We define:

    $$\begin{aligned} \int _{\Omega }\,f\,d{\mu }\,\mathop {=}\limits ^{def}\,\mathrm{sup}\,\left( \int _{\Omega }\,S\,d{\mu };\,\,0\,{\le }\,S{\le }f;\,\,S\,\,\mathrm{simple} \right) \nonumber \end{aligned}$$

    that, obviously, it may be \(+{\infty }\) in some cases.

  2. (2)

    Let f be a measurable function that may take negative values and denote by

    $$\begin{aligned} f^{+} = f\,{\mathbf {1}}_{(f>0)} \,\,\,\,\,\mathrm{and}\,\,\,\,\, f^{-} = -f\,{\mathbf {1}}_{(f<0)} \end{aligned}$$

    so that \(f=f^{+}-f^{-}\) and \(|f|=f^{+}+f^{-}\). Then, if

    $$\begin{aligned} \int _{\Omega }\,f^{+}\,d{\mu }< +{\infty } \qquad \mathrm{and}\qquad \int _{\Omega }\,f^{-}\,d{\mu } < +{\infty } \nonumber \end{aligned}$$

    we have that

    $$\begin{aligned} \int _{\Omega }\,|f|\,d{\mu } = \int _{\Omega }\,f^+\,d{\mu } + \int _{\Omega }\,f^-\,d{\mu }\, <\,+{\infty } \nonumber \end{aligned}$$

    and we say that the function f is Lebesgue integrable with integral

    $$\begin{aligned} \int _{\Omega }\,f\,d{\mu } = \int _{\Omega }\,f^{+}\,d{\mu } - \int _{\Omega }\,f^{-}\,d{\mu } \nonumber \end{aligned}$$

    Observe that f defined on \(\mathcal R\) is Lebesgue integrable iff it belongs to the Banach space \(L_1(\mathcal{R})\).

Some of the main properties of the Lebesgue integral are:

  1. (1)

    If f(x) and g(x) are two non-negative measurable functions such that \(f=g\) almost everywhere; that is, \({\mu }\left( \{x\,{\in }\,{\Omega }|f(x)\,{\ne }\,g(x)\} \right) = 0\) then

    $$\begin{aligned} \int \,f\,d{\mu } = \int \,g\,d{\mu } \nonumber \end{aligned}$$

    The function f(x) is integrable iff g(x) is integrable and both integrals are the same;

  2. (2)

    If f(x) and g(x) are two integrable functions then

    • if \(a,b\,{\in }\,\mathcal{R}\), it holds that \(\int \,(a\,f + b\,g)\,d{\mu } = a\,\int \,f\,d{\mu } + b\,\int \,g\,d{\mu } \)

    • if \(f\,{\le }\,g\) it holds that \(\int \,f\,d{\mu }\,{\le }\, \int \,g\,d{\mu }\)

  3. (3)

    If \(\{f_k(x)\}_{k\,{\in }\,\mathcal{N}}\) is a sequence of non-negative measurable functions such that \(f_k(x)\,{\le }\,f_{k+1}(x)\) for all \(k\,{\in }\,\mathcal{N}\) and \(x\,{\in }\,\Omega \), then

    $$\begin{aligned} {\lim }_k\,\int \,f_k\,d{\mu } = \int {\lim }_k\,f_k\,d{\mu } \nonumber \end{aligned}$$

    (The integrals can be infinite)

  4. (4)

    If \(\{f_k(x)\}_{k\,{\in }\,\mathcal{N}}\) is a sequence of functions that converge pointwise to f(x) (i.e. \(\lim _{k{\rightarrow }{\infty }}f_k(x)=f(x)\) for all x) and there exists an integrable function g such that \(|f_k|\,{\le }\,g\) for all k, then f is integrable and

    $$\begin{aligned} {\lim }_{k{\rightarrow }{\infty }}\,\int \,f_k\,d{\mu } = \int \,f\,d{\mu } \nonumber \end{aligned}$$

Appendix 3: Some properties of Radon–Nikodym derivatives

Consider the \(\sigma \)-additive measures \(\mu _1\), \(\mu _2\) and \(\mu _3\) on the measurable space \((\Omega ,\mathcal{B}_{\Omega })\). Then

figure r

We shall use some of these properties in different places; for instance, relating mathematical expectations under different probability measures or justifying some techniques used in Monte Carlo Sampling.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Maña, C. (2017). Probability. In: Probability and Statistics for Particle Physics. UNITEXT for Physics. Springer, Cham. https://doi.org/10.1007/978-3-319-55738-0_1

Download citation

Publish with us

Policies and ethics