Encyclopedia of Social Network Analysis and Mining

Living Edition
| Editors: Reda Alhajj, Jon Rokne

Probabilistic Analysis

Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-7163-9_155-1

Keywords

Random Graph Sample Space Simple Graph Probability Mass Function Mathematical Induction
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Glossary

Asymptotically Almost Surely (a.a.s.)

The limit ℙ(E n ) → 1 as n → ∞, where {E n } denotes a sequence of events defined on a random structure (e.g., a random graph) that depends on n

Event

A subset of the sample space

$$\mathbb{G}\left( n, p\right)$$

The probability space of simple random graphs that contain n vertices and for which each of the $$\left(\begin{array}{c}\hfill n\hfill \\ {}\hfill 2\hfill \end{array}\right)$$ edges occurs with probability p ∈ [0, 1]

Independent and Identically Distributed (i.i.d)

The hypothesis that some given random variables are mutually independent, and each is described by the same probability mass function

Probability Mass Function (p.m.f.)

A function that assigns a probability to the event that a random variable assumes a given value, e.g., p X (x) = ℙ({ω ∈ Ω : X(ω) = x})

Probability Measure

(ℙ) A function that assigns a probability (a number between 0 and 1) to every event contained in ℰ

Random Variable (r.v.)

A mapping X : Ω → ℛ ⊆ ℝ that assigns a numerical value to every element within the sample space

Sample Space (Ω)

The complete set of mutually disjoint outcomes of a random experiment

Set of Events

(ℰ) A set of subsets of the sample space that is algebraically closed under both complements and countable unions

Simple Graph

An undirected graph described by a set of vertices V and a set of edges E, such that each edge connects a pair of distinct vertices, and no more than one edge connects any pair of vertices

Statistical Independence

The property that the probability of every joint event equals the product of the corresponding probabilities of the individual events

Introduction

The subject of probabilistic analysis is vast. Thus, the following article presents only a synopsis of the foundations of probability theory, discrete random variables, generating functions, branching processes, and probability inequalities. The utility of these concepts is illustrated by several examples interspersed throughout the article. The capstone example in section “Example: Random Graphs” derives the conditions for the existence of a giant component in the family of random graphs, $$\mathbb{G}\left( n, p\right)$$, analyzed by Erdős and Rényi (1960) and more recently by Janson et al. (2000). Although the proof of this theorem is rather technical, the diligent reader will discover that the frequent emergence of a single giant component in practical networks is an immediate consequence of elementary properties of probability. Furthermore, this proof illustrates how the analytic methods described in this entry are applied in a nontrivial context.

Comprehensive treatments of the theory of probability can be found in the books by Feller (1968), Grimmett and Stirzaker (2001), and Venkatesh (2013). Applications of probabilistic methods to social networks are treated by Newman (2010) and Vega-Redondo (2007).

Foundations

The theory of modern probability (Kolmogorov 1956) is founded on the theory of sets, which in turn is based on first-order logic. The fundamental precept of Boolean logic is that variables can assume one of two values: T (true) and F (false). The negation of one value results in the other; thus, ¬T = F and ¬F = T. Logical theorems, such as double negation, a =  ¬ (¬a), are proven exhaustively using truth tables:

a

¬a

a =  ¬ (¬a)

T

F

F

T

T

F

As the left and right columns in every row agree, the theorem of double negation is demonstrated.

The common binary logical operations, conjunction (Λ), disjunction (Λ), implication (⇒), and equivalence (≡), are defined by the following truth table:

a

b

a Λ b

a ∨ b

a ⇒ b

a ≡ b

T

T

F

F

T

F

T

F

T

F

F

F

T

T

T

F

T

F

T

T

T

F

F

T

Equivalence is also called the biconditional (⇔). From the above, it is evident that Λ, ∨, and ≡ are commutative (e.g., a Λ b = b Λ a) and, by substitution, are also associative (e.g., a Λ(b Λ c) = (a Λ b) Λ c). Likewise, using a truth table, the following theorems are readily verified:
$$a\wedge \left( b\vee c\right)\equiv \left( a\wedge b\right)\vee \left( a\wedge c\right),$$
(1)
$$a\vee \left( b\wedge c\right)\equiv \left( a\vee b\right)\wedge \left( a\vee c\right),$$
(2)
$$\neg \left( a\wedge b\right)\equiv \left(\neg a\right)\vee \left(\neg b\right),$$
(3)
$$\neg \left( a\vee b\right)\equiv \left(\neg a\right)\wedge \left(\neg b\right),$$
(4)
$$a\Rightarrow b\equiv \left(\neg b\right)\Rightarrow \left(\neg a\right)$$
(5)

Equations (1) and (2) are known as the distributive laws; (3) and (4) are known as DeMorgan’s laws; and (5) is the law of the contrapositive. Usually, the pairs of parentheses on the right sides of (3) through (5) are omitted as Boolean operations conventionally follow a prescribed order of precedence: negation, conjunction, disjunction, implication, and, finally, equivalence.

A set is a collection of distinct objects or elements, such as the natural numbers ℕ $$\triangleq$$ {1, 2, 3, …}, the nonnegative integers ℕ0$$\triangleq$$ {0, 1, 2, …}, the (signed) integers ℤ $$\triangleq$$ {…, −1, 0, 1, …}, and the reals ℝ. (N.B., the notation s$$\triangleq$$x specifies that the symbol s is being defined to represent the expression x). In the following, we will denote sets with uppercase letters, e.g., A, B, …. We will use the special symbol Ω to denote the universal set, that is, the set of all objects under consideration.

A predicate is a function that assigns a unique Boolean value {T, F} to each element of a set. Thus, one can define the predicate Even (x) to return T whenever x is an integer that is evenly divisible by 2, and F otherwise.

If x is a member of a set A, one writes x ∈ A, which should be interpreted as a predicate function that maps x to T if x is a member of A, and F otherwise. Likewise, the predicate x ∉ A is defined as ¬(x ∈ A), indicating that x is not a member of set A. For any set A, its cardinality, denoted by |A|, equals the number of elements it contains. Sets are often specified as the truth sets of predicates. Thus the set of even integers can be written as {x ∈ ℤ : Even(x)}.

With these notions, an algebra of sets is constructed using the previous logical operations and theorems. Thus, the complement of A, denoted by A c , is defined by
$${A}^c\triangleq \left\{ x\in \Omega :\neg \left( x\in A\right)\right\}.$$
The complement of the universal set is defined as the empty set, ∅ $$\triangleq$$ Ω c . Likewise, the intersection of two sets A and B is defined as the truth set of the conjunction of their membership predicates:
$$A\cap B\triangleq \left\{ x\in \Omega :\left( x\in A\right)\wedge \left( x\in B\right)\right\}.$$
The intersection A ∩ B is thus the set that contains all of the members that are common to A and B. The sets A and B are said to be disjoint if A ∩ B = ∅. Similarly, the union of two sets A and B is defined as the truth set of the disjunction of their membership predicates:
$$A\cup B\triangleq \left\{ x\in \Omega :\left( x\in A\right)\vee \left( x\in B\right)\right\}.$$
Set operations can be clarified visually using Venn diagrams. By convention, the universal set Ω is usually represented by the points belonging to a rectangular region (Fig. 1a). Other sets are likewise represented by the points contained in simple regions, e.g., ellipses. Fig. 1b shows how an arbitrary set A and its complement A c partition Ω into two disjoint regions. Likewise, the union and intersection of two sets are illustrated in Fig. 1c, d, respectively. Fig. 1 Venn diagrams illustrating (a) the universal set Ω (shaded green); (b) a set A (shaded in red) and its complement A c (shaded in green); (c) the union of two disjoint sets, A ∪ B (shaded in green); and (d) the intersection of two intersecting sets, A ∩ B (shaded in green)
The set expression A ⊆ B, read “A is a subset of B,” is defined by the logical expression x ∈ A ⇒ x ∈ B; in other words, every member of A is also a member of B (see Fig. 2a). With this definition, we obtain, for example, {1, 2, 3} ⊆ ℕ , Ω ⊆ Ω, and from the validity of F ⇒ ⊤, ∅ ⊆ Ω. Fig. 2 Venn diagrams of (a) A ⊆ B, (b) A\B, (c) A ∪ (B ∩ C), and (d) (A ∩ B) c
Similarly, set equality, A = B, is defined by (A ⊆ B) Λ (B ⊆ A), or equivalently, x ∈ A ⇔ x ∈ B, for all x ∈ Ω. Many Boolean theorems based on equivalence (≡) can thus be translated into corresponding theorems involving sets. For example, from x ≡ x Λ ⊤, one obtains,
$$A= A\cap \Omega .$$
(6)
From double negation, it immediately follows that A = (A c ) c . The commutativity properties of conjunction and disjunction yield A ∩ B = B ∩ A and A ∪ B = B ∪ A, respectively; and from the laws of associativity, A ∩ (B ∩ C) = (A ∩ B) ∩ C, and A ∩ (B ∩ C) = (A ∩ B) ∩ C. In addition, Eqs. (1) through (5) yield the analogous theorems:
$$A\cap \left( B\cup C\right)=\left( A\cap B\right)\cup \left( A\cap C\right),$$
(7)
$$A\cup \left( B\cap C\right)=\left( A\cup B\right)\cap \left( A\cup C\right),$$
(8)
$${\left( A\cap B\right)}^c={A}^c\cup {B}^c,$$
(9)
$${\left( A\cup B\right)}^c={A}^c\cap {B}^c,$$
(10)
$$A\subseteq B\equiv {B}^c\subseteq {A}^c.$$
(11)
Equations (9) and (10) are known as DeMorgan’s laws for sets. Using mathematical induction, Eqs. (7), (8), (9), and (10) can be extended to arbitrary numbers of sets, e.g.,
$$\begin{array}{c} A\cap \left(\underset{i=1}{\overset{n}{\cup }}{B}_i\right)=\underset{i=1}{\overset{n}{\cup }}\left( A\cap {B}_i\right), \mathrm{and}\\ {}{\left(\underset{i=1}{\overset{n}{\cap }}{A}_i\right)}^c=\underset{i=1}{\overset{n}{\cup }}{A}_i^c.\end{array}$$
It is also customary to define
$$A\backslash B\triangleq \left\{ x\in \Omega :\left( x\in A\right)\wedge \neg \left( x\in B\right)\right\}= A\cap {B}^c$$
as the set difference: the subset of set A that is not contained in B, e.g., {1, 2, 3}\{2, 4, 6} = {1, 3}. In the following, we let $$\mathcal{P}(A)$$ denote the power set of A: the set of all subsets contained in A. Specifically, $$\mathcal{P}(A)\triangleq \left\{ E: E\subseteq A\right\}$$. It is not difficult to show that $$\left|\mathcal{P}(A)\right|={2}^{\left| A\right|}$$ whenever |A| is finite. The binomial coefficient,
$$\left(\begin{array}{c}\hfill N\hfill \\ {}\hfill m\hfill \end{array}\right)=\frac{N!}{m!\left( N- m\right)!},$$
(12)
enumerates the number of ways that m objects can be selected from a set of N distinct objects and thus equals the number of subsets of size m that are contained in a finite set of cardinality N.

Example 1

The power set of A = {1, 2, 3} is given by,
$$\mathcal{P}(A)=\left\{\varnothing, \left\{1\right\},\left\{2\right\},\left\{3\right\},\left\{1,2\right\},\left\{1,3\right\},\left\{2,3\right\},\left\{1,2,3\right\}\right\}.$$

Furthermore, one readily verifies that $$\left|\mathcal{P}(A)\right|={2}^{\left|\mathrm{A}\right|}={2}^3=8$$. From (12), $$\left(\begin{array}{c}\hfill 3\hfill \\ {}\hfill 0\hfill \end{array}\right)=\left(\begin{array}{c}\hfill 3\hfill \\ {}\hfill 3\hfill \end{array}\right)=1$$, indicating that exactly one element in $$\mathcal{P}(A)$$ has cardinality 0 (∅), and exactly one has cardinality 3 (viz., {1, 2, 3}). Likewise, since $$\left(\begin{array}{c}\hfill 3\hfill \\ {}\hfill 1\hfill \end{array}\right)=\left(\begin{array}{c}\hfill 3\hfill \\ {}\hfill 2\hfill \end{array}\right)=3$$, three subsets of A are found to contain exactly one element ({1}, {2}, {3}), and another three subsets of A contain exactly two elements ({1, 2}, {1, 3}, {2, 3}).

Theory of Probability

Axioms of Probability

Most modern treatments of probability theory are based on Kolmogorov’s definition of a probability space, (Ω, ℰ, ℙ) (Kolmogorov 1956). In the following, we define each of its three components in sequence.

The Sample Space, Ω

The fundamental notion in probability is the sample space, also known as the set of elementary outcomes. Since the sample space acts as a universal set, we denote it by Ω. This set is by definition complete: it contains every possible outcome under consideration. In addition, the elements of the set are mutually exclusive, meaning that only one element within it can occur at once. For an experiment consisting of the roll of a standard six-sided die, Ω would equal the set {1, 2, 3, 4, 5, 6}. For a simple graph of n vertices and m undirected edges, where the n vertices are fixed but the m edges are selected at random, Ω would equal the set of all $$\left(\begin{array}{c}\hfill \left(\begin{array}{c}\hfill n\hfill \\ {}\hfill 2\hfill \end{array}\right)\hfill \\ {}\hfill m\hfill \end{array}\right)$$ possible configurations.

The Set of Events, ℰ

An event is defined as a combination of outcomes, or more formally as a subset of Ω. When rolling a die, a player might be interested in the event “even,” {2, 4, 6}, or “odd,” {1, 3, 5}. We define the symbol ℰ to denote the set of possible events. Thus, A ∈ ℰ implies that A ⊆ Ω. A singleton event is defined by {ω} ∈ ℰ where ω ∈ Ω. (Thus, {1}, {2},   …, {6} are the singleton events for a die.) For self-consistency, we assume that ℰ satisfies the following three axioms:
• A1. Ω ∈ ℰ,

• A2. A ∈ ℰ) A c  ∈ ℰ,

• A3. A, B ∈ ℰ) A ∪ B ∈ ℰ.

As a consequence of A1 and A2, ∅ ∈ ℰ. From A2, A3, and DeMorgan’s laws (9) and (10), it follows that joint events (i.e., intersections of events) are also events:
$$A, B\in \mathrm{\mathcal{E}}\Rightarrow A\cap B\in \mathrm{\mathcal{E}}.$$
(13)
By means of mathematical induction and the associative laws, A3 and (13) can be extended to finite unions and intersections. Thus, A i  ∈ ℰ for i = 1, 2,  … , n implies
$$\underset{i=1}{\overset{n}{\cup }}{A}_i\in \mathrm{\mathcal{E}} \mathrm{and} \underset{i=1}{\overset{n}{\cap }}{A}_i\in \mathrm{\mathcal{E}}.$$
(14)

If ℰ satisfies A1–A3 above, it is said to form an algebra. In the event that Ω is infinitely countable, i.e., if its elements can be placed in a one-to-one correspondence with the elements of ℕ, then it is desirable to adopt the additional axiom, A4. If A i  ∈ ℰ, for all i ∈ ℕ, then ∪ i A i  ∈ ℰ. Any collection of events ℰ that satisfies Axioms A1–A4 is called a σ-algebra. In the event that Ω if either finite or infinitely countable, one often chooses ℰ to be the power set of the sample space, $$\mathcal{P}\left(\Omega \right)$$.

The Probability Measure, ℙ

The third and final component of the probability space is the probability measure, denoted by ℙ, which assigns a real number, or probability, to every possible event. Thus, one writes ℙ : ℰ → ℝ to indicate that ℙ(A) is a well-defined real number for every event A ∈ ℰ. In addition, we assume that ℙ satisfies the following three postulates:
• P1. For any A ∈ ℰ, ℙ(A) > 0;

• P2. ℙ(Ω) = 1;

• P3. For any finite or infinitely countable sequence of mutually disjoint events A i  ∈ ℰ (i.e., j ≠ k - Aj 0 Aj = 0)

Postulate P3 is known as complete additivity. As a consequence of these three postulates, 0 ≤ ℙ(A) ≤ 1, for any event A ∈ ℰ. If Ω is either finite or infinitely countable, then ℰ may include all singleton events. In this case, the probability measure ℙ is uniquely defined by specifying the probabilities of the singleton events ℙ({ω}) for each ω ∈ Ω.

Together, the triple (Ω, ℰ, ℙ) is said to form a probability space. Henceforth, unless stated otherwise, we assume that the probability space (Ω, ℰ, ℙ) is well defined.

Example 2 (A Pair of Dice)

Consider an experiment in which two six-sided dice are rolled. Let
$$\Omega ={\left\{1,2,\dots, 6\right\}}^2\triangleq \left\{\left( i, j\right): i, j\in \left\{1,2,\dots, 6\right\}\right\},$$
where the exponent above expresses the Cartesian product of {1, 2, … 6} with itself. Likewise, let $$\mathrm{\mathcal{E}}=\mathcal{P},\Omega$$). Since |Ω| = 62 = 36, the number of distinct events is |ℰ| = 236 = 68 , 719 , 476 , 736. If the dice are fair, then the probabilities of the singleton events should all agree: ℙ({ω}) = 1/36 for every ω ∈ Ω. Then, as a consequence of P3,
$$\mathrm{\mathbb{P}}(A)=\sum_{\omega \in A}\mathrm{\mathbb{P}}\left(\left\{\omega \right\}\right)=\frac{\left| A\right|}{36},$$
for every A ∈ ℰ

Theorem 1

If A ∈ ℰ, then ℙ(A c ) = 1 − ℙ(A).

Proof

By A2, A c  ∈ ℰ, and thus by P1, has a well-defined probability. From the definition of complement, A ∪ A c  = Ω and A ∩ A c  = ∅. Thus, by P2 and P3, ℙ(A) + ℙ(A c ) = ℙ(Ω) = 1, from which the theorem follows.

With A = Ω and P2, one obtains ℙ(∅) = 0.

Theorem 2 (Monotonicity)

If A, B ∈ ℰ, with A ⊆ B, then ℙ(A) ≤ ℙ(B).

Proof

By A2, A c  ∈ ℰ. Equation (6) and the distributive law (7) yield
$$B= B\cap \Omega = B\cap \left( A\cup {A}^c\right)=\left( A\cap B\right)\cup \left({A}^c\cap B\right).$$
Since A ∩ B and A c  ∩ B are disjoint, by P3,
$$\mathrm{\mathbb{P}}(B)=\mathrm{\mathbb{P}}\left( A\cap B\right)+\mathrm{\mathbb{P}}\left({A}^c\cap B\right).$$
(15)
From the definition of subset, A ⊆ B ⇒ A ∩ B = A. After this substitution, Eq. (15) can be written as
$$\mathrm{\mathbb{P}}(A)=\mathrm{\mathbb{P}}(B)+\mathrm{\mathbb{P}}\left({A}^c\cap B\right)\le \mathrm{\mathbb{P}}(B).$$
(16)

The last inequality follows from ℙ(A c  ∩ B) ≥ 0, by A1.

Definition

A collection of events {E i  ∈ ℰ : i ∈ ℐ ⊆ } is said to form a partition of Ω if both of the following hold:
1. 1.

E i  ∩ E j  = ∅ whenever i ≠ j

2. 2.

i ∈ ℐ E i  = Ω.

Theorem 3 (Total Probability)

If {E i  ∈ ℰ : i ∈ ℐ} forms a partition of Ω, then for any A ∈ ℰ,
$$\mathrm{\mathbb{P}}(A)=\sum_{i\in \mathrm{\mathcal{I}}}\mathrm{\mathbb{P}}\left( A\cap {E}_i\right).$$
(17)

Proof

$$\begin{array}{cc}\hfill \mathrm{\mathbb{P}}(A)\ \hfill & \hfill =\mathrm{\mathbb{P}}\left( A\cap \Omega \right)=\mathrm{\mathbb{P}}\left( A\cap \underset{i\in \mathrm{\mathcal{I}}}{\cup }{E}_i\right),\hfill \\ {}\hfill \hfill & \hfill =\mathrm{\mathbb{P}}\left(\underset{i\in \mathrm{\mathcal{I}}}{\cup}\left( A\cap {E}_i\right)\right)=\sum_{i\in \mathrm{\mathcal{I}}}\mathrm{\mathbb{P}}\left( A\cap {E}_i\right),\hfill \end{array}$$

The last equality results from P3 as (A ∩ E i ) ∩ (A ∩ E j ) = ∅, whenever i ≠ j.

Theorem 4 (Inclusion/Exclusion)

Let A, B ∈ ℰ. Then,
$$\mathrm{\mathbb{P}}\left( A\cup B\right)=\mathrm{\mathbb{P}}(A)+\mathrm{\mathbb{P}}(B)-\mathrm{\mathbb{P}}\left( A\cap B\right).$$

Proof

Using the definition of set equality, A ∩ B can be expressed as the union of three mutually disjoint sets:
$$A\cup B=\left( A\cap {B}^c\right)\cup \left( A\cap B\right)\cup \left({A}^c\cap B\right),$$
(18)
(see Fig. 3). By P3,
$$\mathrm{\mathbb{P}}\left( A\cup B\right)=\mathrm{\mathbb{P}}\left( A\cap {B}^c\right)+\mathrm{\mathbb{P}}\left( A\cap B\right)+\mathrm{\mathbb{P}}\left({A}^c\cap B\right).$$ Fig. 3 A Venn diagram illustrating how the union A ∩ B in (18) is partitioned into three mutually disjoint subsets: A ∩ B c , A ∩ B, and A c  ∩ B From (15),
$$\mathrm{\mathbb{P}}\left({A}^c\cap B\right)=\mathrm{\mathbb{P}}(B)-\mathrm{\mathbb{P}}\left( A\cap B\right),$$
and by symmetry,
$$\mathrm{\mathbb{P}}\left( A\cap {B}^c\right)=\mathrm{\mathbb{P}}(A)-\mathrm{\mathbb{P}}\left( A\cap B\right).$$
Thus, by substitution,
$$\begin{array}{ll}\mathrm{\mathbb{P}}\left( A\cup B\right)\hfill & =\left(\mathrm{\mathbb{P}}(A)-\mathrm{\mathbb{P}}\left( A\cap B\right)\right)+\mathrm{\mathbb{P}}\left( A\cap B\right)\hfill \\ {}\ \hfill & +\left(\mathrm{\mathbb{P}}(B)-\mathrm{\mathbb{P}}\left( A\cap B\right)\right),\hfill \\ {}\ \hfill & =\mathrm{\mathbb{P}}(A)+\mathrm{\mathbb{P}}(B)-\mathrm{\mathbb{P}}\left( A\cap B\right).\hfill \end{array}$$
This principle readily generalizes to more than two events. For example, one obtains by recursion,
$$\begin{array}{l}\mathrm{\mathbb{P}}\left( A\cup B\cup C\right)=\mathrm{\mathbb{P}}(A)+\mathrm{\mathbb{P}}(B)+\mathrm{\mathbb{P}}(C)\\ {}-\mathrm{\mathbb{P}}\left( A\cap B\right)-\mathrm{\mathbb{P}}\left( A\cap C\right)\\ {}-\mathrm{\mathbb{P}}\left( B\cap C\right)+\mathrm{\mathbb{P}}\left( A\cap B\cap C\right),\end{array}$$
and, by mathematical induction,
$$\mathrm{\mathbb{P}}\left({\cup}_{i=1}^n{A}_i\right)=\sum_{i=1}^n\mathrm{\mathbb{P}}\left({A}_i\right)-\sum_{1= i< j= n}\mathrm{\mathbb{P}}\left({A}_i\cap {A}_j\right)+\sum_{1= i< j< k= n}\mathrm{\mathbb{P}}\left({A}_i\cap {A}_j\cap {A}_k\right)+\dots \dots +{\left(-1\right)}^{n+1}\mathrm{\mathbb{P}}\left({\cap}_{i=1}^n{A}_i\right).$$

The principle of inclusion/exclusion can be applied to obtain an upper bound on the probability of a finite union of events.

Theorem 5 (Boole’s Inequality)

Let A 1, A 2 ,  …  , A n  ∈ E denote a collection of events. Then,
$$\mathrm{\mathbb{P}}\left(\underset{i=1}{\overset{n}{\cup }}{A}_i\right)\le \sum_{i=1}^n\mathrm{\mathbb{P}}\left({A}_i\right).$$
(19)

Proof

The case for n = 1 istrivial. If n = 2, the inclusion/exclusion principle (Theorem 4) stipulates that
$$\mathrm{\mathbb{P}}\left({A}_1\cup {A}_2\right)=\mathrm{\mathbb{P}}\left({A}_1\right)+\mathrm{\mathbb{P}}\left({A}_2\right)-\mathrm{\mathbb{P}}\left({A}_1\cap {A}_2\right)\le \mathrm{\mathbb{P}}\left({A}_1\right)+\mathrm{\mathbb{P}}\left({A}_2\right).$$
(20)
For n > 2, we rely on mathematical induction. Assume that the hypothesis holds for A 1, A 2,  ⋯ , A k with k ≥ 2. Then,
$$\mathrm{\mathbb{P}}\left(\underset{i=1}{\overset{k+1}{\cup }}{A}_i\right)=\mathrm{\mathbb{P}}\left(\underset{i=1}{\overset{k}{\cup }}{A}_i\cup {A}_{k+1}\right),$$
(21)
$$\le \mathrm{\mathbb{P}}\left(\underset{i=1}{\overset{k}{\cup }}{A}_i\right)+\mathrm{\mathbb{P}}\left({A}_{k+1}\right),$$
(22)
$$\le \left(\sum_{i=1}^k\mathrm{\mathbb{P}}\left({A}_i\right)\right)+\mathrm{\mathbb{P}}\left({A}_{k+1}\right),$$
(23)
$$\le \sum_{i=1}^{k+1}\mathrm{\mathbb{P}}\left({A}_i\right).$$
(24)

In the above, (21) follows from the definition of the serial union; (22), from (20); (23) from the inductive hypothesis; and (24), from the definition of the summation operation.

Conditional Probability

One of the challenging tasks in probabilistic analysis is to discover and quantify dependences that might exist between events. For A, B ∈ ℰ, the conditional probability of A given B, defined by
$$\mathrm{\mathbb{P}}\left( A| B\right)=\left\{\begin{array}{cc}\hfill \frac{\mathrm{\mathbb{P}}\left( A\cap B\right)}{\mathrm{\mathbb{P}}(B)}\hfill & \hfill \mathrm{if}\;\mathrm{\mathbb{P}}(B)\ne 0,\hfill \\ {}\hfill 0,\hfill & \hfill \mathrm{if}\;\mathrm{\mathbb{P}}(B)=0,\hfill \end{array}\right.$$
plays an essential role.
Assuming for the moment that both ℙ(A) and ℙ(B) are greater than zero, one can write
$$\mathrm{\mathbb{P}}\left( A| B\right)\mathrm{\mathbb{P}}(B)=\mathrm{\mathbb{P}}\left( A\cap B\right),$$
and then by symmetry,
$$\mathrm{\mathbb{P}}\left( B| A\right)\mathrm{\mathbb{P}}(A)=\mathrm{\mathbb{P}}\left( A\cap B\right).$$

Since the left sides of the previous two equations must therefore be equal, we have derived, the following:

Theorem 6 (Bayes’s Rule)

$$\mathrm{\mathbb{P}}\left( B| A\right)=\frac{\mathrm{\mathbb{P}}\left( A- B\right)\mathrm{\mathbb{P}}(B)}{\mathrm{\mathbb{P}}(A)}.$$
(25)
A common application of the above is in the domain of statistical inference. Let {E i } for i = 1 , 2 , … denote a finite or infinitely countable partition of Ω, which may correspond to a set of mutually exclusive hypotheses. Letting B = E j , denote an arbitrary hypothesis; then, (25) becomes
$$\begin{array}{cc}\hfill \mathrm{\mathbb{P}}\left({E}_j| A\right)\hfill & \hfill =\frac{\mathrm{\mathbb{P}}\left( A|{E}_j\right)\mathrm{\mathbb{P}}\left({E}_j\right)}{\mathrm{\mathbb{P}}(A)}\hfill \\ {}\hfill \hfill & \hfill =\frac{\mathrm{\mathbb{P}}\left( A|{E}_j\right)\mathrm{\mathbb{P}}\left({E}_j\right)}{\sum_i\mathrm{\mathbb{P}}\left( A|{E}_i\right)\mathrm{\mathbb{P}}\left({E}_i\right)},\hfill \end{array}$$
(26)

where the denominator in the last fraction represents the theorem of total probability (17), followed by an application of the definition of conditional probability.

Example 3

Suppose n identical urns are each filled with n balls, such that the i th urn contains i amber balls and n − i red balls, for i = 1 , 2 ,  …  , n. An urn is selected (uniformly) at random, and from it, one ball is randomly drawn. The probability that the selected urn originally contained j amberballs (event E j ) given thatthe drawn ball was amber (event A) can be computed from (26). The first factor of the numerator, the likelihood, evaluates to
$$\mathrm{\mathbb{P}}\left( A|{E}_j\right)=\frac{j}{n}.$$
The second factor in the numerator, the prior, evaluates to
$$\mathrm{\mathbb{P}}\left({E}_j\right)=\frac{1}{n}.$$
The denominator of (26), the evidence, evaluates to
$$\mathrm{\mathbb{P}}(A)=\sum_{i=1}^n\mathrm{\mathbb{P}}\left( A-{E}_i\right)\mathrm{\mathbb{P}}\left({E}_i\right)=\sum_{i=1}^n\frac{i}{n}\cdot \frac{1}{n}=\frac{n+1}{2 n}.$$
After substituting these values into (26), we obtain the desired expression for the posterior probability:
$$\mathrm{\mathbb{P}}\left({E}_j| A\right)=\frac{P\left( A|{E}_j\right)\mathrm{\mathbb{P}}\left({E}_j\right)}{\mathrm{\mathbb{P}}(A)}=\frac{2 j}{n\left( n+1\right)}.$$

Thus, the discovery that a randomly drawn ball is amber dramatically shifts one’s degree of belief that the randomly chosen urn originally contained j amber balls, from 1/n to the nonuniform expression, $$j/\left(\begin{array}{c}\hfill n+1\hfill \\ {}\hfill 2\hfill \end{array}\right)$$, for j = 1 , 2 ,  …  , n.

Statistical Independence

Informally, we think of events as independent if they lack a causal relationship. For example, it is implausible to suggest that the outcome of a rolled die depends on its previous value in a sequence of rolls. Thus, we can assume that successive roles are independent. In terms of our probabilistic framework, two events A, B ∈ ℰ are said to be independent if the joint probability factors into the product of the probabilities of the individual events, e.g., ℙ(A ∩ B) = ℙ(A)ℙ(B). In this case, the conditional probability also simplifies to
$$\mathrm{\mathbb{P}}\left( A| B\right)=\frac{\mathrm{\mathbb{P}}\left( A\cap B\right)}{\mathrm{\mathbb{P}}(B)}=\mathrm{\mathbb{P}}(A),$$
indicating that A is neither more nor less likely to occuron account of B. More generally, the events A 1 A 2 … A n are said to be independent if every possible joint probability equals the product of the probabilities of the constituent events. That is, every equation of the form
$$\mathrm{\mathbb{P}}\left({A}_{i_1}\cap {A}_{i_2}\cap \dots \cap {A}_{i_k}\right)=\mathrm{\mathbb{P}}\left({A}_{i_1}\right)\mathrm{\mathbb{P}}\left({A}_{i_2}\right)\dots \mathrm{\mathbb{P}}\left({A}_{i_k}\right)$$
(27)
is satisfied for each subsequence (i 1, i 2,  … , i k ) of k distinct integers chosen from the sequence 1 , 2 ,  … n, for every value of k from 2 to n. Thus, for example, events A, B, and C are said to be independent if the following four equations are satisfied:
$$\begin{array}{cc}\hfill \mathrm{\mathbb{P}}\left( A\cap B\right)\hfill & \hfill =\mathrm{\mathbb{P}}(A)\mathrm{\mathbb{P}}(B),\hfill \\ {}\hfill \mathrm{\mathbb{P}}\left( B\cap C\right)\hfill & \hfill =\mathrm{\mathbb{P}}(B)\mathrm{\mathbb{P}}(C),\hfill \\ {}\hfill \mathrm{\mathbb{P}}\left( A\cap C\right)\hfill & \hfill =\mathrm{\mathbb{P}}(A)\mathrm{\mathbb{P}}(C),\hfill \\ {}\hfill \mathrm{\mathbb{P}}\left( A\cap B\cap C\right)\hfill & \hfill =\mathrm{\mathbb{P}}(A)\mathrm{\mathbb{P}}(B)\mathrm{\mathbb{P}}(C).\hfill \end{array}$$

Discrete Random Variables

A random variable (r.v.) provides a means of assigning numerical values to events defined within a given probability space (Ω ℰ ℙ). In many applications, random variables correspond to measured quantities in a random experiment that can vary from a binary feature (i.e., the existence of an edge between two particular nodes in a graph) or a more global aggregation (i.e., the total number of edges in the same graph). Formally, a random variable is a mapping from Ω to an arbitrary set of values, ℛ, called the range of X. In this case, we write X : Ω → ℛ. An r.v. is said to be discrete if ℛ is finite or countably infinite and continuous if ℛ has the power of the continuum, e.g., the real numbers in the unit interval [0, 1]. In this entry, we shall consider only discrete random variables, in which ℛ is either a finite or countably infinite subset of the reals.

Probability Mass Functions

The behavior of an r.v. X : Ω → ℛ is described by its probability mass function (p.m.f.), pX(x) = ℙ[X = x], where
$$\left[ X= x\right]\triangleq \left\{\omega \in \Omega : X\left(\omega \right)= x\right\}$$
represents the set of all outcomes in Ω that are mapped by X to the value x ∈ ℛ. This bold application of the probability measure ℙ to the set [X = x] ⊆ Ω reflects an implicit assumption that [X = x] ∈ ℰ. Moreover, since X is a function, x ≠ x  ⇒ [X = x] ∩ [X = x ] = ∅. Thus, the collection of sets {[X = x] : x ∈ ℛ}} defines a partition of Ω. Postulates P 1 through P 3 can then be invoked to show the following:
$$x\in \mathrm{\mathcal{R}}\Rightarrow {p}_X(x)\ge 0.$$
(28)
$$\sum_{x\in \mathrm{\mathcal{R}}}{p}_X(x)=1.$$
(29)

Equation (29) is called the normalization condition of the p.m.f.

Example 4 (The Sum of Two Fair Dice)

Continuing the experiment introduced in Example 2, let X : Ω → ℛ equal the sum of the two values obtained after each roll. Thus, ℛ = {2 3 … 12}. Note in this case that |ℛ| = 11, different from both |Ω| and |ℰ|. In the event that the dice are fair, we find
$${p}_X(x)=\frac{\left|\left[ X= x\right]\right|}{36}=\frac{6-\left| x-7\right|}{36},$$
For x ∈ {2 3 … 12}, which is plotted in Fig. 4. Fig. 4 The probability mass function for the sum of two fair, six-sided dice, pX(x), given in Example 4

Functions of Random Variables

On occasion when a new random variable Y is defined as a deterministic function of an existing random variable X, the following theorem yields the p.m.f. of Y. In the following, let ϕ : ℝ → ℝ denote an arbitrary function. For the r.v. X : Ω → ℛ, we define ϕ(ℛ) ≜ {ϕ(x) : x ∈ ℛ}, and ϕ −1(y) ≜ {x ∈ ℛ : ϕ(x) = y}.

Theorem 7

Let X: X : Ω → ℛ denote an r.v. with p.m.f., pX, and let Y = ϕ(X). Then, Y : Ω → ϕ(ℛ), such that
$${p}_Y(y)=\mathrm{\mathbb{P}}\left(\left[ Y= y\right]\right)=\sum_{x\in {\phi}^{-1}(y)}{p}_X(x).$$
(30)

Proof

Validating the range of Y follows immediately from function composition. The p.m.f. of Y follows from its definition and P3:
$$\begin{array}{cc}\hfill {p}_Y(y)\hfill & \hfill =\mathrm{\mathbb{P}}\left[ Y= y\right],\hfill \\ {}\hfill \hfill & \hfill =\mathrm{\mathbb{P}}\left(\left\{\omega \in \Omega : Y\left(\omega \right)= y\right\}\right),\hfill \\ {}\hfill \hfill & \hfill =\mathrm{\mathbb{P}}\left(\left\{\omega \in \Omega :\phi\;\left( X\left(\omega \right)\right)= y\right\}\right),\hfill \\ {}\hfill \hfill & \hfill =\mathrm{\mathbb{P}}\left(\underset{x\in {\phi}^{-1}(y)}{\cup}\left\{\omega \in \Omega : X\left(\omega \right)= x\right\}\right),\hfill \\ {}\hfill \hfill & \hfill =\sum_{x\in {\phi}^{-1}(y)} pX(x).\hfill \end{array}$$

Expectations and Higher Moments

The behavior of an r.v. X : Ω → ℛ can be summarized by its expectation, or mean value, defined by
$$\mathbb{E}(X)=\sum_{x\in \mathrm{\mathcal{R}}} x\;{p}_X(x),$$
(31)

where pX is the probability mass function of X. This expression is also called the first moment of X.

In the event that a new random variable Y is defined as a function of an existing r.v., X, the following is useful.

Theorem 8

Let X : Ω → ℛ denote a random variable with p.m.f., pX, and let Y = ϕ(X) for an arbitrary function ϕ : ℝ → ℝ. Then,
$$\mathbb{E}\left[ Y\right]=\mathbb{E}\left[\phi (X)\right]=\sum_{x\in \mathrm{\mathcal{R}}}\phi (x){p}_X(x).$$

Proof

By (30) and (31),
$$\mathbb{E}\left[ Y\right]=\sum_{y\in \phi \left(\mathrm{\mathcal{R}}\right)}{ y p}_Y(y)=\sum_{y\in \phi \left(\mathrm{\mathcal{R}}\right)} y\left(\sum_{y\in {\phi}^{-1}(y)}{p}_X(x)\right),=\sum_{y\in \phi \left(\mathrm{\mathcal{R}}\right)}\left(\sum_{x\in {\phi}^{-1}(y)}{ y p}_X(x)\right),=\sum_{y\in \phi \left(\mathrm{\mathcal{R}}\right)} y\left(\sum_{x\in {\phi}^{-1}(y)}\phi (x){p}_X(x)\right),=\sum_{x\in \mathrm{\mathcal{R}}}\phi (x){p}_X(x).$$

Example 5 (kth Moment of X)

Letting ϕ(x) = x k : for k ∈ , Theorem 8 implies
$$\mathbb{E}\left({X}^k\right)=\sum_{x\in \mathrm{\mathcal{R}}}{x}^k{p}_X(x).$$

The expectation satisfies the properties of a linear operator. Explicitly,

Theorem 9

Let a, b ∈ ℝ denote two constant values and X : Ω → ℛ. Then,
$$\mathbb{E}\left( aX+ b\right)= a\mathbb{E}(X)+ b.$$

Proof

Using Theorem 8 with ϕ(x) ax + b,
$$\begin{array}{cc}\hfill \mathbb{E}\left( aX+ b\right)\hfill & \hfill =\sum_{x\in \mathrm{\mathcal{R}}}\left( ax+ b\right){p}_X(x),\hfill \\ {}\hfill \hfill & \hfill = a\sum_{x\in \mathrm{\mathcal{R}}} x\;{p}_X(x)+ b\sum_{x\in \mathrm{\mathcal{R}}}{p}_X(x),\hfill \\ {}\hfill \hfill & \hfill = a\mathbb{E}(X)+ b,\hfill \end{array}$$

where (29) was used to evaluate the final summation.

The variance, or second central moment, is defined by
$$\mathrm{Var}(X)\triangleq \mathbb{E}\left({\left( X-\mathbb{E}(X)\right)}^2\right).$$
(32)
The variance quantifies the average square deviation of the r.v. X from its mean value. It is usually easier to compute the variance in terms of the first and second moments. Using Theorem 8,
$$\begin{array}{cc}\hfill \mathrm{Var}(X)\hfill & \hfill =\mathbb{E}\left({X}^2-2 X\mathbb{E}(X)+\mathbb{E}{(X)}^2\right),\hfill \\ {}\hfill \hfill & \hfill =\mathbb{E}\left({X}^2\right)-{\left(\mathbb{E}(X)\right)}^2.\hfill \end{array}$$
(33)

In analogy with Theorem 9, one can simplify the variance of a linearly scaled random variable.

Theorem 10

Let a, b ∈ ℝ denote two constant values and X : Ω → ℛ. Then,
$$\mathrm{Var}\left( aX+ b\right)={a}^2\mathrm{Var}(X).$$

Proof

Using (33),
$$\begin{array}{cc}\hfill \mathrm{Var}\left( a X+ b\right)\hfill & \hfill =\mathbb{E}\left({\left( a X+ b\right)}^2\right)-{\left(\mathbb{E}\left( a X+ b\right)\right)}^2\hfill \\ {}\hfill \hfill & \hfill =\mathbb{E}\left({a}^2{X}^2+2 abX+{b}^2\right)-{\left( a\mathbb{E}(X)+ b\right)}^2,\hfill \end{array}$$
where Theorem 9 was applied in the last step. Consequently, by the linearity of expectation,
$$\mathrm{Var}\left( aX+ b\right)={a}^2\mathbb{E}\left({X}^2\right)+2 ab\mathbb{E}(X)+{b}^2-{a}^2{\left(\mathbb{E}(X)\right)}^2-2 ab\mathbb{E}(X)-{b}^2={a}^2\mathbb{E}\left({X}^2\right)-{a}^2{\left(\mathbb{E}(X)\right)}^2={a}^2\mathrm{Var}(X).$$

Generating Functions

The generating function of a discrete random variable X : Ω → ℕ0, with p.m.f. p X , is defined by the power series
$${g}_X(s)=\mathrm{E}\left({s}^X\right)=\sum_{i=0}^{\infty }{p}_X(i){s}^i.$$
(34)
In the event that the range of X is finite, the series will terminate. Otherwise, the expansion parameter s is restricted to the domain of convergence of the power series, which by (29) is guaranteed to include s ∈ [0, 1]. Explicitly, for 0 ≤ s ≤ 1,
$$\begin{array}{cc}\hfill 0\le {g}_X(s)\hfill & \hfill =\sum_{i=0}^{\infty }{p}_X(i){s}^i\le \sum_{i=0}^{\infty }{p}_X(i)\hfill \\ {}\hfill \hfill & \hfill ={g}_X(1)=1.\hfill \end{array}$$
(35)
Generating functions provide an alternate way to compute the moments of an r.v. and with favorable efficiency in the event that the series expansion in (34) corresponds to a closed-form analytic function. (Generating functions are also useful for modeling composite stochastic processes, including the branching processes described in section “Branching Processes.”) Explicitly, $$\mathbb{E}(X)$$ can be computed from the first derivative of (34):
$${g}_X^{\prime }(s)=\sum_{i=0}^{\infty } i\;{p}_X(i){s}^{i-1}.$$
(36)
Thus,
$$\mathbb{E}(X)=\sum_{i=0}^{\infty } i\;{p}_X(i)={g}^{\prime }(1).$$
(37)
The second moment of X is obtained from the second derivative of (34):
$${g}_X^{{\prime\prime} }(s)=\sum_{i=0}^{\infty } i\left( i-1\right){p}_X(i){s}^{i-2}.$$
(38)
Letting s = 1,
$$\begin{array}{cc}\hfill {g}_X^{{\prime\prime} }(1)\hfill & \hfill =\sum_{i=0}^{\infty } i\left( i-1\right){p}_X(i),\hfill \\ {}\hfill \hfill & \hfill =\mathbb{E}\left( X\left( X-1\right)\right)=\mathbb{E}\left({X}^2\right)-\mathbb{E}(X).\hfill \end{array}$$
By which, and
$$\mathbb{E}\left({X}^2\right)={g}_X^{{\prime\prime} }(1)+{g}_X^{\prime }(1).$$
Likewise, the variance of X can be expressed as,
$$\mathrm{Var}(X)={g}^{{\prime\prime} }(1)+{g}^{\prime }(1)-{\left({g}^{\prime }(1)\right)}^2.$$
(39)

Higher moments can be likewise obtained. Applications of (37) and (39) appear below.

Wilf describes a generating function as “a clothesline on which we hang up a sequence of numbers for display” (Wilf 2006). In these applications, the numbers on display are the values of the p.m.f. For s = 0,
$${g}_X(0)=\sum_{i=0}^{\infty }{p}_X(i)\;{0}^i={p}_X(0).$$
(40)
Letting g (i)(s) denote the i th derivative of g(s)
$${p}_X(i)=\frac{1}{i!}\underset{s\to 0}{ \lim }{g}^{(i)}(s),$$

for i = 0 , 1 , 2 ,  … . Thus, the entire p.m.f. of X can be recovered directly from the generating.

The generating function for a uniform distribution corresponds to a truncated geometric series.

Univariate Models

The following (discrete) probability distributions are frequently used in the probabilistic analysis of social networks.

Uniform RandomVariables

An r.v. X : Ω → (1; 2,  … , n) is called a uniform r.v., written X~ Uniform. (n), if its p.m.f. assumes the form
$${p}_X(x)=\frac{1}{n}.$$
In this case,
$$\mathrm{E}(X)=\sum_{i=1}^n i\cdot \frac{1}{n}=\frac{n+1}{2},$$
and
$$\mathrm{E}\left({X}^2\right)=\sum_{i=1}^n{i}^2\cdot \frac{1}{n}=\frac{\left( n+1\right)\left(2 n+1\right)}{6}.$$
Whence, by (33)
$$\mathrm{Var}(X)=\frac{n^2-1}{2}.$$
The generating function for a uniform distribution corresponds to a truncated geometric series:
$${g}_X(s)=\frac{1}{n}\sum_{i=1}^n{s}^i=\frac{s\left(1-{s}^n\right)}{n\left(1- s\right)}.$$

(The validity of the last equality can be shown by mathematical induction.)

Example 6

A uniform random variable models the outcome of a fair n -sided die, where each facet is labeled in sequence 1; 2 ,  …  , n. For the common cubical die, n = 6, and thus,
$$\mathrm{E}(X)=\frac{7}{2}, \mathrm{and} \mathrm{Var}(X)=\frac{35}{2}.$$

Bernoulli Random Variables

An r.v. X : Ω → {0, 1} is called a Bernoulli r.v., written X∼ Bernoulli. (p) if its p.m.f. assumes the form,
$${p}_X(x)={p}^x{\left(1- p\right)}^{1- x}=\left\{\begin{array}{cc}\hfill 1- p,\hfill & \hfill \mathrm{if}\; x=0,\hfill \\ {}\hfill p,\hfill & \hfill \mathrm{if}\; x=1,\hfill \end{array}\right.$$
for some fixed value of p ∈ [0, 1]. This p.m.f. equals the probability of obtaining tails (x = 0) or heads (x = 1), from a single coin toss using a coin that lands heads with probability p ∈ [0, 1]. By (34), the corresponding generating function is g X  . (s)(1 – p) + ps. Using (37),
$$\mathbb{E}(X)={g}_X^{\prime }(1)= p.$$
Likewise, using (39),
$$\mathrm{Var}(X)={g}_X^{{\prime\prime} }(1)+{g}_X^{\prime }(1){\left({g}_X^{\prime }(1)\right)}^2= p\left(1- p\right).$$
(41)
A graph of this expression is plotted in Fig. 5. Note that the variance of a Bernoulli r.v. never exceeds 1/4, and that it vanishes if either p = 0 or p = 1, the two parameter values that enforce a certain outcome. Fig. 5 A graph of the variance of a Bernoulli random variable, Var(X) = p(1 − p)/(41), as a function of p

Example 7 (Bernoulli Trials)

One frequently encounters situations where a random experiment with two possible outcomes (e.g., success or failure) is repeated sequentially. In the event that the outcome of each trial is unaffected by the others, statistical independence can be assumed. Furthermore, each trial might be modeled by an r.v., X k ∼ Bernoulli(p), for k = 1 , 2 , 3 ,  … , where the event X k  = 1 indicates “success,” and X k  = 0, “failure.” By (27), it follows that the probability of observing a particular sequence of n outcomes, X 1 = x 1 , X 2 = x 2  ,  … ,= x n , where x k  ∈ {0, 1} for k = 1 , 2 ,  …  , n, is given by
$$\begin{array}{cc}\hfill \mathrm{\mathbb{P}}\Big({X}_1\hfill & \hfill ={x}_1,{X}_2={x}_2,\dots, {X}_n={x}_n\Big)\hfill \\ {}\hfill \hfill & \hfill ={p}^z{\left(1- p\right)}^{n- z},\hfill \end{array}$$
(42)
where $$z={\sum}_{k=1}^n{x}_k$$ represents the number of trials in the sequence with a successful outcome. Thus, if a certain trial results in success (S) with probability p = 1/3 (and thus failure (F) with probability 1 − p = 2/3), the probability of observing the independent sequence SSFSF equals
$$\mathrm{\mathbb{P}}(SSFSF)={\left(\frac{1}{3}\right)}^3{\left(\frac{2}{3}\right)}^2=\frac{4}{243}\approx 0.0165.$$

Binomial Random Variables

An r.v., X : Ω → {0, 1,  … , n} is called a binomial r.v., written X~ Binomial (n, p), if its p.m.f. assumes the form,
$${p}_X(x)=\left(\begin{array}{c}\hfill n\hfill \\ {}\hfill x\hfill \end{array}\right){p}^x{\left(1- p\right)}^{n- x}$$
(43)
for x = 0 , 1 ,  …  , n and for fixed p ∈ [0, 1] (see Fig. 6). This p.m.f. evaluates to the probability of obtaining exactly x heads from a sequence of n independent coin tosses using a coin that lands heads with probability p. Indeed, Eq. (43) is obtained directly from (42) after replacing z by x (the new independent variable) and multiplying by the binomial coefficient, $$\left(\begin{array}{c}\hfill n\hfill \\ {}\hfill x\hfill \end{array}\right),$$ which enumerates the number of ways of constructing a sequence of n coin tosses with exactly x heads (see (12)). Fig. 6 A bar graph illustrating the binomial p.m.f. (43) for n = 10, with three different values of p : 0.3 (in red), 0.5 (in green), and 0.8 (in blue). In this instance, x ∈ {0, 1, 2,  … , 10}
By the binomial theorem,
$${\left( a+ b\right)}^n=\sum_{i=0}^n\left(\begin{array}{c}\hfill n\hfill \\ {}\hfill i\hfill \end{array}\right){a}^i{b}^{n-1},$$
the generating function corresponding to (43) evaluates to
$${g}_X(s)=\sum_{i=0}^n\left(\begin{array}{c}\hfill n\hfill \\ {}\hfill i\hfill \end{array}\right){(ps)}^i{\left(1- p\right)}^{n- i}={\left(1- p+ ps\right)}^n.$$
(44)
Consequently, by (37),
$$\mathbb{E}(X)={g}_X^{\prime }(1)= np,$$
(45)
and by (39)
$$\mathrm{Var}(X)= n\left( n-1\right){p}^2+ np-{\left( n p\right)}^2= n p\left(1- p\right).$$

Geometric Random Variables

An r.v. X : Ω → ℕ is called a geometric r.v., written X∼ Geometric(p), if its p.m.f. satisfies
$${p}_X(x)={\left(1- p\right)}^{x-1} p$$
(46)
for x ∈ ℕ and for fixed p ∈ (0, 1] (see Fig. 7). Note that 0 is excluded from the parameter range of p. This r.v. describes the number of successive trials required to observe the next head in an arbitrarily long sequence of coin tosses, using a coin that lands heads with probability p and tails with probability q = 1 − p. Since 0 is excluded from the range of X, we omit the initial term in (34). The generating function then evaluates to
$${g}_X(s)= ps\sum_{i=1}^{\infty }{(qs)}^{i-1}=\frac{ps}{1- qs},$$ Fig. 7 A bar graph illustrating the geometric p.m.f. (46) for three different values of p : 0.1 (in red), 0.2 (in green), and 0.4 (in blue). Only the values x ∈ {1, 2,  … , 10} are shown
where the summation formula for the geometric series,
$$\sum_{i=0}^{\infty }{\xi}^i=\frac{1}{1-\xi},$$
with |ξ| < 1, is employed. Using (37) and (39),
$$\mathbb{E}\begin{array}{ccc}\hfill (X)=\frac{1}{p}\hfill & \hfill \mathrm{and}\hfill & \hfill \mathrm{Var}(X)=\frac{1- p}{p^2}\hfill \end{array}.$$
Likewise, the summation formula for the truncated geometric series,
$$\sum_{i=0}^n{\xi}^i=\frac{1-{\xi}^{n+1}}{1-\xi}$$
(verified by mathematical induction), yields the cumulative distribution function for X~ Geometric(p):
$$\begin{array}{cc}\hfill {\mathrm{\mathbb{P}}}_X\left[ X\le x\right]\hfill & \hfill =\sum_{i=1}^x\mathrm{\mathbb{P}}\left[ X= i\right]\hfill \\ {}\hfill \hfill & \hfill =\sum_{i=1}^x{\left(1- p\right)}^{i-1} p\hfill \\ {}\hfill \hfill & \hfill = p\sum_{i=0}^{x-1}{\left(1- p\right)}^i\hfill \\ {}\hfill \hfill & \hfill =1-{\left(1- p\right)}^x.\hfill \end{array}$$
(47)

Example 8

A pair of fair, six-sided dice is rolled repeatedly until double-sixes appear. The number of trials required is thus described by a geometric random variable X∼ Geometric(1/36). The expected number of rolls is $$\mathbb{E}(X)=1/ p=36$$, and the variance is Var(X) = 35 × 36 = 1260. Using (47), it follows that ℙ(X ≤ 25) ≈ 0.505532.

Poisson Random Variables

An r.v. X : Ω → ℕ0 is called a Poisson r.v., written X∼ Poisson(λ), if its p.m.f. satisfies
$${p}_X(x)=\frac{\lambda^x{e}^{-\lambda}}{x!}$$
(48)
for a fixed λ ∈ [0, +∞) (see Fig. 8). This distribution is associated with the frequency of rare events during an infinitely long sequence of independent and identical trials. The generating function associated with (48) evaluates to
$${g}_X(s)={e}^{-\lambda}\sum_{i=0}^{\infty}\frac{{\left(\lambda s\right)}^i}{i!}={e}^{\lambda \left( s-1\right)}.$$
(49) Fig. 8 A bar graph illustrating the Poisson p.m.f. (48) for three different values of λ : 1 (in red), 2 (in green), and 4 (in blue). Only the values x ∈ {0; 1; 2,  … , 10} are shown
Then, by (37) and (39),
$$\begin{array}{ccc}\hfill \mathrm{E}(X)=\lambda \hfill & \hfill \mathrm{and}\hfill & \hfill \mathrm{Var}\hfill \end{array}(X)=\lambda .$$

Remark 1

Equation (48) is obtained from the binomial p.m.f. (43) in the double limit, n → 0 and p → 0, with np → λ. Explicitly, letting p = λ/n in (43) yields
$$\left(\begin{array}{c}\hfill n\hfill \\ {}\hfill x\hfill \end{array}\right){p}^x{\left(1- p\right)}^{n- x}=\frac{(n)_x}{x!}{\left(\frac{\lambda}{n}\right)}^x{\left(1-\frac{\lambda}{n}\right)}^{n- x}.$$
In the above,
$${(n)}_x= n\left( n-1\right)\left( n-2\right)\dots \left( n- x+1\right)$$
denotes the x th factorial of n. Note that as n →  ∞ , (n) x  ~ n x ; likewise, in the rightmost exponent, n − x ~ n. Using L’Hôpital’s rule, one obtains
$$\underset{n\to \infty }{ \lim}\left(\begin{array}{c}\hfill n\hfill \\ {}\hfill x\hfill \end{array}\right){p}^x{\left(1- p\right)}^{n- x}=\underset{n}{ \lim}\to infty\frac{n^x}{x!}\frac{\lambda^x}{n^x}{\left(1-\frac{\lambda}{n}\right)}^x=\frac{\lambda^x}{x!}{e}^{-\lambda}.$$
It is actually much easier to derive this limit using generating functions. Starting with the binomial g.f. (44), let p = p(n) = λ/n. Then, taking the limit n → ∞, one obtains
$$\begin{array}{cc}\hfill \underset{n\to \infty }{ \lim }{\left(1+ p(n)\left( s-1\right)\right)}^n\hfill & \hfill =\underset{n\to \infty }{ \lim }{\left(1+\lambda \frac{s-1}{n}\right)}^n,\hfill \\ {}\hfill \hfill & \hfill ={e}^{\lambda \left( s-1\right)},\hfill \end{array}$$

the Poisson g.f. (49).

Scale-Free Random Variables

An r.v. X : Ω → ℕ is said to follow a power law if its p.m.f. assumes the form
$${p}_X(x)={cx}^{-\gamma},$$
(50)
with γ ∈ ℝ , γ > 1 (see Fig. 9). Here, c is chosen to satisfy the normalization requirement, Eq. (29). Consequently, c = 1/ζ(γ) where
$$\zeta \left(\gamma \right)=\sum_{j=1}^{\infty }{j}^{-\gamma}$$ Fig. 9 A bar graph illustrating the power law p.m.f. (50) for three different values of γ: 1.5 (in red), 2 (in green), and 4 (in blue). Only the values x ∈ {1; 2,  … , 9} are shown
is Riemann’s zeta function. The lower bound on γ is enforced to ensure that ζ(γ) is finite. With this p.m.f. X is frequently referred to as a scale-free r.v. as the ratio
$$\frac{p_X(nj)}{p_X(j)}={n}^{-\gamma}$$
is independent of j for any positive integer n. If γ > 2, then the expectation of X is defined, as
$$\mathbb{E}(X)= c\sum_{i=1}^{\infty }{i}^{-\left(\gamma -1\right)}=\frac{\zeta \left(\gamma -1\right)}{\zeta \left(\gamma \right)}.$$
If γ > 3, then $$\mathbb{E}\left({X}^2\right)$$ and the variance of X are defined, and by a similar derivation,
$$\mathrm{Var}(X)=\frac{\zeta \left(\gamma \right)\zeta \left(\gamma -2\right)-\zeta {\left(\gamma -1\right)}^2}{\zeta {\left(\gamma \right)}^2}.$$

Multivariate Models

Most practical problems are described in terms of multiple random variables defined on the same probability space (Ω , ℰ , ℙ), e.g., X j  : Ω → ℛ j , for j = 1 , 2 ,  …  , n. The above can be expressed in terms of a random vector that maps Ω into the Cartesian products of the ranges, e.g., X : Ω → ℛ1 ×  ⋯  × ℛ n . As in the previous discussion, we assume that each of the ranges ℛ j is countable. Letting x = (x 1,  … , x n ) ∈ ℛ1 x … x n , the joint probability mass function is defined by
$${p}_X\left(\mathbf{x}\right)={p}_{X_1,{X}_2,\dots, {X}_n}\left({x}_1,{x}_2,\dots, {x}_n\right),=\mathrm{\mathbb{P}}\left(\left\{\omega :\mathbf{X}\left(\omega \right)=\mathbf{x}\right\}\right),=\mathrm{\mathbb{P}}\left(\left\{\omega :\left({X}_1\left(\omega \right)={x}_1\right)\wedge \cdots \wedge \left({X}_n\left(\omega \right)={x}_n\right)\right\}\right),=\mathrm{\mathbb{P}}\left(\left[{X}_1={x}_1\right]\cap \cdots \cap \left[{X}_n={x}_n\right]\right).$$
By virtue of (14), the probability of the joint event is defined if each of the events
$$\left[{X}_j={x}_j\right]=\left\{\omega \in \Omega :\left(\omega \right)={x}_j\right\}$$

belongs to ℰ for j = 1 , 2 ,  …  , n, for all x under consideration. It is useful to extend Theorem 9 to such multivariate models.

Theorem 11

For j = 1 ,  …  , n, let X j  : Ω → ℛ j , subject to a joint p.m.f. p x( x ). Letting c j ∈ ℝ for j = 1 ,  …  , n, denote arbitrary constants, it follows that
$$\begin{array}{cc}\hfill \mathbb{E}\Big({c}_1\hfill & \hfill +{c}_2{X}_2+\cdots +{c}_n{X}_n\Big)={c}_1\mathbb{E}\left({X}_1\right)\hfill \\ {}\hfill \hfill & \hfill +{c}_2\mathbb{E}\left({X}_2\right)+\cdots {c}_n\mathbb{E}\left({x}_n\right).\hfill \end{array}$$
(51)

Proof

First note that (51) is trivially satisfied by Theorem 9 if n = 1. For n = 2, we obtain,
$$\mathbb{E}\left({c}_1{X}_1+{c}_2{X}_2\right)=\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}\left({c}_1{x}_1+{c}_2{x}_2\right){p}_{\mathrm{x}}\left({x}_1,{x}_2\right),={c}_1\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}{x}_1\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}{p}_{\mathrm{x}}\left({x}_1,{x}_2\right)+{c}_2\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}{x}_2\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}{p}_{\mathrm{x}}\left({x}_1,{x}_2\right).$$
We now examine the inner sum in the first term:
$$\begin{array}{cc}\hfill \sum_{x_2\in \mathrm{\mathcal{R}}}{p}_{\mathrm{x}}\left({x}_1,{x}_2\right)\hfill & \hfill =\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}\mathrm{\mathbb{P}}\left(\left[{X}_1={x}_1\right]\cap \left[{X}_2={x}_2\right]\right),\hfill \\ {}\hfill \hfill & \hfill =\mathrm{\mathbb{P}}\left(\underset{x_2\in {\mathrm{\mathcal{R}}}_2}{\cup}\left[{X}_1={x}_1\right]\cap \left[{X}_2={X}_2\right]\right),\hfill \\ {}\hfill \hfill & \hfill =\mathrm{\mathbb{P}}\left(\left[{X}_1={x}_1\right]\cap \underset{x_2\in {\mathrm{\mathcal{R}}}_2}{\cup}\left[{X}_2={x}_2\right]\right),\hfill \\ {}\hfill \hfill & \hfill =\mathrm{\mathbb{P}}\left(\left[{X}_1={x}_1\right]\cap \Omega \right),\hfill \\ {}\hfill \hfill & \hfill =\mathrm{\mathbb{P}}\left(\left[{X}_1={x}_1\right]\right),\hfill \\ {}\hfill \hfill & \hfill ={p}_{X_1}\left({x}_1\right).\hfill \end{array}$$
Likewise, one can obtain
$$\sum_{x_1\in {\mathrm{\mathcal{R}}}_1} p\mathbf{x}\left({x}_1,{x}_2\right)={p}_{X_2}\left({x}_2\right).$$
Whence,
$$\mathbb{E}\left({c}_1{X}_1+{c}_2{X}_2\right)={c}_1\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}{x}_1{p}_{X_1}\left({x}_1\right)+{c}_2\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}{x}_2{p}_{X_2}\left({x}_2\right),={c}_1\mathbb{E}\left({X}_1\right)+{c}_2\mathbb{E}\left({X}_2\right),$$

establishing the theorem for n = 2. The cases for n > 2 are demonstrated by mathematical induction in a manner similar to the proof of Theorem 5.

In the above derivation, the theorem of total probability was effectively applied in reverse. In essence, Eq. (17) implies
$${p}_{X_1}\left({x}_1\right)=\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}\cdots \sum_{x_n\in {\mathrm{\mathcal{R}}}_n} p\mathbf{x}\left({x}_1,{x}_2,\cdots, {x}_n\right),$$
(52)

with similar equations for p Xi (x i ) for i = 2 , 3 ,  …  , n. Equation (52) and its companions define the marginal p.m.f.s of the individual r.v.s.

Conditional Probability Mass Functions

Using the definition of conditional probability, one can define a p.m.f. for a subset of random variables, given that the values of a complementary subset of variables are fixed. For example, with X i  : Ω → ℛ i for i = 1 , 2, one defines the conditional p.m.f. of X 2 given the event [X 1 = X 1] by
$$\begin{array}{cc}\hfill {p}_{X_2\Big|{X}_1}\left({x}_2|{x}_1\right)\hfill & \hfill =\mathrm{\mathbb{P}}\left(\left[{X}_2={x}_2\right]|\left[{X}_1={x}_1\right]\right),\hfill \\ {}\hfill \hfill & \hfill \frac{=\mathrm{\mathbb{P}}\left(\left[{X}_1={x}_1\right]\cap \left[{X}_2={x}_2\right]\right),}{\mathrm{\mathbb{P}}\left(\left[{X}_1={x}_1\right]\right)}\hfill \\ {}\hfill \hfill & \hfill =\frac{p_{X_1,{X}_2}\left({x}_1,{x}_2\right)}{p_{X_1}\left({x}_1\right)},\hfill \end{array}$$
(53)
where the denominator, the marginal p.m.f. of X 1 (defined in (52)), is assumed to be nonzero. Using (53), the joint p.m.f. can be expressed as product of a conditional and marginal p.m.f.:
$${p}_{X_1,{X}_2}\left({x}_1,{x}_2\right)={p}_{X_2\Big|{X}_1}\left({x}_2|{x}_1\right){p}_{X_1}\left({x}_1\right).$$
(54)
Likewise, one can define conditional p.m.f.s involving larger collections of random variables as ratios of the appropriate joint and marginal p.m.f.s, e.g.,
$$\begin{array}{l}{p}_{X_j,\dots, {X}_n\Big|{X}_1,\dots, {X}_{j-1}\;}\left({x}_j,\dots, {x}_n|{x}_1,\dots, {x}_{j-1}\right)\\ {}=\frac{p_{X_1,\dots, {X}_{j-1},{X}_j,\dots, {X}_n\;}\left({x}_1,\dots, {x}_{j-1},{x}_j,\dots, {x}_n\right)}{p_{X_1,\dots, {X}_{j-1}\;}\left({x}_1,\dots, {x}_{j-1}\right)}.\end{array}$$
(55)
Using recursion, the above formula yields
$$\begin{array}{l}{p}_{X_1\cdots {X}_n}\left({x}_1,\dots, {x}_n\right)=\\ {}{p}_{X_n\Big|{X}_1\dots {X}_{n-1}}\left({x}_n|{x}_1,\dots, {x}_{n-1}\right)\times \\ {}{p}_{X_{n-1}\Big|{X}_1\dots {X}_{n-2}}\left({x}_{n-1}|{x}_1,\dots, {x}_{n-2}\right)\times \dots \\ {}\dots \times {p}_{X_2\Big|{X}_1}\left({x}_2|{x}_1\right){p}_{X_1}\left({x}_1\right).\end{array}$$

Markov Chains

The random variables form a (first-order) Markov chain if each conditional probability above simplifies
$$\begin{array}{l}{p}_{X_j\Big|{X}_1\dots {X}_{j-1}}\left({x}_j|{x}_1,\dots, {x}_{j-1}\right)\\ {} ={p}_{X_j\Big|{X}_{j-1}}\left({x}_j|{x}_{j-1}\right),\end{array}$$
for j = 2 , 3 ,  …  , n. In this event, the original joint probability distribution factors as
$$\begin{array}{l}{p}_{X_1\cdots {X}_n}\left({x}_1,\dots, {x}_n\right)={p}_{X_n\Big|{X}_{n-1}}\left({x}_n|{x}_{n-1}\right)\\ {} \times {p}_{X_{n-1}\Big|{X}_{n-2}}\left({x}_{n-1}|{x}_{n-2}\right)\times \dots \\ {}\dots \times {p}_{X_2\Big|{X}_1}\left({x}_2|{x}_1\right){p}_{X_1}\left({x}_1\right).\end{array}$$

Conditional Expectations

With X 1 = x 1 fixed, the conditional p.m.f. p(x 2| x 1) defined in (53) satisfies (28) and (29) and thus represents a valid p.m.f. This enables one to define the conditional expectation of X 2 given X 1 as
$$\mathbb{E}\left({X}_2|\left[{X}_1={x}_1\right]\right)=\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}{x}_2\;{p}_{X_2\Big|{X}_1}\left({x}_2|{x}_1\right).$$
(56)
Using (55), one can create corresponding definitions for $$\mathbb{E}\left({X}_j,\dots, {X}_n|\left[{X}_1={x}_1\right]\cap \cdots \cap \left[{X}_{j-1}={X}_{j-1}\right]\right)$$, for various values of 1 < j < n, etc. Here, we note one particularly useful theorem. In analogy with (54), an expectation of a function of several r.v.s can be composed. Letting ϕ : ℝ2 → ℝ denote an arbitrary function, one obtains
$$\mathbb{E}\left(\phi \left({X}_1,{X}_2\right)\right)=\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}\phi \left({x}_1,{x}_2\right){p}_{X_1,{X}_2}\left({x}_1,{x}_2\right),=\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}\phi \left({x}_1,{x}_2\right){p}_{X_2\Big|{X}_1}\left({x}_2|{x}_1\right){p}_{X_1}\left({x}_1\right),=\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}\mathbb{E}\left(\phi \left({X}_1,{X}_2\right)|\left[{X}_1={x}_1\right]\right){p}_{X_1}\left({x}_1\right),=\mathbb{E}\left(\mathbb{E}?(\phi\;\left({X}_1,{X}_2\right)|{X}_1\Big)\right).$$

Note that different $$\mathbb{E}$$ operations appearing in the above are taken with respect to different probability mass functions.

Definition

The r.v.s, X j for j = 1 , 2 ,  …  , n, are said to be independent, if the events [X j  = X j ] ∈ \$, for j = 1 , 2 ,  …  , n, are independent in accordance with (27).

Thus, the joint p.m.f. of independent r.v.s can be factored into a product of the marginal probabilities. By (27),
$${p}_{\mathbf{X}}\left(\mathbf{x}\right)={p}_{X_1,{X}_2,\dots, {X}_n}\left({x}_1,{x}_2,\dots, {x}_n\right),=\mathrm{\mathbb{P}}\left(\left[{X}_1={x}_1\right]\cap \left[{X}_2={x}_2\right]\cap \dots \cap \left[{X}_n={x}_n\right]\right),=\mathrm{\mathbb{P}}\left(\left[{X}_1={x}_1\right]\right)\mathrm{\mathbb{P}}\left(\left[{X}_2={x}_2\right]\right)\dots \mathrm{\mathbb{P}}\left(\left[{X}_n={x}_n\right]\right)={p}_{X_1}\left({x}_1\right){p}_{X_2}\left({x}_2\right)\dots {p}_{X_n}\left({x}_n\right).$$
Likewise, the expectation of a product of independent r.v.s yields in the product of the individual expectations. For example, if X i  : Ω → ℝ i for i = 1 2 are independent, then
$$\begin{array}{cc}\hfill \mathbb{E}\left({X}_1{X}_2\right)\hfill & \hfill =\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}{x}_1{x}_2{p}_{X_1,{X}_2}\left({x}_1,{x}_2\right),\hfill \\ {}\hfill \hfill & \hfill =\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}{x}_1,{x}_2{p}_{X_1}\left({x}_1\right){p}_{X_2}\left({x}_2\right),\hfill \\ {}\hfill \hfill & \hfill =\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}{x}_1{p}_{X_1}\left({x}_1\right)\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}{x}_2\;{p}_{X_2}\left({x}_2\right),\hfill \\ {}\hfill \hfill & \hfill =\mathbb{E}\left({X}_1\right)\mathbb{E}\left({X}_2\right).\hfill \end{array}$$
As a consequence of the above, if X 1 … X n are independent, then for arbitrary real constants c 1 … c n ,
$$\begin{array}{cc}\hfill \mathrm{Var}\Big({c}_1{X}_1\hfill & \hfill +{c}_2{X}_2+\dots +{c}_n{X}_n\Big)={c}_1^2\;\mathrm{Var}\left({X}_1\right)\hfill \\ {}\hfill \hfill & \hfill +{c}_2^2\;\mathrm{Var}\left({X}_2\right)+\cdots +{c}_n^2\;\mathrm{Var}\left({X}_n\right).\hfill \end{array}$$
(57)

Independent random variables X j  : Ω → ℝ, governed by the same probability mass function, pX j (x) = p(x), for j = 1 , 2 , 3 ,  … , are said to be independent and identically distributed, or i.i.d for short.

Sums of Random Variables

Frequently, one desires the p.m.f. of the sums of several random variables, viz., Z = X 1 +  ⋯  + X n . With n = 2,
$$\begin{array}{cc}\hfill {p}_Z(z)\hfill & \hfill =\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}{p}_{X_1,{X}_2}\left({x}_1, z-{x}_1\right),\hfill \\ {}\hfill \hfill & \hfill =\sum_{x_2\in {\mathrm{\mathcal{R}}}_2}{p}_{X_1,{X}_2}\left( z-{x}_2,{x}_2\right).\hfill \end{array}$$
(Note that we assume p X1,X2(x 1, z − x 1) = 0 if z − x 1 ∉ ℝ2, with a similar assumption for z − x 2 ∉ ℝ1.) If the r.v.s are independent, then the above simplifies to a convolution product, e.g.,
$${p}_Z(z)=\sum_{x_1\in {\mathrm{\mathcal{R}}}_1}{p}_{X_1}\left({x}_1\right){p}_{X_2}\left( z-{x}_1\right).$$
Thus, the generating function of the sum is the product of the generating functions of the terms:
$${g}_Z(s)={g}_{X_1}(s){g}_{X_2}(s).$$
For computing probabilities associated with the sum of n independent and identically distributed (i.i.d.) random variables, this generalizes to
$${g}_Z(s)={g}_X{(s)}^n.$$
(58)

Example 9

Let X i ∼ Poisson( λ i ) with λ i  > 0 for i = 1 … n. Then, the generating function for Z = X 1 +  ⋯  + X n is given by
$${g}_Z(s)=\prod_{i=1}^n{g}_{X_i}(s)=\prod_{i=1}^n{e}^{\lambda_i\left( s-1\right)}={e}^{\lambda_z\left( s-1\right)},$$

where λ z  = λ 1 +  … λ z . Thus, Poisson Z ~ (λ z ).

Example 10

Let X i ∼ Bernoulli(p) with p 2[0, 1], for i = 1 , n. Then, the generating function for Z = X 1 +  … X n is given by,
$${g}_Z(s)=\prod_{i=1}^n{g}_{X_i}(s)={\left(1- p+ ps\right)}^n.$$

Consequently, Z ∼ Binomial( n , p).

Compound Random Variables

If the number of r.v.s being summed is also random, then one obtains what is called a compound random variable. For example, suppose the X i  : Ω → ℝ are i.i.d., for i ∈ ℕ. Now, let N : Ω → ℕ be independent of the preceding r.v.s. The generating function for the sum Z = X 1 +  …  + X N is obtained following the prescription given by Feller (1968, p. 287):
$$\begin{array}{cc}\hfill {g}_Z(s)\hfill & \hfill =\mathbb{E}\left({s}^Z\right),\hfill \\ {}\hfill \hfill & \hfill =\mathbb{E}\left(\mathbb{E}\left({s}^{X_1+\dots +{X}_N}| N= n\right)\right),\hfill \\ {}\hfill \hfill & \hfill =\sum_{n=0}^{\infty}\mathbb{E}\left({s}^{X_1+\dots +{X}_n}\right){p}_N(n),\hfill \\ {}\hfill \hfill & \hfill =\sum_{n=0}^{\infty }{g}_X{(s)}^n{p}_N(n),\hfill \end{array}$$
(59)
$$={g}_N\left({g}_X(s)\right).$$
(60)

Example 11

Consider a tree of depth 2, in which the number of immediate descendants of the root is governed by N∼ Poisson(λ), and each level- one descendent generates leaf nodes according to a binomial X i ∼ Binomial(n , p), for i = 1 , 2 ,  …  , N. Then, the generating function that governs Z = X 1 +  …  + X N , the number of level- two nodes, is
$${g}_Z(s)={e}^{\lambda \left({\left(1- p+ ps\right)}^n-1\right)}.$$
Consequently, $$\mathbb{E}(Z)={g}_Z^{\prime }(1)=\lambda\;np$$. Since (1 − p) n represents the probability that a level-one node in this tree has no children, we let W = Y 1 +  ⋯  + Y N denote the number of level-one leaf nodes, where Y~ Bernoulli ((1 − p) n ). The generating function for W then simplifies to
$${g}_W(s)={e}^{\lambda {\left(1- p\right)}^n\left( s-1\right)}.$$

Thus, W∼ Poisson (λ(1 − p) n ), and the expected number of level-one leaf nodes is $$\mathbb{E}(W)={g}^{\prime } W(1)=\lambda {\left(1- p\right)}^n$$.

Branching Processes

The previous subsection enables the analysis of branching processes (Feller 1968; Harris 1989) (also known as Galton-Watson processes), which have direct applicability to modeling connectivity in random graphs. Let X i  : Ω → ℕ0 for i ∈ ℕ0 describe the size of the i th generation of an asexually reproducing population. Here, X 0 = 1 corresponds to the root node of the hereditary tree. Thus, X 1 will equal the number of the root’s children, X 2 the number of its grandchildren, and so on (See Fig. 10). We assume that the X i form a Markov chain, and that the transition probabilities are invariant with respect to generation. Thus, for example,
$${p}_{X_i+1\Big|{X}_i}\left( k| j\right)={\pi}_{k, j}\ge 0,$$ Fig. 10 An instance of a particular branching process. Each vertex corresponds to a member of the population. The single (red) vertex denotes the original parent and the root of the tree (X 0 = 1). The offspring belonging to the first generation (X 1 = 4) is represented by green vertices. Their children in turn (X 2 = 6) are represented by blue vertices. Only X 3 = 3 great grandchildren of the root node (violet vertices) were produced, and subsequent generations are extinct: X k  = 0 for k ≥ 4
for all i , j , k ∈ ℕ0. Let g i (s) denote the generating function of X i . Thus, g 0(s) = 1, and
$${g}_1(s)=\sum_{k=0}^{\infty }{p}_{X_1}(k){s}^k.$$

Theorem 12

The generating function corresponding to Xi + 1, for i ≥ 1, is the (i + 1) st iterate of g1. That is,
$${g}_{i+1}(s)={g}_i\left({g}_1(s)\right)={g}_1\left({g}_i(s)\right),$$
(61)

for i ∈ ℕ.

Proof

Suppose the size of the i th generation satisfies X i  = n. Then by (58) the generating function of the next generation would be g1(s)n. Then, with the aid of (59), the generating function for X i + 1 is derived as
$${g}_{i+1}(s)=\sum_{n=0}^{\infty }{g}_1{(s)}^n\mathrm{\mathbb{P}}\left[{X}_i= n\right]={g}_i\left({g}_1(s)\right).$$

The right equality in (61) follows from the associativity of function composition.

Galton and Watson were in particular interested in the question, will a given population continue to grow, or will it fall into extinction? To address this, note that by (40),
$${g}_k(0)=\mathrm{\mathbb{P}}\left[{X}_k=0\right]$$

corresponds to the probability of extinction at or before the k th generation. Thus, the probability of ultimate extinction is given by α ≜ lim k → ∞ g k (0).

Theorem 13 (Steffensen)

If $$\mu \triangleq \mathbb{E}\left({X}_1\right)\le 1$$ , then α = 1: the population falls into extinction. Otherwise, if μ > 1, then α < 1 and is determined by the unique fixed point of g 1,
$$\alpha ={g}_1\left(\alpha \right),$$
(62)

within the interval 0 < α < 1.

Proof

Let i ∈ ℝ0. Since the population never can recover from extinction, [X i  = 0] ⊆ [X i + 1 = 0]. Thus, by Theorem 2 the sequence g i (0) satisfies a monotonicity requirement: g i (0) < g i + 1(0) = g 1(g i (0)). The last equality stems from (61). In the limit i → ∞, both g i + 1(0) and g i (0) tend to α. Thus, in this limit, g i + 1(0) = g 1(g i (0)) yields α = g 1(α). Since α is a probability, we only are concerned with the domain 0 < α < 1.

First, we consider the case μ ≤ 1. By the mean value theorem of calculus, the continuity of g1 demands that there exists a ξ ∈ (s, 1) such that
$${g}_1^{\prime}\left(\xi \right)\left(1- s\right)={g}_1(1)-{g}_1(s)=1-{g}_1(s).$$
(The right equality follows from (35)). Since the coefficients in the power series of $${g}_1^{\prime }(s)$$ are nonnegative (see (38)), $$\mu ={\mathrm{g}}_1^{\prime }(1)\le 1$$ implies that $$0\le {g}_1^{\prime}\left(\xi \right)<1$$. Consequently, 1 − s > 1 − g 1(s), or g 1(s) > s. Thus, the only fixed point occurs at s = 1, whence α = 1 (Fig. 11a). Fig. 11 Two hypothetical graphs of the generating function g 1 with the corresponding fixed points, α = g 1(α), indicated by the abscissas of the intersections of the curves y = g 1(s) (red) with y = s (blue). (a) g 1(s) = 0.7 + 0.3s results in μ = 0.3 ≤ 1 and extinction probability, α = 1. (b) g 1(s) = e 2(s − 1) results in μ = 2 > 1 and α ≈ s 0.203188 < 1

If μ > 1, then by a similar argument g 1(s) < s for s in a neighborhood of 1. Also note that g 1(0) > 0. Since g (s) > 0, convexity guarantees that the equation s = g 1(s) has a single root in the interval 0 < s < 1. Finally, note that the fact that $$\mu ={g}_1^{\prime }(1)>1$$ ensures that lim k → ∞ g k (0) < 1. By (61), g k + 1(0) = g 1(g k (0)). But since g 1 < 1 in a neighborhood of s = 1, g k  + 1(0) < g k (0) for g k (0) = 1 − ϵ (with ϵ > 0, sufficiently small). Since this violates the monotonicity requirement, the fixed point at s = 1 is not achievable if μ > 1. Consequently α < 1 for this case (Fig. 11b).

Probability Inequalities

The toolbox of every practicing probabilistic contains a variety of inequalities for constructing useful bounds on certain probabilities.

Theorem 14 (Markov’s Inequality)

Let X : Ω → ℝ, whereis a countable subset of the nonnegative reals, and let α > 0 be a real constant. Then,
$$\mathrm{\mathbb{P}}\left( x\ge a\right)\le \frac{\mathbb{E}(X)}{a}.$$
(63)

Proof

From the definition of expectation,
$$\mathbb{E}(X)=\sum_{x\in \mathrm{\mathcal{R}}} x{p}_X(x),\ge \sum_{x\in \mathrm{\mathcal{R}}\cap \left[ a,+\infty \right]} x{p}_X(x),\ge a\sum_{x\in \mathrm{\mathcal{R}}\cap \left[ a,+\infty \right]}{p}_X(x),= a\mathrm{\mathbb{P}}\left( X\ge a\right).$$

Dividing through by α completes the proof.

Theorem 15 (Chebyshev’s Inequality)

Let X : Ω → ℝ denote an r.v., and let ϵ > 0. Then,
$$\mathrm{\mathbb{P}}\;\left(| X-\mathrm{E}(X)|\ge \upepsilon \right)\le \frac{\mathrm{Var}(X)}{\upepsilon^2}.$$
(64)

Proof

Let Y = (X − E(X))2 and let α = ϵ2. Since Y ≥ 0, the hypothesis of Markov’s inequality is satisfied. Consequently,
$$\mathrm{\mathbb{P}}\left( Y\ge {\upepsilon}^2\right)\le \frac{\mathrm{E}(Y){\upepsilon}^2}{.}$$
Replacing Y by its definition,
$$\mathrm{\mathbb{P}}\left( Y\ge {\upepsilon}^2\right)\le \frac{\mathrm{E}(Y)}{\upepsilon^2}$$

where (32) was employed to simplify the right side. Applying the square root to the argument of ℙ yields an equivalent event and (64).

Weak Law of Large Numbers

Chebyshev’s inequality provides a convenient demonstration of Bernoulli’s weak law of large numbers. Let X i  : Ω → ℛ, for i ∈ ℕ, be a countably infinite sequence of i.i.d. r.v.s with finite mean μ = E(X 1) and finite variance σ 2 = Var(X 1 < ∞. Then, let Z n  = (X 1 +  ⋯ X n )/n denote the average of the first n values in the sequence. From (51), Z n  = (X 1 +  ⋯ X n )/n. Likewise, from (57), Var(Z n ) = σ 2/n. Substitution into Chebyshev’s inequality (64) yields
$$\mathrm{\mathbb{P}}\left(|{Z}_n-\mu |\ge \upepsilon \right)\le \frac{\sigma^2}{n{\upepsilon}^2}.$$

Thus, for any ϵ > 0, the probability that the average value of the finite sequence X 1 ,  …  , X n deviates more than ϵ from the true mean μ can be enforced to be arbitrarily small by choosing n to be sufficiently large.

Chernoff’s Bound

Let S n  = X 1 +  ⋯  + X n denote a finite sum of i.i.d. random variables, and let ϵ > 0. Letting t > 0 represent an arbitrary positive number, the following inequalities can be shown to be equivalent:
$$\begin{array}{cc}\hfill {S}_n\ge \mathbb{E}\left({S}_n\right)+\upepsilon \hfill & \hfill \iff t\;{S}_n\ge t\;\left(\mathbb{E}\left({S}_n\right)+\upepsilon \right)\hfill \\ {}\hfill \hfill & \hfill \iff {e}^{t\;{S}_n}\ge {e}^{t\left(\mathbb{E}\left({S}_n\right)+\upepsilon \right)}\hfill \end{array}$$
Note the monotonicity of the exponential function enforces the last inequality. Consequently,
$$\mathrm{\mathbb{P}}\left({S}_n\ge \mathbb{E}\left({S}_n\right)+\upepsilon \right)=\mathrm{\mathbb{P}}\left({e}^{{t S}_n}\ge {e}^{t\left(\mathbb{E}\left({S}_n\right)+\upepsilon \right)}\right).$$
Using Markov’s inequality (63), one obtains
$$\begin{array}{c}\mathrm{\mathbb{P}}\left({S}_n\ge \mathbb{E}\left({S}_n\right)+\upepsilon \right)\le {e}^{- t\left(\mathbb{E}\left({S}_n\right)+\upepsilon \right)}\mathbb{E}\left({e}^{tS_n}\right),\\ {}\le {e}^{- t\left(\mathbb{E}\left({S}_n\right)+\upepsilon \right)}\prod_{i=1}^n\mathbb{E}\left({e}^{tX_i}\right).\end{array}$$
(65)
By a similar argument, the inequality
$$\mathrm{\mathbb{P}}\left({S}_n\ge \mathbb{E}\left({S}_n\right)-\upepsilon \right)\le {e}^{t\left(\mathbb{E}\left({S}_n\right)-\upepsilon \right)}\prod_{i=1}^n\mathbb{E}\left({e}^{-{ t X}_i}\right)$$
(66)
is derived. Chernoff’s bound is then obtained by computing the positive values of t that minimize the right sides of Inequalities (65) and (66) (Chernoff 1952). For example, in the event that each X i ~ Bernoulli(p), $$\mathbb{E}\left({e}^{\pm {tX}_i}\right)=1- p+{pe}^{\pm t}$$. In addition, letting $$\lambda =\mathbb{E}\left({S}_n\right)= np$$, with Inequality (65), we obtain
$$\mathrm{\mathbb{P}}\left({S}_n\ge \lambda +\upepsilon \right)\le {e}^{- t\left(\lambda +\upepsilon \right)}{\left(1- p+{pe}^t\right)}^n.$$
Using calculus, it is not difficult to show that the right side assumes a minimum when e t  = (1 − p)(λ + ϵ)/p(n − λ − ϵ), assuming n > λ + ϵ. Thus,
$$\mathrm{\mathbb{P}}\left({S}_n\ge \lambda +\upepsilon \right)\le {\left(\frac{\lambda}{\lambda +\upepsilon}\right)}^{\lambda +\upepsilon}{\left(\frac{n-\lambda}{n-\lambda -\upepsilon}\right)}^{n-\lambda -\upepsilon}.$$
(67)
By a similar argument (assuming n > λ – ϵ), Inequality (66) yields
$$\mathrm{\mathbb{P}}\left({S}_n\ge \lambda -\upepsilon \right)\le {\left(\frac{\lambda}{\lambda -\upepsilon}\right)}^{\lambda -\upepsilon}{\left(\frac{n-\lambda}{n-\lambda +\upepsilon}\right)}^{n-\lambda +\upepsilon}.$$
(68)
Inequalities (67) and (68) can be cast into more useful forms by loosening the bounds ever so slightly. Following the convex analysis of Janson et al. (2000, pp. 26–27), let ϕ(x) = (1 + x) log (1 = x) − x. Note that ϕ(0) = 0 , ϕ (x) =  log (1 + x), and ϕ (x) = 1/(1 + x). Consequently, for x >  − 1 , ϕ(x) > 0. Thus, Inequality (67) becomes
$$\mathrm{\mathbb{P}}\left({S}_n\ge \lambda -\upepsilon \right)\le \exp \left(-\lambda \phi \left(\frac{\upepsilon}{\lambda}\right)\right).$$
Since $${\phi}^{{\prime\prime} };(x)\ge \frac{1}{{\left(1+ x/3\right)}^3}={\left(\frac{x^2}{2\left(1+ x/3\right)}\right)}^{\prime \prime },$$one shows ϕ (x) ≥ x 2/(2(1 + x/3)). Thus,
$$\mathrm{P}\left({S}_n\ge \lambda +\upepsilon \right)\le \exp \left(-\frac{\upepsilon^2}{2\left(\lambda +\upepsilon /3\right)}\right).$$
(69)
In a similar manner (cf., Janson et al. 2000),
$$\begin{array}{l}\mathrm{P}\left({S}_n\le \lambda -\upepsilon \right)\le \exp \left(-\upepsilon \phi \left(-\frac{\upepsilon}{\lambda}\right)\right)\\ {}\le \exp \left(-\frac{\upepsilon^2}{2\lambda}\right).\end{array}$$
(70)

Example: Random Graphs

In this final section, we illustrate how probabilistic methods can be employed to illuminate a problem relevant to social networks: the onset of the giant component in a family of random graphs initially studied by Erdős and Rényi (1960). We begin with a description of the probability space, (Ω, ℰ, ℙ), associated with the model $$\mathbb{G}\left( n, p\right)$$: the set of simple graphs G(V, E) consisting of n = |V| vertices, in which each pair of vertices υ and υ is randomly connected by an edge with probability p. (Details about $$\mathbb{G}\left( n, p\right)$$ and related modes can be found in Barrat et al. (2008), Bollobás (1985), Durrett (2007), Erdős and Rényi (1959), Gilbert (1959), Janson et al. (2000), Newman (2010), and Vega-Redondo (2007)). This model is conceptually distinct from the probability space known as $$\mathbb{G}\left( n, m\right)$$, for which Ω consists of the $$\left(\begin{array}{c}\hfill M\hfill \\ {}\hfill m\hfill \end{array}\right)$$ simple graphs having n vertices and m edges, where $$M=\left(\begin{array}{c}\hfill n\hfill \\ {}\hfill 2\hfill \end{array}\right)$$. Although the sample space of $$\mathbb{G}\left( n, m\right)$$ is smaller, with a uniform probability measure, the random variables associated with the presence or absence of an edge lack the independence properties inherent to $$\mathbb{G}\left( n, p\right)$$). The sample space Ω of $$\mathbb{G}\left( n, p\right)$$ contains every possible simple graph containing n -labeled vertices. Since a simple graph contains at most $$M=\left(\begin{array}{c}\hfill n\hfill \\ {}\hfill 2\hfill \end{array}\right)$$ edges, |Ω | = 2 M , and the set of events ℰ is tacitly assumed to be the power set of Ω. TM probability measure is then defined by specifying the probability of each singleton event: for each ω ∈ Ω, let ℙ({ω}) = p m (1 − p) M − m , where m ∈ {0, 1, 2, · · ·, M} denotes the number of edges in the graph that ω represents. Equivalently, one may construct each graph in $$\mathbb{G}\left( n, p\right)$$ according to a finite Bernoulli process. Let X i , j  : Ω → {0, 1}, for 1 ≤ i < j ≤ n, be an i.i.d. sequence of M random variables, where ℙ([X i , j  = k]) = p k (1 − p)1 − k , for k ∈ {0, 1}. An edge is then constructed between vertices i and j if and only if X i , j  = 1. Therefore, a graph selected at random from $$\mathbb{G}\left( n, p\right)$$ will contain Y : Ω → {0, 1, · · ·, M} edges, in accordance with the binomial p.m.f. $${p}_Y(m)=\left(\begin{array}{c}\hfill M\hfill \\ {}\hfill m\hfill \end{array}\right){p}^m{\left(1- p\right)}^{M- m}$$. The expected number of edges in the graph (in total) is then $$\mathbb{E}(Y)= Mp$$ (see (45)). Note that the same value is obtained if Y is defined as the sum,
$$Y=\sum_{i=1}^{n-1}\sum_{j= i+1}^n{X}_{i, j}$$

(see Example 10).

Additional insight is gained by examining how this model impacts an arbitrary vertex in the graph. Let denote the degree of vertex υ ∈ V, that is, the number of its incident edges. Since there are n − 1 potential edges between υ and the remaining vertices in the graph, ~ Binomial(n − 1, p) is described by the p.m.f., $$\mathrm{\mathbb{P}}\left(\left[{D}_{\upsilon}= d\right]\right)=\left(\begin{array}{c}\hfill n-1\hfill \\ {}\hfill d\hfill \end{array}\right){p}^d{\left(1- p\right)}^{n-1- d}$$. Following standard convention (Newman 2010), we let
$$\begin{array}{cc}\hfill c=\mathbb{E}\left({D}_{\upsilon}\right)=\left( n-1\right) p\approx np\hfill & \hfill \mathrm{if} n\hfill \end{array}\gg 1,$$

which acts as an average branching factor. In the following, the Poisson limit (see the Remark in section “Poisson Random Variables”) with n → ∞, p → 0, such that np → c (a prescribed constant) will be useful. One technicality concerns how to implement the law of large numbers as the number of vertices in the graph n increases. The term asymptotically almost surely (a.a.s. for short) signifies the limit ℙ(E n ) → 1 as n → ∞, where E n is an event defined on a random structure (e.g., random graph) that depends on n.

Analyzing the Giant Component

One of the more remarkable properties of $$\mathbb{G}\left( n, p\right)$$ is how the number and sizes of the various components in a random graph depend on the parameter c = np. A component is defined as a maximal subset of vertices that are topologically connected to one another via a network of edges. In particular, if c > 1, then for an increasing sequence of values of n, the random graph almost surely exhibits a “giant” component that grows as Θ(n). Here, we closely follow the presentation of Janson et al. (2000) in which the following theorem is derived.

Theorem 16

Let np = c, where c > 0 denotes a constant.
1. 1.

In the event that c < 1, (a.a.s.) the largest component in $$\mathbb{G}\left( n, p\right)$$ has no more than 3 log n/(1 − c)2 vertices.

2. 2.

If c > 1, then we define α(c) ∈ (0, 1) to be the smallest positive root of α = g 1(α) (see (62)), where g 1(s) = e c(s − 1) is the generating function of Poisson(c). In this case, a single giant component containing approximately (1 − α(c) + o(1))n vertices exists within $$\mathbb{G}\left( n, p\right)$$.

The statement in Part (b) suggests that a branching process might be involved. Indeed, this is the case, but the nomenclature adopted here differs slightly from our presentation in section “Branching Processes.” The process begins with the selection of an arbitrary vertex υ from the graph. Let X 0 = 1, as before. The initial vertex υ will be linked to a random subset of nodes in the graph according to the probability model. We let X 1 enumerate the cardinality of this subset $$\left\{{\upsilon}_1,{\upsilon}_2,\dots, {\upsilon}_{X_1}\right\}$$. Once these vertices have been enumerated, vertex υ is said to be saturated, or “dead.” Vertices that have been enumerated and are not saturated are said to be active. In the next step, X 2 is obtained by enumerating the subset of vertices adjacent to υ 1 that are neither active nor saturated. The new vertices are declared active as υ 1 is saturated. Subsequently, the neighbors of node υ 2 that are neither active nor saturated are enumerated by X 3, and so on, in the manner of a breadth-first search. The process continues until every vertex in the component has been counted and labeled as saturated. The size of the component containing υ is thus  = X 0 + X 1 +  …  + X k . Heuristically, if the branching factor is small (c < 1), then the branching process that describes the “growth” of the component quickly falls into extinction. Consequently, each component in such a graph will be small. On the other hand, if the branching factor is large (c > 1), then the branching process may grow initially at a persistent rate, resulting in a large component.

Proof

Part (α): With c = np < 1, let (k) denote the event that the above branching process that starts at vertex υ yields a component that contains at least k nodes, where we assume that k depends on n via k = k(n). Our goal is to place an upper bound on the probability,
$$\begin{array}{cc}\hfill \mathrm{\mathbb{P}}\left({\exists}_{\upsilon}\in V,{A}_{\upsilon}\right)=\hfill & \hfill \mathrm{\mathbb{P}}\left(\underset{\upsilon \in V}{\cup }{A}_{\upsilon}(k)\right)\hfill \\ {}\hfill \hfill & \hfill \le n\mathrm{\mathbb{P}}\left({A}_{\upsilon}(k)\right).\hfill \end{array}$$
The inequality above follows from Boole’s inequality (19). Now, ℙ((k)) ≤ ℙ[X 1 +  ⋯  + X k  ≥ k − 1], because X 0 = 1. It is important to note that the branching process described above does not satisfy the stationarity requirement stipulated in section “Branching Processes”: with each new generation, it becomes more likely that new child nodes are either saturated or active. Nevertheless, an upper bound can be placed on the above probability by constructing a stochastic upper bound $${X}_i^{+}$$ on each random variable X i , so that $$\mathrm{\mathbb{P}}\left[{X}_i\ge x\right]\le \mathrm{\mathbb{P}}\left[{X}_i^{+}\ge x\right]$$ for every x ∈ {0, 1,  … , n}. Moreover, it is desirable that the $${X}_i^{+}$$ be i.i.d. Both requirements are satisfied by letting $${X}_i^{+}\sim$$ Binomial(n , c/n) for /∈;. Note also that $$\mathrm{E}\left({X}_i^{+}\right)= c.$$ With the assumption that k ≥ 3/(1 − c)2,
$$\begin{array}{c} n\mathrm{\mathbb{P}}\left(\sum_{i=1}^k{X}_i\ge k-1\right)\le n\mathrm{\mathbb{P}}\left(\sum_{i=1}^k{X}_i^{+}\ge k-1\right),\\ {}= n\mathrm{\mathbb{P}}\left(\sum_{i=1}^k{X}_i^{+}\ge ck+\left(1- c\right) k-1\right),\\ {}\le n \exp \left(-\frac{{\left(\left(1- c\right) k-1\right)}^2}{2\Big( ck+\left(1- c\right) k/3}\right),\\ {}\le n \exp \left(-\frac{{\left(1- c\right)}^2 k}{2}\right).\end{array}$$

The third result is obtained from Chernoff’s bound (69), as modified by Janson et al. If now, k = k(n) > 3 log n/(1 − c)2, then the probability that there exists a component in G(n, p) with k or more nodes falls off as o (1). Thus, c < 1 implies every component is small, a.a.s.

Part (b): With c = np > 1, let $$k\_=\frac{16 c}{{\left( c-1\right)}^2} \log n$$ and k + = n 2/3. Here, the strategy is to show that for every vertex υ, either the branching process falls into extinction before the k_ generation or the branching process continues for at least k + generations. Moreover, if k satisfies k −  ≤ k ≤ k +, then the process currently has at least (c − 1)k/2 active vertices. Since one visited vertex is labeled as saturated with each iteration, then this latter case implies that the size of the component rooted at υ is at least k + (c − 1)k/2 = (c + 1)k/2. We demonstrate this claim by considering the complementary event , k (Bfor “bad”), for which a branching process beginning with vertex υ has an insufficient number of visited vertices,
$${X}_0+{X}_1+\cdots +{X}_k<\frac{\left( c+1\right) k}{2},$$
for some k satisfying k −  ≤ k ≤ k +. In symmetry with the proof of Part (α), we now seek a stochastic lower bound $${X}_i^{-}$$ for each X i that satisfies $$\mathrm{\mathbb{P}}\left[{X}_i\le x\right]\le \mathrm{\mathbb{P}}\left[{X}_i^{-}\le x\right]$$ for every x ∈ {0, 1,  … , n}, such that the $${X}_i^{-}$$ are i.i.d. To wit, let $${X}_i^{-}\sim$$ Binomial(n − (c + 1)k +/2, c/n). As $$n\to \infty, \mathbb{E}\left({X}_j^{-}\right)\to c.$$ Thus,
$$\begin{array}{l}\mathrm{\mathbb{P}}\left(\exists \upsilon \in V,\exists k\in \left[{k}_{-},{k}_{+}\right],{B}_{\upsilon, k}\right)\\ {}\le n\sum_{k={k}_{-}}^{k_{+}}\mathrm{\mathbb{P}}\left(\sum_{i=1}^k{X}_i\le k-1+\frac{\left( c-1\right) k}{2}\right),\\ {}\le n\sum_{k={k}_{-}}^{k_{+}}\mathrm{\mathbb{P}}\left(\sum_{i=1}^k{X}_i^{-}\le k-1+\frac{\left( c-1\right) k}{2}\right),\\ {}\le n\sum_{k={k}_{-}}^{k_{+}} \exp \left(-\frac{{\left( c-1\right)}^2{k}^2}{9 c k}\right),\\ {}\le {nk}_{+} \exp \left(-\frac{{\left( c-1\right)}^2{k}_{-}}{9 c}\right)= o(1).\end{array}$$
(71)

In the above, Boole’s inequality (19) is initially applied, followed by Chernoff’s inequality (70), using λ = ck and ϵ = 1 + (c − 1)k/2.

The next step is to show that a.a.s. there can be at most one component that is larger than k +. Suppose a branching process starting with vertex υ produces a large component, which by the k = k + iteration has (by the previous argument) (c − 1)k +2 active vertices. If there is indeed more than one component, then a branching process beginning with a different vertex υ would produce a second component, also with at least (c − 1)k +2 active vertices by the k = k + iteration. The probability that the two components are in fact disjoint is less than or equal to
$$\begin{array}{l}{\left(1- p\right)}^{{\left(\left( c-1\right){k}_{+}/2\right)}^2}={\left(1-\frac{c}{n}\right)}^{{\left( c-1\right)}^2{n}^{4/2}/4}\\ {}\le \exp \left(-\frac{c{\left( c-1\right)}^2{n}^{1/3}}{4}\right)= o\left(1/{n}^2\right).\end{array}$$

Note that 1 − p represents the probability that a given pair of vertices is not connected by an edge, and here, there are at least ((c − 1 )k + 2)2 potential edges that link the two components. Thus, the two components merge, a.a.s., as n → ∞.

Consequently, the components in the random graph fall into two categories: (i) “small” components, each with size less than k vertices, or (ii) a single “large” component, with size greater than k + vertices. In the following, we apply our description of branching processes in section “Branching Processes” to estimate the number of vertices that fall into the first category and consequently estimate the size of the single large component. The probability that a vertex υ belongs to a small component is given by the extinction probability α of the branching process. Though it is difficult to determine α exactly, one can construct upper and lower bounds: α −  < α < α +. For the upper bound, α +, we construct the branching process where X 1∼ Binomial (n – k−, c/n). For the lower bound, α , we use the branching process defined by X 1∼ Binomial (n , c/n). In the asymptotic limit, both processes tend to the Poisson distribution, with generating function g 1(s) = e c(s − 1) (see the Remark in section “Poisson Random Variables”). Thus, in the limit n → ∞, the extinction probability equals α = α(c), the root of the equation α = e c(α − 1) that satisfies α ∈ (0, 1). The expected number of small vertices is thus n(α(c) + o(1)) < n. The remaining vertices must belong to the giant component, which therefore consists of n(1 − α (c) + o(1)) vertices, proving the theorem.

Kurt Lewin is celebrated for stating “there is nothing so practical as a good theory” (Lewin 1997, p. 288). Since the emergence of a single giant component in large networks is nearly ubiquitous – Newman (2010, pp. 235–239) enumerates 27 such instances, including 10 from social contexts – one might well assert that results like Theorem 16 are especially worthy of Lewin’s adulation. (See also Molloy and Reed 1995, 1998). Although practical social networks are typically modeled with additional features, the presence of a giant component can often be understood with simplifying assumptions. As an illustration, Fig. 12 depicts the network of sexual and romantic relationships studied by Bearman et al. (2004). Following a survey of 832 adolescents enrolled in the same high school in the Midwestern United States, a graph of their relationships over an 18-month period between 1993 and 1995 was constructed. Here, each subject is represented by a vertex, colored by gender, and edges are constructed between vertices if one subject reported a romantic or sexual encounter with the other. A total of 573 students were involved in at least one sexual relationship. This model differs from $$\mathbb{G}\left( n, p\right)$$ in that each vertex is colored by gender and heterosexual edges are prominent. The study also identified significant homophily based on socioeconomic status, academic performance, grade level, drinking and smoking behavior, etc. Yet, despite these additional details, a single “giant” component emerges. Fig. 12 After Fig. 2 of Bearman et al. (2004, p. 58). A number to the right of a component indicates its multiplicity. Thus, for example, 63 isolated heterosexual couples were identified in the study. In summary, 573 of 832 adolescents surveyed were involved in at least one relationship with another student, and 288 subjects were connected as a single large component

References

1. Barrat A, Barthélemy M, Vespignani A (2008) Dynamical processes on complex networks. Cambridge University Press, Cambridge
2. Bearman PS, Moody J, Stovel K (2004) Chains of affection: the structure of adolescent romantic and sexual networks. Am J Sociol 110(1):44–91
3. Bollobás B (1985) Random graphs. Academic Press, London
4. Chernoff H (1952) A measure of asymptotic efficiency for test of a hypothesis base on a sum of observations. Ann Math Stat 23:493–507
5. Durrett R (2007) Random graph dynamics. Cambridge University Press, Cambridge
6. Erdős P, Rényi A (1959) On random graphs i. Publ Math Debr 6:290–297
7. Erdős P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17–61
8. Feller W (1968) An introduction to probability theory and its applications, vol I, 3rd edn. Wiley, New York
9. Gilbert EN (1959) Random graphs. Ann Math Stat 30:1141–1144
10. Grimmett G, Stirzaker D (2001) Probability and random processes, 3rd edn. Oxford University Press, Oxford
11. Harris TE (1989) The theory of branching processes. Dover, MineolaGoogle Scholar
12. Janson S, Łuczak T, Ruciński A (2000) Rańdom graphs. Wiley, New York
13. Kolmogorov AN (1956) Foundations of probability, 2nd edn. Chelsea, New York
14. Lewin K (1997) Resolving social conflicts and field theory in social science. American Psychological Association, Washington, DC
15. Molloy M, Reed B (1995) A critical point for random graphs with a given degree sequence. Random Struct Algorithm 6(2–3):161–180
16. Molloy M, Reed B (1998) The size of the giant component of a random graph with a given degree sequence. Comb Probab Comput 7(3):295–305
17. Newman MEJ (2010) Networks: an introduction. Oxford University Press, Oxford
18. Vega-Redondo F (2007) Complex social networks. Cambridge University Press, Cambridge
19. Venkatesh SS (2013) The theory of probability. Cambridge University Press, Cambridge
20. Wilf HS (2006) Generatingfunctionology, 3rd edn. A K Peters, Wellesley