Probabilistic Analysis
Keywords
Random Graph Sample Space Simple Graph Probability Mass Function Mathematical InductionSynonyms
Glossary
 Asymptotically Almost Surely (a.a.s.)

The limit ℙ(E _{ n }) → 1 as n → ∞, where {E _{ n }} denotes a sequence of events defined on a random structure (e.g., a random graph) that depends on n
 Event

A subset of the sample space
 \( \mathbb{G}\left( n, p\right) \)

The probability space of simple random graphs that contain n vertices and for which each of the \( \left(\begin{array}{c}\hfill n\hfill \\ {}\hfill 2\hfill \end{array}\right) \) edges occurs with probability p ∈ [0, 1]
 Independent and Identically Distributed (i.i.d)

The hypothesis that some given random variables are mutually independent, and each is described by the same probability mass function
 Probability Mass Function (p.m.f.)

A function that assigns a probability to the event that a random variable assumes a given value, e.g., p _{ X }(x) = ℙ({ω ∈ Ω : X(ω) = x})
 Probability Measure

(ℙ) A function that assigns a probability (a number between 0 and 1) to every event contained in ℰ
 Random Variable (r.v.)

A mapping X : Ω → ℛ ⊆ ℝ that assigns a numerical value to every element within the sample space
 Sample Space (Ω)

The complete set of mutually disjoint outcomes of a random experiment
 Set of Events

(ℰ) A set of subsets of the sample space that is algebraically closed under both complements and countable unions
 Simple Graph

An undirected graph described by a set of vertices V and a set of edges E, such that each edge connects a pair of distinct vertices, and no more than one edge connects any pair of vertices
 Statistical Independence

The property that the probability of every joint event equals the product of the corresponding probabilities of the individual events
Introduction
The subject of probabilistic analysis is vast. Thus, the following article presents only a synopsis of the foundations of probability theory, discrete random variables, generating functions, branching processes, and probability inequalities. The utility of these concepts is illustrated by several examples interspersed throughout the article. The capstone example in section “Example: Random Graphs” derives the conditions for the existence of a giant component in the family of random graphs, \( \mathbb{G}\left( n, p\right) \), analyzed by Erdős and Rényi (1960) and more recently by Janson et al. (2000). Although the proof of this theorem is rather technical, the diligent reader will discover that the frequent emergence of a single giant component in practical networks is an immediate consequence of elementary properties of probability. Furthermore, this proof illustrates how the analytic methods described in this entry are applied in a nontrivial context.
Comprehensive treatments of the theory of probability can be found in the books by Feller (1968), Grimmett and Stirzaker (2001), and Venkatesh (2013). Applications of probabilistic methods to social networks are treated by Newman (2010) and VegaRedondo (2007).
Foundations
a  ¬a  a = ¬ (¬a) 

T F  F T  T F 
As the left and right columns in every row agree, the theorem of double negation is demonstrated.
a  b  a Λ b  a ∨ b  a ⇒ b  a ≡ b 

T T F F  T F T F  T F F F  T T T F  T F T T  T F F T 
Equations (1) and (2) are known as the distributive laws; (3) and (4) are known as DeMorgan’s laws; and (5) is the law of the contrapositive. Usually, the pairs of parentheses on the right sides of (3) through (5) are omitted as Boolean operations conventionally follow a prescribed order of precedence: negation, conjunction, disjunction, implication, and, finally, equivalence.
A set is a collection of distinct objects or elements, such as the natural numbers ℕ \( \triangleq \) {1, 2, 3, …}, the nonnegative integers ℕ_{0} \( \triangleq \) {0, 1, 2, …}, the (signed) integers ℤ \( \triangleq \) {…, −1, 0, 1, …}, and the reals ℝ. (N.B., the notation s \( \triangleq \) x specifies that the symbol s is being defined to represent the expression x). In the following, we will denote sets with uppercase letters, e.g., A, B, …. We will use the special symbol Ω to denote the universal set, that is, the set of all objects under consideration.
A predicate is a function that assigns a unique Boolean value {T, F} to each element of a set. Thus, one can define the predicate Even (x) to return T whenever x is an integer that is evenly divisible by 2, and F otherwise.
If x is a member of a set A, one writes x ∈ A, which should be interpreted as a predicate function that maps x to T if x is a member of A, and F otherwise. Likewise, the predicate x ∉ A is defined as ¬(x ∈ A), indicating that x is not a member of set A. For any set A, its cardinality, denoted by A, equals the number of elements it contains. Sets are often specified as the truth sets of predicates. Thus the set of even integers can be written as {x ∈ ℤ : Even(x)}.
Example 1
Furthermore, one readily verifies that \( \left\mathcal{P}(A)\right={2}^{\left\mathrm{A}\right}={2}^3=8 \). From (12), \( \left(\begin{array}{c}\hfill 3\hfill \\ {}\hfill 0\hfill \end{array}\right)=\left(\begin{array}{c}\hfill 3\hfill \\ {}\hfill 3\hfill \end{array}\right)=1 \), indicating that exactly one element in \( \mathcal{P}(A) \) has cardinality 0 (∅), and exactly one has cardinality 3 (viz., {1, 2, 3}). Likewise, since \( \left(\begin{array}{c}\hfill 3\hfill \\ {}\hfill 1\hfill \end{array}\right)=\left(\begin{array}{c}\hfill 3\hfill \\ {}\hfill 2\hfill \end{array}\right)=3 \), three subsets of A are found to contain exactly one element ({1}, {2}, {3}), and another three subsets of A contain exactly two elements ({1, 2}, {1, 3}, {2, 3}).
Theory of Probability
Axioms of Probability
Most modern treatments of probability theory are based on Kolmogorov’s definition of a probability space, (Ω, ℰ, ℙ) (Kolmogorov 1956). In the following, we define each of its three components in sequence.
The Sample Space, Ω
The fundamental notion in probability is the sample space, also known as the set of elementary outcomes. Since the sample space acts as a universal set, we denote it by Ω. This set is by definition complete: it contains every possible outcome under consideration. In addition, the elements of the set are mutually exclusive, meaning that only one element within it can occur at once. For an experiment consisting of the roll of a standard sixsided die, Ω would equal the set {1, 2, 3, 4, 5, 6}. For a simple graph of n vertices and m undirected edges, where the n vertices are fixed but the m edges are selected at random, Ω would equal the set of all \( \left(\begin{array}{c}\hfill \left(\begin{array}{c}\hfill n\hfill \\ {}\hfill 2\hfill \end{array}\right)\hfill \\ {}\hfill m\hfill \end{array}\right) \) possible configurations.
The Set of Events, ℰ

A1. Ω ∈ ℰ,

A2. A ∈ ℰ) A ^{ c } ∈ ℰ,

A3. A, B ∈ ℰ) A ∪ B ∈ ℰ.
If ℰ satisfies A1–A3 above, it is said to form an algebra. In the event that Ω is infinitely countable, i.e., if its elements can be placed in a onetoone correspondence with the elements of ℕ, then it is desirable to adopt the additional axiom, A4. If A _{ i } ∈ ℰ, for all i ∈ ℕ, then ∪_{ i } A _{ i } ∈ ℰ. Any collection of events ℰ that satisfies Axioms A1–A4 is called a σalgebra. In the event that Ω if either finite or infinitely countable, one often chooses ℰ to be the power set of the sample space, \( \mathcal{P}\left(\Omega \right) \).
The Probability Measure, ℙ

P1. For any A ∈ ℰ, ℙ(A) > 0;

P2. ℙ(Ω) = 1;

P3. For any finite or infinitely countable sequence of mutually disjoint events A _{ i } ∈ ℰ (i.e., j ≠ k  Aj 0 Aj = 0)
Postulate P3 is known as complete additivity. As a consequence of these three postulates, 0 ≤ ℙ(A) ≤ 1, for any event A ∈ ℰ. If Ω is either finite or infinitely countable, then ℰ may include all singleton events. In this case, the probability measure ℙ is uniquely defined by specifying the probabilities of the singleton events ℙ({ω}) for each ω ∈ Ω.
Together, the triple (Ω, ℰ, ℙ) is said to form a probability space. Henceforth, unless stated otherwise, we assume that the probability space (Ω, ℰ, ℙ) is well defined.
Example 2 (A Pair of Dice)
Some Useful Theorems
Theorem 1
If A ∈ ℰ, then ℙ(A ^{ c }) = 1 − ℙ(A).
Proof
By A2, A ^{ c } ∈ ℰ, and thus by P1, has a welldefined probability. From the definition of complement, A ∪ A ^{ c } = Ω and A ∩ A ^{ c } = ∅. Thus, by P2 and P3, ℙ(A) + ℙ(A ^{ c }) = ℙ(Ω) = 1, from which the theorem follows.
With A = Ω and P2, one obtains ℙ(∅) = 0.
Theorem 2 (Monotonicity)
If A, B ∈ ℰ, with A ⊆ B, then ℙ(A) ≤ ℙ(B).
Proof
The last inequality follows from ℙ(A ^{ c } ∩ B) ≥ 0, by A1.
Definition
 1.
E _{ i } ∩ E _{ j } = ∅ whenever i ≠ j
 2.
∪_{ i ∈ ℐ} E _{ i } = Ω.
Theorem 3 (Total Probability)
Proof
The last equality results from P3 as (A ∩ E _{ i }) ∩ (A ∩ E _{ j }) = ∅, whenever i ≠ j.
Theorem 4 (Inclusion/Exclusion)
Proof
The principle of inclusion/exclusion can be applied to obtain an upper bound on the probability of a finite union of events.
Theorem 5 (Boole’s Inequality)
Proof
In the above, (21) follows from the definition of the serial union; (22), from (20); (23) from the inductive hypothesis; and (24), from the definition of the summation operation.
Conditional Probability
Since the left sides of the previous two equations must therefore be equal, we have derived, the following:
Theorem 6 (Bayes’s Rule)
where the denominator in the last fraction represents the theorem of total probability (17), followed by an application of the definition of conditional probability.
Example 3
Thus, the discovery that a randomly drawn ball is amber dramatically shifts one’s degree of belief that the randomly chosen urn originally contained j amber balls, from 1/n to the nonuniform expression, \( j/\left(\begin{array}{c}\hfill n+1\hfill \\ {}\hfill 2\hfill \end{array}\right) \), for j = 1 , 2 , … , n.
Statistical Independence
Discrete Random Variables
A random variable (r.v.) provides a means of assigning numerical values to events defined within a given probability space (Ω ℰ ℙ). In many applications, random variables correspond to measured quantities in a random experiment that can vary from a binary feature (i.e., the existence of an edge between two particular nodes in a graph) or a more global aggregation (i.e., the total number of edges in the same graph). Formally, a random variable is a mapping from Ω to an arbitrary set of values, ℛ, called the range of X. In this case, we write X : Ω → ℛ. An r.v. is said to be discrete if ℛ is finite or countably infinite and continuous if ℛ has the power of the continuum, e.g., the real numbers in the unit interval [0, 1]. In this entry, we shall consider only discrete random variables, in which ℛ is either a finite or countably infinite subset of the reals.
Probability Mass Functions
Equation (29) is called the normalization condition of the p.m.f.
Example 4 (The Sum of Two Fair Dice)
Functions of Random Variables
On occasion when a new random variable Y is defined as a deterministic function of an existing random variable X, the following theorem yields the p.m.f. of Y. In the following, let ϕ : ℝ → ℝ denote an arbitrary function. For the r.v. X : Ω → ℛ, we define ϕ(ℛ) ≜ {ϕ(x) : x ∈ ℛ}, and ϕ ^{−1}(y) ≜ {x ∈ ℛ : ϕ(x) = y}.
Theorem 7
Proof
Expectations and Higher Moments
where pX is the probability mass function of X. This expression is also called the first moment of X.
In the event that a new random variable Y is defined as a function of an existing r.v., X, the following is useful.
Theorem 8
Proof
Example 5 (kth Moment of X)
The expectation satisfies the properties of a linear operator. Explicitly,
Theorem 9
Proof
where (29) was used to evaluate the final summation.
In analogy with Theorem 9, one can simplify the variance of a linearly scaled random variable.
Theorem 10
Proof
Generating Functions
Higher moments can be likewise obtained. Applications of (37) and (39) appear below.
for i = 0 , 1 , 2 , … . Thus, the entire p.m.f. of X can be recovered directly from the generating.
The generating function for a uniform distribution corresponds to a truncated geometric series.
Univariate Models
The following (discrete) probability distributions are frequently used in the probabilistic analysis of social networks.
Uniform RandomVariables
(The validity of the last equality can be shown by mathematical induction.)
Example 6
Bernoulli Random Variables
Example 7 (Bernoulli Trials)
Binomial Random Variables
Geometric Random Variables
Example 8
A pair of fair, sixsided dice is rolled repeatedly until doublesixes appear. The number of trials required is thus described by a geometric random variable X∼ Geometric(1/36). The expected number of rolls is \( \mathbb{E}(X)=1/ p=36 \), and the variance is Var(X) = 35 × 36 = 1260. Using (47), it follows that ℙ(X ≤ 25) ≈ 0.505532.
Poisson Random Variables
Remark 1
the Poisson g.f. (49).
ScaleFree Random Variables
Multivariate Models
belongs to ℰ for j = 1 , 2 , … , n, for all x under consideration. It is useful to extend Theorem 9 to such multivariate models.
Theorem 11
Proof
establishing the theorem for n = 2. The cases for n > 2 are demonstrated by mathematical induction in a manner similar to the proof of Theorem 5.
with similar equations for p _{ Xi }(x _{ i }) for i = 2 , 3 , … , n. Equation (52) and its companions define the marginal p.m.f.s of the individual r.v.s.
Conditional Probability Mass Functions
Markov Chains
Conditional Expectations
Note that different \( \mathbb{E} \) operations appearing in the above are taken with respect to different probability mass functions.
Independent Random Variables
Definition
The r.v.s, X _{ j } for j = 1 , 2 , … , n, are said to be independent, if the events [X _{ j } = X _{ j }] ∈ $, for j = 1 , 2 , … , n, are independent in accordance with (27).
Independent random variables X _{ j } : Ω → ℝ, governed by the same probability mass function, pX _{ j }(x) = p(x), for j = 1 , 2 , 3 , … , are said to be independent and identically distributed, or i.i.d for short.
Sums of Random Variables
Example 9
where λ _{ z } = λ _{1} + … λ _{ z }. Thus, Poisson Z ~ (λ _{ z }).
Example 10
Consequently, Z ∼ Binomial( n , p).
Compound Random Variables
Example 11
Thus, W∼ Poisson (λ(1 − p)^{ n }), and the expected number of levelone leaf nodes is \( \mathbb{E}(W)={g}^{\prime } W(1)=\lambda {\left(1 p\right)}^n \).
Branching Processes
Theorem 12
for i ∈ ℕ.
Proof
The right equality in (61) follows from the associativity of function composition.
corresponds to the probability of extinction at or before the k th generation. Thus, the probability of ultimate extinction is given by α ≜ lim_{ k → ∞} g _{ k }(0).
Theorem 13 (Steffensen)
within the interval 0 < α < 1.
Proof
Let i ∈ ℝ_{0}. Since the population never can recover from extinction, [X _{ i } = 0] ⊆ [X _{ i + 1} = 0]. Thus, by Theorem 2 the sequence g _{ i }(0) satisfies a monotonicity requirement: g _{ i }(0) < g _{ i + 1}(0) = g _{1}(g _{ i }(0)). The last equality stems from (61). In the limit i → ∞, both g _{ i + 1}(0) and g _{ i }(0) tend to α. Thus, in this limit, g _{ i + 1}(0) = g _{1}(g _{ i }(0)) yields α = g _{1}(α). Since α is a probability, we only are concerned with the domain 0 < α < 1.
If μ > 1, then by a similar argument g _{1}(s) < s for s in a neighborhood of 1. Also note that g _{1}(0) > 0. Since g ^{″}(s) > 0, convexity guarantees that the equation s = g _{1}(s) has a single root in the interval 0 < s < 1. Finally, note that the fact that \( \mu ={g}_1^{\prime }(1)>1 \) ensures that lim_{ k → ∞} g _{ k }(0) < 1. By (61), g _{ k + 1}(0) = g _{1}(g _{ k }(0)). But since g _{1} < 1 in a neighborhood of s = 1, g _{ k } + 1(0) < g _{ k }(0) for g _{ k }(0) = 1 − ϵ (with ϵ > 0, sufficiently small). Since this violates the monotonicity requirement, the fixed point at s = 1 is not achievable if μ > 1. Consequently α < 1 for this case (Fig. 11b).
Probability Inequalities
The toolbox of every practicing probabilistic contains a variety of inequalities for constructing useful bounds on certain probabilities.
Markov’s Inequality
Theorem 14 (Markov’s Inequality)
Proof
Dividing through by α completes the proof.
Chebyshev’s Inequality
Theorem 15 (Chebyshev’s Inequality)
Proof
where (32) was employed to simplify the right side. Applying the square root to the argument of ℙ yields an equivalent event and (64).
Weak Law of Large Numbers
Thus, for any ϵ > 0, the probability that the average value of the finite sequence X _{1} , … , X _{ n } deviates more than ϵ from the true mean μ can be enforced to be arbitrarily small by choosing n to be sufficiently large.
Chernoff’s Bound
Example: Random Graphs
(see Example 10).
which acts as an average branching factor. In the following, the Poisson limit (see the Remark in section “Poisson Random Variables”) with n → ∞, p → 0, such that np → c (a prescribed constant) will be useful. One technicality concerns how to implement the law of large numbers as the number of vertices in the graph n increases. The term asymptotically almost surely (a.a.s. for short) signifies the limit ℙ(E _{ n }) → 1 as n → ∞, where E _{ n } is an event defined on a random structure (e.g., random graph) that depends on n.
Analyzing the Giant Component
One of the more remarkable properties of \( \mathbb{G}\left( n, p\right) \) is how the number and sizes of the various components in a random graph depend on the parameter c = np. A component is defined as a maximal subset of vertices that are topologically connected to one another via a network of edges. In particular, if c > 1, then for an increasing sequence of values of n, the random graph almost surely exhibits a “giant” component that grows as Θ(n). Here, we closely follow the presentation of Janson et al. (2000) in which the following theorem is derived.
Theorem 16
 1.
In the event that c < 1, (a.a.s.) the largest component in \( \mathbb{G}\left( n, p\right) \) has no more than 3 log n/(1 − c)^{2} vertices.
 2.
If c > 1, then we define α(c) ∈ (0, 1) to be the smallest positive root of α = g _{1}(α) (see (62)), where g _{1}(s) = e ^{ c(s − 1)} is the generating function of Poisson(c). In this case, a single giant component containing approximately (1 − α(c) + o(1))n vertices exists within \( \mathbb{G}\left( n, p\right) \).
The statement in Part (b) suggests that a branching process might be involved. Indeed, this is the case, but the nomenclature adopted here differs slightly from our presentation in section “Branching Processes.” The process begins with the selection of an arbitrary vertex υ from the graph. Let X _{0} = 1, as before. The initial vertex υ will be linked to a random subset of nodes in the graph according to the probability model. We let X _{1} enumerate the cardinality of this subset \( \left\{{\upsilon}_1,{\upsilon}_2,\dots, {\upsilon}_{X_1}\right\} \). Once these vertices have been enumerated, vertex υ is said to be saturated, or “dead.” Vertices that have been enumerated and are not saturated are said to be active. In the next step, X _{2} is obtained by enumerating the subset of vertices adjacent to υ _{1} that are neither active nor saturated. The new vertices are declared active as υ _{1} is saturated. Subsequently, the neighbors of node υ _{2} that are neither active nor saturated are enumerated by X _{3}, and so on, in the manner of a breadthfirst search. The process continues until every vertex in the component has been counted and labeled as saturated. The size of the component containing υ is thus Sυ = X _{0} + X _{1} + … + X _{ k }. Heuristically, if the branching factor is small (c < 1), then the branching process that describes the “growth” of the component quickly falls into extinction. Consequently, each component in such a graph will be small. On the other hand, if the branching factor is large (c > 1), then the branching process may grow initially at a persistent rate, resulting in a large component.
Proof
The third result is obtained from Chernoff’s bound (69), as modified by Janson et al. If now, k = k(n) > 3 log n/(1 − c)^{2}, then the probability that there exists a component in G(n, p) with k or more nodes falls off as o (1). Thus, c < 1 implies every component is small, a.a.s.
In the above, Boole’s inequality (19) is initially applied, followed by Chernoff’s inequality (70), using λ = ck and ϵ = 1 + (c − 1)k/2.
Note that 1 − p represents the probability that a given pair of vertices is not connected by an edge, and here, there are at least ((c − 1 )k + 2)^{2} potential edges that link the two components. Thus, the two components merge, a.a.s., as n → ∞.
Consequently, the components in the random graph fall into two categories: (i) “small” components, each with size less than k ^{−} vertices, or (ii) a single “large” component, with size greater than k _{+} vertices. In the following, we apply our description of branching processes in section “Branching Processes” to estimate the number of vertices that fall into the first category and consequently estimate the size of the single large component. The probability that a vertex υ belongs to a small component is given by the extinction probability α of the branching process. Though it is difficult to determine α exactly, one can construct upper and lower bounds: α − < α < α _{+}. For the upper bound, α _{+}, we construct the branching process where X _{1}∼ Binomial (n – k−, c/n). For the lower bound, α ^{−}, we use the branching process defined by X _{1}∼ Binomial (n , c/n). In the asymptotic limit, both processes tend to the Poisson distribution, with generating function g _{1}(s) = e ^{ c(s − 1)} (see the Remark in section “Poisson Random Variables”). Thus, in the limit n → ∞, the extinction probability equals α = α(c), the root of the equation α = e ^{ c(α − 1)} that satisfies α ∈ (0, 1). The expected number of small vertices is thus n(α(c) + o(1)) < n. The remaining vertices must belong to the giant component, which therefore consists of n(1 − α (c) + o(1)) vertices, proving the theorem.
CrossReferences
References
 Barrat A, Barthélemy M, Vespignani A (2008) Dynamical processes on complex networks. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
 Bearman PS, Moody J, Stovel K (2004) Chains of affection: the structure of adolescent romantic and sexual networks. Am J Sociol 110(1):44–91CrossRefGoogle Scholar
 Bollobás B (1985) Random graphs. Academic Press, LondonzbMATHGoogle Scholar
 Chernoff H (1952) A measure of asymptotic efficiency for test of a hypothesis base on a sum of observations. Ann Math Stat 23:493–507CrossRefzbMATHGoogle Scholar
 Durrett R (2007) Random graph dynamics. Cambridge University Press, CambridgezbMATHGoogle Scholar
 Erdős P, Rényi A (1959) On random graphs i. Publ Math Debr 6:290–297zbMATHGoogle Scholar
 Erdős P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17–61MathSciNetzbMATHGoogle Scholar
 Feller W (1968) An introduction to probability theory and its applications, vol I, 3rd edn. Wiley, New YorkzbMATHGoogle Scholar
 Gilbert EN (1959) Random graphs. Ann Math Stat 30:1141–1144MathSciNetCrossRefzbMATHGoogle Scholar
 Grimmett G, Stirzaker D (2001) Probability and random processes, 3rd edn. Oxford University Press, OxfordzbMATHGoogle Scholar
 Harris TE (1989) The theory of branching processes. Dover, MineolaGoogle Scholar
 Janson S, Łuczak T, Ruciński A (2000) Rańdom graphs. Wiley, New YorkCrossRefzbMATHGoogle Scholar
 Kolmogorov AN (1956) Foundations of probability, 2nd edn. Chelsea, New YorkzbMATHGoogle Scholar
 Lewin K (1997) Resolving social conflicts and field theory in social science. American Psychological Association, Washington, DCCrossRefGoogle Scholar
 Molloy M, Reed B (1995) A critical point for random graphs with a given degree sequence. Random Struct Algorithm 6(2–3):161–180MathSciNetCrossRefzbMATHGoogle Scholar
 Molloy M, Reed B (1998) The size of the giant component of a random graph with a given degree sequence. Comb Probab Comput 7(3):295–305MathSciNetCrossRefzbMATHGoogle Scholar
 Newman MEJ (2010) Networks: an introduction. Oxford University Press, OxfordCrossRefzbMATHGoogle Scholar
 VegaRedondo F (2007) Complex social networks. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
 Venkatesh SS (2013) The theory of probability. Cambridge University Press, CambridgezbMATHGoogle Scholar
 Wilf HS (2006) Generatingfunctionology, 3rd edn. A K Peters, WellesleyzbMATHGoogle Scholar