Sensitivity Analysis of Discrete Markov Chains

Caswell, Hal

doi:10.1007/978-3-030-10534-1_11

Hal Caswell³

Part of the book series: Demographic Research Monographs ((DEMOGRAPHIC))

7477 Accesses

Abstract

As we have seen repeatedly, Markov chains are often used as mathematical models of demographic (as well as other natural) phenomena, with transition probabilities defined in terms of parameters that are of interest in the scientific question at hand. Sensitivity analysis is an important way to quantify the effects of changes in these parameters on the behavior of the chain.

Chapter 11 is modified, under the terms of a Journal Publishing Agreement with Elsevier Publishers, from: Caswell, H. Sensitivity analysis of discrete Markov chains via matrix calculus. Linear Algebra and its Applications 438:1727–1745. ⒸElsevier.

You have full access to this open access chapter, Download chapter PDF

1 Introduction

As we have seen repeatedly, Markov chains are often used as mathematical models of demographic (as well as other natural) phenomena, with transition probabilities defined in terms of parameters that are of interest in the scientific question at hand. Sensitivity analysis is an important way to quantify the effects of changes in these parameters on the behavior of the chain. This chapter revisits, in a more rigorous way, some of the quantities already explored for absorbing Markov chains (Chaps. 4, 5, and 6). It will also consider ergodic Markov chains (in which no absorbing states exist), and calculate the sensitivity of the stationary distribution and measures of the rate of convergence.

Perturbation (or sensitivity) analysis is a long-standing problem in the theory of Markov chains (Schweitzer 1968; Conlisk 1985; Golub and Meyer 1986; Funderlic and Meyer 1986; Seneta 1988, 1993; Meyer 1994; Cho and Meyer 2000; Mitrophanov 2003, 2005; Mitrophanov et al. 2005; Kirkland et al. 2008). When Markov chains are applied as models of physical, biological, or social systems, they are often defined as functions of parameters that have substantive meaning.

2 Absorbing Chains

The transition matrix for a discrete-time absorbing chain can be written

(11.1)

where U, of dimension s × s, is the transition matrix among the s transient states, and M, of dimension a × s, contains probabilities of transition from the transient states to the a absorbing states. Assume that the spectral radius of U is strictly less than 1. Because we are concerned here with absorption, but not what happens after, we ignore transitions among absorbing states; hence the identity matrix (a × a) in the lower right corner. The matrices U[θ] and M[θ] are functions of a vector of parameters. We assume that θ varies over some set in which the column sums of P are 1 and the spectral radius of U is strictly less than one.

2.1 Occupancy: Visits to Transient States

Let ν _ij be the number of visits to transient state i, prior to absorption, by an individual starting in transient state j. The expectations of the ν _ij are entries of the fundamental matrix ${\mathbf {N}} = {\mathbf {N}}_1 = \left ( E(\eta _{ij}^{~}) \right )$:

$$\displaystyle \begin{aligned} {\mathbf{N}} = \left( {\mathbf{I}} - {\mathbf{U}} \right)^{-1} {} \end{aligned} $$

(11.2)

(e.g., Kemeny and Snell 1960; Iosifescu 1980). Let ${\mathbf {N}}_k = \left ( E(\eta _{ij}^k) \right )$ be a matrix containing the kth moments about the origin of the ν _ij. The first several of these matrices are (Iosifescu 1980, Thm. 3.1)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_1 &\displaystyle =&\displaystyle \left( {\mathbf{I}} - {\mathbf{U}} \right)^{-1} {} \end{array} \end{aligned} $$

(11.3)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_2 &\displaystyle =&\displaystyle \left( 2 {\mathbf{N}}_{\mathrm{dg}} - {\mathbf{I}} \right) {\mathbf{N}}_1 {} \end{array} \end{aligned} $$

(11.4)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_3 &\displaystyle =&\displaystyle \left( 6 {\mathbf{N}}^{2}_{\mathrm{dg}} - 6 {\mathbf{N}}_{\mathrm{dg}} + {\mathbf{I}} \right) {\mathbf{N}}_1 {} \end{array} \end{aligned} $$

(11.5)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_4 &\displaystyle =&\displaystyle \left( 24 {\mathbf{N}}^{3}_{\mathrm{dg}} -36 {\mathbf{N}}^{2}_{\mathrm{dg}} +14 {\mathbf{N}}_{\mathrm{dg}} - {\mathbf{I}} \right) {\mathbf{N}}_1.{} \end{array} \end{aligned} $$

(11.6)

Theorem 11.2.1

Let N _kbe the matrix of kth moments of the ν _ij, as given by (11.3) , (11.4) , (11.5) , and (11.6) . The sensitivities of N _k, for k = 1, …, 4 are

(11.7)

(11.8)

(11.9)

(11.10)

where (see Sect. 2.8 )

$$\displaystyle \begin{aligned} \begin{array}{rcl} d {\mathbf{N}}_{\mathrm{dg}} &\displaystyle =&\displaystyle {\mathbf{I}} \circ d {\mathbf{N}}_1 \end{array} \end{aligned} $$

(11.11)

$$\displaystyle \begin{aligned} \begin{array}{rcl} d \mathrm{vec} \, {\mathbf{N}}_{\mathrm{dg}} &\displaystyle =&\displaystyle \mathcal{D}\,(\mathrm{vec} \, {\mathbf{I}} ) d \mathrm{vec} \, {\mathbf{N}}_1. {} \end{array} \end{aligned} $$

(11.12)

Proof

The result (11.7) is derived in Caswell (2006, Section 3.1). For k > 1, and considering N _k as a function of N ₁ and N _dg, the total differential of N _k is

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{N}}_k = {\partial \mbox{vec} \, {\mathbf{N}}_k \over \partial \mbox{vec} \,^{\mathsf{T}} {\mathbf{N}}_1} d \mbox{vec} \, {\mathbf{N}}_1 + {\partial \mbox{vec} \, {\mathbf{N}}_k \over \partial \mbox{vec} \,^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}} d \mbox{vec} \, {\mathbf{N}}_{\mathrm{dg}}. {} \end{aligned} $$

(11.13)

The two terms of (11.13) are the partial differentials of vec N _k, obtained by taking differentials treating only N ₁ or only N _dg as variables, respectively. Denote these partial differentials as $\partial _{\mbox{ {${\mathbf {N}}_1$}}}$ $\partial _{\mbox{ {${\mathbf {N}}_1$}}}$ and $\partial _{\mbox{ {${\mathbf {N}}_{\mathrm {dg}}$}}}$ and $\partial _{\mbox{ {${\mathbf {N}}_{\mathrm {dg}}$}}}$. Differentiating N ₂ in (11.4), gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_1$}}} {\mathbf{N}}_2 &\displaystyle =&\displaystyle 2 {\mathbf{N}}_{\mathrm{dg}} \left(d {\mathbf{N}}_1 \right) - d {\mathbf{N}}_1 \end{array} \end{aligned} $$

(11.14)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_{\mathrm{dg}}$}}} {\mathbf{N}}_2 &\displaystyle =&\displaystyle 2 \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_1. \end{array} \end{aligned} $$

(11.15)

Applying the vec operator gives

(11.16)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_{\mathrm{dg}}$}}} \mbox{vec} \, {\mathbf{N}}_2 &\displaystyle =&\displaystyle 2 \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{I}} \right) d \mbox{vec} \, {\mathbf{N}}_{\mathrm{dg}}, \end{array} \end{aligned} $$

(11.17)

and (11.13) becomes

(11.18)

which is (11.8). The derivations of dvec N ₃ and dvec N ₄ follow the same sequence of steps. The details are given in Appendix A. □

The derivatives of N ₂, N ₃, and N ₄ can be used to study the variance, standard deviation, coefficient of variation, skewness, and kurtosis of the number of visits to the transient states (Caswell 2006, 2009, 2011).

2.2 Time to Absorption

Let η _j be the time to absorption starting in transient state j and let $\boldsymbol {\eta }_k = E \left (\begin {array}{ccc} \eta _1^k, \cdots ,\eta _s^k \end {array}\right )^{\mathsf {T}}$. The first several of these moments are (Iosifescu 1980, Thm. 3.2)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_1^{\mathsf{T}} &\displaystyle =&\displaystyle {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_1 \end{array} \end{aligned} $$

(11.19)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_2^{\mathsf{T}} &\displaystyle =&\displaystyle \boldsymbol{\eta}_1^{\mathsf{T}} \left( 2 {\mathbf{N}}_1 - {\mathbf{I}} \right) \end{array} \end{aligned} $$

(11.20)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_3^{\mathsf{T}} &\displaystyle =&\displaystyle \boldsymbol{\eta}_1^{\mathsf{T}} \left( 6 {\mathbf{N}}_1^2 - 6 {\mathbf{N}}_1 + {\mathbf{I}} \right) {} \end{array} \end{aligned} $$

(11.21)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_4^{\mathsf{T}} &\displaystyle =&\displaystyle \boldsymbol{\eta}_1^{\mathsf{T}} \left( 24 {\mathbf{N}}_1^3 - 36 {\mathbf{N}}_1^2 + 14 {\mathbf{N}}_1 - {\mathbf{I}} \right). {} \end{array} \end{aligned} $$

(11.22)

Theorem 11.2.2

Let η _k be the vector of the kth moments of the η _i . The sensitivities of these moment vectors are

$$ \begin{array}{rcl} d {\eta}_1 &\displaystyle =&\displaystyle \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) d {vec} \, {\mathbf{N}}_1 {} \end{array} $$

(11.23)

$$\displaystyle \begin{aligned} \begin{array}{rcl} d \boldsymbol{\eta}_2 &\displaystyle =&\displaystyle \left( 2 {\mathbf{N}}_1^{\mathsf{T}} - {\mathbf{I}} \right) d \boldsymbol{\eta}_1 + 2 \left( {\mathbf{I}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d {vec} \, {\mathbf{N}}_1 {} \end{array} \end{aligned} $$

(11.24)

(11.25)

(11.26)

where dvec N ₁is given by (11.7) .

Proof

The derivative of η ₁ is obtained (Caswell 2006) by differentiating to get $d \boldsymbol {\eta }_1^{\mathsf {T}} = {\mathbf {1}}^{\mathsf {T}} \left ( d {\mathbf {N}}_1 \right )$ and then applying the vec operator. For the higher moments, consider the η _k to be functions of η ₁ and N ₁, and write the total differential

$$\displaystyle \begin{aligned} d \boldsymbol{\eta}_k = {\partial \boldsymbol{\eta}_k \over \partial \boldsymbol{\eta}_1^{\mathsf{T}}} \; d \boldsymbol{\eta}_1 + {\partial \boldsymbol{\eta}_k \over \partial \mbox{vec} \,^{\mathsf{T}} {\mathbf{N}}_1} \; d \mbox{vec} \, {\mathbf{N}}_1. {} \end{aligned} $$

(11.27)

The partial differentials of η ₂ with respect to η ₁ and N ₁ are

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {$\boldsymbol{\eta}_1$}}} \boldsymbol{\eta}_2^{\mathsf{T}} &\displaystyle =&\displaystyle \left( d \boldsymbol{\eta}_1^{\mathsf{T}} \right) \left( 2 {\mathbf{N}}_1-{\mathbf{I}} \right) \end{array} \end{aligned} $$

(11.28)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_1$}}} \boldsymbol{\eta}_2^{\mathsf{T}} &\displaystyle =&\displaystyle 2 \boldsymbol{\eta}_1^{\mathsf{T}} \left( d {\mathbf{N}}_1 \right). \end{array} \end{aligned} $$

(11.29)

Applying the vec operator gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {$\boldsymbol{\eta}_1$}}} \boldsymbol{\eta}_2 &\displaystyle =&\displaystyle \left( 2 {\mathbf{N}}_1^{\mathsf{T}} - {\mathbf{I}} \right) d \boldsymbol{\eta}_1 \end{array} \end{aligned} $$

(11.30)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_1$}}} \boldsymbol{\eta}_2 &\displaystyle =&\displaystyle 2 \left( {\mathbf{I}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{N}}_1 \end{array} \end{aligned} $$

(11.31)

which combine according to (11.27) to yield (11.24). The derivations of d η ₃ and d η ₄ follow the same sequence of steps; the details are shown in Appendix A. □

2.3 Number of States Visited Before Absorption

Let ξ _i ≥ 1 be the number of distinct transient states visited before absorption, and let ξ ₁ = E(ξ). Then

$$\displaystyle \begin{aligned} \boldsymbol{\xi}_1^{\mathsf{T}} = {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} {\mathbf{N}}_1 {} \end{aligned} $$

(11.32)

(Iosifescu 1980, Sect. 3.2.5), where ${\mathbf {N}}_{\mathrm {dg}}^{-1} = \left ( {\mathbf {N}}_{\mathrm {dg}} \right )^{-1}$.

Theorem 11.2.3

Let ξ ₁ = E(ξ). The sensitivity of ξ is

$$\displaystyle \begin{aligned} d \boldsymbol{\xi}_1 = \left[ - \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \left( {\mathbf{N}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) \mathcal{D}\,({vec} \, {\mathbf{I}}) + \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) \right] d {vec} \, {\mathbf{N}}_1, {} \end{aligned} $$

(11.33)

where dvec N ₁is given by (11.7) .

Proof

Differentiating (11.32) yields

$$\displaystyle \begin{aligned} d \boldsymbol{\xi}_1^{\mathsf{T}} = {\mathbf{1}}^{\mathsf{T}} \left( d {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) {\mathbf{N}}_1 + {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} d {\mathbf{N}}_1. \end{aligned} $$

(11.34)

Applying the vec operator yields

$$\displaystyle \begin{aligned} d \boldsymbol{\xi}_1 = \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{N}}_{\mathrm{dg}}^{-1} + \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) d \mbox{vec} \, {\mathbf{N}}_1. \end{aligned} $$

(11.35)

Applying (2.82) to $d \mbox{vec} \, {\mathbf {N}}_{\mathrm {dg}}^{-1}$ and using (11.12) for dvec N _dg gives

$$\displaystyle \begin{aligned} d \boldsymbol{\xi}_1 = - \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \left( {\mathbf{N}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}}) d \mbox{vec} \, {\mathbf{N}}_1 + \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) d \mbox{vec} \, {\mathbf{N}}_1 \end{aligned} $$

(11.36)

which simplifies to (11.33). □

2.4 Multiple Absorbing States and Probabilities of Absorption

When the chain includes a > 1 absorbing states, the entry m _ij of the a × s submatrix M in (11.1) is the probability of transition from transient state j to absorbing state i. The result of the competing risks of absorption is a set of probabilities $b_{ij} = P \left [ \mbox{absorption in }i \left | \mbox{starting in }j \right . \right ]$ for i = 1, …, a and j = 1, …, s. The matrix ${\mathbf {B}} = \left ( b_{ij} \right ) = {\mathbf {M}} {\mathbf {N}}_1$ (Iosifescu 1980, Thm. 3.3).

Theorem 11.2.4

Let B = MN ₁be the matrix of absorption probabilities. Then

$$\displaystyle \begin{aligned} d {vec} \, {\mathbf{B}} = \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{I}} \right) d {vec} \, {\mathbf{M}} + \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{B}} \right) d {vec} \, {\mathbf{U}}. {} \end{aligned} $$

(11.37)

Proof

Differentiating B yields

$$\displaystyle \begin{aligned} d {\mathbf{B}} = \left( d {\mathbf{M}} \right) {\mathbf{N}}_1 + {\mathbf{M}} \left( d {\mathbf{N}}_1 \right). \end{aligned} $$

(11.38)

Applying the vec operator gives

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}} = \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{I}} \right) d \mbox{vec} \, {\mathbf{M}} + \left( {\mathbf{I}} \otimes {\mathbf{M}} \right) d \mbox{vec} \, {\mathbf{N}}_1. \end{aligned} $$

(11.39)

Substituting (11.7) for dvec N ₁ and simplifying gives (11.37). □

Column j of B is the probability distribution of the eventual absorption state for an individual starting in transient state j. Usually a few of those starting states are of particular interest (e.g., states corresponding to “birth” or to the start of some process). Let B(:, j) = Be _j denote column j of B, where e _j is the jth unit vector of length s. Thus the derivative of B(:, j) is

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}}(:,j) = \left( {\mathbf{e}}_j^{\mathsf{T}} \otimes {\mathbf{I}}_s \right) d \mbox{vec} \, {\mathbf{B}} {} \end{aligned} $$

(11.40)

where dvec B is given by (11.37). Similarly, row i of B is ${\mathbf {B}}(i,:)={\mathbf {e}}_i^{\mathsf {T}} {\mathbf {B}}$ and

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}}(i,:) = \left( {\mathbf{I}}_s \otimes {\mathbf{e}}_i^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{B}} {} \end{aligned} $$

(11.41)

where e _i is the ith unit vector of length a.

2.5 The Quasistationary Distribution

The quasistationary distribution of an absorbing Markov chain gives the limiting probability distribution, over the set of transient states, of the state of an individual that has yet to be absorbed. Let w and v be the right and left eigenvectors associated with the dominant eigenvalue of U, normalized so that ∥w∥ = ∥v∥ = 1. Darroch and Seneta (1965) defined two quasistationary distributions in terms of w and v. The limiting probability distribution of the state of an individual, given that absorption has not yet happened, converges to

$$\displaystyle \begin{aligned} {\mathbf{q}}_a = {\mathbf{w}} {} \end{aligned} $$

(11.42)

The limiting probability distribution of the state of an individual, given that absorption has not happened and will not happen for a long time, is

$$\displaystyle \begin{aligned} {\mathbf{q}}_b = \frac{{\mathbf{w}} \circ {\mathbf{v}}}{{\mathbf{w}}^{\mathsf{T}} {\mathbf{v}}} {} \end{aligned} $$

(11.43)

Horvitz and Tuljapurkar (2008) pointed out that the convergence to the quasistationary distribution implies that, in a stage-classified model, mortality eventually becomes independent of age.

Lemma 1

Let the dominant eigenvalue of U , guaranteed real and nonnegative by the Perron-Frobenius theorem, satisfy 0 < λ < 1, and let w and v be the right and left eigenvectors corresponding to λ, scaled so that w ^Tv = 1. Then

(11.44)

(11.45)

Proof

Equation (11.44) is proven in Caswell (2008, Section 6.1). Equation (11.45) is obtained by treating v as the right eigenvector of U ^T. □

Theorem 11.2.5

The derivative of the quasistationary distribution q _ais given by (11.44) . The derivative of the quasistationary distribution q _bis

(11.46)

where d w and d v are given by (11.44) and (11.45) respectively.

Proof

The derivative of q _a follows from its definition as the scaled right eigenvector of U. For q _b, differentiating (11.43) gives

(11.47)

(11.48)

Applying the vec operator gives

(11.49)

which simplifies to give (11.46). □

3 Life Lost Due to Mortality

The approach here makes it easy to compute the sensitivity of a variety of dependent variables calculated from the Markov chain. As an example of this flexibility, consider a recently developed demographic index, the number of years of life lost due to mortality (Vaupel and Canudas Romo 2003).

The transient states of the chains are age classes, absorption corresponds to death, and absorbing states correspond to age at death. Let μ _i be the mortality rate and $p_i=\exp (-\mu _i)$ the survival probability at age i. The matrix U has the p _i on the subdiagonal and zeros elsewhere. The matrix M has 1 − p _i on the diagonal and zeros elsewhere. Let f = B(:, 1) be the distribution of age at death and η ₁ the vector of expected longevity as a function of age.

A death at age i represents the loss of some number of years of life beyond that age. The expectation of that loss is given by the ith entry of η ₁, and the expected number of years lost over the distribution of age at death is $\eta ^\dagger = \boldsymbol {\eta }_1^{\mathsf {T}} {\mathbf {f}}$. This quantity also measures the disparity among individuals in longevity (Vaupel and Canudas Romo 2003). If everyone died at the identical age x, f would be a delta function at x and further life expectancy at age x would be zero; their product would give η ^† = 0. Declines in discrepancy have accompanied increases in life expectancy observed in developed countries (Edwards and Tuljapurkar 2005; Wilmoth and Horiuchi 1999). Thus it is useful to know how η ^† responds to changes in mortality.

Differentiating η ^† gives

$$\displaystyle \begin{aligned} d \eta^\dagger = \left( d \boldsymbol{\eta}_1^{\mathsf{T}} \right) {\mathbf{B}} {\mathbf{e}}_1 + \boldsymbol{\eta}_1^{\mathsf{T}} \left( d {\mathbf{B}} \right) {\mathbf{e}}_1. \end{aligned} $$

(11.50)

Applying the vec operator gives

$$\displaystyle \begin{aligned} d \eta^\dagger = {\mathbf{e}}_1^{\mathsf{T}} {\mathbf{b}}^{\mathsf{T}} d \boldsymbol{\eta}_1^{\mathsf{T}} + \left( {\mathbf{e}}_1^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{B}}. \end{aligned} $$

(11.51)

Substituting (11.23) for d η ₁ and (11.37) for dvec B gives

(11.52)

Simplifying and writing derivatives in terms of μ gives

(11.53)

Because mortality rates vary over several orders of magnitude with age, it is useful to present the results as elasticities,

$$\displaystyle \begin{aligned} {\epsilon \eta^\dagger \over \epsilon \boldsymbol{\mu}^{\mathsf{T}}} = \frac{1}{\eta^\dagger}\; {d \eta^\dagger \over d \boldsymbol{\mu}^{\mathsf{T}}} \; \mathcal{D}\,(\boldsymbol{\mu}). \end{aligned} $$

(11.54)

Figure 11.1 shows these elasticities for two populations chosen to have very different life expectancies: India in 1961, with female life expectancy of 45 years and η ^† = 23.9 years and Japan in 2006, with female life expectancy of 86 years and η ^† = 10.1 years (Human Mortality Database 2016). In both cases, elasticities are positive from birth to some age (≈50 for India, ≈85 for Japan) and negative thereafter. This implies that reductions in infant and early life mortality would reduce η ^†, whereas reductions in old age mortality would increase η ^†. Zhang and Vaupel (2009) have shown that the existence of such a critical age is a general property of these models.

4 Ergodic Chains

Now let us consider perturbations of an ergodic finite-state Markov chain with an irreducible, primitive, column-stochastic transition matrix P of dimension s × s. The stationary distribution π is given by the right eigenvector, scaled to sum to 1, corresponding to the dominant eigenvalue λ ₁ = 1 of P. The fundamental matrix of the chain is ${\mathbf {Z}} = \left ( {\mathbf {I}} - {\mathbf {P}} + \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right )^{-1}$ (Kemeny and Snell 1960).

We are interested only in perturbations that preserve the column-stochasticity of P; i.e., for which P remains a stochastic matrix. Such perturbations are easily defined when the p _ij depend explicitly on a parameter vector θ. However, when the parameters of interest are the p _ij themselves, an implicit parameterization must be defined to preserve the stochastic nature of P under perturbation (Conlisk 1985; Caswell 2001). In Sect. 11.4.5 we will explore new expressions for two different forms of implicit parameterization.

Previous studies of perturbations of ergodic chains focus almost completely on perturbations of the stationary distribution, and are divided between those focusing on sensitivity as a derivative (e.g., Schweitzer 1968; Conlisk 1985; Golub and Meyer 1986) and studies focusing on perturbation bounds and condition numbers (Funderlic and Meyer 1986; Meyer 1994; Seneta 1988; Hunter 2005; Kirkland 2003); for reviews see Cho and Meyer (2000) and Kirkland et al. (2008). The approach here is similar in spirit to that of Schweitzer (1968), Conlisk (1985), and Golub and Meyer (1986), in that we focus on derivatives of Markov chain properties with respect to parameter perturbations, but taking advantage of the matrix calculus approach. We do not consider perturbation bounds here.

4.1 The Stationary Distribution

Theorem 11.4.1

Let π be the stationary distribution, satisfying P π = π and 1 ^Tπ = 1. The sensitivity of π is

(11.55)

where Z is the fundamental matrix of the chain.

Proof

The vector π is the right eigenvector of P, scaled to sum to 1. Applying Lemma 1, and noting that λ = 1 and 1 ^TP = 1 ^T, gives $d \boldsymbol {\pi } = {\mathbf {Z}} \left [ \boldsymbol {\pi }^{\mathsf {T}} \otimes \left ( {\mathbf {I}}_s - \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right ) \right ] d \mbox{vec} \, {\mathbf {P}}$. Noting that Z π = π and simplifying the Kronecker products yields (11.55). □

Based on an analysis of eigenvector sensitivity (Meyer and Stewart 1982), Golub and Meyer (1986) derived an expression for the derivative of π to a change in a single element of P using the group generalized inverse $\left ( {\mathbf {I}} -{\mathbf {P}} \right )^\#$ of I −P. Since $\left ( {\mathbf {I}} -{\mathbf {P}} \right )^\# = {\mathbf {Z}} - \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}}$ (Golub and Meyer 1986), expression (11.55) is exactly the Golub-Meyer result expressed in matrix calculus notation. Our results here permit sensitivity analysis of functions of π using only the chain rule. If g(π) is a vector- or scalar-valued function of π, then

$$\displaystyle \begin{aligned} d g(\boldsymbol{\pi}) = {d g \over d \boldsymbol{\pi}^{\mathsf{T}} } \; {d \boldsymbol{\pi} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; d \mbox{vec} \, {\mathbf{P}} \end{aligned} $$

(11.56)

Some examples will appear in Sect. 11.5.

4.2 The Fundamental Matrix

The fundamental matrix ${\mathbf {Z}} = \left ( {\mathbf {I}} - {\mathbf {P}} + \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right )^{-1}$ plays a role in ergodic chains similar to that played by N ₁ in absorbing chains (Kemeny and Snell 1960). It has been extended using generalized inverses (Meyer 1975; Kemeny 1981), but we do not consider those extensions here.

Theorem 11.4.2

The sensitivity of the fundamental matrix is

(11.57)

Proof

From (2.82),

(11.58)

(11.59)

Substituting (11.55) for d π and simplifying gives (11.57). □

4.3 The First Passage Time Matrix

Let ${\mathbf {R}} = \left ( r_{ij}^{~} \right )$ be the matrix of mean first passage times from j to i, given by Iosifescu (1980, Thm. 4.7).

(11.60)

Again, this is the transpose of the expression obtained when P is row-stochastic.

Theorem 11.4.3

The sensitivity of R is

(11.61)

where d π is given by (11.55) and dvec Z is given by (11.57) .

Proof

Differentiating (11.60) gives

(11.62)

Applying the vec operator gives

(11.63)

Using (2.82) for $d \mbox{vec} \, \left [ \mathcal {D}\, (\boldsymbol {\pi } )^{-1} \right ]$, (2.69) for $d \mbox{vec} \, \mathcal {D}\,(\boldsymbol {\pi })$, and (11.12) for dvec Z _dg yields

(11.64)

which simplifies to give (11.61). □

4.4 Mixing Time and the Kemeny Constant

The mixing time K of a chain is the mean time required to get from a specified state to a state chosen at random from the stationary distribution π. Remarkably, K is independent of the starting state (Grinstead and Snell 2003; Hunter 2006) and is sometimes called Kemeny’s constant; it is a measure of the rate of convergence to stationarity, and is K = trace(Z) (Hunter 2006). In addition to being a quantity of interest in itself, the rate of convergence also plays a role in the sensitivity of the stationary distribution of ergodic chains (Hunter 2005; Mitrophanov 2005).

Theorem 11.4.4

The sensitivity of K is

$$\displaystyle \begin{aligned} dK = \left( {vec} \, {\mathbf{I}}_s \right)^{\mathsf{T}} d {vec} \, {\mathbf{Z}}. {} \end{aligned} $$

(11.65)

Proof

Differentiating K = trace(Z) gives

$$\displaystyle \begin{aligned} dK = {\mathbf{1}}^{\mathsf{T}} \left( {\mathbf{I}} \circ d {\mathbf{Z}} \right) \mathbf{1}. \end{aligned} $$

(11.66)

Applying the vec operator gives

$$\displaystyle \begin{aligned} dK = \left( {\mathbf{1}}^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}} ) d \mbox{vec} \, {\mathbf{Z}} \end{aligned} $$

(11.67)

which simplifies to (11.65). □

4.5 Implicit Parameters and Compensation

Theorems 11.4.1, 11.4.2, 11.4.3, and 11.4.4 are written in terms of dvec P. However, perturbation of any element, say p _kj, to p _kj + θ _kj, must be compensated for by adjustments of the other elements in column j so that the column sum remains equal to 1 (Conlisk 1985). Two kinds of compensation are likely to be of use in applications: additive and proportional. Additive compensation adjusts all the elements of the column by an equal amount, distributing the perturbation θ _kj additively over column j. Proportional compensation distributes θ _kj in proportion to the values of the p _ij, for i ≠ k. Proportional compensation is attractive because it preserves the pattern of zero and non-zero elements within P.

To develop the compensation formulae, let us start by considering a probability vector p, of dimension s × 1, with p _i ≥ 0 and ∑_ip _i = 1. Let θ _i be the perturbation of p _i, and write

$$\displaystyle \begin{aligned} {\mathbf{p}} (\boldsymbol{\theta}) = {\mathbf{p}} (0) + {\mathbf{A}} \boldsymbol{\theta} {} \end{aligned} $$

(11.68)

for some matrix A to be determined. If y is a function of p, then

$$\displaystyle \begin{aligned} dy = {d y \over d {\mathbf{p}}^{\mathsf{T}}} \; {d {\mathbf{p}} \over d \boldsymbol{\theta}^{\mathsf{T}}} \; d \boldsymbol{\theta} \end{aligned} $$

(11.69)

evaluated at θ = 0.

Additive compensation

For the case of additive compensation, we write

$$\displaystyle \begin{aligned} \begin{array}{rcl} p_1(\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_1(0) + \theta_1 - \frac{\theta_2}{s-1} - \cdots - \frac{\theta_s}{s-1} \\ p_2 (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_2(0) -\frac{\theta_1}{s-1} + \theta_2 - \cdots - \frac{\theta_s}{s-1} \\ &\displaystyle \vdots&\displaystyle {} \\ p_s (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_s(0) -\frac{\theta_1}{s-1} -\frac{\theta_2}{s-1}- \cdots + \theta_s \end{array} \end{aligned} $$

(11.70)

The perturbation θ ₁ is added to p ₁ and compensated for by subtracting θ ₁∕(s − 1) from all other entries of p; clearly ∑_ip _i(θ) = 1 for any perturbation vector θ.

The system of Eqs. (11.70) can be written

$$\displaystyle \begin{aligned} {\mathbf{p}}(\boldsymbol{\theta}) = {\mathbf{p}}(0) + \left( {\mathbf{I}} - \frac{1}{s-1} {\mathbf{C}} \right) \boldsymbol{\theta}. \end{aligned} $$

(11.71)

Defining E to be a matrix of ones, then the matrix C can be written (as a so-called Toeplitz matrix) as C = E −I, with zeros on the diagonal and ones elsewhere. Thus the matrix A in (11.68) is

$$\displaystyle \begin{aligned} {\mathbf{A}} = {\mathbf{I}} - \frac{1}{s-1} {\mathbf{C}} {} \end{aligned} $$

(11.72)

Proportional compensation

For proportional compensation, assume that p _i < 1 for all i. The vector p(θ) is

$$\displaystyle \begin{aligned} \begin{array}{rcl} p_1(\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_1(0) + \theta_1 - \frac{p_1 \theta_2}{1-p_2} - \cdots - \frac{p_1 \theta_s}{1-p_s} \\ p_2 (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_2(0) -\frac{p_2 \theta_1}{1-p_1} + \theta_2 - \cdots - \frac{p_2 \theta_s}{1-p_s} \\ &\displaystyle \vdots&\displaystyle {} \\ p_s (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_s(0) -\frac{p_s \theta_1}{1-p_1} -\frac{p_s \theta_2}{1-p_2}- \cdots + \theta_s \end{array} \end{aligned} $$

(11.73)

The perturbation θ ₁ is added to p ₁ and compensated for by subtracting θ ₁p _i∕(1 − p ₁) from the ith entry of p. Again, ∑_ip _i(θ) = 1 for any perturbation vector θ.

Equation (11.73) can be written

(11.74)

so that the matrix A in (11.68) is

$$\displaystyle \begin{aligned} {\mathbf{A}} = {\mathbf{I}} - \mathcal{D}\,({\mathbf{p}}) \; {\mathbf{C}} \; \mathcal{D}\,(\mathbf{1} - {\mathbf{p}})^{-1} {} \end{aligned} $$

(11.75)

The transition matrix

We have derived compensation formulae for a single probability vector p. Now consider perturbation of a probability matrix P, each column of which is a probability vector. Define a perturbation matrix Θ where θ _ij is the perturbation of p _ij. Perturbations of column j are to be compensated by a matrix A _j, so that

(11.76)

where A _i compensates for the changes in column i of P. Applying the vec operator to (11.76) gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{vec} \, {\mathbf{P}}(\boldsymbol{\Theta}) &=& \mbox{vec} \, {\mathbf{P}}(0) + \left(\begin{array}{ccc} {\mathbf{A}}_1 & & \\ & \ddots & \\ && {\mathbf{A}}_s \end{array}\right) \mbox{vec} \, \boldsymbol{\Theta} \end{array} \end{aligned} $$

(11.77)

$$\displaystyle \begin{aligned} \begin{array}{rcl} &=& \mbox{vec} \, {\mathbf{P}}(0) + \sum_{i=1}^s \left( {\mathbf{E}}_{ii} \otimes {\mathbf{A}}_i \right) \mbox{vec} \, \boldsymbol{\Theta}. {} \end{array} \end{aligned} $$

(11.78)

The terms in the summation in (11.78) are recognizable as the vec of the product A _iΘE _ii; thus

$$\displaystyle \begin{aligned} {\mathbf{P}}(\boldsymbol{\Theta}) = {\mathbf{P}}(0) + \sum_{i=1}^s {\mathbf{A}}_i \boldsymbol{\Theta} {\mathbf{E}}_{ii} {} \end{aligned} $$

(11.79)

where E _ii is a matrix with a 1 in the (i, i) entry and zeros elsewhere.

Theorem 11.4.5

Let P be a column-stochastic s × s transition matrix. Let Θ be a matrix of perturbations, where θ _ijis applied to p _ij, and the other entries of Θ compensate for the perturbation. Let C = E −I . If compensation is additive, then

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{P}}(\boldsymbol{\Theta}) &\displaystyle =&\displaystyle {\mathbf{P}}(0) + \left( {\mathbf{I}} - \frac{1}{s-1} {\mathbf{C}} \right) \boldsymbol{\Theta} {} \end{array} \end{aligned} $$

(11.80)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {d {vec} \, {\mathbf{P}} \over d {vec} \,^{\mathsf{T}} \boldsymbol{\Theta}} &\displaystyle =&\displaystyle \left[ {\mathbf{I}}_{s^2} - \frac{1}{s-1} \left( {\mathbf{I}}_s \otimes {\mathbf{C}} \right) \right]. {} \end{array} \end{aligned} $$

(11.81)

If compensation is proportional, then

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{P}}(\boldsymbol{\Theta}) &\displaystyle =&\displaystyle {\mathbf{P}}(0) + \sum_{i=1}^s \left\{ {\mathbf{I}} - \mathcal{D}\, \left[ {\mathbf{P}}(:,i) \right] \; {\mathbf{C}}\; \mathcal{D}\, \left[ \mathbf{1} - {\mathbf{P}}(:,i) \right]^{-1} \right\} \boldsymbol{\Theta} {\mathbf{E}}_{ii} \qquad {} \end{array} \end{aligned} $$

(11.82)

(11.83)

Proof

P( Θ) is given by (11.79). If compensation is additive, A _i is given by (11.72) for all i. Substituting into (11.79) gives (11.80). Differentiating (11.80) and applying the vec operator gives (11.81).

If compensation is proportional, substituting (11.75) for A _i in (11.79) gives (11.82). Differentiating yields

$$\displaystyle \begin{aligned} d {\mathbf{P}} = \left( d \boldsymbol{\theta} \right) \sum_{i=1}^s {\mathbf{E}}_{ii} - \sum_{i=1}^s \mathcal{D}\,[ {\mathbf{P}}(:,1) ] \; {\mathbf{C}} \; \mathcal{D}\,[ \mathbf{1} - {\mathbf{P}}(:,i) ]^{-1} (d \boldsymbol{\Theta}) {\mathbf{E}}_{ii}. \end{aligned} $$

(11.84)

Using the vec operator gives (11.83). □

Perturbations of P subject to compensation are given by perturbations of Θ. Thus for any function y(P) we can write

$$\displaystyle \begin{aligned} \left. {d y \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} = {d y \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; {d \mbox{vec} \, {\mathbf{P}} \over d \mbox{vec} \,^{\mathsf{T}} \boldsymbol{\Theta}} \end{aligned} $$

(11.85)

where dvec P∕dvec ^TΘ is given (for additive and proportional compensation) by Theorem 11.4.5. The slight notational complexity is worthwhile for clarifying how to use Theorem 11.4.5 in practice.

5 Species Succession in a Marine Community

Markov chains are used by ecologists as models of species replacement (succession) in ecological communities; (e.g., Horn 1975; Hill et al. 2004; Nelis and Wootton 2010). In these models, the state of a point on a landscape is given by the species occupying that point. The entry p _ij of P is the probability that species j is replaced by species i between t and t + 1. If a community consists of a large number of points independently subject to the transition probabilities in P, the stationary distribution π will give the relative frequencies of species in the community at equilibrium.

Hill et al. (2004) used a Markov chain to describe a community of encrusting organisms occupying rock surfaces at 30–35 m depth in the Gulf of Maine. The Markov chain contained 14 species plus an additional state (“bare rock”) for unoccupied substrate. The matrix P was estimated from longitudinal data (Hill et al. 2002, 2004) and is given, along with a list of species names, in Appendix B. We will use the results of this chapter to analyze the sensitivity of species diversity and the Kemeny constant to the processes of colonization and replacement that determing P.

5.1 Biotic Diversity

The stationary distribution π, with the species numbered in order of decreasing abundance and bare rock placed at the end as state 15, is shown in Fig. 11.2. The two dominant species are an encrusting sponge (called Hymedesmia) and a bryozoan (Crisia).

The entropy of this stationary distribution, $H(\boldsymbol {\pi }) = -\boldsymbol {\pi }^{\mathsf {T}} (\log \boldsymbol {\pi })$, where the logarithm is applied elementwise, is used as an index of biodiversity; it is maximal when all species are equally abundant and goes to 0 in a community dominated by a single species. The sensitivity of H is

$$\displaystyle \begin{aligned} d H = - \left( \log \; \boldsymbol{\pi}^{\mathsf{T}} + {\mathbf{1}}^{\mathsf{T}} \right) d \boldsymbol{\pi} {} \end{aligned} $$

(11.86)

Most ecologists, however, would not include bare substrate in a measure of biodiversity, so we define instead a “biotic diversity” $H_b(\boldsymbol {\pi }) = H \left ( \boldsymbol {\pi }_b \right )$ where

$$\displaystyle \begin{aligned} \boldsymbol{\pi}_b = \frac{{\mathbf{G}} \boldsymbol{\pi}}{\|{\mathbf{G}} \boldsymbol{\pi} \|}. {} \end{aligned} $$

(11.87)

The matrix G, of dimension 14 × 15, is a 0–1 matrix that selects rows 1–14 of π. Because π is positive, ∥G π∥ = 1 ^TG π. Differentiating π _b gives

$$\displaystyle \begin{aligned} d \boldsymbol{\pi}_b = \left( \frac{{\mathbf{G}}}{{\mathbf{1}}^{\mathsf{T}} {\mathbf{G}} \boldsymbol{\pi}} - \frac{{\mathbf{G}} \boldsymbol{\pi} {\mathbf{1}}^{\mathsf{T}} {\mathbf{G}}}{\left( {\mathbf{1}}^{\mathsf{T}} {\mathbf{G}} \boldsymbol{\pi} \right)^2} \right) d \boldsymbol{\pi} \end{aligned} $$

(11.88)

which simplifies to

$$\displaystyle \begin{aligned} d \boldsymbol{\pi}_b = \left( \frac{{\mathbf{G}} - \boldsymbol{\pi}_b {\mathbf{1}}^{\mathsf{T}} {\mathbf{G}}}{{\mathbf{1}}^{\mathsf{T}} {\mathbf{G}} \boldsymbol{\pi}} \right) d \boldsymbol{\pi} {} \end{aligned} $$

(11.89)

This model contains no explicit parameters; perturbations of the transition probabilities themselves are of interest and a compensation pattern is needed. Because the relative magnitudes of the entries in a column of P reflect the relative abilities of species to capture or to hold space, proportional compensation is appropriate in this case because it preserves these relative abilities.

The sensitivity and elasticity of the biotic diversity H _b to changes in the matrix P, subject to proportional compensation, are

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left. {d H_b \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} &\displaystyle =&\displaystyle \underbrace{{d H_b \over d \boldsymbol{\pi}_b^{\mathsf{T}}}}_1 \; \underbrace{{d \boldsymbol{\pi}_b \over d \boldsymbol{\pi}^{\mathsf{T}}}}_2 \; \underbrace{{d \boldsymbol{\pi} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}}}_3 \; \underbrace{{d \mbox{vec} \, {\mathbf{P}} \over d \mbox{vec} \,^{\mathsf{T}} \boldsymbol{\Theta}}}_4 {} \end{array} \end{aligned} $$

(11.90)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left. {\epsilon H_b \over \epsilon \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} &\displaystyle =&\displaystyle \frac{1}{H_b} \; {d H_b \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; \mathcal{D}\,(\mbox{vec} \, {\mathbf{P}}) {} \end{array} \end{aligned} $$

(11.91)

Term 1 on the right hand side of (11.90) is the derivative of H _b with respect to π _b, and is given by (11.86). Term 2 is the derivative of the biotic diversity vector π _b with respect to the full diversity vector π, given by (11.89). Term 3 is the derivative of the diversity vector π with respect to the transition matrix P, given by, (11.55). Finally, Term 4 is the derivative of the matrix P taking into account the compensation structure in (11.83).

The sensitivity and elasticity vectors (11.90) and (11.91) are of dimension 1 × s ² = 1 × 255. To reduce the number of independent perturbations, we consider subsets of the p _ij: disturbance (in which a species is replaced by bare rock), colonization of unoccupied space, replacement of one species by another, and persistence of a species in its location, where

$$\displaystyle \begin{aligned} \begin{array}{rcl} P[\mbox{disturbance of sp. }i ] &\displaystyle =&\displaystyle p_{si} {} \\ {} P[\mbox{colonization by sp. }i ] &\displaystyle =&\displaystyle p_{is} \\ {} P[\mbox{persistence of sp. }i ] &\displaystyle =&\displaystyle p_{ii} \\ {} P[\mbox{replacement of sp. }i ] &\displaystyle =&\displaystyle \sum_{k \neq i,s} p_{ki} \\ {} P[\mbox{replacement by sp. }i ] &\displaystyle =&\displaystyle \sum_{j \neq i, s} p_{ij}. {} \end{array} \end{aligned} $$

Extracting the corresponding elements of ${\epsilon H_b \over \epsilon \mbox{vec} \,^{\mathsf {T}} {\mathbf {P}}}$ gives the elasticities to these classes of probabilities. Figure 11.3 shows that the dominant species (1 and 2) have impacts that are larger than, and opposite in sign to, those of the remaining species. Biodiversity would be enhanced by increasing the disturbance of, or the replacement of, species 1 and 2, and reduced by increasing the rates of colonization by, persistence of, or replacement by species 1 and 2.

5.2 The Kemeny Constant and Ecological Mixing

Ecologists have used several measures of the rate of convergence of communities modelled by Markov chains, including the damping ratio and Dobrushin’s coefficient of ergodicity (Hill et al. 2004). The Kemeny constant K is an interesting addition to this list; it gives the expected time to get from any initial state to a state selected at random from the stationary distribution (Hunter 2006). Once reaching that state, the behavior of the chain and the stationary process are indistinguishable.

The sensitivity of K, subject to compensation, is

$$\displaystyle \begin{aligned} \left. {d K \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} = {d K \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{Z}}} \; {d \mbox{vec} \, {\mathbf{Z}} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; {d \mbox{vec} \, {\mathbf{P}} \over d \mbox{vec} \,^{\mathsf{T}} \boldsymbol{\Theta}} \end{aligned} $$

(11.92)

where the three terms on the right hand side are given by (11.65), (11.57), and (11.83), respectively.

Figure 11.4 shows the sensitivities dK∕dvec ^TP, subject to proportional compensation, and aggregated as in Fig. 11.3. Unlike the case with H _b, the two dominant species do not stand out from the others. Increases in the rates of replacement will speed up convergence, and increases in persistence will slow convergence. The disturbance of, colonization by, persistence of, and replacement of species 6 (it is a sea anemone, Urticina crassicornis) have particularly large impacts on K. Examination of row 6 and column 6 of P (Appendix B) shows that U. crassicornis has the highest probability of persistence (p ₆₆ = 0.86), and one of the lowest rates of disturbance, in the community. While it is far from dominant (Fig. 11.2), it has a major impact on the rate of mixing.

6 Discussion

Given that many properties of finite state Markov chains can be expressed as simple matrix expressions, matrix calculus is an attractive approach to finding the sensitivity and elasticity to parameter perturbations. Most of the literature on perturbation analysis of Markov chains has focused on the stationary distribution of ergodic chains, but the approach here is equally applicable to absorbing chains, and to dependent variables other than the stationary distribution. The perturbation of ergodic chains is often studied using generalized inverses, since the influential studies of Meyer (Meyer 1975, 1994; Golub and Meyer 1986; Funderlic and Meyer 1986). Matrix calculus provides a complementary approach; the sensitivity of the stationary distribution π obtained here agrees with the result obtained by Golub and Meyer (1986) using the group generalized inverse.

The examples shown here are typical of cases where absorbing or ergodic Markov chains are used in population biology and ecology. In each example, the dependent variables of interest are functions several steps removed from the chain itself. The ease with which one can differentiate such functions is a particularly attractive property of the matrix calculus approach.

References

Caswell, H. 2001. Matrix Population Models: Construction, Analysis, and Interpretation. 2nd edition. Sinauer Associates, Sunderland, MA.
Google Scholar
Caswell, H., 2006. Applications of Markov chains in demography. Pages 319–334 in MAM2006: Markov Anniversary Meeting. Boson Books, Raleigh, North Carolina.
Google Scholar
Caswell, H. 2008. Perturbation analysis of nonlinear matrix population models. Demographic Research 18:59–116.
Article Google Scholar
Caswell, H. 2009. Stage, age and individual stochasticity in demography. Oikos 118:1763–1782.
Article Google Scholar
Caswell, H. 2011. Perturbation analysis of continuous-time absorbing Markov chains. Numerical Linear Algebra with Applications 18:901–917.
Article Google Scholar
Cho, G. E., and C. D. Meyer. 2000. Comparison of perturbation bounds for the stationary distribution of a Markov chain. Linear Algebra and its Applications 335:137–150.
Article Google Scholar
Conlisk, J. 1985. Comparative statics for Markov chains. Journal of Economic Dynamics and Control 9:139–151.
Article Google Scholar
Darroch, J. N., and E. Seneta. 1965. On quasi-stationary distributions in absorbing discrete-time finite Markov chains. Journal of Applied Probability 2:88–100.
Article Google Scholar
Edwards, R. D., and S. Tuljapurkar. 2005. Inequality in life spans and a new perspective on mortality convergence across industrialized countries. Population and Development Review 31:645–674.
Article Google Scholar
Funderlic, R. E., and C. D. Meyer, Jr. 1986. Sensitivity of the stationary distribution vector for an ergodic Markov chain. Linear Algebra and its Applications 76:1–17.
Article Google Scholar
Golub, G. H., and C. D. Meyer, Jr. 1986. Using the QR factorization and group inversion to compute, differentiate, and estimate the sensitivity of stationary probabilities for Markov chains. SIAM Journal on Algebraic and Discrete Methods 7:273–281.
Article Google Scholar
Grinstead, C. M., and J. L. Snell. 2003. Introduction to probability. Second edition. American Mathematical Society.
Google Scholar
Hill, M. F., J. D. Witman, and H. Caswell. 2002. Spatio-temporal variation in Markov chain models of subtidal community succession. Ecology Letters 5:665–675.
Article Google Scholar
Hill, M. F., J. D. Witman, and H. Caswell. 2004. Markov chain analysis of succession in a rocky subtidal community. The American Naturalist 164:E46–E61.
Article Google Scholar
Horn, H. S., 1975. Markovian properties of forest succession. Pages 196–211 in M. L. Cody and J. M. Diamond, editors. Ecology and evolution of communities. Harvard University Press, Cambridge, MA.
Google Scholar
Horvitz, C. C., and S. Tuljapurkar. 2008. Stage dynamics, period survival, and mortality plateaus. American Naturalist 172:203–215.
Article Google Scholar
Human Mortality Database. 2016. University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). www.mortality.org URL www.mortality.org.
Hunter, J. J. 2005. Stationary distributions and mean first passage times of perturbed Markov chains. Linear Algebra and its Applications 410:217–243.
Article Google Scholar
Hunter, J. J. 2006. Mixing times with applications to perturbed Markov chains. Linear Algebra and its Applications 417:108–123.
Article Google Scholar
Iosifescu, M. 1980. Finite Markov Processes and Their Applications. Wiley, New York, New York.
Google Scholar
Kemeny, J. G. 1981. Generalization of a fundamental matrix. Linear Algebra and its Applications 38:193–206.
Article Google Scholar
Kemeny, J. G., and J. L. Snell. 1960. Finite Markov Chains. Van Nostrand, Princeton, New Jersey.
Google Scholar
Kirkland, S. 2003. Conditioning properties of the stationary distribution for a Markov chain. Electronic Journal of Linear Algebra 10:1–15.
Article Google Scholar
Kirkland, S. J., M. M. Neumann, and N.-S. Sze. 2008. On optimal condition numbers for Markov chains. Numerische Mathematik 110:521–537.
Article Google Scholar
Meyer, C. D. 1975. The role of the group generalized inverse in the theory of finite Markov chains. SIAM Review 17:443–464.
Article Google Scholar
Meyer, C. D. 1994. Sensitivity of the stationary distribution of a Markov chain. SIAM Journal of Matrix Analysis and Applications 15:715–728.
Article Google Scholar
Meyer, C. D., and G. W. Stewart. 1982. Derivatives and perturbations of eigenvectors. SIAM Journal of Numerical Analysis 25:679–691.
Article Google Scholar
Mitrophanov, A. Y. 2003. Stability and exponential convergence of continuous-time Markov chains. Journal of Applied Probability 40:970–979.
Article Google Scholar
Mitrophanov, A. Y. 2005. Sensitivity and convergence of uniformly ergodic Markov chains. Journal of Applied Probability 42:1003–1014.
Article Google Scholar
Mitrophanov, A. Y., A. Lomsadze, and M. Borodovsky. 2005. Sensitivity of hidden Markov models. Journal of Applied Probability 42:632–642.
Article Google Scholar
Nelis, L. C., and J. T. Wootton. 2010. Treatment-based Markov chain models clarify mechanisms of invasion in an invaded grassland community. Proceedings B, The Royal Society of London 277:539.
Article Google Scholar
Schweitzer, P. J. 1968. Perturbation theory and finite Markov chains. Journal of Applied Probability 5:401–413.
Article Google Scholar
Seneta, E. 1988. Perturbation of the stationary distribution measured by ergodicity coefficients. Advances in Applied Probability 20:228–230.
Article Google Scholar
Seneta, E. 1993. Sensitivity of finite Markov chains under perturbation. Statistics and Probability Letters 17:163–168.
Article Google Scholar
Vaupel, J. W., and V. Canudas Romo. 2003. Decomposing change in life expectancy: a bouquet of formulas in honor of Nathan Keyfitz’s 90th birthday. Demography 40:201–216.
Article Google Scholar
Wilmoth, J. R., and S. Horiuchi. 1999. Rectangularization revisited: variability of age at death within human populations. Demography 36:475–495.
Article Google Scholar
Zhang, Z., and J. W. Vaupel. 2009. The age separating early deaths from late deaths. Demographic Research 20:721–730.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biodiversity & Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
Hal Caswell

Authors

Hal Caswell
View author publications
You can also search for this author in PubMed Google Scholar

Appendices

A Appendix A: Proofs

Theorems 11.2.1 and 11.2.2 give the sensitivities of the moments of the number of visits to transient states and of the time to absorption, respectively. These results are obtained by applying matrix calculus to the expressions for the moments. Proofs are given in the text for the first two moments; the proofs for the others follow the same steps but introduce no new concepts, and so are presented here.

1.1 A.1 Derivatives of the Moments of Occupancy Times

To continue the proof of Theorem 11.2.1, take partial differentials of N ₃ in (11.5) with respect to N ₁ and N _dg, to obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_1$}}} {\mathbf{N}}_3 &\displaystyle =&\displaystyle \left( 6 {\mathbf{N}}_{\mathrm{dg}}^2 - 6 {\mathbf{N}}_{\mathrm{dg}} + {\mathbf{I}} \right) d {\mathbf{N}}_1 \end{array} \end{aligned} $$

(11.93)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_{\mathrm{dg}}$}}} {\mathbf{N}}_3 &\displaystyle =&\displaystyle 6 \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_{\mathrm{dg}} {\mathbf{N}}_1 + 6 {\mathbf{N}}_{\mathrm{dg}} \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_1 - 6 \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_1 \end{array} \end{aligned} $$

(11.94)

Applying the vec operator to each term and using Roth’s theorem gives

(11.95)

(11.96)

Substituting (11.95) and (11.96) into (11.13) gives (11.9).

Taking partial differentials of N ₄ in (11.6) gives

(11.97)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_{\mathrm{dg}}$}}} {\mathbf{N}}_4 &\displaystyle =&\displaystyle 24 \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_{\mathrm{dg}}^2 {\mathbf{N}}_1 + 24 {\mathbf{N}}_{\mathrm{dg}} \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_{\mathrm{dg}} {\mathbf{N}}_1 +24 {\mathbf{N}}_{\mathrm{dg}}^2 \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_1 \\ {} &\displaystyle &\displaystyle -36 \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_{\mathrm{dg}} {\mathbf{N}}_1 -36 {\mathbf{N}}_{\mathrm{dg}} \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_1 + 14 \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_1. \end{array} \end{aligned} $$

(11.98)

Applying the vec operator yields

(11.99)

(11.100)

Substituting (11.99) and (11.100) into (11.13) gives (11.10).

1.2 A.2 Derivatives of the Moments of Time to Absorption

To continue the proof of Theorem 11.2.2, take partial differentials of η ₃, in (11.21) with respect to η ₁ and N ₁, to obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {$\boldsymbol{\eta}_1$}}} \boldsymbol{\eta}_3^{\mathsf{T}} &\displaystyle =&\displaystyle \left( d \boldsymbol{\eta}_1^{\mathsf{T}} \right) \left( 6 {\mathbf{N}}_1^2 - 6 {\mathbf{N}}_1 + {\mathbf{I}} \right) \end{array} \end{aligned} $$

(11.101)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{ {\mathbf{N}}_1} \boldsymbol{\eta}_3^{\mathsf{T}} &\displaystyle =&\displaystyle 6 \boldsymbol{\eta}_1^{\mathsf{T}} \left( d {\mathbf{N}}_1 \right) + 6 \boldsymbol{\eta}_1^{\mathsf{T}} {\mathbf{N}}_1 \left( d {\mathbf{N}}_1 \right) -6 \boldsymbol{\eta}_1^{\mathsf{T}} \left( d {\mathbf{N}}_1 \right). \end{array} \end{aligned} $$

(11.102)

Applying the vec operator yields

(11.103)

(11.104)

which combine to yield (11.25).

The partial differentials of η ₄ in (11.22) with respect to η ₁ and N ₁ are

(11.105)

(11.106)

Applying the vec operator to each equation gives

(11.107)

(11.108)

which combine to give (11.26).

B Appendix B: Marine Community Matrix

	Model states	Species type	State ID	Number
1	Hymedesmia 1 sp.	Sponge	HY1	14875
2	Crisia eburnea	Bryozoan	CRI	9915
3	Myxilla fimbriata	Sponge	MYX	4525
4	Mycale lingua	Sponge	MYC	3001
5	Filograna implexa	Polychaete	FIL	2219
6	Urticina crassicornis	Sea anemone	URT	992
7	Ascidia callosa	Ascidian	ASC	1052
8	Aplidium pallidum	Ascidian	APL	1166
9	Hymedesmia 2 sp.	Sponge	HY2	1226
10	Idmidronea atlantica	Bryozoan	IDM	730
11	Coralline Algae	Encrusting algae	COR	875
12	Metridium senile	Sea anemone	MET	1298
13	Parasmittina jeffreysi	Bryozoan	PAR	402
14	Spirorbis spirorbis	Polychaete	SPI	225
15	Bare Rock		BR	4266

The transition matrix for the marine benthic community (Hill et al. 2004) is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &&{\mathbf{P}} =\\ &&\left(\begin{array}{ccccccccccccccc} 0.771 & 0.145 & 0.052 & 0.017 & 0.117 & 0.009 & 0.241 & 0.199 & 0.056 & 0.309 & 0.056 & 0.025 & 0.321 & 0.158 & 0.101 \\ 0.102 & 0.609 & 0.061 & 0.054 & 0.218 & 0.024 & 0.223 & 0.235 & 0.147 & 0.228 & 0.222 & 0.068 & 0.179 & 0.448 & 0.320 \\ 0.017 & 0.031 & 0.710 & 0.006 & 0.035 & 0.012 & 0.051 & 0.038 & 0.026 & 0.031 & 0.028 & 0.018 & 0.023 & 0.018 & 0.025 \\ 0.004 & 0.011 & 0.004 & 0.839 & 0.004 & 0.000 & 0.016 & 0.018 & 0.011 & 0.010 & 0.008 & 0.030 & 0.000 & 0.018 & 0.009 \\ 0.015 & 0.028 & 0.020 & 0.005 & 0.404 & 0.016 & 0.080 & 0.089 & 0.020 & 0.027 & 0.036 & 0.016 & 0.063 & 0.085 & 0.062 \\ 0.001 & 0.005 & 0.004 & 0.000 & 0.008 & 0.863 & 0.024 & 0.007 & 0.006 & 0.006 & 0.000 & 0.000 & 0.000 & 0.006 & 0.005 \\ 0.018 & 0.022 & 0.008 & 0.004 & 0.033 & 0.001 & 0.105 & 0.044 & 0.011 & 0.042 & 0.025 & 0.010 & 0.030 & 0.030 & 0.048 \\ 0.012 & 0.025 & 0.008 & 0.006 & 0.032 & 0.007 & 0.041 & 0.154 & 0.026 & 0.031 & 0.020 & 0.016 & 0.020 & 0.018 & 0.034 \\ 0.002 & 0.011 & 0.025 & 0.008 & 0.013 & 0.016 & 0.014 & 0.015 & 0.586 & 0.010 & 0.007 & 0.004 & 0.003 & 0.018 & 0.013 \\ 0.014 & 0.015 & 0.003 & 0.004 & 0.007 & 0.003 & 0.033 & 0.027 & 0.021 & 0.165 & 0.007 & 0.003 & 0.020 & 0.030 & 0.031 \\ 0.003 & 0.012 & 0.005 & 0.006 & 0.006 & 0.004 & 0.025 & 0.016 & 0.006 & 0.013 & 0.507 & 0.001 & 0.017 & 0.006 & 0.017 \\ 0.002 & 0.008 & 0.007 & 0.011 & 0.005 & 0.007 & 0.005 & 0.020 & 0.005 & 0.008 & 0.002 & 0.537 & 0.000 & 0.006 & 0.017 \\ 0.005 & 0.005 & 0.002 & 0.000 & 0.006 & 0.000 & 0.014 & 0.009 & 0.001 & 0.012 & 0.005 & 0.003 & 0.248 & 0.000 & 0.011 \\ 0.003 & 0.004 & 0.008 & 0.003 & 0.005 & 0.000 & 0.012 & 0.009 & 0.005 & 0.006 & 0.003 & 0.003 & 0.000 & 0.030 & 0.013 \\ 0.029 & 0.069 & 0.084 & 0.036 & 0.108 & 0.036 & 0.115 & 0.122 & 0.074 & 0.104 & 0.076 & 0.266 & 0.076 & 0.127 & 0.294 \end{array}\right)\\ \vspace{-2.5pc} \end{array} \end{aligned} $$

(11.109)

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Caswell, H. (2019). Sensitivity Analysis of Discrete Markov Chains. In: Sensitivity Analysis: Matrix Methods in Demography and Ecology. Demographic Research Monographs. Springer, Cham. https://doi.org/10.1007/978-3-030-10534-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-10534-1_11
Published: 03 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10533-4
Online ISBN: 978-3-030-10534-1
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics

Sensitivity Analysis of Discrete Markov Chains

Abstract

1 Introduction

2 Absorbing Chains

2.1 Occupancy: Visits to Transient States

Theorem 11.2.1

Proof

2.2 Time to Absorption

Theorem 11.2.2

Proof

2.3 Number of States Visited Before Absorption

Theorem 11.2.3

Proof

2.4 Multiple Absorbing States and Probabilities of Absorption

Theorem 11.2.4

Proof

2.5 The Quasistationary Distribution

Lemma 1

Proof

Theorem 11.2.5

Proof

3 Life Lost Due to Mortality

4 Ergodic Chains

4.1 The Stationary Distribution

Theorem 11.4.1

Proof

4.2 The Fundamental Matrix

Theorem 11.4.2

Proof

4.3 The First Passage Time Matrix

Theorem 11.4.3

Proof

4.4 Mixing Time and the Kemeny Constant

Theorem 11.4.4

Proof

4.5 Implicit Parameters and Compensation

Additive compensation

Proportional compensation

The transition matrix

5 Species Succession in a Marine Community

5.1 Biotic Diversity

5.2 The Kemeny Constant and Ecological Mixing

6 Discussion

References

Author information

Authors and Affiliations

Appendices

A Appendix A: Proofs

1.1 A.1 Derivatives of the Moments of Occupancy Times

1.2 A.2 Derivatives of the Moments of Time to Absorption

B Appendix B: Marine Community Matrix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation