1 Introduction

As we have seen repeatedly, Markov chains are often used as mathematical models of demographic (as well as other natural) phenomena, with transition probabilities defined in terms of parameters that are of interest in the scientific question at hand. Sensitivity analysis is an important way to quantify the effects of changes in these parameters on the behavior of the chain. This chapter revisits, in a more rigorous way, some of the quantities already explored for absorbing Markov chains (Chaps. 4, 5, and 6). It will also consider ergodic Markov chains (in which no absorbing states exist), and calculate the sensitivity of the stationary distribution and measures of the rate of convergence.

Perturbation (or sensitivity) analysis is a long-standing problem in the theory of Markov chains (Schweitzer 1968; Conlisk 1985; Golub and Meyer 1986; Funderlic and Meyer 1986; Seneta 1988, 1993; Meyer 1994; Cho and Meyer 2000; Mitrophanov 2003, 2005; Mitrophanov et al. 2005; Kirkland et al. 2008). When Markov chains are applied as models of physical, biological, or social systems, they are often defined as functions of parameters that have substantive meaning.

2 Absorbing Chains

The transition matrix for a discrete-time absorbing chain can be written

(11.1)

where U, of dimension s × s, is the transition matrix among the s transient states, and M, of dimension a × s, contains probabilities of transition from the transient states to the a absorbing states. Assume that the spectral radius of U is strictly less than 1. Because we are concerned here with absorption, but not what happens after, we ignore transitions among absorbing states; hence the identity matrix (a × a) in the lower right corner. The matrices U[θ] and M[θ] are functions of a vector of parameters. We assume that θ varies over some set in which the column sums of P are 1 and the spectral radius of U is strictly less than one.

2.1 Occupancy: Visits to Transient States

Let ν ij be the number of visits to transient state i, prior to absorption, by an individual starting in transient state j. The expectations of the ν ij are entries of the fundamental matrix \({\mathbf {N}} = {\mathbf {N}}_1 = \left ( E(\eta _{ij}^{~}) \right )\):

$$\displaystyle \begin{aligned} {\mathbf{N}} = \left( {\mathbf{I}} - {\mathbf{U}} \right)^{-1} {} \end{aligned} $$
(11.2)

(e.g., Kemeny and Snell 1960; Iosifescu 1980). Let \({\mathbf {N}}_k = \left ( E(\eta _{ij}^k) \right )\) be a matrix containing the kth moments about the origin of the ν ij. The first several of these matrices are (Iosifescu 1980, Thm. 3.1)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_1 &\displaystyle =&\displaystyle \left( {\mathbf{I}} - {\mathbf{U}} \right)^{-1} {} \end{array} \end{aligned} $$
(11.3)
$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_2 &\displaystyle =&\displaystyle \left( 2 {\mathbf{N}}_{\mathrm{dg}} - {\mathbf{I}} \right) {\mathbf{N}}_1 {} \end{array} \end{aligned} $$
(11.4)
$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_3 &\displaystyle =&\displaystyle \left( 6 {\mathbf{N}}^{2}_{\mathrm{dg}} - 6 {\mathbf{N}}_{\mathrm{dg}} + {\mathbf{I}} \right) {\mathbf{N}}_1 {} \end{array} \end{aligned} $$
(11.5)
$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_4 &\displaystyle =&\displaystyle \left( 24 {\mathbf{N}}^{3}_{\mathrm{dg}} -36 {\mathbf{N}}^{2}_{\mathrm{dg}} +14 {\mathbf{N}}_{\mathrm{dg}} - {\mathbf{I}} \right) {\mathbf{N}}_1.{} \end{array} \end{aligned} $$
(11.6)

Theorem 11.2.1

Let N kbe the matrix of kth moments of the ν ij, as given by (11.3) , (11.4) , (11.5) , and (11.6) . The sensitivities of N k, for k = 1, …, 4 are

(11.7)
(11.8)
(11.9)
(11.10)

where (see Sect. 2.8 )

$$\displaystyle \begin{aligned} \begin{array}{rcl} d {\mathbf{N}}_{\mathrm{dg}} &\displaystyle =&\displaystyle {\mathbf{I}} \circ d {\mathbf{N}}_1 \end{array} \end{aligned} $$
(11.11)
$$\displaystyle \begin{aligned} \begin{array}{rcl} d \mathrm{vec} \, {\mathbf{N}}_{\mathrm{dg}} &\displaystyle =&\displaystyle \mathcal{D}\,(\mathrm{vec} \, {\mathbf{I}} ) d \mathrm{vec} \, {\mathbf{N}}_1. {} \end{array} \end{aligned} $$
(11.12)

Proof

The result (11.7) is derived in Caswell (2006, Section 3.1). For k > 1, and considering N k as a function of N 1 and N dg, the total differential of N k is

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{N}}_k = {\partial \mbox{vec} \, {\mathbf{N}}_k \over \partial \mbox{vec} \,^{\mathsf{T}} {\mathbf{N}}_1} d \mbox{vec} \, {\mathbf{N}}_1 + {\partial \mbox{vec} \, {\mathbf{N}}_k \over \partial \mbox{vec} \,^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}} d \mbox{vec} \, {\mathbf{N}}_{\mathrm{dg}}. {} \end{aligned} $$
(11.13)

The two terms of (11.13) are the partial differentials of vec N k, obtained by taking differentials treating only N 1 or only N dg as variables, respectively. Denote these partial differentials as \(\partial _{\mbox{ {${\mathbf {N}}_1$}}}\) \(\partial _{\mbox{ {${\mathbf {N}}_1$}}}\) and \(\partial _{\mbox{ {${\mathbf {N}}_{\mathrm {dg}}$}}}\) and \(\partial _{\mbox{ {${\mathbf {N}}_{\mathrm {dg}}$}}}\). Differentiating N 2 in (11.4), gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_1$}}} {\mathbf{N}}_2 &\displaystyle =&\displaystyle 2 {\mathbf{N}}_{\mathrm{dg}} \left(d {\mathbf{N}}_1 \right) - d {\mathbf{N}}_1 \end{array} \end{aligned} $$
(11.14)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_{\mathrm{dg}}$}}} {\mathbf{N}}_2 &\displaystyle =&\displaystyle 2 \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_1. \end{array} \end{aligned} $$
(11.15)

Applying the vec operator gives

(11.16)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_{\mathrm{dg}}$}}} \mbox{vec} \, {\mathbf{N}}_2 &\displaystyle =&\displaystyle 2 \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{I}} \right) d \mbox{vec} \, {\mathbf{N}}_{\mathrm{dg}}, \end{array} \end{aligned} $$
(11.17)

and (11.13) becomes

(11.18)

which is (11.8). The derivations of dvec N 3 and dvec N 4 follow the same sequence of steps. The details are given in Appendix A. □

The derivatives of N 2, N 3, and N 4 can be used to study the variance, standard deviation, coefficient of variation, skewness, and kurtosis of the number of visits to the transient states (Caswell 2006, 2009, 2011).

2.2 Time to Absorption

Let η j be the time to absorption starting in transient state j and let \(\boldsymbol {\eta }_k = E \left (\begin {array}{ccc} \eta _1^k, \cdots ,\eta _s^k \end {array}\right )^{\mathsf {T}}\). The first several of these moments are (Iosifescu 1980, Thm. 3.2)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_1^{\mathsf{T}} &\displaystyle =&\displaystyle {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_1 \end{array} \end{aligned} $$
(11.19)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_2^{\mathsf{T}} &\displaystyle =&\displaystyle \boldsymbol{\eta}_1^{\mathsf{T}} \left( 2 {\mathbf{N}}_1 - {\mathbf{I}} \right) \end{array} \end{aligned} $$
(11.20)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_3^{\mathsf{T}} &\displaystyle =&\displaystyle \boldsymbol{\eta}_1^{\mathsf{T}} \left( 6 {\mathbf{N}}_1^2 - 6 {\mathbf{N}}_1 + {\mathbf{I}} \right) {} \end{array} \end{aligned} $$
(11.21)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_4^{\mathsf{T}} &\displaystyle =&\displaystyle \boldsymbol{\eta}_1^{\mathsf{T}} \left( 24 {\mathbf{N}}_1^3 - 36 {\mathbf{N}}_1^2 + 14 {\mathbf{N}}_1 - {\mathbf{I}} \right). {} \end{array} \end{aligned} $$
(11.22)

Theorem 11.2.2

Let η k be the vector of the kth moments of the η i . The sensitivities of these moment vectors are

$$ \begin{array}{rcl} d {\eta}_1 &\displaystyle =&\displaystyle \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) d {vec} \, {\mathbf{N}}_1 {} \end{array} $$
(11.23)
$$\displaystyle \begin{aligned} \begin{array}{rcl} d \boldsymbol{\eta}_2 &\displaystyle =&\displaystyle \left( 2 {\mathbf{N}}_1^{\mathsf{T}} - {\mathbf{I}} \right) d \boldsymbol{\eta}_1 + 2 \left( {\mathbf{I}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d {vec} \, {\mathbf{N}}_1 {} \end{array} \end{aligned} $$
(11.24)
(11.25)
(11.26)

where dvec N 1is given by (11.7) .

Proof

The derivative of η 1 is obtained (Caswell 2006) by differentiating to get \(d \boldsymbol {\eta }_1^{\mathsf {T}} = {\mathbf {1}}^{\mathsf {T}} \left ( d {\mathbf {N}}_1 \right )\) and then applying the vec operator. For the higher moments, consider the η k to be functions of η 1 and N 1, and write the total differential

$$\displaystyle \begin{aligned} d \boldsymbol{\eta}_k = {\partial \boldsymbol{\eta}_k \over \partial \boldsymbol{\eta}_1^{\mathsf{T}}} \; d \boldsymbol{\eta}_1 + {\partial \boldsymbol{\eta}_k \over \partial \mbox{vec} \,^{\mathsf{T}} {\mathbf{N}}_1} \; d \mbox{vec} \, {\mathbf{N}}_1. {} \end{aligned} $$
(11.27)

The partial differentials of η 2 with respect to η 1 and N 1 are

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {$\boldsymbol{\eta}_1$}}} \boldsymbol{\eta}_2^{\mathsf{T}} &\displaystyle =&\displaystyle \left( d \boldsymbol{\eta}_1^{\mathsf{T}} \right) \left( 2 {\mathbf{N}}_1-{\mathbf{I}} \right) \end{array} \end{aligned} $$
(11.28)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_1$}}} \boldsymbol{\eta}_2^{\mathsf{T}} &\displaystyle =&\displaystyle 2 \boldsymbol{\eta}_1^{\mathsf{T}} \left( d {\mathbf{N}}_1 \right). \end{array} \end{aligned} $$
(11.29)

Applying the vec operator gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {$\boldsymbol{\eta}_1$}}} \boldsymbol{\eta}_2 &\displaystyle =&\displaystyle \left( 2 {\mathbf{N}}_1^{\mathsf{T}} - {\mathbf{I}} \right) d \boldsymbol{\eta}_1 \end{array} \end{aligned} $$
(11.30)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {${\mathbf{N}}_1$}}} \boldsymbol{\eta}_2 &\displaystyle =&\displaystyle 2 \left( {\mathbf{I}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{N}}_1 \end{array} \end{aligned} $$
(11.31)

which combine according to (11.27) to yield (11.24). The derivations of d η 3 and d η 4 follow the same sequence of steps; the details are shown in Appendix A. □

2.3 Number of States Visited Before Absorption

Let ξ i ≥ 1 be the number of distinct transient states visited before absorption, and let ξ 1 = E(ξ). Then

$$\displaystyle \begin{aligned} \boldsymbol{\xi}_1^{\mathsf{T}} = {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} {\mathbf{N}}_1 {} \end{aligned} $$
(11.32)

(Iosifescu 1980, Sect. 3.2.5), where \({\mathbf {N}}_{\mathrm {dg}}^{-1} = \left ( {\mathbf {N}}_{\mathrm {dg}} \right )^{-1}\).

Theorem 11.2.3

Let ξ 1 = E(ξ). The sensitivity of ξ is

$$\displaystyle \begin{aligned} d \boldsymbol{\xi}_1 = \left[ - \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \left( {\mathbf{N}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) \mathcal{D}\,({vec} \, {\mathbf{I}}) + \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) \right] d {vec} \, {\mathbf{N}}_1, {} \end{aligned} $$
(11.33)

where dvec N 1is given by (11.7) .

Proof

Differentiating (11.32) yields

$$\displaystyle \begin{aligned} d \boldsymbol{\xi}_1^{\mathsf{T}} = {\mathbf{1}}^{\mathsf{T}} \left( d {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) {\mathbf{N}}_1 + {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} d {\mathbf{N}}_1. \end{aligned} $$
(11.34)

Applying the vec operator yields

$$\displaystyle \begin{aligned} d \boldsymbol{\xi}_1 = \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{N}}_{\mathrm{dg}}^{-1} + \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) d \mbox{vec} \, {\mathbf{N}}_1. \end{aligned} $$
(11.35)

Applying (2.82) to \(d \mbox{vec} \, {\mathbf {N}}_{\mathrm {dg}}^{-1}\) and using (11.12) for dvec N dg gives

$$\displaystyle \begin{aligned} d \boldsymbol{\xi}_1 = - \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \left( {\mathbf{N}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}}) d \mbox{vec} \, {\mathbf{N}}_1 + \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) d \mbox{vec} \, {\mathbf{N}}_1 \end{aligned} $$
(11.36)

which simplifies to (11.33). □

2.4 Multiple Absorbing States and Probabilities of Absorption

When the chain includes a > 1 absorbing states, the entry m ij of the a × s submatrix M in (11.1) is the probability of transition from transient state j to absorbing state i. The result of the competing risks of absorption is a set of probabilities \(b_{ij} = P \left [ \mbox{absorption in }i \left | \mbox{starting in }j \right . \right ]\) for i = 1, …, a and j = 1, …, s. The matrix \({\mathbf {B}} = \left ( b_{ij} \right ) = {\mathbf {M}} {\mathbf {N}}_1\) (Iosifescu 1980, Thm. 3.3).

Theorem 11.2.4

Let B = MN 1be the matrix of absorption probabilities. Then

$$\displaystyle \begin{aligned} d {vec} \, {\mathbf{B}} = \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{I}} \right) d {vec} \, {\mathbf{M}} + \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{B}} \right) d {vec} \, {\mathbf{U}}. {} \end{aligned} $$
(11.37)

Proof

Differentiating B yields

$$\displaystyle \begin{aligned} d {\mathbf{B}} = \left( d {\mathbf{M}} \right) {\mathbf{N}}_1 + {\mathbf{M}} \left( d {\mathbf{N}}_1 \right). \end{aligned} $$
(11.38)

Applying the vec operator gives

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}} = \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{I}} \right) d \mbox{vec} \, {\mathbf{M}} + \left( {\mathbf{I}} \otimes {\mathbf{M}} \right) d \mbox{vec} \, {\mathbf{N}}_1. \end{aligned} $$
(11.39)

Substituting (11.7) for dvec N 1 and simplifying gives (11.37). □

Column j of B is the probability distribution of the eventual absorption state for an individual starting in transient state j. Usually a few of those starting states are of particular interest (e.g., states corresponding to “birth” or to the start of some process). Let B(:, j) = Be j denote column j of B, where e j is the jth unit vector of length s. Thus the derivative of B(:, j) is

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}}(:,j) = \left( {\mathbf{e}}_j^{\mathsf{T}} \otimes {\mathbf{I}}_s \right) d \mbox{vec} \, {\mathbf{B}} {} \end{aligned} $$
(11.40)

where dvec B is given by (11.37). Similarly, row i of B is \({\mathbf {B}}(i,:)={\mathbf {e}}_i^{\mathsf {T}} {\mathbf {B}}\) and

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}}(i,:) = \left( {\mathbf{I}}_s \otimes {\mathbf{e}}_i^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{B}} {} \end{aligned} $$
(11.41)

where e i is the ith unit vector of length a.

2.5 The Quasistationary Distribution

The quasistationary distribution of an absorbing Markov chain gives the limiting probability distribution, over the set of transient states, of the state of an individual that has yet to be absorbed. Let w and v be the right and left eigenvectors associated with the dominant eigenvalue of U, normalized so that ∥w∥ = ∥v∥ = 1. Darroch and Seneta (1965) defined two quasistationary distributions in terms of w and v. The limiting probability distribution of the state of an individual, given that absorption has not yet happened, converges to

$$\displaystyle \begin{aligned} {\mathbf{q}}_a = {\mathbf{w}} {} \end{aligned} $$
(11.42)

The limiting probability distribution of the state of an individual, given that absorption has not happened and will not happen for a long time, is

$$\displaystyle \begin{aligned} {\mathbf{q}}_b = \frac{{\mathbf{w}} \circ {\mathbf{v}}}{{\mathbf{w}}^{\mathsf{T}} {\mathbf{v}}} {} \end{aligned} $$
(11.43)

Horvitz and Tuljapurkar (2008) pointed out that the convergence to the quasistationary distribution implies that, in a stage-classified model, mortality eventually becomes independent of age.

Lemma 1

Let the dominant eigenvalue of U , guaranteed real and nonnegative by the Perron-Frobenius theorem, satisfy 0 < λ < 1, and let w and v be the right and left eigenvectors corresponding to λ, scaled so that w Tv = 1. Then

(11.44)
(11.45)

Proof

Equation (11.44) is proven in Caswell (2008, Section 6.1). Equation (11.45) is obtained by treating v as the right eigenvector of U T. □

Theorem 11.2.5

The derivative of the quasistationary distribution q ais given by (11.44) . The derivative of the quasistationary distribution q bis

(11.46)

where d w and d v are given by (11.44) and (11.45) respectively.

Proof

The derivative of q a follows from its definition as the scaled right eigenvector of U. For q b, differentiating (11.43) gives

(11.47)
(11.48)

Applying the vec operator gives

(11.49)

which simplifies to give (11.46). □

3 Life Lost Due to Mortality

The approach here makes it easy to compute the sensitivity of a variety of dependent variables calculated from the Markov chain. As an example of this flexibility, consider a recently developed demographic index, the number of years of life lost due to mortality (Vaupel and Canudas Romo 2003).

The transient states of the chains are age classes, absorption corresponds to death, and absorbing states correspond to age at death. Let μ i be the mortality rate and \(p_i=\exp (-\mu _i)\) the survival probability at age i. The matrix U has the p i on the subdiagonal and zeros elsewhere. The matrix M has 1 − p i on the diagonal and zeros elsewhere. Let f = B(:, 1) be the distribution of age at death and η 1 the vector of expected longevity as a function of age.

A death at age i represents the loss of some number of years of life beyond that age. The expectation of that loss is given by the ith entry of η 1, and the expected number of years lost over the distribution of age at death is \(\eta ^\dagger = \boldsymbol {\eta }_1^{\mathsf {T}} {\mathbf {f}}\). This quantity also measures the disparity among individuals in longevity (Vaupel and Canudas Romo 2003). If everyone died at the identical age x, f would be a delta function at x and further life expectancy at age x would be zero; their product would give η  = 0. Declines in discrepancy have accompanied increases in life expectancy observed in developed countries (Edwards and Tuljapurkar 2005; Wilmoth and Horiuchi 1999). Thus it is useful to know how η responds to changes in mortality.

Differentiating η gives

$$\displaystyle \begin{aligned} d \eta^\dagger = \left( d \boldsymbol{\eta}_1^{\mathsf{T}} \right) {\mathbf{B}} {\mathbf{e}}_1 + \boldsymbol{\eta}_1^{\mathsf{T}} \left( d {\mathbf{B}} \right) {\mathbf{e}}_1. \end{aligned} $$
(11.50)

Applying the vec operator gives

$$\displaystyle \begin{aligned} d \eta^\dagger = {\mathbf{e}}_1^{\mathsf{T}} {\mathbf{b}}^{\mathsf{T}} d \boldsymbol{\eta}_1^{\mathsf{T}} + \left( {\mathbf{e}}_1^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{B}}. \end{aligned} $$
(11.51)

Substituting (11.23) for d η 1 and (11.37) for dvec B gives

(11.52)

Simplifying and writing derivatives in terms of μ gives

(11.53)

Because mortality rates vary over several orders of magnitude with age, it is useful to present the results as elasticities,

$$\displaystyle \begin{aligned} {\epsilon \eta^\dagger \over \epsilon \boldsymbol{\mu}^{\mathsf{T}}} = \frac{1}{\eta^\dagger}\; {d \eta^\dagger \over d \boldsymbol{\mu}^{\mathsf{T}}} \; \mathcal{D}\,(\boldsymbol{\mu}). \end{aligned} $$
(11.54)

Figure 11.1 shows these elasticities for two populations chosen to have very different life expectancies: India in 1961, with female life expectancy of 45 years and η  = 23.9 years and Japan in 2006, with female life expectancy of 86 years and η  = 10.1 years (Human Mortality Database 2016). In both cases, elasticities are positive from birth to some age (≈50 for India, ≈85 for Japan) and negative thereafter. This implies that reductions in infant and early life mortality would reduce η , whereas reductions in old age mortality would increase η . Zhang and Vaupel (2009) have shown that the existence of such a critical age is a general property of these models.

Fig. 11.1
figure 1

The elasticity of mean years of life lost due to mortality, η , to changes in age-specific mortality, calculated from the female life tables of India in 1961 and of Japan in 2006. (Data obtained from the Human Mortality Database 2016)

4 Ergodic Chains

Now let us consider perturbations of an ergodic finite-state Markov chain with an irreducible, primitive, column-stochastic transition matrix P of dimension s × s. The stationary distribution π is given by the right eigenvector, scaled to sum to 1, corresponding to the dominant eigenvalue λ 1 = 1 of P. The fundamental matrix of the chain is \({\mathbf {Z}} = \left ( {\mathbf {I}} - {\mathbf {P}} + \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right )^{-1}\) (Kemeny and Snell 1960).

We are interested only in perturbations that preserve the column-stochasticity of P; i.e., for which P remains a stochastic matrix. Such perturbations are easily defined when the p ij depend explicitly on a parameter vector θ. However, when the parameters of interest are the p ij themselves, an implicit parameterization must be defined to preserve the stochastic nature of P under perturbation (Conlisk 1985; Caswell 2001). In Sect. 11.4.5 we will explore new expressions for two different forms of implicit parameterization.

Previous studies of perturbations of ergodic chains focus almost completely on perturbations of the stationary distribution, and are divided between those focusing on sensitivity as a derivative (e.g., Schweitzer 1968; Conlisk 1985; Golub and Meyer 1986) and studies focusing on perturbation bounds and condition numbers (Funderlic and Meyer 1986; Meyer 1994; Seneta 1988; Hunter 2005; Kirkland 2003); for reviews see Cho and Meyer (2000) and Kirkland et al. (2008). The approach here is similar in spirit to that of Schweitzer (1968), Conlisk (1985), and Golub and Meyer (1986), in that we focus on derivatives of Markov chain properties with respect to parameter perturbations, but taking advantage of the matrix calculus approach. We do not consider perturbation bounds here.

4.1 The Stationary Distribution

Theorem 11.4.1

Let π be the stationary distribution, satisfying P π = π and 1 Tπ = 1. The sensitivity of π is

(11.55)

where Z is the fundamental matrix of the chain.

Proof

The vector π is the right eigenvector of P, scaled to sum to 1. Applying Lemma 1, and noting that λ = 1 and 1 TP = 1 T, gives \(d \boldsymbol {\pi } = {\mathbf {Z}} \left [ \boldsymbol {\pi }^{\mathsf {T}} \otimes \left ( {\mathbf {I}}_s - \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right ) \right ] d \mbox{vec} \, {\mathbf {P}}\). Noting that Z π = π and simplifying the Kronecker products yields (11.55). □

Based on an analysis of eigenvector sensitivity (Meyer and Stewart 1982), Golub and Meyer (1986) derived an expression for the derivative of π to a change in a single element of P using the group generalized inverse \(\left ( {\mathbf {I}} -{\mathbf {P}} \right )^\#\) of I −P. Since \(\left ( {\mathbf {I}} -{\mathbf {P}} \right )^\# = {\mathbf {Z}} - \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}}\) (Golub and Meyer 1986), expression (11.55) is exactly the Golub-Meyer result expressed in matrix calculus notation. Our results here permit sensitivity analysis of functions of π using only the chain rule. If g(π) is a vector- or scalar-valued function of π, then

$$\displaystyle \begin{aligned} d g(\boldsymbol{\pi}) = {d g \over d \boldsymbol{\pi}^{\mathsf{T}} } \; {d \boldsymbol{\pi} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; d \mbox{vec} \, {\mathbf{P}} \end{aligned} $$
(11.56)

Some examples will appear in Sect. 11.5.

4.2 The Fundamental Matrix

The fundamental matrix \({\mathbf {Z}} = \left ( {\mathbf {I}} - {\mathbf {P}} + \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right )^{-1}\) plays a role in ergodic chains similar to that played by N 1 in absorbing chains (Kemeny and Snell 1960). It has been extended using generalized inverses (Meyer 1975; Kemeny 1981), but we do not consider those extensions here.

Theorem 11.4.2

The sensitivity of the fundamental matrix is

(11.57)

Proof

From (2.82),

(11.58)
(11.59)

Substituting (11.55) for d π and simplifying gives (11.57). □

4.3 The First Passage Time Matrix

Let \({\mathbf {R}} = \left ( r_{ij}^{~} \right )\) be the matrix of mean first passage times from j to i, given by Iosifescu (1980, Thm. 4.7).

(11.60)

Again, this is the transpose of the expression obtained when P is row-stochastic.

Theorem 11.4.3

The sensitivity of R is

(11.61)

where d π is given by (11.55) and dvec Z is given by (11.57) .

Proof

Differentiating (11.60) gives

(11.62)

Applying the vec operator gives

(11.63)

Using (2.82) for \(d \mbox{vec} \, \left [ \mathcal {D}\, (\boldsymbol {\pi } )^{-1} \right ]\), (2.69) for \(d \mbox{vec} \, \mathcal {D}\,(\boldsymbol {\pi })\), and (11.12) for dvec Z dg yields

(11.64)

which simplifies to give (11.61). □

4.4 Mixing Time and the Kemeny Constant

The mixing time K of a chain is the mean time required to get from a specified state to a state chosen at random from the stationary distribution π. Remarkably, K is independent of the starting state (Grinstead and Snell 2003; Hunter 2006) and is sometimes called Kemeny’s constant; it is a measure of the rate of convergence to stationarity, and is K = trace(Z) (Hunter 2006). In addition to being a quantity of interest in itself, the rate of convergence also plays a role in the sensitivity of the stationary distribution of ergodic chains (Hunter 2005; Mitrophanov 2005).

Theorem 11.4.4

The sensitivity of K is

$$\displaystyle \begin{aligned} dK = \left( {vec} \, {\mathbf{I}}_s \right)^{\mathsf{T}} d {vec} \, {\mathbf{Z}}. {} \end{aligned} $$
(11.65)

Proof

Differentiating K = trace(Z) gives

$$\displaystyle \begin{aligned} dK = {\mathbf{1}}^{\mathsf{T}} \left( {\mathbf{I}} \circ d {\mathbf{Z}} \right) \mathbf{1}. \end{aligned} $$
(11.66)

Applying the vec operator gives

$$\displaystyle \begin{aligned} dK = \left( {\mathbf{1}}^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}} ) d \mbox{vec} \, {\mathbf{Z}} \end{aligned} $$
(11.67)

which simplifies to (11.65). □

4.5 Implicit Parameters and Compensation

Theorems 11.4.1, 11.4.2, 11.4.3, and 11.4.4 are written in terms of dvec P. However, perturbation of any element, say p kj, to p kj + θ kj, must be compensated for by adjustments of the other elements in column j so that the column sum remains equal to 1 (Conlisk 1985). Two kinds of compensation are likely to be of use in applications: additive and proportional. Additive compensation adjusts all the elements of the column by an equal amount, distributing the perturbation θ kj additively over column j. Proportional compensation distributes θ kj in proportion to the values of the p ij, for i ≠ k. Proportional compensation is attractive because it preserves the pattern of zero and non-zero elements within P.

To develop the compensation formulae, let us start by considering a probability vector p, of dimension s × 1, with p i ≥ 0 and ∑ip i = 1. Let θ i be the perturbation of p i, and write

$$\displaystyle \begin{aligned} {\mathbf{p}} (\boldsymbol{\theta}) = {\mathbf{p}} (0) + {\mathbf{A}} \boldsymbol{\theta} {} \end{aligned} $$
(11.68)

for some matrix A to be determined. If y is a function of p, then

$$\displaystyle \begin{aligned} dy = {d y \over d {\mathbf{p}}^{\mathsf{T}}} \; {d {\mathbf{p}} \over d \boldsymbol{\theta}^{\mathsf{T}}} \; d \boldsymbol{\theta} \end{aligned} $$
(11.69)

evaluated at θ = 0.

Additive compensation

For the case of additive compensation, we write

$$\displaystyle \begin{aligned} \begin{array}{rcl} p_1(\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_1(0) + \theta_1 - \frac{\theta_2}{s-1} - \cdots - \frac{\theta_s}{s-1} \\ p_2 (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_2(0) -\frac{\theta_1}{s-1} + \theta_2 - \cdots - \frac{\theta_s}{s-1} \\ &\displaystyle \vdots&\displaystyle {} \\ p_s (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_s(0) -\frac{\theta_1}{s-1} -\frac{\theta_2}{s-1}- \cdots + \theta_s \end{array} \end{aligned} $$
(11.70)

The perturbation θ 1 is added to p 1 and compensated for by subtracting θ 1∕(s − 1) from all other entries of p; clearly ∑ip i(θ) = 1 for any perturbation vector θ.

The system of Eqs. (11.70) can be written

$$\displaystyle \begin{aligned} {\mathbf{p}}(\boldsymbol{\theta}) = {\mathbf{p}}(0) + \left( {\mathbf{I}} - \frac{1}{s-1} {\mathbf{C}} \right) \boldsymbol{\theta}. \end{aligned} $$
(11.71)

Defining E to be a matrix of ones, then the matrix C can be written (as a so-called Toeplitz matrix) as C = E −I, with zeros on the diagonal and ones elsewhere. Thus the matrix A in (11.68) is

$$\displaystyle \begin{aligned} {\mathbf{A}} = {\mathbf{I}} - \frac{1}{s-1} {\mathbf{C}} {} \end{aligned} $$
(11.72)

Proportional compensation

For proportional compensation, assume that p i < 1 for all i. The vector p(θ) is

$$\displaystyle \begin{aligned} \begin{array}{rcl} p_1(\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_1(0) + \theta_1 - \frac{p_1 \theta_2}{1-p_2} - \cdots - \frac{p_1 \theta_s}{1-p_s} \\ p_2 (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_2(0) -\frac{p_2 \theta_1}{1-p_1} + \theta_2 - \cdots - \frac{p_2 \theta_s}{1-p_s} \\ &\displaystyle \vdots&\displaystyle {} \\ p_s (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_s(0) -\frac{p_s \theta_1}{1-p_1} -\frac{p_s \theta_2}{1-p_2}- \cdots + \theta_s \end{array} \end{aligned} $$
(11.73)

The perturbation θ 1 is added to p 1 and compensated for by subtracting θ 1p i∕(1 − p 1) from the ith entry of p. Again, ∑ip i(θ) = 1 for any perturbation vector θ.

Equation (11.73) can be written

(11.74)

so that the matrix A in (11.68) is

$$\displaystyle \begin{aligned} {\mathbf{A}} = {\mathbf{I}} - \mathcal{D}\,({\mathbf{p}}) \; {\mathbf{C}} \; \mathcal{D}\,(\mathbf{1} - {\mathbf{p}})^{-1} {} \end{aligned} $$
(11.75)

The transition matrix

We have derived compensation formulae for a single probability vector p. Now consider perturbation of a probability matrix P, each column of which is a probability vector. Define a perturbation matrix Θ where θ ij is the perturbation of p ij. Perturbations of column j are to be compensated by a matrix A j, so that

(11.76)

where A i compensates for the changes in column i of P. Applying the vec operator to (11.76) gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{vec} \, {\mathbf{P}}(\boldsymbol{\Theta}) &=& \mbox{vec} \, {\mathbf{P}}(0) + \left(\begin{array}{ccc} {\mathbf{A}}_1 & & \\ & \ddots & \\ && {\mathbf{A}}_s \end{array}\right) \mbox{vec} \, \boldsymbol{\Theta} \end{array} \end{aligned} $$
(11.77)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &=& \mbox{vec} \, {\mathbf{P}}(0) + \sum_{i=1}^s \left( {\mathbf{E}}_{ii} \otimes {\mathbf{A}}_i \right) \mbox{vec} \, \boldsymbol{\Theta}. {} \end{array} \end{aligned} $$
(11.78)

The terms in the summation in (11.78) are recognizable as the vec of the product A iΘE ii; thus

$$\displaystyle \begin{aligned} {\mathbf{P}}(\boldsymbol{\Theta}) = {\mathbf{P}}(0) + \sum_{i=1}^s {\mathbf{A}}_i \boldsymbol{\Theta} {\mathbf{E}}_{ii} {} \end{aligned} $$
(11.79)

where E ii is a matrix with a 1 in the (i, i) entry and zeros elsewhere.

Theorem 11.4.5

Let P be a column-stochastic s × s transition matrix. Let Θ be a matrix of perturbations, where θ ijis applied to p ij, and the other entries of Θ compensate for the perturbation. Let C = E −I . If compensation is additive, then

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{P}}(\boldsymbol{\Theta}) &\displaystyle =&\displaystyle {\mathbf{P}}(0) + \left( {\mathbf{I}} - \frac{1}{s-1} {\mathbf{C}} \right) \boldsymbol{\Theta} {} \end{array} \end{aligned} $$
(11.80)
$$\displaystyle \begin{aligned} \begin{array}{rcl} {d {vec} \, {\mathbf{P}} \over d {vec} \,^{\mathsf{T}} \boldsymbol{\Theta}} &\displaystyle =&\displaystyle \left[ {\mathbf{I}}_{s^2} - \frac{1}{s-1} \left( {\mathbf{I}}_s \otimes {\mathbf{C}} \right) \right]. {} \end{array} \end{aligned} $$
(11.81)

If compensation is proportional, then

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{P}}(\boldsymbol{\Theta}) &\displaystyle =&\displaystyle {\mathbf{P}}(0) + \sum_{i=1}^s \left\{ {\mathbf{I}} - \mathcal{D}\, \left[ {\mathbf{P}}(:,i) \right] \; {\mathbf{C}}\; \mathcal{D}\, \left[ \mathbf{1} - {\mathbf{P}}(:,i) \right]^{-1} \right\} \boldsymbol{\Theta} {\mathbf{E}}_{ii} \qquad {} \end{array} \end{aligned} $$
(11.82)
(11.83)

Proof

P( Θ) is given by (11.79). If compensation is additive, A i is given by (11.72) for all i. Substituting into (11.79) gives (11.80). Differentiating (11.80) and applying the vec operator gives (11.81).

If compensation is proportional, substituting (11.75) for A i in (11.79) gives (11.82). Differentiating yields

$$\displaystyle \begin{aligned} d {\mathbf{P}} = \left( d \boldsymbol{\theta} \right) \sum_{i=1}^s {\mathbf{E}}_{ii} - \sum_{i=1}^s \mathcal{D}\,[ {\mathbf{P}}(:,1) ] \; {\mathbf{C}} \; \mathcal{D}\,[ \mathbf{1} - {\mathbf{P}}(:,i) ]^{-1} (d \boldsymbol{\Theta}) {\mathbf{E}}_{ii}. \end{aligned} $$
(11.84)

Using the vec operator gives (11.83). □

Perturbations of P subject to compensation are given by perturbations of Θ. Thus for any function y(P) we can write

$$\displaystyle \begin{aligned} \left. {d y \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} = {d y \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; {d \mbox{vec} \, {\mathbf{P}} \over d \mbox{vec} \,^{\mathsf{T}} \boldsymbol{\Theta}} \end{aligned} $$
(11.85)

where dvec Pdvec TΘ is given (for additive and proportional compensation) by Theorem 11.4.5. The slight notational complexity is worthwhile for clarifying how to use Theorem 11.4.5 in practice.

5 Species Succession in a Marine Community

Markov chains are used by ecologists as models of species replacement (succession) in ecological communities; (e.g., Horn 1975; Hill et al. 2004; Nelis and Wootton 2010). In these models, the state of a point on a landscape is given by the species occupying that point. The entry p ij of P is the probability that species j is replaced by species i between t and t + 1. If a community consists of a large number of points independently subject to the transition probabilities in P, the stationary distribution π will give the relative frequencies of species in the community at equilibrium.

Hill et al. (2004) used a Markov chain to describe a community of encrusting organisms occupying rock surfaces at 30–35 m depth in the Gulf of Maine. The Markov chain contained 14 species plus an additional state (“bare rock”) for unoccupied substrate. The matrix P was estimated from longitudinal data (Hill et al. 2002, 2004) and is given, along with a list of species names, in Appendix B. We will use the results of this chapter to analyze the sensitivity of species diversity and the Kemeny constant to the processes of colonization and replacement that determing P.

5.1 Biotic Diversity

The stationary distribution π, with the species numbered in order of decreasing abundance and bare rock placed at the end as state 15, is shown in Fig. 11.2. The two dominant species are an encrusting sponge (called Hymedesmia) and a bryozoan (Crisia).

Fig. 11.2
figure 2

The stationary distribution for the subtidal benthic community succession model of Hill et al. (2004). States 1–14 correspond to species, numbered in decreasing order of abundance in the stationary distribution. State 15 is bare rock, unoccupied by any species. For the identity of species and the transition matrix, see Appendix B

The entropy of this stationary distribution, \(H(\boldsymbol {\pi }) = -\boldsymbol {\pi }^{\mathsf {T}} (\log \boldsymbol {\pi })\), where the logarithm is applied elementwise, is used as an index of biodiversity; it is maximal when all species are equally abundant and goes to 0 in a community dominated by a single species. The sensitivity of H is

$$\displaystyle \begin{aligned} d H = - \left( \log \; \boldsymbol{\pi}^{\mathsf{T}} + {\mathbf{1}}^{\mathsf{T}} \right) d \boldsymbol{\pi} {} \end{aligned} $$
(11.86)

Most ecologists, however, would not include bare substrate in a measure of biodiversity, so we define instead a “biotic diversity” \(H_b(\boldsymbol {\pi }) = H \left ( \boldsymbol {\pi }_b \right )\) where

$$\displaystyle \begin{aligned} \boldsymbol{\pi}_b = \frac{{\mathbf{G}} \boldsymbol{\pi}}{\|{\mathbf{G}} \boldsymbol{\pi} \|}. {} \end{aligned} $$
(11.87)

The matrix G, of dimension 14 × 15, is a 0–1 matrix that selects rows 1–14 of π. Because π is positive, ∥G π∥ = 1 TG π. Differentiating π b gives

$$\displaystyle \begin{aligned} d \boldsymbol{\pi}_b = \left( \frac{{\mathbf{G}}}{{\mathbf{1}}^{\mathsf{T}} {\mathbf{G}} \boldsymbol{\pi}} - \frac{{\mathbf{G}} \boldsymbol{\pi} {\mathbf{1}}^{\mathsf{T}} {\mathbf{G}}}{\left( {\mathbf{1}}^{\mathsf{T}} {\mathbf{G}} \boldsymbol{\pi} \right)^2} \right) d \boldsymbol{\pi} \end{aligned} $$
(11.88)

which simplifies to

$$\displaystyle \begin{aligned} d \boldsymbol{\pi}_b = \left( \frac{{\mathbf{G}} - \boldsymbol{\pi}_b {\mathbf{1}}^{\mathsf{T}} {\mathbf{G}}}{{\mathbf{1}}^{\mathsf{T}} {\mathbf{G}} \boldsymbol{\pi}} \right) d \boldsymbol{\pi} {} \end{aligned} $$
(11.89)

This model contains no explicit parameters; perturbations of the transition probabilities themselves are of interest and a compensation pattern is needed. Because the relative magnitudes of the entries in a column of P reflect the relative abilities of species to capture or to hold space, proportional compensation is appropriate in this case because it preserves these relative abilities.

The sensitivity and elasticity of the biotic diversity H b to changes in the matrix P, subject to proportional compensation, are

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left. {d H_b \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} &\displaystyle =&\displaystyle \underbrace{{d H_b \over d \boldsymbol{\pi}_b^{\mathsf{T}}}}_1 \; \underbrace{{d \boldsymbol{\pi}_b \over d \boldsymbol{\pi}^{\mathsf{T}}}}_2 \; \underbrace{{d \boldsymbol{\pi} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}}}_3 \; \underbrace{{d \mbox{vec} \, {\mathbf{P}} \over d \mbox{vec} \,^{\mathsf{T}} \boldsymbol{\Theta}}}_4 {} \end{array} \end{aligned} $$
(11.90)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \left. {\epsilon H_b \over \epsilon \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} &\displaystyle =&\displaystyle \frac{1}{H_b} \; {d H_b \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; \mathcal{D}\,(\mbox{vec} \, {\mathbf{P}}) {} \end{array} \end{aligned} $$
(11.91)

Term 1 on the right hand side of (11.90) is the derivative of H b with respect to π b, and is given by (11.86). Term 2 is the derivative of the biotic diversity vector π b with respect to the full diversity vector π, given by (11.89). Term 3 is the derivative of the diversity vector π with respect to the transition matrix P, given by, (11.55). Finally, Term 4 is the derivative of the matrix P taking into account the compensation structure in (11.83).

The sensitivity and elasticity vectors (11.90) and (11.91) are of dimension 1 × s 2 = 1 × 255. To reduce the number of independent perturbations, we consider subsets of the p ij: disturbance (in which a species is replaced by bare rock), colonization of unoccupied space, replacement of one species by another, and persistence of a species in its location, where

$$\displaystyle \begin{aligned} \begin{array}{rcl} P[\mbox{disturbance of sp. }i ] &\displaystyle =&\displaystyle p_{si} {} \\ {} P[\mbox{colonization by sp. }i ] &\displaystyle =&\displaystyle p_{is} \\ {} P[\mbox{persistence of sp. }i ] &\displaystyle =&\displaystyle p_{ii} \\ {} P[\mbox{replacement of sp. }i ] &\displaystyle =&\displaystyle \sum_{k \neq i,s} p_{ki} \\ {} P[\mbox{replacement by sp. }i ] &\displaystyle =&\displaystyle \sum_{j \neq i, s} p_{ij}. {} \end{array} \end{aligned} $$

Extracting the corresponding elements of \({\epsilon H_b \over \epsilon \mbox{vec} \,^{\mathsf {T}} {\mathbf {P}}}\) gives the elasticities to these classes of probabilities. Figure 11.3 shows that the dominant species (1 and 2) have impacts that are larger than, and opposite in sign to, those of the remaining species. Biodiversity would be enhanced by increasing the disturbance of, or the replacement of, species 1 and 2, and reduced by increasing the rates of colonization by, persistence of, or replacement by species 1 and 2.

Fig. 11.3
figure 3

The elasticity of the biotic diversity H b(π) calculated over the biotic states of the stationary distribution of the subtidal benthic community succession model of Hill et al. (2004). States 1–14 correspond to species, numbered in decreasing order of abundance in the stationary distribution. State 15 is bare rock, unoccupied by any species. For the identity of species and the transition matrix, see Appendix B

5.2 The Kemeny Constant and Ecological Mixing

Ecologists have used several measures of the rate of convergence of communities modelled by Markov chains, including the damping ratio and Dobrushin’s coefficient of ergodicity (Hill et al. 2004). The Kemeny constant K is an interesting addition to this list; it gives the expected time to get from any initial state to a state selected at random from the stationary distribution (Hunter 2006). Once reaching that state, the behavior of the chain and the stationary process are indistinguishable.

The sensitivity of K, subject to compensation, is

$$\displaystyle \begin{aligned} \left. {d K \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} = {d K \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{Z}}} \; {d \mbox{vec} \, {\mathbf{Z}} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; {d \mbox{vec} \, {\mathbf{P}} \over d \mbox{vec} \,^{\mathsf{T}} \boldsymbol{\Theta}} \end{aligned} $$
(11.92)

where the three terms on the right hand side are given by (11.65), (11.57), and (11.83), respectively.

Figure 11.4 shows the sensitivities dKdvec TP, subject to proportional compensation, and aggregated as in Fig. 11.3. Unlike the case with H b, the two dominant species do not stand out from the others. Increases in the rates of replacement will speed up convergence, and increases in persistence will slow convergence. The disturbance of, colonization by, persistence of, and replacement of species 6 (it is a sea anemone, Urticina crassicornis) have particularly large impacts on K. Examination of row 6 and column 6 of P (Appendix B) shows that U. crassicornis has the highest probability of persistence (p 66 = 0.86), and one of the lowest rates of disturbance, in the community. While it is far from dominant (Fig. 11.2), it has a major impact on the rate of mixing.

Fig. 11.4
figure 4

The sensitivity of the Kemeny constant K of the subtidal benthic community succession model of Hill et al. (2004). States 1–14 correspond to species, numbered in decreasing order of abundance in the stationary distribution. State 15 is bare rock, unoccupied by any species. For the identity of species and the transition matrix, see Appendix B

6 Discussion

Given that many properties of finite state Markov chains can be expressed as simple matrix expressions, matrix calculus is an attractive approach to finding the sensitivity and elasticity to parameter perturbations. Most of the literature on perturbation analysis of Markov chains has focused on the stationary distribution of ergodic chains, but the approach here is equally applicable to absorbing chains, and to dependent variables other than the stationary distribution. The perturbation of ergodic chains is often studied using generalized inverses, since the influential studies of Meyer (Meyer 1975, 1994; Golub and Meyer 1986; Funderlic and Meyer 1986). Matrix calculus provides a complementary approach; the sensitivity of the stationary distribution π obtained here agrees with the result obtained by Golub and Meyer (1986) using the group generalized inverse.

The examples shown here are typical of cases where absorbing or ergodic Markov chains are used in population biology and ecology. In each example, the dependent variables of interest are functions several steps removed from the chain itself. The ease with which one can differentiate such functions is a particularly attractive property of the matrix calculus approach.