# Sensitivity Analysis of Discrete Markov Chains

## Abstract

As we have seen repeatedly, Markov chains are often used as mathematical models of demographic (as well as other natural) phenomena, with transition probabilities defined in terms of parameters that are of interest in the scientific question at hand. Sensitivity analysis is an important way to quantify the effects of changes in these parameters on the behavior of the chain.

## 11.1 Introduction

As we have seen repeatedly, Markov chains are often used as mathematical models of demographic (as well as other natural) phenomena, with transition probabilities defined in terms of parameters that are of interest in the scientific question at hand. Sensitivity analysis is an important way to quantify the effects of changes in these parameters on the behavior of the chain. This chapter revisits, in a more rigorous way, some of the quantities already explored for absorbing Markov chains (Chaps. 4, 5, and 6). It will also consider ergodic Markov chains (in which no absorbing states exist), and calculate the sensitivity of the stationary distribution and measures of the rate of convergence.

Perturbation (or sensitivity) analysis is a long-standing problem in the theory of Markov chains (Schweitzer 1968; Conlisk 1985; Golub and Meyer 1986; Funderlic and Meyer 1986; Seneta 1988, 1993; Meyer 1994; Cho and Meyer 2000; Mitrophanov 2003, 2005; Mitrophanov et al. 2005; Kirkland et al. 2008). When Markov chains are applied as models of physical, biological, or social systems, they are often defined as functions of parameters that have substantive meaning.

## 11.2 Absorbing Chains

**U**, of dimension

*s*×

*s*, is the transition matrix among the

*s*transient states, and

**M**, of dimension

*a*×

*s*, contains probabilities of transition from the transient states to the

*a*absorbing states. Assume that the spectral radius of

**U**is strictly less than 1. Because we are concerned here with absorption, but not what happens after, we ignore transitions among absorbing states; hence the identity matrix (

*a*×

*a*) in the lower right corner. The matrices

**U**[

*] and*

**θ****M**[

*] are functions of a vector of parameters. We assume that*

**θ***varies over some set in which the column sums of*

**θ****P**are 1 and the spectral radius of

**U**is strictly less than one.

### 11.2.1 Occupancy: Visits to Transient States

*ν*

_{ij}be the number of visits to transient state

*i*, prior to absorption, by an individual starting in transient state

*j*. The expectations of the

*ν*

_{ij}are entries of the fundamental matrix \({\mathbf {N}} = {\mathbf {N}}_1 = \left ( E(\eta _{ij}^{~}) \right )\):

*k*th moments about the origin of the

*ν*

_{ij}. The first several of these matrices are (Iosifescu 1980, Thm. 3.1)

### Theorem 11.2.1

*Let*

**N**

_{k}

*be the matrix of kth moments of the ν*

_{ij}

*, as given by*(11.3)

*,*(11.4)

*,*(11.5)

*, and*(11.6) .

*The sensitivities of*

**N**

_{k}

*, for k*= 1, …, 4

*are*

*where (see Sect.*2.8

*)*

### Proof

*k*> 1, and considering

**N**

_{k}as a function of

**N**

_{1}and

**N**

_{dg}, the total differential of

**N**

_{k}is

**N**

_{k}, obtained by taking differentials treating only

**N**

_{1}or only

**N**

_{dg}as variables, respectively. Denote these partial differentials as \(\partial _{\mbox{ {${\mathbf {N}}_1$}}}\)\(\partial _{\mbox{ {${\mathbf {N}}_1$}}}\) and \(\partial _{\mbox{ {${\mathbf {N}}_{\mathrm {dg}}$}}}\) and \(\partial _{\mbox{ {${\mathbf {N}}_{\mathrm {dg}}$}}}\). Differentiating

**N**

_{2}in (11.4), gives

*d*vec

**N**

_{3}and

*d*vec

**N**

_{4}follow the same sequence of steps. The details are given in Appendix A. □

The derivatives of **N**_{2}, **N**_{3}, and **N**_{4} can be used to study the variance, standard deviation, coefficient of variation, skewness, and kurtosis of the number of visits to the transient states (Caswell 2006, 2009, 2011).

### 11.2.2 Time to Absorption

*η*

_{j}be the time to absorption starting in transient state

*j*and let \(\boldsymbol {\eta }_k = E \left (\begin {array}{ccc} \eta _1^k, \cdots ,\eta _s^k \end {array}\right )^{\mathsf {T}}\). The first several of these moments are (Iosifescu 1980, Thm. 3.2)

### Theorem 11.2.2

*Let*

**η**_{ k }

*be the vector of the kth moments of the η*

_{ i }

*. The sensitivities of these moment vectors are*

*where dvec***N**_{1}*is given by* (11.7) .

### Proof

**η**_{1}is obtained (Caswell 2006) by differentiating to get \(d \boldsymbol {\eta }_1^{\mathsf {T}} = {\mathbf {1}}^{\mathsf {T}} \left ( d {\mathbf {N}}_1 \right )\) and then applying the vec operator. For the higher moments, consider the

**η**_{k}to be functions of

**η**_{1}and

**N**

_{1}, and write the total differential

**η**_{2}with respect to

**η**_{1}and

**N**

_{1}are

*d*

**η**_{3}and

*d*

**η**_{4}follow the same sequence of steps; the details are shown in Appendix A. □

### 11.2.3 Number of States Visited Before Absorption

*ξ*

_{i}≥ 1 be the number of distinct transient states visited before absorption, and let

**ξ**_{1}=

*E*(

*). Then*

**ξ**### Theorem 11.2.3

*Let*

**ξ**_{1}=

*E*(

*)*

**ξ***. The sensitivity of*

**ξ***is*

*where dvec***N**_{1}*is given by* (11.7) .

### Proof

*d*vec

**N**

_{dg}gives

### 11.2.4 Multiple Absorbing States and Probabilities of Absorption

When the chain includes *a* > 1 absorbing states, the entry *m*_{ij} of the *a* × *s* submatrix **M** in (11.1) is the probability of transition from transient state *j* to absorbing state *i*. The result of the competing risks of absorption is a set of probabilities \(b_{ij} = P \left [ \mbox{absorption in }i \left | \mbox{starting in }j \right . \right ]\) for *i* = 1, …, *a* and *j* = 1, …, *s*. The matrix \({\mathbf {B}} = \left ( b_{ij} \right ) = {\mathbf {M}} {\mathbf {N}}_1\) (Iosifescu 1980, Thm. 3.3).

### Theorem 11.2.4

*Let*

**B**=

**MN**

_{1}

*be the matrix of absorption probabilities. Then*

### Proof

**B**yields

*d*vec

**N**

_{1}and simplifying gives (11.37). □

*j*of

**B**is the probability distribution of the eventual absorption state for an individual starting in transient state

*j*. Usually a few of those starting states are of particular interest (e.g., states corresponding to “birth” or to the start of some process). Let

**B**(:,

*j*) =

**Be**

_{j}denote column

*j*of

**B**, where

**e**

_{j}is the

*j*th unit vector of length

*s*. Thus the derivative of

**B**(:,

*j*) is

*d*vec

**B**is given by (11.37). Similarly, row

*i*of

**B**is \({\mathbf {B}}(i,:)={\mathbf {e}}_i^{\mathsf {T}} {\mathbf {B}}\) and

**e**

_{i}is the

*i*th unit vector of length

*a*.

### 11.2.5 The Quasistationary Distribution

**w**and

**v**be the right and left eigenvectors associated with the dominant eigenvalue of

**U**, normalized so that ∥

**w**∥ = ∥

**v**∥ = 1. Darroch and Seneta (1965) defined two quasistationary distributions in terms of

**w**and

**v**. The limiting probability distribution of the state of an individual, given that absorption has not yet happened, converges to

### Lemma 1

*Let the dominant eigenvalue of*

**U**

*, guaranteed real and nonnegative by the Perron-Frobenius theorem, satisfy*0 <

*λ*< 1

*, and let*

**w**

*and*

**v**

*be the right and left eigenvectors corresponding to λ, scaled so that*

**w**

^{T}

**v**= 1

*. Then*

### Proof

Equation (11.44) is proven in Caswell (2008, Section 6.1). Equation (11.45) is obtained by treating **v** as the right eigenvector of **U**^{T}. □

### Theorem 11.2.5

*The derivative of the quasistationary distribution*

**q**

_{a}

*is given by*(11.44)

*. The derivative of the quasistationary distribution*

**q**

_{b}

*is*

*where d***w***and d***v***are given by* (11.44) *and* (11.45) *respectively.*

## 11.3 Life Lost Due to Mortality

The approach here makes it easy to compute the sensitivity of a variety of dependent variables calculated from the Markov chain. As an example of this flexibility, consider a recently developed demographic index, the number of years of life lost due to mortality (Vaupel and Canudas Romo 2003).

The transient states of the chains are age classes, absorption corresponds to death, and absorbing states correspond to age at death. Let *μ*_{i} be the mortality rate and \(p_i=\exp (-\mu _i)\) the survival probability at age *i*. The matrix **U** has the *p*_{i} on the subdiagonal and zeros elsewhere. The matrix **M** has 1 − *p*_{i} on the diagonal and zeros elsewhere. Let **f** = **B**(:, 1) be the distribution of age at death and **η**_{1} the vector of expected longevity as a function of age.

A death at age *i* represents the loss of some number of years of life beyond that age. The expectation of that loss is given by the *i*th entry of **η**_{1}, and the expected number of years lost over the distribution of age at death is \(\eta ^\dagger = \boldsymbol {\eta }_1^{\mathsf {T}} {\mathbf {f}}\). This quantity also measures the disparity among individuals in longevity (Vaupel and Canudas Romo 2003). If everyone died at the identical age *x*, **f** would be a delta function at *x* and further life expectancy at age *x* would be zero; their product would give *η*^{†} = 0. Declines in discrepancy have accompanied increases in life expectancy observed in developed countries (Edwards and Tuljapurkar 2005; Wilmoth and Horiuchi 1999). Thus it is useful to know how *η*^{†} responds to changes in mortality.

*η*

^{†}gives

*d*

**η**_{1}and (11.37) for

*d*vec

**B**gives

*gives*

**μ***η*

^{†}= 23.9 years and Japan in 2006, with female life expectancy of 86 years and

*η*

^{†}= 10.1 years (Human Mortality Database 2016). In both cases, elasticities are positive from birth to some age (≈50 for India, ≈85 for Japan) and negative thereafter. This implies that reductions in infant and early life mortality would reduce

*η*

^{†}, whereas reductions in old age mortality would increase

*η*

^{†}. Zhang and Vaupel (2009) have shown that the existence of such a critical age is a general property of these models.

## 11.4 Ergodic Chains

Now let us consider perturbations of an ergodic finite-state Markov chain with an irreducible, primitive, column-stochastic transition matrix **P** of dimension *s* × *s*. The stationary distribution * π* is given by the right eigenvector, scaled to sum to 1, corresponding to the dominant eigenvalue

*λ*

_{1}= 1 of

**P**. The fundamental matrix of the chain is \({\mathbf {Z}} = \left ( {\mathbf {I}} - {\mathbf {P}} + \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right )^{-1}\) (Kemeny and Snell 1960).

We are interested only in perturbations that preserve the column-stochasticity of **P**; i.e., for which **P** remains a stochastic matrix. Such perturbations are easily defined when the *p*_{ij} depend explicitly on a parameter vector * θ*. However, when the parameters of interest are the

*p*

_{ij}themselves, an implicit parameterization must be defined to preserve the stochastic nature of

**P**under perturbation (Conlisk 1985; Caswell 2001). In Sect. 11.4.5 we will explore new expressions for two different forms of implicit parameterization.

Previous studies of perturbations of ergodic chains focus almost completely on perturbations of the stationary distribution, and are divided between those focusing on sensitivity as a derivative (e.g., Schweitzer 1968; Conlisk 1985; Golub and Meyer 1986) and studies focusing on perturbation bounds and condition numbers (Funderlic and Meyer 1986; Meyer 1994; Seneta 1988; Hunter 2005; Kirkland 2003); for reviews see Cho and Meyer (2000) and Kirkland et al. (2008). The approach here is similar in spirit to that of Schweitzer (1968), Conlisk (1985), and Golub and Meyer (1986), in that we focus on derivatives of Markov chain properties with respect to parameter perturbations, but taking advantage of the matrix calculus approach. We do not consider perturbation bounds here.

### 11.4.1 The Stationary Distribution

### Theorem 11.4.1

*Let*

**π***be the stationary distribution, satisfying*

**P**

*=*

**π**

**π***and*

**1**

^{T}

*= 1*

**π***. The sensitivity of*

**π***is*

*where* **Z** *is the fundamental matrix of the chain.*

### Proof

The vector * π* is the right eigenvector of

**P**, scaled to sum to 1. Applying Lemma 1, and noting that

*λ*= 1 and

**1**

^{T}

**P**=

**1**

^{T}, gives \(d \boldsymbol {\pi } = {\mathbf {Z}} \left [ \boldsymbol {\pi }^{\mathsf {T}} \otimes \left ( {\mathbf {I}}_s - \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right ) \right ] d \mbox{vec} \, {\mathbf {P}}\). Noting that

**Z**

*=*

**π***and simplifying the Kronecker products yields (11.55). □*

**π***to a change in a single element of*

**π****P**using the group generalized inverse \(\left ( {\mathbf {I}} -{\mathbf {P}} \right )^\#\) of

**I**−

**P**. Since \(\left ( {\mathbf {I}} -{\mathbf {P}} \right )^\# = {\mathbf {Z}} - \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}}\) (Golub and Meyer 1986), expression (11.55) is exactly the Golub-Meyer result expressed in matrix calculus notation. Our results here permit sensitivity analysis of functions of

*using only the chain rule. If*

**π***g*(

*) is a vector- or scalar-valued function of*

**π***, then*

**π**### 11.4.2 The Fundamental Matrix

The fundamental matrix \({\mathbf {Z}} = \left ( {\mathbf {I}} - {\mathbf {P}} + \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right )^{-1}\) plays a role in ergodic chains similar to that played by **N**_{1} in absorbing chains (Kemeny and Snell 1960). It has been extended using generalized inverses (Meyer 1975; Kemeny 1981), but we do not consider those extensions here.

### Theorem 11.4.2

*The sensitivity of the fundamental matrix is*

### 11.4.3 The First Passage Time Matrix

*j*to

*i*, given by Iosifescu (1980, Thm. 4.7).

**P**is row-stochastic.

### Theorem 11.4.3

### Proof

*d*vec

**Z**

_{dg}yields

### 11.4.4 Mixing Time and the Kemeny Constant

The mixing time *K* of a chain is the mean time required to get from a specified state to a state chosen at random from the stationary distribution * π*. Remarkably,

*K*is independent of the starting state (Grinstead and Snell 2003; Hunter 2006) and is sometimes called Kemeny’s constant; it is a measure of the rate of convergence to stationarity, and is

*K*= trace(

**Z**) (Hunter 2006). In addition to being a quantity of interest in itself, the rate of convergence also plays a role in the sensitivity of the stationary distribution of ergodic chains (Hunter 2005; Mitrophanov 2005).

### Theorem 11.4.4

*The sensitivity of K is*

### Proof

*K*= trace(

**Z**) gives

### 11.4.5 Implicit Parameters and Compensation

Theorems 11.4.1, 11.4.2, 11.4.3, and 11.4.4 are written in terms of *d*vec **P**. However, perturbation of any element, say *p*_{kj}, to *p*_{kj} + *θ*_{kj}, must be compensated for by adjustments of the other elements in column *j* so that the column sum remains equal to 1 (Conlisk 1985). Two kinds of compensation are likely to be of use in applications: additive and proportional. Additive compensation adjusts all the elements of the column by an equal amount, distributing the perturbation *θ*_{kj} additively over column *j*. Proportional compensation distributes *θ*_{kj} in proportion to the values of the *p*_{ij}, for *i* ≠ *k*. Proportional compensation is attractive because it preserves the pattern of zero and non-zero elements within **P**.

**p**, of dimension

*s*× 1, with

*p*

_{i}≥ 0 and ∑

_{i}

*p*

_{i}= 1. Let

*θ*

_{i}be the perturbation of

*p*

_{i}, and write

**A**to be determined. If

*y*is a function of

**p**, then

*= 0.*

**θ**### Additive compensation

*θ*

_{1}is added to

*p*

_{1}and compensated for by subtracting

*θ*

_{1}∕(

*s*− 1) from all other entries of

**p**; clearly ∑

_{i}

*p*

_{i}(

*) = 1 for any perturbation vector*

**θ***.*

**θ****E**to be a matrix of ones, then the matrix

**C**can be written (as a so-called Toeplitz matrix) as

**C**=

**E**−

**I**, with zeros on the diagonal and ones elsewhere. Thus the matrix

**A**in (11.68) is

### Proportional compensation

*p*

_{i}< 1 for all

*i*. The vector

**p**(

*) is*

**θ***θ*

_{1}is added to

*p*

_{1}and compensated for by subtracting

*θ*

_{1}

*p*

_{i}∕(1 −

*p*

_{1}) from the

*i*th entry of

**p**. Again, ∑

_{i}

*p*

_{i}(

*) = 1 for any perturbation vector*

**θ***.*

**θ**### The transition matrix

**p**. Now consider perturbation of a probability matrix

**P**, each column of which is a probability vector. Define a perturbation matrix

**Θ**where

*θ*

_{ij}is the perturbation of

*p*

_{ij}. Perturbations of column

*j*are to be compensated by a matrix

**A**

_{j}, so that

**A**

_{i}compensates for the changes in column

*i*of

**P**. Applying the vec operator to (11.76) gives

**A**

_{i}

**ΘE**

_{ii}; thus

**E**

_{ii}is a matrix with a 1 in the (

*i*,

*i*) entry and zeros elsewhere.

**Theorem 11.4.5**

*Let*

**P**

*be a column-stochastic s*×

*s transition matrix. Let*

**Θ***be a matrix of perturbations, where θ*

_{ij}

*is applied to p*

_{ij}

*, and the other entries of*

**Θ***compensate for the perturbation. Let*

**C**=

**E**−

**I**

*. If compensation is additive, then*

*If compensation is proportional, then*

*Proof*

**P**( **Θ**) is given by (11.79). If compensation is additive, **A**_{i} is given by (11.72) for all *i*. Substituting into (11.79) gives (11.80). Differentiating (11.80) and applying the vec operator gives (11.81).

**A**

_{i}in (11.79) gives (11.82). Differentiating yields

**P**subject to compensation are given by perturbations of

**Θ**. Thus for any function

*y*(

**P**) we can write

*d*vec

**P**∕

*d*vec

^{T}

**Θ**is given (for additive and proportional compensation) by Theorem 11.4.5. The slight notational complexity is worthwhile for clarifying how to use Theorem 11.4.5 in practice.

## 11.5 Species Succession in a Marine Community

Markov chains are used by ecologists as models of species replacement (succession) in ecological communities; (e.g., Horn 1975; Hill et al. 2004; Nelis and Wootton 2010). In these models, the state of a point on a landscape is given by the species occupying that point. The entry *p*_{ij} of **P** is the probability that species *j* is replaced by species *i* between *t* and *t* + 1. If a community consists of a large number of points independently subject to the transition probabilities in **P**, the stationary distribution * π* will give the relative frequencies of species in the community at equilibrium.

Hill et al. (2004) used a Markov chain to describe a community of encrusting organisms occupying rock surfaces at 30–35 m depth in the Gulf of Maine. The Markov chain contained 14 species plus an additional state (“bare rock”) for unoccupied substrate. The matrix **P** was estimated from longitudinal data (Hill et al. 2002, 2004) and is given, along with a list of species names, in Appendix B. We will use the results of this chapter to analyze the sensitivity of species diversity and the Kemeny constant to the processes of colonization and replacement that determing **P**.

### 11.5.1 Biotic Diversity

*, with the species numbered in order of decreasing abundance and bare rock placed at the end as state 15, is shown in Fig. 11.2. The two dominant species are an encrusting sponge (called*

**π***Hymedesmia*) and a bryozoan (

*Crisia*).

*H*is

**G**, of dimension 14 × 15, is a 0–1 matrix that selects rows 1–14 of

*. Because*

**π***is positive, ∥*

**π****G**

*∥ =*

**π****1**

^{T}

**G**

*. Differentiating*

**π**

**π**_{b}gives

This model contains no explicit parameters; perturbations of the transition probabilities themselves are of interest and a compensation pattern is needed. Because the relative magnitudes of the entries in a column of **P** reflect the relative abilities of species to capture or to hold space, proportional compensation is appropriate in this case because it preserves these relative abilities.

*H*

_{b}to changes in the matrix

**P**, subject to proportional compensation, are

*H*

_{b}with respect to

**π**_{b}, and is given by (11.86). Term 2 is the derivative of the biotic diversity vector

**π**_{b}with respect to the full diversity vector

*, given by (11.89). Term 3 is the derivative of the diversity vector*

**π***with respect to the transition matrix*

**π****P**, given by, (11.55). Finally, Term 4 is the derivative of the matrix

**P**taking into account the compensation structure in (11.83).

*s*

^{2}= 1 × 255. To reduce the number of independent perturbations, we consider subsets of the

*p*

_{ij}: disturbance (in which a species is replaced by bare rock), colonization of unoccupied space, replacement of one species by another, and persistence of a species in its location, where

### 11.5.2 The Kemeny Constant and Ecological Mixing

Ecologists have used several measures of the rate of convergence of communities modelled by Markov chains, including the damping ratio and Dobrushin’s coefficient of ergodicity (Hill et al. 2004). The Kemeny constant *K* is an interesting addition to this list; it gives the expected time to get from any initial state to a state selected at random from the stationary distribution (Hunter 2006). Once reaching that state, the behavior of the chain and the stationary process are indistinguishable.

*K*, subject to compensation, is

*dK*∕

*d*vec

^{T}

**P**, subject to proportional compensation, and aggregated as in Fig. 11.3. Unlike the case with

*H*

_{b}, the two dominant species do not stand out from the others. Increases in the rates of replacement will speed up convergence, and increases in persistence will slow convergence. The disturbance of, colonization by, persistence of, and replacement of species 6 (it is a sea anemone,

*Urticina crassicornis*) have particularly large impacts on

*K*. Examination of row 6 and column 6 of

**P**(Appendix B) shows that

*U. crassicornis*has the highest probability of persistence (

*p*

_{66}= 0.86), and one of the lowest rates of disturbance, in the community. While it is far from dominant (Fig. 11.2), it has a major impact on the rate of mixing.

## 11.6 Discussion

Given that many properties of finite state Markov chains can be expressed as simple matrix expressions, matrix calculus is an attractive approach to finding the sensitivity and elasticity to parameter perturbations. Most of the literature on perturbation analysis of Markov chains has focused on the stationary distribution of ergodic chains, but the approach here is equally applicable to absorbing chains, and to dependent variables other than the stationary distribution. The perturbation of ergodic chains is often studied using generalized inverses, since the influential studies of Meyer (Meyer 1975, 1994; Golub and Meyer 1986; Funderlic and Meyer 1986). Matrix calculus provides a complementary approach; the sensitivity of the stationary distribution * π* obtained here agrees with the result obtained by Golub and Meyer (1986) using the group generalized inverse.

The examples shown here are typical of cases where absorbing or ergodic Markov chains are used in population biology and ecology. In each example, the dependent variables of interest are functions several steps removed from the chain itself. The ease with which one can differentiate such functions is a particularly attractive property of the matrix calculus approach.

## References

- Caswell, H. 2001. Matrix Population Models: Construction, Analysis, and Interpretation. 2nd edition. Sinauer Associates, Sunderland, MA.Google Scholar
- Caswell, H., 2006. Applications of Markov chains in demography. Pages 319–334
*in*MAM2006: Markov Anniversary Meeting. Boson Books, Raleigh, North Carolina.Google Scholar - Caswell, H. 2008. Perturbation analysis of nonlinear matrix population models. Demographic Research
**18**:59–116.CrossRefGoogle Scholar - Caswell, H. 2009. Stage, age and individual stochasticity in demography. Oikos
**118**:1763–1782.CrossRefGoogle Scholar - Caswell, H. 2011. Perturbation analysis of continuous-time absorbing Markov chains. Numerical Linear Algebra with Applications
**18**:901–917.CrossRefGoogle Scholar - Cho, G. E., and C. D. Meyer. 2000. Comparison of perturbation bounds for the stationary distribution of a Markov chain. Linear Algebra and its Applications
**335**:137–150.CrossRefGoogle Scholar - Conlisk, J. 1985. Comparative statics for Markov chains. Journal of Economic Dynamics and Control
**9**:139–151.CrossRefGoogle Scholar - Darroch, J. N., and E. Seneta. 1965. On quasi-stationary distributions in absorbing discrete-time finite Markov chains. Journal of Applied Probability
**2**:88–100.CrossRefGoogle Scholar - Edwards, R. D., and S. Tuljapurkar. 2005. Inequality in life spans and a new perspective on mortality convergence across industrialized countries. Population and Development Review
**31**:645–674.CrossRefGoogle Scholar - Funderlic, R. E., and C. D. Meyer, Jr. 1986. Sensitivity of the stationary distribution vector for an ergodic Markov chain. Linear Algebra and its Applications
**76**:1–17.CrossRefGoogle Scholar - Golub, G. H., and C. D. Meyer, Jr. 1986. Using the QR factorization and group inversion to compute, differentiate, and estimate the sensitivity of stationary probabilities for Markov chains. SIAM Journal on Algebraic and Discrete Methods
**7**:273–281.CrossRefGoogle Scholar - Grinstead, C. M., and J. L. Snell. 2003. Introduction to probability. Second edition. American Mathematical Society.Google Scholar
- Hill, M. F., J. D. Witman, and H. Caswell. 2002. Spatio-temporal variation in Markov chain models of subtidal community succession. Ecology Letters
**5**:665–675.CrossRefGoogle Scholar - Hill, M. F., J. D. Witman, and H. Caswell. 2004. Markov chain analysis of succession in a rocky subtidal community. The American Naturalist
**164**:E46–E61.CrossRefGoogle Scholar - Horn, H. S., 1975. Markovian properties of forest succession. Pages 196–211
*in*M. L. Cody and J. M. Diamond, editors. Ecology and evolution of communities. Harvard University Press, Cambridge, MA.Google Scholar - Horvitz, C. C., and S. Tuljapurkar. 2008. Stage dynamics, period survival, and mortality plateaus. American Naturalist
**172**:203–215.CrossRefGoogle Scholar - Human Mortality Database. 2016. University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). www.mortality.org URL www.mortality.org.
- Hunter, J. J. 2005. Stationary distributions and mean first passage times of perturbed Markov chains. Linear Algebra and its Applications
**410**:217–243.CrossRefGoogle Scholar - Hunter, J. J. 2006. Mixing times with applications to perturbed Markov chains. Linear Algebra and its Applications
**417**:108–123.CrossRefGoogle Scholar - Iosifescu, M. 1980. Finite Markov Processes and Their Applications. Wiley, New York, New York.Google Scholar
- Kemeny, J. G. 1981. Generalization of a fundamental matrix. Linear Algebra and its Applications
**38**:193–206.CrossRefGoogle Scholar - Kemeny, J. G., and J. L. Snell. 1960. Finite Markov Chains. Van Nostrand, Princeton, New Jersey.Google Scholar
- Kirkland, S. 2003. Conditioning properties of the stationary distribution for a Markov chain. Electronic Journal of Linear Algebra
**10**:1–15.CrossRefGoogle Scholar - Kirkland, S. J., M. M. Neumann, and N.-S. Sze. 2008. On optimal condition numbers for Markov chains. Numerische Mathematik
**110**:521–537.CrossRefGoogle Scholar - Meyer, C. D. 1975. The role of the group generalized inverse in the theory of finite Markov chains. SIAM Review
**17**:443–464.CrossRefGoogle Scholar - Meyer, C. D. 1994. Sensitivity of the stationary distribution of a Markov chain. SIAM Journal of Matrix Analysis and Applications
**15**:715–728.CrossRefGoogle Scholar - Meyer, C. D., and G. W. Stewart. 1982. Derivatives and perturbations of eigenvectors. SIAM Journal of Numerical Analysis
**25**:679–691.CrossRefGoogle Scholar - Mitrophanov, A. Y. 2003. Stability and exponential convergence of continuous-time Markov chains. Journal of Applied Probability
**40**:970–979.CrossRefGoogle Scholar - Mitrophanov, A. Y. 2005. Sensitivity and convergence of uniformly ergodic Markov chains. Journal of Applied Probability
**42**:1003–1014.CrossRefGoogle Scholar - Mitrophanov, A. Y., A. Lomsadze, and M. Borodovsky. 2005. Sensitivity of hidden Markov models. Journal of Applied Probability
**42**:632–642.CrossRefGoogle Scholar - Nelis, L. C., and J. T. Wootton. 2010. Treatment-based Markov chain models clarify mechanisms of invasion in an invaded grassland community. Proceedings B, The Royal Society of London
**277**:539.CrossRefGoogle Scholar - Schweitzer, P. J. 1968. Perturbation theory and finite Markov chains. Journal of Applied Probability
**5**:401–413.CrossRefGoogle Scholar - Seneta, E. 1988. Perturbation of the stationary distribution measured by ergodicity coefficients. Advances in Applied Probability
**20**:228–230.CrossRefGoogle Scholar - Seneta, E. 1993. Sensitivity of finite Markov chains under perturbation. Statistics and Probability Letters
**17**:163–168.CrossRefGoogle Scholar - Vaupel, J. W., and V. Canudas Romo. 2003. Decomposing change in life expectancy: a bouquet of formulas in honor of Nathan Keyfitz’s 90th birthday. Demography
**40**:201–216.CrossRefGoogle Scholar - Wilmoth, J. R., and S. Horiuchi. 1999. Rectangularization revisited: variability of age at death within human populations. Demography
**36**:475–495.CrossRefGoogle Scholar - Zhang, Z., and J. W. Vaupel. 2009. The age separating early deaths from late deaths. Demographic Research
**20**:721–730.CrossRefGoogle Scholar

## Copyright information

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.