Abstract
When Markov chains are used as mathematical models of natural or social phenomena, the transition intensities or probabilities are usually defined in terms of parameters that are relevant to the scientific question at hand. Sensitivity analysis of such models is important because it quantifies the dependence of the model behavior on the parameters.
Chapter 12 is modified, by permission of John Wiley and Sons, from: Caswell, H. 2012. Perturbation analysis of continuous-time absorbing Markov chains. Numerical Linear Algebra with Applications 18:901-917. ⒸJohn Wiley and Sons.
You have full access to this open access chapter, Download chapter PDF
1 Introduction
When Markov chains are used as mathematical models of natural or social phenomena, the transition intensities or probabilities are usually defined in terms of parameters that are relevant to the scientific question at hand. Sensitivity analysis of such models is important because it quantifies the dependence of the model behavior on the parameters. This chapter presents sensitivity results for finite-state, continuous-time absorbing Markov chains, paralleling the approach for discrete-time chains in Chap. 11. In absorbing chains, interest focuses on behavior prior to absorption (time spent in transient states and time to absorption) and on the probabilities of absorption in each absorbing state. Here we will derive formulae for the sensitivity and the elasticity (i.e., proportional sensitivity) of the moments of the time to absorption, the time spent in each transient state, and the number of visits to each transient state.
The most basic difference between discrete-time and continuous-time Markov chains is that the former are defined by transition probabilities, while the latter are defined by transition rates. This leads to differences in the structure of the matrices, but there is a nice parallelism in the results.
Perturbation analysis of Markov chains has a long history (Schweitzer 1968; Meyer 1975). Most of the literature, however, is devoted to discrete-time chains, and most of that focuses on ergodic chains and the perturbation analysis of the stationary distribution; e.g. Funderlic and Meyer (1986), Golub and Meyer (1986), Hunter (2005), Cho and Meyer (2000), and Seneta (1993). Much less attention has been paid to continuous-time chains. Perturbation expansions have been developed for the stationary distribution of ergodic continuous-time chains, with application to queueing models (Altman et al. 2004), and sensitivity results and perturbation bounds presented for transient solutions (Ramesh and Trivedi 1993; Mitrophanov 2004). The operations research literature contains many studies of the sensitivity of performance measures calculated over realizations of a continuous-time ergodic Markov chain; e.g., Cao (1989), Glasserman (1992), and Cao et al. (1996). The results to be presented here complement and extend the existing literature on perturbation analysis of Markov chains, by focusing on the statistical properties of the solutions of absorbing continuous-time chains, by introducing the use of matrix calculus, and (as a consequence of that technique) extending the range of parameters whose effects can be evaluated.
1.1 Absorbing Markov Chains
I consider a finite state, homogeneous, continuous-time Markov chain with intensity matrix Q, where q ij is the rate of transition from stage j to stage i. The intensity matrix satisfies q ij ≥ 0 for i ≠ j and q jj = −∑i ≠ jq ij. Note that Q is written in column-to-row orientation, and operates on column vectors. An absorbing chain contains at least one absorbing class of states. Numbering the states so that the transient states appear before the absorbing states leads to the intensity matrix
The matrix U contains rates of transitions among the transient states, and M contains the rates of transition from transient to absorbing states.
I assume that U and M are differentiable functions of a vector θ of parameters, and that Q[θ] remains an intensity matrix for sufficiently small perturbations of θ. This includes as a special case the situation where the elements of θ are simply some or all of the q ij, i ≠ j. The goal of the perturbation analysis is to obtain the derivatives of properties of the chain with respect to θ.
2 Occupancy Time in Transient States
Let s be the number of transient states, and ν ij be the time spent in transient state i by an individual starting in transient state j. Define \({\mathbf {N}}_k = E \left ( \nu _{ij}^k \right )\) as the matrix whose entries are the kth moments, and \({\mathbf {N}}_{\mathrm {dg}} = \left ( {\mathbf {N}}_1 \right )_{\mathrm {dg}}\). The matrix N 1 of expectations is the fundamental matrix of the chain. The first several moments of occupancy times are given by the entries of the matrices
and, in general, by
(Iosifescu 1980, Thm. 8.7).
The differentials of the moments (12.2), (12.3), (12.4), and (12.5) are
where I = I s throughout. A recursive relation for all the moments is
The variance, standard deviation, and coefficient of variation of the ν ij are important in applications; they are
where the square root is taken elementwise. Their derivatives are
(suppressing the arguments of V , SD and CV ). Because N 1 usually contains zeros, \(\mathcal {D}\,(\mbox{vec} \, {\mathbf {N}}_1)^{-1}\) must be restricted to the non-zero entries; the coefficient of variation is undefined if the mean is zero.
Derivation
The fundamental matrix N 1 = −U −1. Applying (2.82) yields (12.7). The derivatives of the higher moments are obtained by differentiating N 2 – N 4 in (12.3), (12.4), and (12.5). For example, the differential of N 4 is
using the fact that N dg commutes with itself and d N dg. Applying the vec operator gives
Substituting (11.12) for dvec N dg and (12.7) for dvec N 1 gives (12.10). Results (12.8) and (12.9) are obtained in similar fashion.
Differentiating the recurrence relationship (12.6) gives
Apply the vec operator,
and substitute (11.12) for dvec N dg to obtain (12.11).
The derivative of V in (12.15) comes from differentiating (12.12),
applying the vec operator,
and then using (12.7) and (12.8). The derivative of \(SD \left ( \nu _{ij} \right )\) in (12.16) follows from (2.83). The derivative of \(CV \left ( \nu _{ij}\right )\) in (12.17) is obtained using (2.84), with x = vec SD and y = vec N 1.
3 Longevity: Time to Absorption
Let η j be the time to absorption for an individual currently in transient state j. The vectors of the kth moments of the time to absorption, η k, satisfy
and in general
(Iosifescu 1980, Thm. 8.6)
The variance, standard deviation, and coefficient of variation of the time to absorption are
with the square root taken elementwise.
The derivatives of the moments in (12.24), (12.25), (12.26), and (12.27) are given by
and, recursively,
The derivatives of the variance, standard deviation, and coefficient of variation of the time to absorption are (suppressing the arguments)
Derivation
Differentiating (12.24) for the expected time to absorption gives
Applying the vec operator, substituting (12.7) for dvec N 1, and simplifying gives (12.32). The derivatives of the higher moments are obtained in the same way; e.g., for η 4,
Applying the vec operator yields
Substituting (12.7) for dvec N 1 and simplifying using Eqs. (12.24), (12.25), and (12.26) gives (12.35). The derivatives of the second and third moments, (12.33) and (12.34), are obtained in similar fashion.
The recursive formula (12.36) is obtained by differentiating (12.28)
Apply the vec operator,
substitute (12.7) for dvec N 1, and simplify, to obtain (12.36).
Differentiating (12.29) for the variance yields
Applying the vec operator gives
Substituting (12.32) for d η 1 and (12.33) for d η 2 gives the result (12.37). The derivatives of the standard deviation, in (12.38), and the coefficient of variation, in (12.39), are obtained by differentiating (12.30) and (12.31) and applying (2.83) and (2.84).
4 Multiple Absorbing States and Probabilities of Absorption
Consider a chain that includes a > 1 absorbing states. The entry m ij of the a × s submatrix M in (12.1) is the rate of transition from transient state j to absorbing state i. The probabilities of absorption are defined as
The a × s matrix \({\mathbf {B}} = \left (\begin {array}{c} b_{ij} \end {array}\right )\) is
(Iosifescu 1980, Section 8.5.6). Column j of B is the probability distribution of the eventual absorption state for an individual starting in transient state j. Usually a few starting states are of particular interest (e.g., states corresponding to “birth”). Let B(:, j) = Be j denote column j of B, where e j is the jth unit vector of length s. Then
Similarly, row i of B is \({\mathbf {B}}(i,:)={\mathbf {e}}_i^{\mathsf {T}} {\mathbf {B}}\) and
where e i is the ith unit vector of length a. The derivative of B in (12.49) and (12.50) is
Derivations
Differentiating (12.48) yields
Applying the vec operator and simplifying gives
Substituting (12.7) for dvec N 1 and simplifying gives (12.51).
5 The Embedded Chain: Discrete Transitions Within a Continuous Process
If a continuous-time chain is observed only at the moments when it changes state, the result is a discrete-time process called the embedded Markov chain, or the jump chain, associated with Q (Iosifescu 1980, Section 8.3.2). The transition matrix of this embedded chain can be written
where
The embedded chain provides information on the number of visits to each transient state, rather than the time spent in each transient state. The expected numbers of such visits are given by the fundamental matrix
The sensitivity analysis of the embedded chain follows directly from the discrete-time results in previous chapters (Chaps. 4 and 5).
In particular, the differential of \(\widehat {{\mathbf {N}}}_1\) is Caswell (2006)
However, this derivative is unlikely to be the sensitivity we are looking for. The continuous-time chain is likely to be parameterized in terms of the rate matrices U and M, rather than the probability matrices \(\widehat {{\mathbf {U}}}\) and \(\widehat {{\mathbf {M}}}\). To express the perturbation analysis of \(\widehat {{\mathbf {P}}}\) in terms of the parameters of Q requires the derivatives of the embedded chain with respect to the continuous chain; i.e.,
These derivatives are
Using (12.59) and (12.61), one can write
Derivation
Differentiate \(\widehat {{\mathbf {U}}}\) in (12.55),
apply the vec operator, and use (2.82) and (11.12) for \(d \mbox{vec} \, {\mathbf {U}}_{\mathrm {dg}}^{-1}\). The result is
which simplifies to give (12.59). Similarly, differentiating \(\widehat {{\mathbf {M}}}\) in (12.56) and applying the vec operator gives
Using (2.82) and (11.12) for \(d \mbox{vec} \, {\mathbf {U}}_{\mathrm {dg}}^{-1}\) and simplifying gives (12.61).
6 An Example: A Model of Disease Progression
An important area of application of continuous-time Markov chains is the modelling of transitions among disease states. In this context, the time to absorption is longevity, and the time spent in various transient states has implications for the quality of life during the disease. Fix and Neyman (1951) introduced the idea and proposed a 4-state model for cancer, with two transient states (under treatment or not) and two absorbing states (death from cancer or from other causes). Kay (1986) proposed a model with k disease states and an absorbing state representing death. There is now a large literature on such models and their estimation. Recently, studies have proliferated that use Markov chain models of disease transmission to explore the cost-effectiveness of screening and treatment procedures (e.g., Kuo et al. 1999; Chen et al. 1999; Wu et al. 2006; Sonnenberg and Beck 1993).
Sensitivity analysis reveals how these demographic properties respond to changes in parameters. As an example, I consider a model for the progression of colorectal cancer (CRC) that was developed to study the cost-effectiveness of a new CRC screening technique based on DNA testing of stool samples (Wu et al. 2006). The model includes 7 transient states (normal, small and large adenoma, early and late preclinical CRC, and early and late clinical CRC) and 2 absorbing states (death from CRC and death from other causes); see Fig. 12.1. Parameters were estimated from the literature and from clinical studies in Taiwan.
This model, which describes the so-called natural history of the disease, was embedded in a larger decision model to compare the cost-effectiveness of screening strategies. The intensity matrix (12.1) corresponding to Fig. 12.1 is
The λ i are transition rates; μ is the mortality rate from other causes of death. The incidence rate of small adenoma (λ 1) and the mortality rate due to other causes of death (μ) are age-dependent. Here I have analyzed values for age 70; based on figures in Wu et al. (2006). This leads to a parameter vector (all rates are per year):
6.1 Sensitivity Results
The fundamental matrix (12.2) is
Thus, given these rates, a 70-year old normal condition individual would expect to spend 27 years in stage 1, and only 0.9 and 0.3 years in stages 6 and 7 (early and late clinical CRC).Footnote 1 Individuals in more advanced stages can expect to spend progressively longer periods in stages 6 and 7 (compare across rows 6 and 7 of N 1).
The standard deviations (12.13) of the times spent in the transient states are
Clearly, considerable variation can be expected in the times spent in the various states; the standard deviation equals or exceeds the mean in every case.
Considering the sensitivity analysis of the time spent in transient states, focus on the fate of a normal (state 1) individual. The expected times spent in each state by such an individual are give by N 1(:, 1). From (12.7) and (2.55) the sensitivity and elasticity of N(:, 1) are
These elasticities imply that a 1% increase in λ 1 will (to first order) cause about a 0.4% decrease in the mean time spent in the normal state and a 0.6% increase in the mean time spent in each other state. A 1% increase in λ 4 (the rate of transition between early and late preclinical CRC) creates a 0.6% decrease in the time spent in stages 4 and 6 (the early CRC stages) and a 0.4% increase in the time spent in stages 5 and 7 (the late CRC stages). An increase in the mortality rate μ due to other causes of death reduces the time spent in any of the transient states.
The elasticity of the variance in the time spent in the transient states by an individual in state 1 is
The sign pattern is the same as that of the elasticities of the mean times in (12.68), so we conclude that any parameter change that increases the mean time spent in a transient state will also increase the variance in that time. The elasticities of the variance are comparable to those of the mean (cf. (12.68) and (12.69)), showing that the means and the variance respond with roughly equal proportional changes.
Longevity is measured by the time to absorption, and is a primary concern in analyses of screening or treatment protocols. The vectors of the mean, standard deviation, and coefficient of variation of longevity are
The sensitivity and elasticity of expected longevity (life expectancy) with respect to θ are
Almost all the nonzero elements are negative, because increasing any of the rates leading towards clinical CRC reduces life expectancy, as does increasing the mortality rate due to other causes of death. The exceptions are the sensitivities and elasticities of η 1 to λ 5 (in column 5 of these matrices), which are positive because λ 5 delays the onset of clinical CRC (cf. Fig. 12.1).
The elasticities of E(η 1), the life expectancy of a normal individual, to a change in θ, appear in the first row of (12.71). The largest of these (except for the last column, representing mortality from other causes of death) are to changes in λ 1, λ 2, and λ 3, the rates of transition from normal to small adenoma, small to large adenoma, and large adenoma to preclinical CRC. The rates λ 2 and λ 3 have large effects on E(η 2), and λ 3 has a large effect on E(η 3). These transitions are targets of screening and early treatment; this analysis quantifies the effect that such interventions could have.
The sensitivity and elasticity of the standard deviation of longevity are
and
These have the same sign pattern as the sensitivity of η 1, indicating that any increase in life expectancy will be accompanied by an increase in the variance of longevity. The coefficient of variation takes this joint change into account; from (12.39),
Most of these elasticities are small, suggesting that the mean and standard deviation respond roughly proportionally, so that the CV does not change much.
The matrix B in (12.48), giving the ultimate probability of death from CRC (row 1) or other causes of death (row 2) is
Focusing on the probability of death due to CRC, the sensitivity and elasticity, from (12.50), are
The probability of death from CRC could be reduced by increasing the mortality rate due to other causes (last column), although this is not an attractive treatment option. A more useful interpretation of the last column is as an indication of the increase in death from CRC that would result from reducing other causes of death.
For normal individuals, the risk of death from CRC is most elastic to changes in λ 2, λ 3, and λ 4 (row 1). The row sums of the elasticity matrix, corresponding to the effects of a proportional change in all rates, sum to zero because a change of time scale does not affect the probability of absorption.
6.2 Sensitivity of the Embedded Chain
The transition matrix \(\widehat {{\mathbf {P}}}\) in (12.76) for the embedded chain is
The fundamental matrix \(\widehat {{\mathbf {N}}}_1\) from (12.57) is
In this continuous-time chain, states cannot be re-entered (cf. Fig. 12.1). Because a state can be visited at most once, the mean number of visits is also the probability of ever entering the state. Thus the probabilities that a normal individual will ever suffer early or late clinical CRC are \(\widehat {{\mathbf {N}}}_1 (6,1)=0.1\), and \(\widehat {{\mathbf {N}}}_1(7,1) = 0.07\), respectively. These probabilities increase for individuals in successively later stages; for an individual with large adenoma the probabilities are \(\widehat {{\mathbf {N}}}_1(6.3)=0.2\) and \(\widehat {{\mathbf {N}}}_1(7,3)=0.3\), respectively.
Focusing sensitivity analysis on individuals in the normal state (state 1), the sensitivities and elasticities of the number of visits are
and
The sensitivities and elasticities of the probability of contracting clinical CRC are given by the last two rows. These probabilities are highly elastic to λ 1, λ 2 and λ 3. The elasticities to μ indicate that every 1% reduction in mortality due to other causes will cause about a 1.5% increase in the probability of experiencing clinical CRC.
7 Discussion
The results of this chapter have been presented in terms of differentials of, or derivatives with respect to, a general vector θ of parameters. The nature of these parameters and their relation to Q, U, or M can be very general. At its simplest, θ could consist of some subset of the elements of Q. This is the case in the CRC example (Sect. 12.6), in which the parameters are transition rates λ i and mortality rates μ i. More generally, the transition rates might themselves be written as functions of other variables. For example, in Van Den Hout and Matthews (2009a,b) the rates are written as \(q_{ij}=\exp \left ( \boldsymbol {\beta }_{ij}^{\mathsf {T}} {\mathbf {z}} \right )\), i ≠ j, where z is a vector of covariates (e.g., age, medical care) and β ij is a vector of coefficients to be estimated. The results presented here can be applied directly to such cases, and indeed to even more complicated functional dependencies, using the chain rule. Thus, focusing on parametric dependence is not only scientifically valuable (these are, after all, the relationships of interest in applications of Markov chains) but also extremely general.
Epidemic models are often written as continuous-time Markov chains, specified in terms of rates of movement among infection states. Gómez-Corral and López-García (2018) extended the methods of this chapter to a model in which individuals are classified by two state variables (a level-dependent quasi-birth-death process). The model may be considered a continuous-time analog of the age×stage models of Chap. 6 (Caswell 2012; Caswell and Salguero-Gómez 2013; Caswell et al. 2018). Their approach takes advantage of the block structure of the intensity matrix for such processes. They have also applied the approach to receptor-ligand complexes within cells (López-García et al. 2018). As far removed from demography as molecules may seem, the concepts of i-state transitions, of inferring population behavior from individual trajectories, and of sensitivity analysis still apply. That’s a good thing.
Notes
- 1.
This calculation holds the mortality rate fixed at its values at age 70; in reality it increases with age. Wu et al. (2006) included age variation by providing values of λ 1 (the rate of progression from normal to small adenoma) specific to 5-year intervals from 50 to 70 years of age; all other parameters were age-invariant.
References
Altman, E., K. Avrachenkov, and R. Núnez-queija. 2004. Perturbation analysis for denumerable Markov chains with application to queueing models. Advances in Applied Probability 36:839–853.
Cao, X. 1989. Estimates of performance sensitivity of a stochastic system. IEEE Transactions on Information Theory 35:1058–1068.
Cao, X., X. Yuan, and L. Qiu. 1996. A single sample path-based performance sensitivity formula for Markov chains. IEEE Transactions on Automatic Control 41:1814–1817.
Caswell, H., 2006. Applications of Markov chains in demography. Pages 319–334 in MAM2006: Markov Anniversary Meeting. Boson Books, Raleigh, North Carolina.
Caswell, H. 2012. Matrix models and sensitivity analysis of populations classified by age and stage: a vec-permutation matrix approach. Theoretical Ecology 5:403–417.
Caswell, H., C. de Vries, N. Hartemink, G. Roth, and S. F. van Daalen. 2018. Age×stage-classified demographic analysis: a comprehensive approach. Ecological Monographs 88:560–584.
Caswell, H., and R. Salguero-Gómez. 2013. Age, stage and senescence in plants. Journal of Ecology 101:585–595.
Chen, T.-H., M.-F. Yen, S.-S. Lai, K. S-L, W. C-Y, W. J-M, T. C. Prevost, and D. S. W. 1999. Evaluation of a selective screening for colorectal carcinoma: the Taiwan Multicenter Cancer Screening (TAMCAS) Project. Cancer 86:1116–1128.
Cho, G. E., and C. D. Meyer. 2000. Comparison of perturbation bounds for the stationary distribution of a Markov chain. Linear Algebra and its Applications 335:137–150.
Fix, E., and J. A. Neyman. 1951. A simple stochastic model of recovery, relapse, death and loss of patients. Human Biology 23:205–241.
Funderlic, R. E., and C. D. Meyer, Jr. 1986. Sensitivity of the stationary distribution vector for an ergodic Markov chain. Linear Algebra and its Applications 76:1–17.
Glasserman, P. 1992. Derivative estimates from simulation of continuous-time Markov chains. Operations Research 40:292–308.
Golub, G. H., and C. D. Meyer, Jr. 1986. Using the QR factorization and group inversion to compute, differentiate, and estimate the sensitivity of stationary probabilities for Markov chains. SIAM Journal on Algebraic and Discrete Methods 7:273–281.
Gómez-Corral, A., and M. López-García. 2018. Perturbation analysis in finite LD-QBD processes and applications to epidemic models. Numerical Linear Algebra with Applications page e2160.
Hunter, J. J. 2005. Stationary distributions and mean first passage times of perturbed Markov chains. Linear Algebra and its Applications 410:217–243.
Iosifescu, M. 1980. Finite Markov Processes and Their Applications. Wiley, New York, New York.
Kay, R. A. 1986. Markov model for analysing cancer markers and disease states in survival studies. Biometrics 42:855–865.
Kuo, H. S., H. J. Chang, P. Chou, L. Teng, and T. H. H. Chan. 1999. A Markov chain model to assess the efficacy of screening for non-insulin dependent diabetes mellitus (NIDDM). International Journal of Epidemiology 28:233–240.
López-García, M., M. Nowicka, C. Bendtsen, G. Lythe, S. Ponnambalam, and C. Molina-París. 2018. Quantifying the phosphorylation timescales of receptor–ligand complexes: a Markovian matrix-analytic approach. Open Biology 8:180126.
Meyer, C. D. 1975. The role of the group generalized inverse in the theory of finite Markov chains. SIAM Review 17:443–464.
Mitrophanov, A. Y. 2004. The spectral gap and perturbation bounds for reversible continuous-time Markov chains. Journal of Applied Probability 41:1219–1222.
Ramesh, A. V., and K. Trivedi, 1993. On the sensitivity of transient solutions of Markov models. Pages 122–134 in Proceedings of the 1993 ACM SIGMETRICS Conference on measurement and modeling of computer systems.
Schweitzer, P. J. 1968. Perturbation theory and finite Markov chains. Journal of Applied Probability 5:401–413.
Seneta, E. 1993. Sensitivity of finite Markov chains under perturbation. Statistics and Probability Letters 17:163–168.
Sonnenberg, F. A., and R. Beck. 1993. Markov models in medical decision making: a practical guide. Medical Decision Making 13:322–338.
Van Den Hout, A., and F. E. Matthews. 2009a. Estimating dementia-free life expectancy for Parkinson’s patients using Bayesian inference and microsimulation. Biostatistics 10:729–743.
Van Den Hout, A., and F. E. Matthews. 2009b. A piecewise-constant Markov model and the effects of study design on the estimation of life expectancies in health and ill health. Statistical Methods in Medical Research 18:145–162.
Wu, G.-M., Y.-M. Wang, M.-F. Yen, J.-M. Wong, H.-C. Lai, J. Warwick, and C. TH-H. 2006. Cost-effectiveness analysis of colorectal cancer screening with stool DNA testing in intermediate-incidence countries. BMC Cancer 6:136.
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2019 The Author(s)
About this chapter
Cite this chapter
Caswell, H. (2019). Sensitivity Analysis of Continuous Markov Chains. In: Sensitivity Analysis: Matrix Methods in Demography and Ecology. Demographic Research Monographs. Springer, Cham. https://doi.org/10.1007/978-3-030-10534-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-10534-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10533-4
Online ISBN: 978-3-030-10534-1
eBook Packages: Social SciencesSocial Sciences (R0)