# Intra-eruption forecasting

## Abstract

Forecasting eruption onsets has received much attention, in both the short and long term. However, an eruption is not easily reduced to an instant in time, and forecasting what happens after eruption onset has received little attention. Any useful definition of an eruption has to allow for activity over scales ranging from days to decades, and can do so only by allowing for multiple eruptive phases. These phases can be defined by having different styles (e.g. effusive and/or explosive) of activity and/or quiescent periods between them. A vital question then presents itself: given what we have seen so far of the eruption, what is likely to happen next? We have recoded a global database of multiple-phase eruptions provided by the Smithsonian Institution’s Global Volcanism Program and the USGS into eight major styles of activity. The resulting database contains c. 700 multi-phase eruptions, with each eruption having up to 50 non-quiescent phases. The resulting record of transitions between states is relatively dense, and a probability tree that models 8^{50} possible phase sequences is infeasible. Thus, we use (semi-)Markov chain models in order to assess the probability of transitioning from one phase of activity to another, as a function of the recent eruption activity. Markov chains describe the path from state to state i.e. from one style of activity to another, under the assumption that only the present state determines the probability of the next state, but the definition of ‘state’ can be extended. The ‘order’ of a Markov chain is the number of previous consecutive phases that are considered to define the current state controlling the next transition, and thus higher order Markov chains can account for a greater degree of memory. A semi-Markov chain is one in which the duration in a given state is not necessarily memoryless. We show how a second-order semi-Markov chain can be used to calculate likelihoods for the next style of activity during an eruption, conditional on the type and elapsed duration of the current phase, and the type and duration of the one preceding it. We find that solely effusive behaviour is unlikely to precede violent explosions, and that Plinian eruptions only become more likely during a sequence once a major eruption occurs. A quantitative method for forecasting intra-eruptive activity supports long-term and short-term decision making. To further refine the model, we discuss possible future developments to differentiate between volcanoes, and to incorporate monitoring data in real time to update forecasts.

## Keywords

Multi-phase eruptions Markov chains Eruption forecasting Eruption chronology## Introduction

The development of probabilistic eruption forecasting (Marzocchi and Bebbington 2012) has concentrated largely on forecasting the next initial onset time using the previous reposes and/or monitoring data, supplemented with analogues where necessary (Jenkins et al. 2012; Whelley et al. 2015). Models for long-term forecasts (Wickman 1966; Klein 1982; Bebbington and Lai 1996; Green et al. 2013) are based on past eruption records (Decker 1986) and appear to have been reasonably accurate (Bebbington 2013). Short-term forecasts instead use precursory seismicity (Voight 1988; Kilburn and Voight 1998; Bell et al. 2011; Boue et al. 2016), or multiple monitoring data (Marzocchi et al. 2008; Sobradelo et al. 2014). A more recent innovation has been the attempt to forecast the size of the future eruption (Burt et al. 1994; Marzocchi and Zaccarelli 2006; Marzocchi et al. 2010; Bebbington 2014). Much less attention has been focussed on forecasting how activity evolves once the eruption begins, although new multiparameter investigations of volcano-tectonic seismic signatures exhibit some promise (McCausland et al. 2019). However, volcanic eruptions are complex, cascading events with durations up to years in length, which are not easily reducible to an instant in time. This can lead to problems when using statistical models from the family of point processes (Bebbington 2008; Sheldrake et al. 2016). Once the eruption has begun, much emergency management focus is placed on when or if a major phase of activity will begin and when the end of the eruption will be. Hence, forecasting intra-eruptive activity is critically important. The question of what can be forecast in such circumstances, and how, has not previously been addressed.

While it is technically possible to express the progression of an eruption as an event tree (Newhall and Hoblitt 2002; Wright et al. 2019), facilitating the incorporation of monitoring data (Marzocchi et al. 2008), the number of branches grows rapidly with the number of phases. It is more efficient to use a Markov chain, with all the associated mathematical power, to model the indefinite number of possible phases. An introduction to Markov chains is provided by Guttorp (1995); semi-Markov models are described by Limnios and Oprisan (2001).

An exploratory analysis, which did not incorporate a forecastable model, of the multi-phase makeup of large explosive (VEI ≥ 4) eruptions was performed by Jenkins et al. (2007), using unpublished data held by the Smithsonian Institution’s Global Volcanism Program (GVP). A phase (or ‘stage’ in the terminology of Jenkins et al. 2007) reflects a discrete time period that is characterised by one predominant style of activity. The authors analysed 604 phases from 125 events and found that more than half of their events were multi-phase, the latter exhibiting a median of six stages, including quiescence. The study found that multi-phase eruptions were more likely to be characterised by declining explosivity and increasing effusivity throughout an event, with effusive phases typically of longer duration and interspersed with fewer quiescent phases than the explosive activity. This paper builds on the exploratory work of Jenkins et al. (2007), by constructing a model for forecasting the phase progression of an eruption in progress, using a much larger database coded in a substantially different manner.

The rest of the paper is organized as follows. We first describe the data, and how it is classified and coded for the model. The section following outlines the stochastic model used, basically a quiescence/duration augmented semi-Markov chain with time-varying transition probabilities. Results then include an example validating the model against expert opinion elicited during the 2012 Te Maari and 1991 Pinatubo eruption sequences. The paper concludes with a discussion of possible uses, improvements (particularly the incorporation of monitoring data and better use of analogue volcanoes) and limitations.

## Data

The unpublished dataset on intra-eruptive activity made available by the GVP to Jenkins et al. (2007) has been greatly expanded in the years since their study. The new data, still unpublished, generously made available to us by Ed Venzke (Smithsonian Institution) and Sarah Ogburn (USGS), comprised a total of 781 multi-phase eruptions from AD79 to 2015. The dataset included information on the eruption VEI, start and end date (where available) and, critically, the eruption description. While these eruptions had already been decomposed into phases, these did not always reflect the eruption description, and so we recoded the entire dataset into discrete styles of activity in a consistent manner using the eruption descriptions provided by the GVP record. Where we could identify a more precise record from published sources, including the GVP bulletins, we used these sources to supplement the GVP eruption description in more accurately assigning the phase states and durations.

### Coding states of activity

State definitions

State | Description | Notes |
---|---|---|

1 | Effusive | Solely effusive activity, including lava extrusions (domes), |

lava effusions (flow), fountains and spatter cones. | ||

Satellite thermal anomalies were assumed to be effusive, | ||

unless stated otherwise. | ||

2 | Effusive and explosive | Activity that specifically describes an effusive and |

explosive component, but that cannot be further | ||

divided into discrete phases, either because activity | ||

was contemporaneous e.g. lava dome growth and | ||

phreatic explosion, or because descriptions weren’t | ||

detailed enough. Effusive activity includes all of | ||

those in State 1; explosive activity includes phreatic | ||

through magmatic activity. | ||

3 | Continuously explosive | Explosive activity described as continuous or Strombolian. |

4 | Intermittently explosive | Explosive activity described as intermittent or |

Vulcanian, or with a date range exceeding 2 days | ||

without mention of a major explosion. | ||

5 | Minor explosive eruption | c. < 10-km column height. Where the dates of |

individual explosive eruptions were given, but the | ||

size was unclear, minor eruptions were assumed. | ||

6 | Major explosive eruption | c. 10–20-km column height. Many VEI 3 and most VEI 4 |

eruptions will have at least one. | ||

7 | Plinian explosive eruption | c. > 20-km column height. Most VEI 5 and all VEI 6+ |

eruptions will have at least one. | ||

8 | Deformation | No eruptive activity, but explicit mention of deformation. |

9 | Quiescence | More than 1 day between states 1–8 |

A new phase is deemed to start when the activity style changes. This required some recoding of the GVP data, inserting additional phases. Our definition is purely temporal, with phases not overlapping in time, and so separate phases in the GVP data referring to different locations overlapping in time were rearranged to overlap in location rather than time. The definition of a phase in the GVP seems to have been variable, depending in part on the source, and in some cases we collapsed multiple phases into one. In addition, a number of date typos were corrected. Several eruptions included quiescent phases longer than three months. The GVP arbitrarily treats two eruptions separated by more than 3-month surface quiet as distinct eruptions (Siebert et al. 2010). Conversely, if work shows two phases with an intervening quiescence longer than 3 months to be non-distinct, they are considered part of the same eruption. Thus, quiescent phases longer than 3 months exist in our dataset and we have not separated such events out into two consecutive eruptions.

In general, explosions were coded as continuous (State 3) if noted as ‘continuous’ or ‘Strombolian’, and as intermittent (State 4) if described as such or as ‘Vulcanian’, or if a date range was given. Where eruptions on individual days were noted, they were coded as minor eruptions (State 5). Effusive and explosive activity (State 2; State 1 being effusive only) is assigned where the description specifically indicates it as such. States 5–7 are for discrete events with c. < 2-day duration, as opposed to State 4, which is generally assigned to longer duration events where specific dates are not given. The remaining phase states (8 and 9) are as described in Table 1.

Time is measured in days, start and end times being given in whole days only. A ‘quiescent’ phase (State 9) was inserted between any eruptive phases separated by more than one day. To account for partial days, especially when the start and end dates were the same, 0.5 day was added to the duration of all non-quiescent states for numerical calculation purposes. In many cases, partial details of end dates (and, less usually, start dates) were missing. These were assigned randomly within the implicit limits of the data and the other phases in the eruption. For example, an end date described as ‘June’ would have the day chosen randomly from a discrete uniform distribution on {1, 2, … , 30}, unless a subsequent phase started later in the month. Phases that were described as occurring without a break were coded as such, allowing appropriately for randomness in the case where details of both the end and subsequent start dates were missing. In other cases, there may or may not have been a quiescent phase between the end of one eruptive phase and the beginning of the next. For these entries, we allocated end and/or start dates randomly, preserving the order of phases and allowing for quiescent phases or not, depending upon the simulated dates. Thus, any run of the model may produce slightly different numbers of phases and durations based upon the date assignations, but the effect on model probabilities is negligible.

From this, we extract the state sequence in Table 2.Ash eruptions were recorded at Sheveluch on August 15, October 27, November 1 and November 24. On December 27, a possible gas-and-ash plume was reported. Intermittent gas-and-ash explosions continued into 2002. A short-lived explosive eruption was observed at 11:35 on 29 August sending an ash-rich plume to an estimated altitude of 10 km. The ash cloud drifted SE, and was recorded by geostationary weather-satellite imagery moving E across the Bering Sea. Following a strong tremor on May 7, 2002, a new lava dome was first seen on May 12 between the 1980 dome and the NW crater wall. A major explosion on May 22 destroyed the new dome and the west part of the old dome and produced a 20-km-high eruption column. Dome growth, gas-and-ash emissions and occasional pyroclastic flows were recorded through 2005. Dome growth and occasional thermal anomalies were reported in 2006, although no ash eruptions were reported until December 4. Dome growth, intermittent ash eruptions and occasional pyroclastic flows continued into 2008. On March 31, 2007, a mudflow covered an approximately 900-m-long section of road, in an area 20 km from Sheveluch. A large explosive eruption on October 28, 2010 produced an ash plume to 12-km altitude and pyroclastic flows that travelled 15 km; half of the lava dome was destroyed. Dome growth and intermittent explosive activity continued into 2011.

State sequence of Sheveluch 1991, coded from the GVP eruption bulletins. Eff = Effusive, Exp = Explosion(s), Cts = Continuous, Int = Intermittent, E = Eruption

Start | Description | State | Phase | Subsequent |
---|---|---|---|---|

date | duration (d) | quiescence (d) | ||

Aug 15, 1999 | Ash eruption | Minor E | 1 | 72 |

Oct 27, 1999 | Ash eruption | Minor E | 1 | 4 |

Nov 1, 1999 | Ash eruption | Minor E | 1 | 22 |

Nov 24, 1999 | Ash eruption | Minor E | 1 | 32 |

Dec 27, 1999 | Ash eruption | Minor E | 1 | 0 |

Dec 28, 1999 | Intermittent explosions | Int Exp | 734–865 | 0–131 |

May 5, 2002 | Dome growth | Effusive | 10 | 0 |

May 22, 2002 | Major explosion | Major E | 1 | 0 |

May 23, 2002 | Dome growth | Effusive | 88 | 0 |

Aug 29, 2002 | Ash eruption | Minor E | 1 | 0 |

Aug 30, 2002 | Dome growth | Eff+Exp | 853–1218 | 0–730 |

+ intermittent explosions | ||||

??? ??, 2005 | Dome growth | Effusive | ?–330 | ? |

Apr 12, 2006 | Dome growth | Eff+Exp | 394–758 | 665–1030 |

+ intermittent explosions | ||||

Oct 28, 2010 | Large explosive eruption | Major E | 1 | 0 |

Oct 29, 2010 | Dome growth | Eff+Exp | 65+ | |

+ intermittent explosions |

From the initial eruption dataset, we omitted eruptions with insufficient description to enable coding. This included purely submarine eruptions, and most eruptions with VEI 0. The final dataset contains 698 eruptions from 187 volcanoes. There are 2785 eruptive phases (including only eight deformation phases), and between 4302 and 4312 total phases depending on the random occurrences of intervening quiescence.

Summary of the numbers of observed phase transitions

Phase type | Subsequent phase type | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

Eff | Eff | Cts | Int | Minor | Major | Plinian | Deform. | Quiescence | End | |

+Exp | Exp | Exp | E | E | E | |||||

Start | 83 | 72 | 31 | 232 | 230 | 37 | 9 | 4 | 0 | 0 |

Eff | 8 | 15 | 4 | 8 | 5 | 5 | 0 | 0 | 318 | 113 |

Eff+Exp | 7 | 5 | 8 | 16 | 3 | 11 | 1 | 0 | 254 | 102 |

Cts Exp | 5 | 6 | 0 | 8 | 0 | 4 | 0 | 0 | 84 | 29 |

Int Exp | 20 | 19 | 4 | 4 | 5 | 15 | 4 | 0 | 511 | 230 |

Minor E | 13 | 2 | 1 | 8 | 0 | 0 | 1 | 0 | 539 | 210 |

Major E | 13 | 10 | 3 | 26 | 0 | 1 | 3 | 0 | 75 | 12 |

Plinian E | 1 | 1 | 1 | 9 | 0 | 1 | 0 | 0 | 14 | 0 |

Deform. | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 6 | 0 |

Quiescence | 326 | 277 | 83 | 501 | 530 | 69 | 9 | 6 | 0 | 0 |

## Markov chain analysis

The central idea behind using a Markov chain instead of a tree structure (Newhall and Hoblitt 2002) within which the eruption can only traverse in one direction is that the analysis allows for repeated visits to the same state. A Markov chain thus requires fewer transition probabilities to be estimated. Markov chains have been widely used in volcanic hazard analysis, to model shifts in location (Cronin et al. 2001; Eliasson et al. 2006), the quiescence-unrest-eruption cycle (Wickman 1976; Carta et al. 1981; Bebbington 2007), the unrest-eruption sequence (Aspinall et al. 2006) and volcanic regimes (Bebbington 2007). A Markov chain is characterised by having a ‘state’ *X*(*k*) ∈ {0, 1, … , *m*}, where *k* = 1, 2, … indexes time, governed by ‘transition probabilities’ \(p_{ij} = \Pr \left (X(k+1) = j | X(k) = i \right )\) which are independent of time (*k*), and of the previous states: *X*(1),*X*(2), … , *X*(*k* − 1). This independence is referred to as the ‘Markov property’.

*n*

_{ij}from state

*i*to state

*j*(Table 3). If the number of departures from

*i*, \(n_{i\cdot } = {\sum }_{j} n_{ij}\), then the maximum likelihood estimate of the transition probability from state

*i*to state

*j*is

*n*

_{Eff⋅}= 476 (the sum of the second row in Table 3) and hence the estimated probability of transitioning from an effusive to quiescent state is

*n*

_{Eff,Qui}= 318/476 = 0.67; that is, two thirds of effusive phases are followed by quiescence. These estimated probabilities make up the transition matrix \(P = \left (\hat {p}_{ij}\right )\). Because we have an absorbing state (eruption end), we can decompose this as

*Q*is the transition matrix among the transient states (including eruption start),

*R*is a vector giving the probability that a given state is the last before the end, and 0 is a vector of zeros (it is not possible to transition out of the [End] state).

Several interesting estimates are immediately available. The expected number of future visits to state *j* given that one is in state *i* is *N* = (*I* − *Q*)^{− 1}, where *I* is the identity matrix. The expected number of phases before absorption given that we are in state *i* is then *E* = *N*1, where 1 is a vector of ones. Finally, the probability of a future visit to state *j* given that one is in state *i* is *W* = (*N* − *I*)(*N* ⊗ *I*)^{− 1}, where ⊗ denotes the Hadamard product (element by element multiplication).

### Including durations

If we include quiescence as a state, then the Markov property (probability of next state depends only on current state) means that the majority of transitions into an eruptive state will be from quiescence, in effect making the next eruptive state completely random (or at least independent of all previous eruptive states). The obvious way to incorporate information on the previous eruption phase is to use a second-order Markov chain, which treats every two-state pair as a separate state. If the current state is (*i*,*j*), then the next state must be (*j*,*k*). This would have 9 × 8 = 72 states (the same phase cannot occur twice). However, as consecutive states overlap, only 36 of the 72 states can be reached in one transition. This number of states would require estimating 72 × 36 = 2592 transition probabilities, which is prohibitive for the data available, so we need to be more creative. The way we have chosen to deal with this is not to have quiescence as a state, instead the quiescence length becomes a ‘mark’ on the eruptive state, augmenting it as was done by Bebbington (2007). In fact, there will be two such marks, the quiescence length before the eruptive state (‘pre-quiescence’, denoted by *Q*_{0}) and the quiescence length following the eruption state (‘post-quiescence’, denoted by *Q*_{1}).

Our data give us information of the form \(\Pr (D_{i} = d | i \rightarrow j)\) where *D*_{i} is the duration in state *i*, and \(i \rightarrow j\) indicates that the subsequent transition is from state *i* to state *j*. If quiescence is not treated as a separate state, we can augment this by including the quiescence before or after state *i*. We thus have (in obvious notation) \(\Pr (D_{i} = d, Q_{0} = q | i \rightarrow j)\) if we include quiescence before state *i*, or \(\Pr (D_{i} = d, Q_{1} = q | i \rightarrow j)\) if we use the quiescence between states *i* and *j*.

Although the phase duration and quiescence times are recorded in discrete days, we will model them as continuous random variables. It appears that the log-normal distribution (Bebbington and Lai 1996) is suitable for the purpose, as the phase durations in Fig. 2 can be represented as a mixture of log-normal distributions, where on the log scale they appear as a mixture of normals. A formal test is not feasible due to this mixing, and the fact that some state pairs only exhibit one or two transitions. Hence, we make the assumption based on mathematical and computational tractability, noting that it does not conflict with the appearance of the data.

*Z*has log-normal distribution with parameters

*μ*and

*σ*, then log

*Z*has a normal distribution with mean

*μ*and variance

*σ*

^{2}. The mean of

*Z*is \(\exp \left (\mu + \sigma ^{2}/2\right )\) and the variance of

*Z*is \(\exp \left (2\mu + \sigma ^{2}\right )\left (\exp \left (\sigma ^{2}\right ) - 1 \right )\). If we let

*f*

_{ij}(

*d*) be the probability density for \(D_{i} = d | i \rightarrow j\), then

*μ*

_{ij}and

*σ*

_{ij}are estimated as the mean and standard deviation of the relevant observed (and log-transformed) phase durations. If there are no transitions between states

*i*and

*j*,

*p*

_{ij}= 0 and so we do not need to estimate the duration distribution. Some duration distributions are exemplified by very few (often one) such transitions. In the case of only one transition, we can not simultaneously estimate both

*μ*and

*σ*. In the case of very few observations, they may be too clustered for numerical stability, with the resulting distribution approaching a point mass, which is very unrealistic. Hence, we estimate

*μ*(and

*σ*if possible) from the data, but ensure that

*σ*is not unrealistically small by setting the minimum coefficient of variation as 1/

*n*, where

*n*is the number of observed transitions. As the coefficient of variation (standard deviation divided by the mean) of a log-normal distribution is \(\sqrt {\exp (\sigma ^{2})-1}\), this means setting \(\sigma = \sqrt {\log (1 + 1/n^{2})}\) as a minimum.

*g*

_{ij}(

*q*) for the event ‘\(Q_{0} = q | i \rightarrow j\)’ and

*h*

_{ij}(

*q*) for the event ‘\(Q_{1} = q | i \rightarrow j\)’ are handled similarly to

*f*, with the addition of an atom at 0 representing the possibility of no intervening quiescence i.e.

*δ*

_{A}(

*q*) is the mathematical delta function with the properties that

*δ*

_{A}(

*q*) = 0 if

*q*∉

*A*, and \(\int \delta _{A}(q) \d q = 1\).

*P*values from all the state-pair tests as a sample, the null hypothesis (no correlation) is that the

*P*values should be uniformly distributed on (0,1). Such a meta-analysis of the rank correlations between phase durations and quiescence length (Fig. 3) shows that there is no evidence of dependence between the durations and either the previous or subsequent quiescence length. Using the resulting assumed independence between phase durations and quiescence lengths, we have

*Q*

_{1}, with

*h*replacing

*g*. These are the joint probabilities i.e. assuming that we know

*d*and

*q*. In reality, we will know one of these, and know that the other will be at least as long as that observed to the present. It is then easy to calculate the conditional probabilities

*j*, given that the previous eruptive state was

*i*, and the observed previous and current durations (one eruptive phase, the other quiescence). Equation (6) can be used for day-to-day forecasting during an eruptive phase, similarly (7) can be applied during a quiescence.

### Validation

We compare the modelled probabilities for state transitions through time with those previously established by expert judgement for the 2012 VEI 2 Te Maari eruption from Mt Tongariro, New Zealand (Jolly et al. 2014), and the 1991 VEI 6 eruption of Pinatubo, Philippines (Newhall 2000). This direct comparison with expert forecasts, which were based on monitoring data and additional insights into the volcano and eruption, offers a form of validation for our model estimates.

## Results

*further*Plinian phase. Finally, the probability of visiting a future eruptive state at any point in the sequence, conditional on the currently (or last, if in quiescence) occupied eruptive state is given by

Here, the current state is the row, and each possible future state is the column. This outcome has informationrelevant to living with persistent volcanism; that is: effusive behaviour, continuous or intermittent explosions and minor eruptions do not elevate the probability of a major or Plinian eruption above the ‘at start’ values. In other words, Plinian eruptions do not become more likely during a sequence, unless a major eruption occurs.

^{5}

*m*

^{3}of tephra ejected to a height of 6 km) eruption on August 6, 2012, followed by a smaller eruption on November 21, 2012. In our model, these are both minor eruptions, separated by a quiescent period. Between the two eruptions, there were conflicting monitoring signals from seismic and gas measurements, which prompted the Science Advisory Panel to use expert elicitation to estimate probabilities at several points during the sequence (Jolly et al. 2014). Figure 7 shows the median of the expert’s probabilities (Jolly et al. 2014) that there will be no further eruption in the next 30 days against the conditional probability derived from the model that the next state will be [Eruption End]. The expert-derived median probabilities from Jolly et al. (2014) start lower and increase more rapidly following the eruptions than for our model, at least partially reflecting the difference between forecasting 30 days of quiet versus eruption end (c. > 90 day quiet). Neither approach forecasts the second minor eruption, although the model avoids the tendency of the experts’ probability estimate to quickly saturate at one, in spite of (or perhaps because of) their access to monitoring data and familiarity with the volcano. Nevertheless, the similarity in trends suggests that the model is providing rational, evidence–based forecasts that align with those of experts who may have additional insights on the volcano and eruption in question. A factor that may have weighed with the experts, that is not represented within our generic model, is that this was a phreatic eruption of almost insignificant size.

## Discussion

A second use for the model is in providing an evidence base for assigning prior probabilities to each of the nodes in an event tree e.g. (Neri et al. 2008; Garcia-Aristizabal et al. 2013; Newhall and Pallister 2015). For example, taking a very generic approach where we use all of the phase database rather than a subsection of analogue volcanoes, our model would forecast that the most likely initial phase of an eruption is a minor explosive eruption or intermittent explosions, both with a probability of c. 0.33. Our model can then be used to dynamically update the event tree conditional on current activity or lack of it. Should the eruption begin with, for example, 50 days of effusive and explosive activity followed by quiescence, then the probabilities associated with future eruption states i.e. separate nodes on the event tree can be derived from Figure 4. Such prior probabilities can then be updated within a Bayesian framework using indicators such as monitoring data.

The current database contains only those eruptions identified as multi-phase. To support the pre-eruption development of an event tree or probabilistic forecasts, we will need to include also those eruptions that are demonstrably single-phase. Without these, the probability of an eruption continuing, and hence its duration, may be overestimated and the starting phases incorrectly weighted. A preliminary analysis (Ang Pei Shan, Nanyang Technological University, personal communication, 2018) of the GVP eruption descriptions available online suggests that approximately 35 to 40% of recorded historical eruptions can be considered multi-phase. If the methodology is to be extended to non-historical eruptions (i.e. those observed solely in the geological record), attention to under-reporting issues will be required.

An obvious avenue for future development is the incorporation of real-time monitoring data into the model to allow updating of the probabilities. In order to make the state transition probabilities conditional on monitoring signals, the latter needs to be transformed into a tractable form. One option, exemplified by the methodology in event trees (Newhall and Hoblitt 2002), is to discretize the possible signal levels via the establishment of thresholds, and apply fuzzy logic (Marzocchi et al. 2008) to effectively smooth the probabilities. Alternatively, a normal copula formulation (i.e. a multivariate probability distribution) in a Bayesian belief network (Hincks et al. 2018) could be explored. A less computationally intensive possibility would be to use a generalized linear model with the monitoring data as the independent variables. None of these are simple to develop in a volcano-independent manner. The state duration distributions could potentially also be modified using a proportional hazards formulation (Faenza et al. 2003; Green et al. 2013), given enough monitoring data.

*P*. This leaves 2413 eruptive phases in 631 eruptions from 185 volcanoes, and a recalculated transition matrix While some of the probabilities change considerably in relative terms, for example, the transitions [Continuous explosion] → [Effusive] (0.147 to 0.066), [Minor eruption] → [Plinian eruption] (0.0026 to 0.0013) and [Continuous explosion] → [Major eruption] (0.056 to 0.088), the changes in absolute terms are small (only one probability changes by more than 0.05, this being the decrease in [Continuous explosion] → [Effusive] noted above).

Eruptive phases per volcano

Volcano | No. of phases | No. of eruptions | Earliest eruption |
---|---|---|---|

Vesuvius | 197 | 24 | 79 |

Etna | 175 | 43 | 1537 |

Piton de la Fournaise | 150 | 42 | 1766 |

Asamayama | 108 | 26 | 1645 |

Asosan | 107 | 28 | 1335 |

Krakatau | 80 | 15 | 1883 |

Klyuchevskoy | 66 | 16 | 1789 |

Merapi | 59 | 12 | 1865 |

⋮ | ⋮ |

As a final check, we applied the model to a smaller subset of eruptions (starting from 1950 onwards). Comparing the resulting transition probability matrix with that in Eq. 2, we observed only two noticeable differences: Firstly the transition probability from [Continuous explosion] to [Effusive] drops. Part of this can be ascribed to the deletion of all the eruptions from Vesuvius (where the last eruption was in 1944). In the full data set, there are only 20 such transitions, so this appears to be a small-size effect. The second difference is that the [Plinian] transition probabilities become increasingly grainy due to the reduced number of what is naturally the rarest eruptive phase type. As most Plinian phases occur as the initial phase of an eruption, we do not consider using the full data for intra-eruption forecasting unreasonable.

Our coded database of 698 eruptions from 187 volcanoes between AD79 and 2015 represents a fraction of the GVP Holocene record (currently host to information regarding more than 9800 eruptions (c. 2900 with text descriptions) from 871 volcanoes). With such a large database of phases currently, and the potential to significantly expand, future work should look to subdivide the data by volcano or eruption analogues, and recalculate the transition matrices for specific contexts. For example, if the volcanoes are stratified according to type and dominant composition, do we find that the records from andesitic stratovolcanoes are more informative about the behaviour of an andesitic stratovolcano than, say, rhyolitic calderas? The aim here would be to determine the weights in an ensemble-type forecast (Marzocchi et al. 2012), which will quantify the role of analogue volcanoes (Rodado et al. 2011). We see some variation in phase durations when we subset the current phases database by volcano type or tectonic setting, although the database is dominated by andesitic stratovolcanoes. In order to assess the utility and robustness of the intra-eruptive forecasting for individual volcanoes, we need to compile and analyse as much phase data as possible. Expanding the database presented here is a significant task currently underway.

## Conclusions

Forecasting activity after eruption onset is of prime importance to emergency managers, particularly the question of when or if major explosive activity will begin and when the activity or eruption will end. To address this, we built upon the previous empirical study of intra-eruptive activity by Jenkins et al. (2007) by coding a greatly expanded eruption database for use with a semi-Markov chain model for forecasting the probability of transitions between phases. The unpublished eruption database provided by GVP and USGS was coded for the distinct styles of activity, and their durations, exhibited within 698 multi-phase eruptions from 187 volcanoes around the world. The resulting database of nearly 3000 eruptive phases (plus a further c. 1500 quiescent phases) was used to identify the probabilities of transitioning between different phases. Applying first- and second-order semi-Markov chain models to the data allowed us to condition the transitional probabilities on the previous and current phase duration, as well as the duration of pre- and post-quiescence. Comparing our model probabilities, with observed activity and expert-derived probabilities during eruptions at Tongariro and Pinatubo, shows that the model provides similar trends and values to those of experts, who have access to monitoring data and are very familiar with the volcano. This suggests that the model is providing rational, evidence–based forecasts and has potential as a supporting tool in intra-eruptive forecasting prior to or during an eruption. Future work could look to expand the database, and re-assess across subsets of volcanoes and/or eruptions. The potential to incorporate monitoring data into the model and output probabilities is also an interesting potential avenue of study.

## Notes

### Acknowledgments

Sarah Ogburn and Ed Venzke kindly provided the GVP phase data. We thank the two anonymous reviewers and the editors for their suggestions.

### Funding

This work was funded by the New Zealand Natural Hazards Research Platform Grant 2015-MAU-PC-01. This work comprises Earth Observatory of Singapore contribution no. 239. This research is supported by the National Research Foundation Singapore and the Singapore Ministry of Education under the Research Centres of Excellence initiative.

## Supplementary material

## References

- Aspinall WP, Carniel R, Jaquet O, Woo G, Hincks T (2006) Using hidden multi-state Markov models with multi-parameter volcanic data to provide empirical evidence for alert level decision-support. J Volcanol Geotherm Res 153:112–124CrossRefGoogle Scholar
- Bebbington M, Lai C (1996) On nonhomogeneous models for volcanic eruptions. Math Geol 28:585–600CrossRefGoogle Scholar
- Bebbington MS (2007) Identifying volcanic regimes using hidden Markov models. Geophys J Int 171:921–942CrossRefGoogle Scholar
- Bebbington M (2008) Incorporating the eruptive history in a stochastic model for volcanic eruptions. J Volcanol Geotherm Res 175:325–333CrossRefGoogle Scholar
- Bebbington MS (2013) Assessing probabilistic forecasts of voclanic eruption onsets. Bull Volcanol 75:783CrossRefGoogle Scholar
- Bebbington MS (2014) Long-term forecasting of volcanic explosivity. Geophys J Int 197:1500–1515CrossRefGoogle Scholar
- Bebbington M, Zitikis R (2016) Dynamic uncertainty in cost-benefit analysis of evacuation prioor to a volcanic eruption. Math Geosci 48:123–148CrossRefGoogle Scholar
- Bell AF, Naylor M, Heap MJ, Main IG (2011) Forecasting volcanic eruptions and other material failure phenomena: an evaluation of the failure forecast method. Geophys Res Lett 38:L15304Google Scholar
- Boue A, Lesage P, Cortes G, Valette B, Reyes-Davila G, Arambula-Mendoza R, Bdui-Santosa A (2016) Performance of the ‘material Failure Forecast Method’ in real-time situations: A Bayesian approach applied on effusive and explosive eruptions. J Volcanol Geotherm Res 327:622–633CrossRefGoogle Scholar
- Burt ML, Wadge G, Scott WA (1994) Simple stochastic modelling of the eruption history of a basaltic volcano: nyamuragira, Zaire. Bull Volcanol 56:87–97CrossRefGoogle Scholar
- Carta S, Figari R, Sartoris G, Sassi R, Scandone R (1981) A statistical model for Vesuvius and its volcanological implications. Bull Volcanol 44:129–151CrossRefGoogle Scholar
- Cronin SJ, Bebbington M, Lai CD (2001) A probabilistic assessment of eruption recurrence on Taveuni volcano. Fiji Bull Volcanol 63:274–288CrossRefGoogle Scholar
- Decker RW (1986) Forecasting volcanic eruptions. Ann Rev Earth Planet Sci 14:267–291CrossRefGoogle Scholar
- Eliasson J, Larsen G, Gudmundsson MT, Sigmundsson F (2006) Probabilistic model for eruptions and associated flood events in the Katla caldera, Iceland. Comput Geosci 10:179–200CrossRefGoogle Scholar
- Faenza L, Marzocchi W, Boschi E (2003) A non-parametric hazard model to characterize the spatio-temporal occurrence of large earthquakes;an application to the Italian catalogue. Geophys J Int 155:521–531CrossRefGoogle Scholar
- Garcia-Aristizabal A, Selva J, Fujita E (2013) Integration of stochastic models for long-term eruption forecasting into a Bayesian event tree scheme: A basis method to estimate the probability of volcanic unrest. Bull Volcanol 75:689CrossRefGoogle Scholar
- Green RM, Bebbington MS, Cronin SJ, Jones G (2013) Geochemical precursors for eruption repose length. Geophys J Int 193:855–873CrossRefGoogle Scholar
- Guttorp P (1995) Stochastic modeling of scientific data. CRC Press, Boca Raton, pp 384CrossRefGoogle Scholar
- GVP Bulletin (2011) http://volcano.si.edu/volcano.cfm?vn=300270&vtab=Bulletin
- Hincks T, Aspinall W, Cooke R, Gernon T (2018) Oklahoma’s induced seismicity strongly linked to wastewater injection depth. Science 359:1251–1255CrossRefGoogle Scholar
- Jenkins SF, Magill CR, McAneney J (2007) Multi-stage voclanic events: a statistical investigation. J Volcanol Geotherm Res 161:275–288CrossRefGoogle Scholar
- Jenkins S, Magill C, McAneney J, Blong R (2012) Regional ash fall hazard i: a probabilistic assessment methodology. Bull Volcanol 74:1699–1712CrossRefGoogle Scholar
- Jolly GE, Keys HJR, Procter JN, Deligne NI (2014) Overview of the co-ordinated risk-based approach to science and management response and recovery for the 2012 eruptions of Tongariro voclano, New Zealand. J Volcanol Geotherm Res 286:184– 207CrossRefGoogle Scholar
- Kilburn CRJ, Voight B (1998) Slow rock fracture as eruption precursor at Soufriere Hills volcano. Montserrat Geophys Res Lett 25:3665–3668CrossRefGoogle Scholar
- Klein FW (1982) Patterns of historical eruptions at Hawaiian volcanoes. J Volcanol Geotherm Res 12:1–35CrossRefGoogle Scholar
- Limnios N, Oprisan G (2001) Semi-Markov Processes and Reliability. Birkhauser, Cambridge, pp 222CrossRefGoogle Scholar
- McCausland WA, Gunawan H, White RA, Indrastuti N, Patria C, Suparman Y, Putra A, Triastuty H, Hendrasto M (2019) Using a proces-based model of pre-eruptive patterns to forecast evolving eruptive styles at Sinabung Volcano, Indonesia. J Volcanol Geotherm Res in press, https://doi.org/10.1016/j.jvolgeores.2017.04.004
- Marzocchi W, Woo G (2007) Probabilistic eruption forecasting and the call for an evacuation. Geophys Res Lett 34:L22310CrossRefGoogle Scholar
- Marzocchi W, Zaccarelli L (2006) A quantitative model for the time-size distribution of eruptions. J Geophys Res 111:B04204CrossRefGoogle Scholar
- Marzocchi W, Sandri L, Selva J (2008) BET_EF: A probabilistic tool for long- and short-term eruption forecasting. Bull Volcanol 70:623–632CrossRefGoogle Scholar
- Marzocchi W, Woo G (2009) Principles of volcanic risk metrics: Theory and the case study of Mount Vesuvius and Campi Flegrei, Italy. J Geophys Res 114:B03213CrossRefGoogle Scholar
- Marzocchi W, Sandri L, Selva J (2010) BET_VH: A probabilistic tool for long-term volcanic hazard assessment. Bull Volcanol 72:705–716CrossRefGoogle Scholar
- Marzocchi W, Zechar JD, Jordan TH (2012) Bayesian forecast evaluation and ensemble earthquake forecasting. Bull Seismol Soc Am 102:2574–2584CrossRefGoogle Scholar
- Marzocchi W, Bebbington MS (2012) Probabilistic eruption forecasting at short and long time scales. Bull Volcanol 74:1777–1805CrossRefGoogle Scholar
- Miller RG (1981) Simultaneous Statistical Inference, 2nd edn. Springer, New York, p 299CrossRefGoogle Scholar
- Neri A, Aspinall WP, Cioni R, Bertagnini A, Baxter PJ, Zuccaro G, Andronicao D, Barsotti S, Cole PD, Esposti Ongaro T, Hincks TK, Macedonio G, Papale P, Rosi M, Santacroce R, Woo G (2008) Developing an event tree for probabilistic hazard and risk assessment at Vesuvius. J Volcanol Geotherm Res 178:397–415CrossRefGoogle Scholar
- Newhall CG, Daag AS, Delfin FG Jr, Hoblitt RP, McGeehin J, Pallister J, Rubin M, Tamayo RA Jr, Tubianosa B, Umbal JV (1996) Eruptive history of Mount Pinatubo. In: Newhall C G, Punongbayan R S (eds) Fire and mud. PHIVOLCS and University of Washington Press, pp 165–195Google Scholar
- Newhall C (2000) Volcano warnings. In: Sigurdsson H, Houghton B, McNutt S, Rymer H, Stix J (eds) Encyclopedia of Volcanoes. Academic Press, New York, pp 1185–1197Google Scholar
- Newhall CG, Hoblitt RP (2002) Constructing event trees for volcanic crises. Bull Volcanol 64:3–20CrossRefGoogle Scholar
- Newhall CG, Pallister JS (2015) Using multiple data sets to populate probabilistic voclanic event trees. In: Papale P (ed) Volcanic Hazards, Risks, and Disasters. Elsevier, Amsterdam, pp 203–232CrossRefGoogle Scholar
- Rodado A, Bebbington M, Noble A, Cronin S, Jolly G (2011) On selection of analogue volcanoes. Math Geosci 43:505–519CrossRefGoogle Scholar
- Sheldrake TE, Sparks RSJ, Cashman KV, Wadge G, Aspinall WP (2016) Similarities and differences in the hictorical records of lava dome-building volcanoes: Implications for understanding magmatic proceses and eruption forecasting. Earth Sci Rev 160:240–263CrossRefGoogle Scholar
- Siebert L, Simkin T, Kimberly P (2010) Volcanoes of the world. Third Edition. University of California Press in association with Simthsonian Institution, pp 551Google Scholar
- Sobradelo R, Bartolini S, Marti J (2014) HASSET: A Probability event tree tool to evaluate future volcanic scenarios using Bayesian inference. Bull Volcanol 76:1–15CrossRefGoogle Scholar
- Voight B (1988) A method for prediction of volcanic eruptions. Nature 332:125–130CrossRefGoogle Scholar
- Whelley P, Newhall C, Bradley K (2015) The frequency of explosive volcanic eruptions in Southeast Asia. Bull Volcanol 77:1–11CrossRefGoogle Scholar
- Wickman FE (1966) Repose period patterns of volcanoes I. volcanic eruptions regarded as random phenomena. Arkiv Miner Geol 4:291–301Google Scholar
- Wickman FE (1976) Markov models of repose-period patterns of volcanoes. In: Merriam D F (ed) Random processes in geology. Springer, New York, pp 135–161CrossRefGoogle Scholar
- Wright HMN, Pallister JS, McCausland WA, Griwold JP, Andreastuti S, Budianto A, Primulyana S, Gunawan H (2019) Construction of probabilistic event trees for eruption forecasting at Sinabung volcano, Indonesian 2013-14. J Volcanol Geotherm Res, in press https://doi.org/10.1016/j.jvolgeores.2018.02.003
- Woo G (2008) Probabilistic criteria for volcano evacuation decisions. Nat Hazards 45:87–97CrossRefGoogle Scholar

## Copyright information

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.