1 Introduction

In our previous work, we have already proven some important variations in seismicity determined by tectonic setting. Bird et al. (2002) confirmed that mid-ocean spreading ridges have very low corner magnitude (\(m_c \approx 5.8\)) and low seismic coupling, which declines further with increasing relative plate velocity. Oceanic transform faults also showed an order-of-magnitude decline in coupling with velocity, and a corner magnitude that declines from ∼7.1 to ∼6.3 with increasing velocity. Bird and Kagan (2004) extended the study to all shallow earthquakes, and showed additional variations in coupling and corner magnitude between subduction, continental collision, continental transforms, and continental rifts. Given this proven variability, it would not be surprising if different tectonic zones had different earthquake-clustering and/or -triggering behaviors. Boettcher and Jordan (2004) and McGuire et al. (2005) have already found that earthquakes on oceanic transform faults have more foreshocks and fewer aftershocks. Therefore, our first hypothesis is that interesting variations will be discovered if the empirical clustering parameters in the earthquake branching model (Kagan, 1991) are redetermined in tectonically-defined zones. (Previous global studies, which did not distinguish between tectonic settings, presumably gave results dominated by the behavior of subduction zones.) However, we do not propose to divide as finely as Bird and Kagan (2004), because their code for classification of earthquakes into subcatalogs by “plate boundary class” is complex and not easy to incorporate into forecasts testable by independent agencies. Instead, we will define geographically-contiguous areas of related tectonic style, and define the union of all areas with the same style as a “tectonic zone” (see Fig. 1, below).

Fig. 1
figure 1

Global tectonic zones

These zones are similar in concept to the Flinn–Engdahl zones (Flinn et al., 1974; Young et al., 1996); but they are much larger, and they will be based on the detailed classification of plate-boundary steps (mean length 43 km) in the PB2002 model of Bird (2003). The spatial contiguity of each area within each zone is an important advantage:

  • Epicenter location errors become relatively unimportant.

  • No complex algorithm is needed to decide which actual earthquakes belong to a zone.

  • Testing can include earthquakes below the Harvard CMT threshold (∼5.6) in cases where focal mechanisms are not required.

We start by investigating how different branching models approximate the behavior of global earthquake catalogs, then consider the proposition that different tectonic regimes may have different clustering behaviors. Global catalogs are free of spatial boundary effects and considerably more homogeneous than regional catalogs. Moreover, regional seismicity may be dominated by aftershock sequences of a few strong events, like the m7.5 1952 Kern County and the m7.3 1992 Landers earthquakes in southern California. Explosions and earthquakes caused by volcanic and geothermal activity are more likely to contaminate earthquake records in local and regional catalogs.

On the other hand, it is important to analyze regional catalogs as well. If we see that model parameter values are similar for worldwide and regional catalogs (Kagan, 1991), then we may conclude that the models are relatively robust and therefore suitable for earthquake forecasting. It is also important to investigate various parameterizations of the branching models, especially the spatial and temporal fitting of seismicity patterns (Kagan, 1991) in order to find the best algorithms. Kagan and Jackson (2010) used the results of the present study to develop a technique for calculating and evaluating long- and short-term earthquake rate forecasts in practically any seismically active region of the Earth.

Recently many researchers have applied ETAS (Epidemic Type Aftershock Sequence) branching models (Ogata, 1988, 1998; Ogata and Zhuang, 2006; Zhuang et al., 2005; Ogata et al., 2003; Console and Murru, 2001; Console et al., 2003) to Japan, California, and Italy. The present ETAS programs do not calculate the distance between earthquake locations based on spherical geometry. It should be noted that even the regional studies were conducted with no attempts to compare the results for different tectonic areas or catalogs. Lombardi and Marzocchi (2007) and Marzocchi and Lombardi (2008) have applied the ETAS model to two global catalogs of large earthquakes.

This work is dedicated to the late Frank Evison, a superb exemplar of careful attention to detail and a passionate advocate of rigorous testing in earthquake prediction research.

2 Defining Tectonic Zones

2.1 Objective

Our goal is to divide the Earth surface into a modest number of zones of different tectonic style, defined by objective criteria, which are known from previous research (e.g., Bird and Kagan, 2004) to contain interesting variations in seismicity parameters such as seismic coupling and/or corner magnitude. It is reasonable to anticipate that these zones might have different branching behaviors, and our present definition of zones is designed with consideration for practical matters that will permit such testing to be conducted relatively easily. Zones are defined here as surface areas into which epicenters (and/or epicentroids) of shallow earthquakes may fall. Precise depth of the earthquake will not be a criterion because it is usually poorly known (within the 0–70-km-depth range) unless there is local station control. Focal mechanism will not be a criterion because this is not always available for the smaller earthquakes (m < 5.6) which make up large portions of the aftershock swarms we wish to include in this study. The number of zones we have defined is small (5) so that sufficient earthquakes fall into each zone within a few decades, allowing for reliable optimizations of branching models. Therefore, no zone has a local geographic name like “Aleutian;” instead, each zone has a generic name like “Trench.” We allow one tectonic zone to be the union of many non-contiguous patches. The preceding choices make it impossible to separate strike-slip from normal-faulting earthquakes on mid-ocean spreading ridges, or in continental rift zones. Also, along many trenches it will be impossible to separate subduction-related earthquakes from back-arc-spreading earthquakes. Therefore, the tectonic zones are not equivalent to the seven plate-boundary classes of Bird (2003), and they have been given new, distinct names that will not be confused with plate-boundary classes.

We propose the following short list of tectonic zones (with identifying integers for compact representation in computer files):

  • (4) Trench (including incipient subduction, and earthquakes in outer rise or upper plate);

  • (3) Fast-spreading ridges (oceanic crust, spreading rate ≥40 mm/a; includes transforms);

  • (2) Slow-spreading ridges (oceanic crust, spreading rate <40 mm/a, includes transforms);

  • (1) Active continent (including continental parts of all orogens of PB2002, plus continental plate boundaries of PB2002); and

  • (0) Plate-interior (the rest of the Earth’s surface).

Fast-spreading versus slow-spreading ridges are divided at 40 mm/a because (a) Bird and Kagan (2004) found a change in OTF (Oceanic Transform Fault) earthquakes at ∼39 mm/a, and (b) there is a change in ridge morphology around this rate, from central-graben to peaked (Macdonald, 1986). “Active continent” is not subdivided by style of faulting, as this would require focal mechanisms (not always available) and obscure the basic definition that a zone is the union of a set of predefined geographic regions.

2.2 General Method

Our tectonic zones are defined by objective rules, implemented in a computer program, because this is more reproducible, easier to explain, easier to revise, and less subject to procedural errors. Our program assigns tectonic zone integers to grid points rather than drawing curves along boundaries of tectonic zones. It loops through latitudes (in 0.1° steps) and also loops through longitudes (in 0.1° steps), and creates a grid of integers to identify the tectonic zone at each point.

The output is a relatively compact representation (1801 latitudes × 3601 longitudes × 1 to 2 bytes = 6 to 12 Mb). The necessary data sets are available in digital form: elevation from ETOPO5 (Anonymous, 1988), age of seafloor from Mueller et al. (1997); plate boundaries and Euler poles from Bird (2003). In the Appendix we describe details of zone definition implementation. The result of the global tectonic zones classification is shown in Fig. 1 and is available at http://bemlar.ism.ac.jp/wiki/index.php/Bird%27s_Zones.

3 Earthquake Catalogs

We applied the likelihood technique to several available global and regional earthquake catalogs.

We studied the earthquake distributions and clustering for the global CMT catalog of moment tensor inversions compiled by Ekström et al. (2005). The present catalog contains more than 28,000 earthquake entries for the period 1977/1/1 to 2007/12/31, earthquake size is characterized by a scalar seismic moment M.

The PDE worldwide catalog is published by the USGS (U.S. Geological Survey, 2008); the catalog available at the time this article was written ended on January 1, 2008. The catalog measures earthquake size, using several magnitude scales, and provides the body-wave (m b ) and surface-wave (M S ) magnitudes for most moderate and large events since 1965 and 1968, respectively. Recently, the moment magnitude (m w ) estimate has been added.

Determining one measure of earthquake size for the PDE catalog entails a certain difficulty: for example, Kagan (1991) calculates a weighted average of several magnitudes to use in the likelihood search. Kagan (2003) also analyzes systematic and random errors for various magnitudes in the PDE catalog. At various times many magnitudes have been listed in the PDE catalog, and their relationships are difficult to establish. Therefore, in this work we chose a palliative solution: to use the maximum magnitude among those magnitudes shown for each earthquake. This solution is easier to carry out and the results are easily reproducible. For moderate earthquakes usually m b or M S magnitude is selected, for larger recent earthquakes the maximum magnitude is most likely m w .

The ANSS (Advanced National Seismic System, 2008) composite catalog is a world-wide earthquake catalog that is created by merging the master earthquake catalogs from contributing ANSS institutions and then removing duplicate solutions for the same event.

The CalTech (CIT) data set (Hileman et al., 1973; Hutton and Jones, 1993) was the first instrumental local catalog to include small earthquakes (m ≥ 3), beginning in 1932. In recent years even smaller earthquakes have been included in the catalog.

4 Corner Magnitudes in the Tectonic Zones

Corner magnitude (m c ) is, by definition, associated with corner moment (M c ). We use the moment-magnitude conversion formula

$$ m_w = \frac{2}{3} (\log_{10}M - C), $$
(1)

(Hanks and Kanamori, 1979; Hanks, 1992), where M is measured in Newton m (Nm), however the calculations in our programs have been made with the seismic moment tensor which is the appropriate measure of earthquake size. In order to keep the available computer programs unchanged, in this section we take C = 9.05 (Bird and Kagan, 2004, Eq. 1), but in Sects. 5 and 6 C is taken to be 9.0. These changes in C and the moment threshold values explain the differences in the earthquake numbers for the CMT catalog shown in Tables 1 and 2 below.

Table 1 Parameters of the tapered Gutenberg–Richter frequency/magnitude relations of the tectonic zones, with 95%-confidence ranges, \(M \ge 10^{17.45} = 2.818 \times 10^{17}\,\hbox{Nm}\) (C is taken 9.05 in Eq. 1).
Table 2 Parameters for various subdivisions of CMT catalog, 1982/01/01–2007/03/31, m w ≥5.55 or \(M \ge 10^{17.325}= 2.113 \times 10^{17}\,\hbox{Nm}\) (C is taken 9.0 in Eq. 1).

Corner moment and asymptotic spectral slope (β) are the two parameters of the tapered Gutenberg–Richter model for the cumulative frequency/moment relation:

$$ G(M, M_t, \beta, M_c)= (M_t/M)^{\beta}\exp [(M_t - M )/M_c]\quad \hbox{for}\quad M_t \le M \le \infty, $$
(2)

where G is the fraction of earthquakes (by event count) in the catalog with moment exceeding M, and M t is the threshold moment for completeness of the catalog (Jackson and Kagan, 1999; Kagan and Jackson, 2000; Bird and Kagan, 2004). The corner moment (M c ) can be thought of as the earthquake size which is rarely exceeded (e.g., about once per century for subduction earthquakes), and in maximum-likelihood fitting of (2) to actual large (sub)catalogs it is typically not very different from the second-largest magnitude. Although the form of (2) is chosen for simplicity and acceptable fit, a manner of upper magnitude limit is required to keep seismic moment rates finite. Bird and Kagan (2004) reported corner magnitudes for seven types of plate boundaries, and these ranged from a low of 5.86 +0.19−0.16 for normal-faulting earthquakes on Ocean Spreading Ridges (OSR) to a high of 9.58 +0.48−0.46 for shallow events in Subduction zones (SUB). (Ranges are 95%-confidence.) The spread of corner magnitudes among tectonic zones is less, because each tectonic zone (except zone 0) merges the seismicity of different plate-boundary types as defined by Bird and Kagan (2004). At the low end, the corner magnitude for zones 2 and 3 (slow- and fast-spreading ridges) is higher than that of OSR/normal because of the inclusion of OTF seismicity. At the high end, the corner magnitude of zone 4 (Trenches) is less than that of SUB because of dilution by Oceanic Convergent Boundary (OCB) and oceanic convergent orogen seismicity, and that of some overlying backarc OSRs.

To determine the corner magnitudes of the tectonic zones, we divided the shallow (≤70-km-depth) events with m w ≥5.6 (scalar seismic moment M = 1017.45 Nm or M = 2.818 × 1017 Nm, see Eq. 1) from the Global CMT catalog (1982/01/01-2008/03/31) into 5 zone subcatalogs using the gridded zone assignments of Fig. 1. In Fig. 2 we compare the tapered Gutenberg–Richter model (2) to actual cumulative distribution for two tectonic zones: ‘Active continent’ (1) and ‘Slow-spreading ridges’ (2).

Fig. 2
figure 2

Comparison of tapered Gutenberg–Richter model for the frequency/magnitude distribution to actual cumulative distribution. Earthquakes with m > 7.5 are identified. In each frequency/magnitude plot, three model-tapered Gutenberg–Richter distribution curves are shown. Each has the optimal spectral slope β from Fig. 3 below. The three variant curves show the minimum, best-estimate, and maximum corner magnitude (except that a value of m c = 10 is substituted for an unbounded corner magnitude in the lowest plot). a Upper plot, tectonic zone 1: Active continent. b Lower plot, tectonic zone 2: Slow-spreading ridges

Then each subcatalog was analyzed with program BetaCorner.f90 (published by Bird and Kagan, 2004) which contours the likelihood surface in 2-D (β, M c ) space to determine both maximum-likelihood estimates and 95%-confidence ranges. In Fig. 3 we display two maps of the likelihood surface for the same tectonic zones as in Fig. 2. For the ‘Active continent’ the corner moment (M c ) estimate is well constrained, whereas for the ‘Slow-spreading ridges’ the upper limit is ∞. The results for all zones are shown in Table 1 (as we mentioned above, see Eq. 1, both the moment threshold and the corner magnitude have been calculated with slightly different coefficients from Table 2).

Fig. 3
figure 3

Likelihood surface for the tapered Gutenberg–Richter model in the 2-D parameter space of asymptotic spectral slope β (abscissa) and corner moment M c or corner magnitude m c (left and right ordinates). Color scale (red high, blue low) and cross for optimal value are used inside the contour which is three natural-log-units below the peak; this area corresponds to the 95%-confidence range. Dashed contours with interval of three natural-log-units are used outside. a Left, tectonic zone 1: Active continent. b Right, tectonic zone 2: Slow-spreading ridges. Note that the 95%-confidence area is just barely closed on the high-magnitude side for the “Active continent”; ranges for the “Slow-spreading ridges” and other zones are open on the high side

5 Branching Model of Earthquake Occurrence

Our technique for producing short-term hazard estimates is to establish a statistical model to fit the catalogue of earthquake times, locations, and seismic moments, and subsequently to base forecasts on this model. While most of the components of the model have been tested (Kagan and Knopoff, 1987; Kagan, 1991; Jackson and Kagan, 1999; Kagan and Jackson, 2000), some require further exploration and may be modified as our research progresses.

The assumptions we make in constructing our initial model have been summarized in Kagan and Knopoff (1987) and Kagan (1991). A similar model called the ETAS model was proposed by Ogata (1988, 1998), as well as by Ogata and Zhuang (2006). In both of these models seismicity is approximated by a Poisson cluster process, in which clusters or sequences of earthquakes are statistically independent, although individual earthquakes in the cluster are triggered events. The clusters are assumed to form a Poisson point process with a constant temporal rate. We assumed that the interrelationships between events within a cluster are closely approximated by a stochastic space-time critical branching process. Under this assumption there is a sole trigger for any given dependent event. As shown below, the space-time distribution of interrelated earthquake sources within a sequence is controlled by simple relations justified by analyzing the available statistical data on seismicity.

Below we describe the statistical distributions for the analysis of the CMT earthquake catalog, with the scalar seismic moment M as the measure of earthquake size. The model can be easily reformulated for the PDE and other catalogs (see Sect. 3), where earthquakes are characterized by a magnitude.

We construct an earthquake intensity function, Λ(t, xM) which indicates the earthquake rate at time t, at location x, with the scalar seismic moment M, given the history of previous seismicity. (For small time/space intervals the probability of occurrence and the occurrence rate are equivalent.) This function is given

$$ \Lambda (t, {\bf x}, M) = \lambda \phi ({\bf x}, M) + \sum_i \psi^{(M_i)}(t-u_i, {\bf x} - {\bf y}_i, M), $$
(3)

where λ (or λ t ) is the rate per time unit of the Poisson occurrence of independent (spontaneous) earthquakes with MM t in the volume (Y); ϕ(x, M) is their space-seismic moment distribution; \(\psi^{(M)} (t-u,{\bf x} - {\bf y}, M)\) is a conditional distribution of succeeding events at time t and coordinates x, if preceding earthquakes have occurred at times u i in the places with coordinates y i .

We subdivide the spatial coordinates x into s × z, where s are surface coordinates, and z is depth. If the duration of the catalog is T, then λT is the number of independent events, and λ T/N is the fraction of independent events (N is the total number of earthquakes in a catalog).

5.1 Earthquake Clusters—Independent Events

The first event in a sequence is usually the largest one and is called a main shock. Other dependent events are called aftershocks, though some of them are actually aftershocks of aftershocks. If the first event in a sequence is smaller than subsequent shocks, it is called a foreshock. Retrospectively, it is relatively easy to subdivide an earthquake catalogue into fore-, main-, and aftershocks. However, in real-time forecasting it is uncertain whether the most recent event registered by a seismographic network is a foreshock or a main shock. Although the subsequent triggered events are likely to be smaller and would thus be called aftershocks, there is a significant chance that some of the subsequent earthquakes may be bigger (Kagan, 1991).

5.2 Dependent Events

Similar to the estimation of the long-term seismic hazard, we assume that the distribution of triggered events within a cluster may be broken down into a product of its marginal distributions, i.e., the conditional rate density of the jth shock dependent on the ith shock (j > i) with seismic moment M i is modelled as

$$ \psi(\Updelta t, \rho, M_j | M_i) = \psi_{\Updelta t}(\Updelta t)\times \psi_\rho(\rho) \times \psi_M(M_i)\times \phi_M(M_j), $$
(4)

where Δtt j  − t i and ρ is the horizontal distance between the ith and the jth centroids (or more correctly epicentroids) (\(\rho = |{\bf x}_j - {\bf x}_i|\)). ψΔt , ψρ, and ψ M are the marginal temporal, spatial, and moment densities, and are detailed below. The total time-dependent rate density is a sum of effects from all previous earthquakes,

$$ \Psi(t_j,{\bf x}_j, M_j)= \sum_{i < j} \psi(\Updelta t, \rho, M_j| M_i). $$
(5)

The function ψ decays rapidly with time and distance (see below). Thus only neighbouring events substantially contribute to the sum (5), although the range of strong earthquakes is considerably longer than that of weak events.

The first three densities ψ(x) in the right part of (4) depend on M i . We take a power-law relation for the probability density of time intervals between earthquakes within a cluster:

$$ \psi_{\Updelta t} (\Updelta t) = \theta t_M^\theta (\Updelta t)^{-1-\theta}, \quad \Updelta t \ge t_M, $$
(6)

which is similar to Omori’s law. The parameter θ is an ‘earthquake memory’ factor, and t M is the coda duration time of an earthquake with seismic moment M i . We assume for the coda duration

$$ t_M = t_r \, M_i^{1/3}, $$
(7)

where t r is the coda duration time, taken here as t r = 0.0035 days (about 5 min), of an earthquake with the reference seismic moment M r = 1015 Nm, corresponding to m r = 4.0 (Kagan, 1991). The probability of the next dependent shock to occur in the time interval (t 1, t 2), for 0 < t 1 < t 2, given an event of a cluster occurring at time 0, can then be calculated simply as

$$ \hbox{Prob}(t_1 < t < t_2) = t_M^\theta (t_1^{-\theta}- t_2^{-\theta}). $$
(8)

The non-normalized function ψ M (M i ) which corresponds to the number of triggered shocks generated on the average by an earthquake with seismic moment M i , is assumed to obey (Kagan, 1991)

$$ \psi_M(M_i)=\mu \left(\frac{M_i}{M_t}\right )^\delta. $$
(9)

To make the distribution parameter δ of triggered events comparable to the b values, in our Tables we use a parameter

$$ a=\delta \times 1.5. $$
(10)

Kagan (1991, p. 142) discusses the δ-parameter and its relation to other measurements of size distribution of triggered events in more detail.

We approximate by a Rayleigh distribution the probability density of the horizontal distance (ρ) between two earthquake epicentroids in a cluster:

$$\psi_\rho(\rho) \ = \ {{\rho} \over {\sigma_\rho^2}} \times \exp \, [-\rho^2/(2 \sigma_\rho^2)]$$
(11)

where σρ is a spatial standard deviation. The standard deviation is assumed to depend on the standard errors of epicentroid determination and on the seismic moment M i of the main event according to the relation (Kagan,1991):

$$ \sigma_\rho^2 = \epsilon^2_\rho + s_r^2(M_i/M_r)^{2/3}. $$
(12)

Here \( \epsilon_p\) is the standard error in epicentroid determination and s r is a characteristic size of a focal zone of an earthquake with the reference seismic moment M r (7). Equations 11 and 12 roughly correspond to spatial response functions used in the ETAS model (Ogata, 1998, Eqs. 2.2–2.4).

6 Likelihood Analysis of Catalogs

6.1 Statistical Analysis Results

Using the branching model of Sect. 5 we can obtain a likelihood function for the earthquake process. The likelihood equations are rather bulky (see Kagan, 1991). We compare the likelihood function for the branching process with a process based on the spatial inhomogeneous Poisson model of seismicity. By searching for the maximum of the likelihood ratio of two models or of the log-likelihood difference, we obtain the statistical estimates of the model parameters.

Two methods can be used to search for the likelihood function maximum; one is the Newton–Raphson optimization technique (needs derivatives of the likelihood function), and the simplex algorithm (Press et al., 1992, p. 402) (no derivatives are needed). The simplex method was applied in Helmstetter et al. (2006). We used a modification of the Newton–Raphson technique in most of our studies (Kagan, 1991; Kagan and Jackson, 2000) and in the present work. We could not apply available standard packages of the Newton–Raphson technique because many of the parameters of our model have either physical constraints (for example, non-negative) or constraints based on prior geophysical results. We also censor the immediate aftershocks from the catalogs (see Eq. 7). This censoring introduces a Heaviside function in the time history of earthquake occurrence, which significantly complicates the calculation of the derivatives. This also explains why we did not apply a triangular window to remove aftershocks in our computations (Kagan, 2004; Helmstetter et al., 2006). It would be even more difficult to obtain the likelihood function derivatives in such a case. (The above formula (7) defines a rectangular-in-time window for the aftershock removal.) In our calculations below we applied the same likelihood-optimization program to two catalog types: one in its original form and the other with all of the early aftershocks which are closer to the previous shock than t M (see Eq. 7) being deleted.

We started our search with the initial model of spatial inhomogeneous, temporally constant Poisson process, and iterated the computations until the changes in the log-likelihood difference were smaller than 10−5. For about 90,000 events (see Table 5) one iteration took less than 30 min, using a FORTRAN program on an Alpha HP 600 Mhz workstation; the total iteration number was usually from 90 to 120.

In Tables 2, 3, 4, 5, 6 and 7 we display some of the likelihood search results. The additional results are shown in our Wiki page http://bemlar.ism.ac.jp/wiki/index.php/Bird%27s_Zones. The Tables show the branching model parameter values for four catalogs, with varying magnitude thresholds, and for different tectonic zones. The last Table 7 analyzes only one tectonic zone (Active continent).

Figure 1 shows that the tectonic zones described in Sect. 2 contain many small regions spaced out at large distances throughout the zone. Most of the parameters which depend on the earthquake location separation (like λ, μ, θ) can be expected to be significantly different for these zones compared to the full catalog (‘All’ column in the Tables). However, such a variation is not observed, most likely because these parameters are defined by a close distance interaction between the earthquakes.

Earthquake number ratios (N/N All) in the tectonic zones for the CMT and the PDE catalogs in Tables 2, 3, 4, 5 and 6 show some differences, that are especially large for the “Fast spreading ridges”—10.8% for the CMT catalog versus 5.4% for m t  = 4.7 PDE catalog. Several factors may influence this variation: the change in the β and the corner moment values (Table 1), various systematic effects in the determination of m b magnitude in the PDE catalog, as well as others.

The b values in Tables 2, 3, 4, 5, 6 and 7 were computed using the Aki/Utsu formula (Utsu, 1999). These values for all the subdivisions of the catalogs are close to 1.0, the β value can be converted to b (see Table 1) as b = 1.5 × β. The only significant deviation from this b universality is the values for oceanic ridges. The increased b value for these earthquakes is caused by a relatively low value of their corner magnitude (Table 1) which biases the estimates (cf. Fig. 2) toward higher slope values.

The parameters s r and \( \epsilon_p\) (see Eq. 12) characterize the spread of dependent earthquakes (usually aftershocks) in a neighborhood of a ‘parent’ event. The estimates of the horizontal location accuracy (\( \epsilon_p\)) are in good agreement with the independent evaluations of these uncertainties (Kagan, 2003): for the CMT catalog the accuracy is considered to be of the order 20 km and for the PDE catalog on average it is close to 10 km. The location uncertainties for the PDE catalog depend on the magnitude threshold, m t . As one should expect for a larger magnitude (m t  = 5.3 in Table 3) the accuracy is higher (from 8 to 12 km) than for the lower magnitude (m t  = 4.7 in Table 5) where \( \epsilon_p\) varies from 12 to 16 km. The lowest \( \epsilon_p\) values in the PDE catalog are for zone 1 (Active continent), because most of seismographic stations are located on continents.

Table 3 Parameters for various subdivisions of PDE catalog, 1968–2007/01/01, m ≥ 5.3.

The vertical error \( \epsilon_h\) in Tables 2, 3, 4, 5, 6 and 7 is not informative, since many shallow events are assigned 10 or 15 km depth in the CMT catalog and 33 km in the PDE catalog. We kept this parameter in our likelihood procedure since it is useful for the analysis of deep and intermediate earthquakes, where accuracy of the depth location is higher. This parameter use is also appropriate for those local catalogs which have a low vertical error.

The s r estimate for the regional catalogs (CIT and ANSS) had to be constrained (Table 7). Most likely it is because the location accuracy of these catalogs significantly varies during the long period of observations, starting at 1932. Table 7 displays four sets of inversions in which the parameter s r was constrained at two different values: 0.3 and 0.15 km. This was done to see whether such a constraint causes significant modifications in the estimates of other parameters. The changes appear to be minor.

The global catalogs are considerably more recent and the variations of their location accuracy are not that extensive (Kagan, 2003). The standard deviation s r values for all the global catalogs (CMT and PDE) are close to 0.30–0.35 km. This would correspond to the size of the focal zone of m4 earthquake close to 1 km, the value coinciding with most estimates by other geophysical measurements (Abercrombie, 1995; Kagan, 2002). The confirmation of the parameters s r and \( \epsilon_p\) values, discussed above, makes our results on the statistical analysis of seismicity patterns more credible.

The comparison of the results for the original catalogs with those obtained for the catalogs with immediate aftershocks removed (see Eq. 7 and Tables 2, 3, 4, 5, 6 and 7 where we mark these catalogs by ‘*/d’ note) shows that the values of most of the parameters are not influenced strongly by this procedure. However, three of the tables entries do change. The information score per event (I/N) changes significantly because these aftershocks coming close after a parent event with the high conditional rate of occurrence, strongly increase the log-likelihood ratio. The ratio is used in selecting and comparing the best models for approximating an earthquake occurrence. Therefore, in any such testing it is important to know how the immediate aftershocks have been processed.

The ratio of the number of independent events to total λT/N increases since with the removal of the immediate aftershocks fewer triggered earthquakes are left in a catalog. The branching coefficient μ increases apparently because the immediate aftershocks are incompletely sampled in a catalog, and therefore they influence the μ estimate making it smaller.

These changes are less evident in the CMT catalog (Table 2). This catalog is based on the interpretation of long-period waves and the coda of these waves interferes with the identification of close aftershocks. For that reason fewer such aftershocks (to be deleted) are present in the catalog.

Figure 4 shows the likelihood search iteration procedure for m ≥ 5 earthquakes in the PDE catalog. It demonstrates that the estimates of parameters λ and μ are negatively correlated. The other parameters look less correlated during the search. In principle, to evaluate parameter uncertainties we need to obtain the Hessian matrix (the second partial derivatives of the likelihood function) of the parameter estimates or at least to plot two-dimensional plots of the parameter estimates similar to what was done in Bird and Kagan (2004) or in Fig. 3. Such plots have an advantage because if the relation is nonlinear or the parameter value needs to be restricted (if for example, it goes into the negative domain or to infinity, etc.) such plots provide us with more accurate information than the second-derivative matrix.

Fig. 4
figure 4

Plot of the log-likelihood function (dotted line), independent event rate λ (dashed line) and the branching coefficient μ (solid line). These variables have been normalized to their final values. All shallow earthquakes with m ≥ 5 in the PDE catalog are processed

To assess the statistical significance of the differences in the parameter estimates across the tectonic regimes, we need a reliable way to estimate the multidimensional confidence regions. For the overdetermined linear problems with the Gaussian data errors, the confidence regions can be estimated simply from the covariance matrix of the parameter estimates and an estimate of the data uncertainty. For mildly nonlinear overdetermined problems, asymptotic methods can yield approximate confidence regions. These methods assume local linearity of the relationship between the model perturbations and data perturbations, and they use the Hessian matrix to estimate a covariance matrix. In a particular instance the validity of the asymptotic assumption depends on the how well the parameters are constrained by observations and prior information, and on the differentiability of the likelihood function (Jackson and Matsu’ura, 1985). For catalogs with many earthquakes over a wide magnitude range and with many isolated earthquake clusters, the asymptotic methods might be applicable to branching models. As we explained above, one problem is that positivity and other hard constraints on the parameters, and censoring of immediate aftershocks, introduce discontinuities preventing differentiation of the likelihood function. We have not yet made a serious study of the confidence regions for the parameters, but indirect evidence suggests that the asymptotic method may not be appropriate, even without aftershock censoring. That evidence comes from (1) the shape of the likelihood contours for two relevant parameters in Fig. 3, which would be elliptical under the assumptions in the asymptotic approach, and (2) the behavior of the likelihood during convergence of the parameters while searching for the optimum parameter estimates (Fig. 4). The likelihood converges relatively slowly, and the tradeoffs between parameters clearly vary through the process. We believe that a trustworthy confidence map may require laborious grid search or Monte Carlo techniques.

6.2 Comparison of Models and Results with ETAS

Our branching model which was first proposed and applied to the central California earthquake record by Kagan and Knopoff (1987), is essentially similar in design to the ETAS model (Ogata, 1988, 1998). Both models use the stochastic branching processes theory to approximate the temporal development of earthquake occurrence. Seismicity is approximated by a time-magnitude-space Poisson cluster process (see Sect. 5), with dependent, cluster events concentrating around an earlier ‘parent’ event (see more detail in Kagan, 2006, Section 8). The main difference between these two models lies in the parameterization of the influence functions of a dependent event and normalization of these functions.

Our Eq. 6 is similar to the time influence function used in the ETAS model (Ogata, 1998, Eq. 1.1), where the number of aftershocks is expressed as the Omori–Utsu (Utsu, 1961) formula

$$ n_i(t)={\frac{1}{(t - t_i + c)^p}}, $$
(13)

where the time t is counted from the occurrence of the previous event t i , p, and c are parameters. The p parameter can be identified with (1 + θ) in (6), whereas the c-parameter is introduced to account for the lack of aftershocks immediately following a strong earthquake (Kagan, 2004). In our model we simply excluded such aftershocks (Eq. 7) from the catalog, in a way that is similar to deleting all earthquakes smaller than the moment threshold, M t (see Eq. 2).

However, many evaluations of the c parameter show that it varies strongly, by orders of magnitude, even for main shocks of similar size. Moreover, the c estimates vary by an order of magnitude even for the same main shock when using different data (catalogs). Kagan (2004) discusses the reasons for this phenomenon: the frequency range of a network, the distance to the closest station, installation of temporary stations following a strong event, workforce availability, etc. This effect may be of little importance in the consideration of individual sequences, but in the statistical analysis of earthquake catalogs we should expect that the parameters of the model have a property of statistical stability. Otherwise, derived parameter values are an average of various nonpertinent factors and as such may not be reproducible. Many researchers interpret the c value as having a physical significance; they should then be able to explain why c estimates are so unstable. This is in contrast to other seismicity parameters, such as the p value of Omori’s law or the b value of the Gutenberg–Richter relation, which hardly (if at all) change (Kagan, 2006).

The other problem is that the coefficient c in the present ETAS implementations (as it is in Eq. 13) does not seem to be dependent on the magnitude of the preceding event. Our Eq. 7 which defines which aftershocks are removed from a catalog due to their incomplete sampling, is scaled with magnitude: for a main shock m4 it is about 5 min, whereas for m8 earthquake the ‘dead-time’ extends to more than 8 h. As the above example suggests, a relative temporal decrease of the aftershock numbers modeled by c should vary strongly with the magnitude of a main shock or a preceding event. It can be expected that the model which requires the same short-time behavior for earthquakes of different size may lead to a significant bias in parameter estimation and seismicity forecast.

In calculation with Eq. 13 it is assumed that immediately after a parent event (t = t i ), the rate is equal to 1/c. However, the coda waves of the previous earthquake may hide any aftershocks to be identified for minutes or hours depending on the event magnitude. Ogata (1983) introduced a coefficient ‘S’ to describe a practical absence of aftershocks in the very initial part of a sequence. His estimate of S is 1/2 day (p.120). Therefore, (13) would cause a bias in the evaluation of the branching coefficient K 0 (see below).

The estimated temporal parameters for the ETAS model (Tables 1–3 in Ogata, 1998; Tables 1–3 in Ogata and Zhuang, 2006) demonstrate that the c value in (13) depends on the magnitude threshold. It is smaller for a smaller m t . This is an additional argument against treating this parameter as having a physical significance.

The p value is usually slightly larger than 1.0 (ibid), although for some models the estimate is less than 1.0—the value that can be rejected on general grounds: since it means that each earthquake has an infinite number of aftershocks. Thus, the p value results are in principle consistent, taking into account that p = 1 + θ, with our Tables 2, 3, 4, 5, 6 and 7.

With regards to the spatial distribution (Ogata, 1998, Eqs. 2.2–2.4) there are two problems in the ETAS model: (1) Two effects, earthquake location errors and the dimension of the aftershock zone need to be separated in the modeling (cf., Eq. 12), since as we discussed above (Sect. 6.1), there are good geophysical estimates for these variables. Thus we can test whether the inversion results are trustworthy. (2) The rupture area is used to characterize the source size in the ETAS model. The area is more difficult to measure and is less known than a simple linear size of the focal zone—especially for earthquakes with a near vertical fault plane due to large vertical location errors for shallow earthquakes. Thus, because of the different parameterization in both models, it is difficult to compare our results for the spatial clustering parameters (s r and \( \epsilon_p\)in Tables  2, 3, 4, 5, 6 and 7) with the corresponding ETAS values.

Our Eq. 9 corresponds to the right-hand part of the ETAS model dependence (Ogata, 1998, Eq. 1.1),

$$ \nu_i(t) = n_i(t) K_0 \times \exp [\alpha (m_i - m_t)], $$
(14)

where K 0 and α are parameters, m i is a magnitude of a ‘parent’ event, and m t is a magnitude threshold. Then our a (see Eq. 10) should be equal to \(\alpha/\log(10) \approx \alpha/2.3\). In the above-mentioned publications the most likely α estimates are between 1.2 and 1.6, that would roughly correspond to our a values in Tables  2, 3, 4, 5, 6 and 7. It is unclear how exactly our branching μ parameter is related to the K 0 parameter, the dimensionality and the normalization of the parameters are different, thus it is difficult to compare their estimates.

Users of the ETAS model do not generally determine the likelihood score for an event. Thus, we cannot compare our information score I/N to the fit of the ETAS model to earthquake occurrence record and we cannot check the model’s performance compared to an inhomogeneous spatial Poisson model.

Helmstetteret al. (2006) performed calculations for the southern California seismicity, using the ANSS catalog and the model they named the ETES model which is largely similar to the ETAS. The parameters’ values obtained by the authors (ibid, their Table 2) are close to those shown in Table 7. The p value is around 1.2. Their \(\alpha \approx 0.4{\text{--}}0.8\) can be directly compared to our \(a\) value, since it is an exponent for 10.0. The focal zone size f d translates about 0.4 km for an m4 earthquake (they fixed the location error at 0.5 km, see their Eq. 19). Their information score \(\log_2 G \approx 3.5\) is also close to our values for a m t = 3 subcatalog.

7 Discussion

What differences in the zones parameters can be seen in Tables 12, 3, 4, 5, 6 and 7? The β value is essentially the same for all five zones. Perhaps the ‘slow-spreading ridges’ may have a statistically significant higher β value, although that could be caused by a mixture of different earthquake populations. Our more exhaustive analysis (Bird and Kagan, 2004) suggests that the hypothesis of the universality for β value (\(\beta \approx 0.63\)) cannot be rejected based on the present data. The results of the corner moment evaluation (Table 1) are also consistent with our previous analysis (ibid). As we discussed above (Sect. 6.1), the differences in the b values for various tectonic zones can be most likely attributed to nonlinear tapering of the magnitude–frequency relation due to the corner magnitude influence and to various biases and systematic effects in magnitude determination for the PDE and local catalogs (Kagan, 2003).

Can we distinguish the differences in the earthquake occurrence patterns and in particular the earthquake clustering in various tectonic zones? The parameters I/N, λT/N,  μ, a, and θ need to be compared to see the difference. The θ estimates exhibit no consistent pattern. For the remaining parameters, in all the Tables 2, 3, 4, 5, 6 and 7 oceanic ridges seem to stand out compared to continental areas. The ridges exhibit significantly less clustering than the continents: the information score I/N is smaller and the ratio of spontaneous events (λT/N) is higher for the oceanic earthquakes.

Two parameters that characterize the degree of branching, i.e., μ and the size distribution of descendent events (a) are also different for these zones, though the dissimilarity is not the same for the CMT (Table 2) and the PDE catalogs (Tables 3, 4, 5, and 6). In the CMT results the ridges a value is similar to the continental (zones 0, 1, and 4) estimates, but the μ values are significantly smaller. The PDE catalog exhibits an opposite pattern: the a values drop for the ridges, whereas the μ estimates essentially stay the same. In the PDE data set the size of smaller earthquakes is characterized by various magnitudes, most often by the m b magnitude, which exhibit many biases and systematic errors (Kagan, 2003). Therefore it is likely that the size distribution of triggered events may be biased in the PDE catalog. Unfortunately, the magnitude range of the CMT catalog is too small to see the differences in the branching rate and earthquake clustering due to the variation of the moment threshold. Among continental zones, subduction earthquakes in the CMT catalog exhibit the strongest clustering pattern, however this feature is not prominent in the PDE catalog.

Table 4 Parameters for various subdivisions of PDE catalog, 1968–2007/01/01, m ≥ 5.0.
Table 5 Parameters for various subdivisions of PDE catalog, 1968–2007/01/01, m ≥ 4.7.
Table 6 Parameters for various subdivisions of PDE catalog, 1968–2007/01/01, close aftershock removed, m ≥ 5.0.

The variations in maximum-likelihood estimates of branching parameters across tectonic zones displayed in Tables 2, 3, 4, 5, 6 and 7 result from a complex interplay of physical variability in the Earth, varying deficiencies of the seismic catalogs, and instability/tradeoff in the maximum-likelihood estimation process. One view is that it is not important to separate these factors, because the branching parameters are expected to have practical value in improved forecasts which can be tested by their performance. However, it is of great interest to know whether we can reject the null hypothesis that the shallow earthquake process (including interevent triggering) can be described as uniform, in some carefully qualified sense, around the planet. A major obstacle to answering this question is that our present software does not provide trade-off analysis between the parameters we solve for, nor does it provide formal uncertainty ranges (either for parameters in isolation, or for combinations of linked parameters). This means that we have to use empirical comparisons to put lower bounds on residual uncertainties, by examining how results change with threshold magnitude, with catalog, and with processing method (i.e., raw vs. windowed or censored catalogs, discussed previously). We first examine results from tectonic zone 1 (Active continent) because there we have the widest range of catalogs and threshold magnitudes to compare. Figure 5a combines productivity coefficients μ from global zone-1 subcatalogs of CMT (m t  = 5.55) and PDE (m t  = 5.3, or m t  = 5.0, or m t  = 4.7) and also from regional California-Nevada catalogs CIT and ANSS (m t  = 4.7, 4.0 or 3.0). The abscissa chosen for this comparison is the threshold magnitude assumed (and used to truncate the catalog).

Table 7 Parameters for California/Nevada catalogs.
Fig. 5
figure 5

Productivity coefficients and fraction of independent earthquakes for tectonic zone 1: Active continent. See text for details of catalogs, and tables for sources of μ and λT/N. a Upper plot, branching parameter μ. b Lower plot, fraction of spontaneous earthquakes λT/N

We see a strong tendency for apparent productivity coefficient μ to increase with decreasing threshold magnitude, even though existing theory suggests no reason why this should occur. Until we understand this better, it will be important to use equal threshold magnitude in comparing inferred branching parameters across tectonic zones. Another possible signal in Fig. 5a is a possible small offset (by ∼0.04) between the trends for global zone-1 subcatalogs and the two California/Nevada catalogs. However, we cannot assign any confidence to this possible difference because our present analysis does not provide confidence ranges for each. A different measure of branching is given by the fraction of independent events (λT/N of Tables  3, 4, 5, and 6). This measure is (negatively) correlated with productivity coefficient μ, but is also influenced by parent productivity exponent a (which we estimate consistently as ∼0.6 in tectonic zone (1)) and by the earthquake geometry and time relations in the catalog. In Fig. 5b, we plot this measure for tectonic zone 1, and see that it also varies consistently with threshold magnitude (either real or assumed) with which we truncate these catalogs. In the results for this measure of branching, the possible offset between global and local catalogs is less clear, as results from raw PDE align with those from raw CIT.

Tectonic zone 4 (Trenches) produces the majority (∼65%) of Earth’s moderate and large shallow earthquakes, therefore branching parameters for zone 4 are not very different from global values. In Fig. 6a and 6b we display various inferred values of productivity coefficients and independence estimates λT/N for tectonic zone 4, obtained from the CMT catalog (m t  = 5.55) and the PDE catalog (m t  = 5.3, or m t  = 5.0, or m t  = 4.7), and also display two alternative results computed with windowing of the early aftershock times. Figure 6a shows that windowing has a large effect on inferred μ from the PDE catalog, suggesting that PDE is missing numerous early aftershocks. On the other hand, the estimate from CMT is hardly affected by windowing. As we discussed in Sect. 6.1, because the CMT technique uses long-period waves, fewer such early aftershocks are present in the catalog. Using windowing, μ for tectonic zone 4 is fairly consistent at 0.17–0.18 between these two catalogs. Figure 6b shows that mean independence is easier to measure.

Fig. 6
figure 6

Productivity coefficients and fraction of independent earthquakes for tectonic zone 4: trenches. See text for details of catalogs, and tables for sources of μ and λT/N. a Upper plot, branching parameter μ. b Lower plot, fraction of spontaneous earthquakes λT/N

Results for tectonic zone 0 (Plate-interior) suggest higher productivity coefficients μ (0.10–0.24) and lower parent productivity exponents a (0.27–0.45) than in the more active tectonic zones 1 and 4. These two anomalies tend to offset each other, so that the fraction of independent events λT/N is in a range (0.68–0.89) that overlaps those of zones 1 and 4. This indicates the need for a multi-dimensional trade-off analysis of the acceptable ranges of triggering parameters before reaching any firm conclusions regarding potential differences between zones. Another potential problem is that plate-interior events more often participate in cross-zone triggering to/from adjacent plate-boundary zones, although such cross-zone triggering is not considered when we analyze this subcatalog in isolation.

Tectonic zones 2 and 3 (Slow-spreading and Fast-spreading ridges) may also have productivity coefficient μ values (0.03–0.23) that have been biased by some failures to resolve parent productivity exponent a. Parameter a has a very low values (0–0.24) inferred from PDE, but the values inferred from CMT (0.52 and 0.59) are more consistent with those of other tectonic zones. Because we cannot trust the lower values of a which we obtain, we also cannot trust associated values of μ for these two tectonic zones; μ will be biased upward whenever a is too low. Therefore, in Fig. 7a and b we plot only the fraction of independent event (λT/N) estimates, which we believe to be less affected by these problems.

Fig. 7
figure 7

Fraction of independent earthquakes in oceanic ridges. See text for details of catalogs, and tables for sources of λT/N. a Upper plot, Slow-spreading ridges. b Lower plot, Fast-spreading ridges

Both Fig. 7a (zone 2: Slow-spreading ridges) and b (zone 3: Fast-spreading ridges) show the apparent fraction of dependent events (1 − λT/N) to be lower than in other tectonic zones, nonetheless increasing as threshold magnitude is decreased. Based on an analogy to our Fig. 5b, we must consider that additional increases beyond the last values seen (0.25 and 0.19 for (1 − λT/N) of zones 2 and 3 in windowed PDE with m t  = 4.7) may occur with even larger subcatalogs that will presumably be obtained from hydrophones and/or ocean-bottom seismometers. At most, the difference in mean fraction of dependent events (1 − λT/N) between spreading-ridge tectonic zones 2 and 3 and zone 4 (Trenches) seems to be a factor-of-two, based on our results.

A series of studies (Boettcher and Jordan, 2004; McGuire et al., 2005) of seismicity of oceanic transform faults (OTFs) has lead to general conclusions that OTFs have more foreshocks than continental transform faults (CTFs), although far fewer aftershocks. The seismicity of our tectonic zones 2 and 3 is clearly dominated by OTF earthquakes (relative to smaller normal-faulting earthquakes on spreading centers), and therefore we expected to see a markedly higher fraction of independent events in these two zones. Instead, we found that this difference is modest, and comparable to differences that appear as assumed threshold magnitude varies. It will be important to resolve this. Boettcher and Jordan (2004) may have exaggerated the contrast between OTFs and CTFs by comparing aftershocks in a teleseismic catalog of OTFs with those in high-quality local catalogs of CTFs. (Compare our Fig.5a, in which a fictitious factor-of-two or greater variation in apparent productivity coefficients μ could be obtained by comparing Global CMT to local CIT catalogs in tectonic zone 1.) The study of McGuire et al. (2005) used data from a local hydrophone array on the East Pacific Rise; while this may have lowered the threshold magnitude, it does not approach the global and 26–40-year scope of our analyses of tectonic zones. The prevalence of foreshocks and the resulting enhanced predictability of OTF earthquakes appears robust (McGuire, 2008), however the balance of foreshocks versus aftershocks was not addressed by our study; therefore we only detect the net anomaly in foreshocks plus aftershocks.

In summary, we have observed interesting potential variations in branching parameters between tectonic zones, such as a higher fraction of independent events in the spreading-ridge tectonic zones. Nevertheless we have not been able to document any variations in branching behavior among the tectonic zones with high confidence. At present the variations in apparent branching obtained at different assumed threshold magnitudes (and/or with different catalogs, and/or with different methods of processing) are comparable in size to the effects we seek. More precise and confident discrimination will require some advance in at least one of three areas: (1) Extensions to maximum-likelihood fitting software to provide multi-dimensional confidence bounds on parameters and combinations of related parameters; (2) comparisons of apparent branching parameters obtained from different analysis codes; and/or (3) propagation of apparent differences into earthquake forecasts of global scope which can be scored against future seismicity, to determine whether a forecast that discriminates between zones will outperform one that treats all zones as equivalent.