1 Introduction

In seismic risk assessments, the vulnerability of buildings provides a relationship between the loss caused by earthquakes and a measure of the ground motion intensity. Loss is generally expressed as a Damage Index (DI) which is a ratio of repair to replacement cost for a building population of a given type. The ground motion intensity measure is a measure of ground shaking severity at a site where buildings are located (Rossetto et al. 2014). Ground motion intensity is generally represented by peak ground acceleration, spectral displacement or based on macroseismic intensity scales such as the Modified Mercalli Intensity Scale (Wood and Neumann 1931), European Macroseismic Scale (Grünthal 1998), and others.

The vulnerability of a building class can be assessed directly by developing vulnerability curves, i.e. continuous ground motion intensity-to-loss functions for the studied building class, either empirically by post-earthquake loss and ground motion intensity data or by using expert elicitation.

With regard to the empirical vulnerability assessment, loss can be expressed in the form of repair and replacement costs for damaged buildings or insurance claims and the policy cover. The empirical vulnerability curves are considered to be the best option as they are based on real data (Jaiswal et al. 2013). However, there are also significant uncertainties in this approach which are associated with data quality, and estimation of ground-shaking intensity (Rossetto et al. 2014). Further, this option is necessarily restricted to building types for which adequate loss data are available (Edwards et al. 2004).

The expert judgment-based approach is generally utilised in a workshop environment where a group of experts build a consensus on building vulnerability based on their past experience. This approach was first introduced in ATC-13 (ATC 1985) in which expert earthquake engineers were asked to provide their judgment on the building vulnerability found in California along with their confidence level for selected building types. More recently, Cooke’s (1991) elicitation process has been adopted in the 2015 UN global risk assessment report (UNISDR 2015) for which vulnerability functions were developed for the Asia–Pacific region (Maqsood et al. 2014). In general, this approach is associated with significant uncertainties arising from the selection of experts and their experience and also from the weighting schemas used to combine the judgments from various experts. Nevertheless, this approach remains valid in the absence of statistically significant post-earthquake data and analytical studies (Jaiswal et al. 2013).

Vulnerability curves for a building class can also be obtained indirectly by coupling the fragility of the studies class (i.e. ground motion intensity-to-damage function) with an appropriate damage-to-loss function. In this case, the fragility of a building class can be assessed analytically, empirically, using expert judgement or a combination of at least two of the aforementioned approaches.  The analytical approach results in the construction of fragility curves, which express the probability that the damage sustained by a building for a given intensity level will reach or exceed a given state. This approach utilises software applications to analyse building response to earthquakes by using representative building models, a characterisation of seismic hazard and the selection of nonlinear analysis type, damage model, and damage threshold criteria (Calvi et al. 2006). The reliability of the analytical approach is significantly affected by the uncertainties associated with the various input parameters mentioned above. However, the models developed through the analytical approach are not considered to be region-specific but can be applied globally if sufficient sensitivity analysis and calibration is carried out (D’Ayala and Meslem 2012). With regard to the empirical fragility assessment, post-earthquake damage data have been used for the construction of fragility curves (Calvi et al. 2006; Rossetto et al. 2013) or damage probability matrices, which express in a discrete form the probability of a building sustaining a given damage state for a given ground motion intensity level (Whitman et al. 1973; Gulkan et al. 1992; Giovinazzi and Lagomarsino 2004) and vulnerability/fragility curves (Rossetto and Elnashai 2003; Rota et al. 2008). With regard to expert elicitation, to within a Global Earthquake Model Foundation project (GEM), Cooke’s (1991) elicitation process was applied in soliciting expert judgment on collapse fragility for selected building types (Jaiswal et al. 2014).

Australia has a relatively low seismicity and has not experienced frequent damaging earthquakes (Dhu and Jones 2002). Therefore, there has been little data available to assess the seismic vulnerability of the Australian building stock, and hence, only a few Australian studies have been conducted in the past. Some of the studies conducted after the Newcastle earthquake are Walker (1991), Page (1991), Blackie (1991), and Gohil et al. (1991). More recently, the Canterbury earthquake sequence from 2010 to 2012 in New Zealand provided opportunities to document building performance and assess factors which affect building vulnerability. As the building typologies and construction practices in New Zealand have similarities to the ones in Australia, the lessons learnt during these events are mostly applicable in Australia (Griffith et al. 2013; Turner et al. 2012; Griffith et al. 2010; Russell and Ingham 2008). Several studies have been conducted after these events which record building performance and failure mechanisms for typical URM and retrofitted structures (Cattari et al. 2015; Dizhur et al. 2010, 2015; Moon et al. 2012, 2014 Ingham et al. 2012; Turner et al. 2012; Griffith et al. 2010). A few studies also researched the performance of timber frame structures during the Canterbury earthquakes (Dizhur et al. 2013; Ingham et al. 2011). All these above-mentioned studies primarily focused on documenting the observed damage, the factors contributing to the damage and the building performance during the earthquakes. However, they did not aim to develop vulnerability curves for use in seismic risk assessment. Moreover, there is a national programme in New Zealand to upgrade older earthquake-prone structures to achieve a greater compliance to the current building code (Russell and Ingham 2010). This has significantly reduced the damage to retrofitted buildings during the Canterbury earthquakes (Ingham et al. 2012). This type of initiative has not been taken in Australia despite having a greater likelihood of damage if an earthquake similar to the 2011 Christchurch main event struck in Adelaide (Griffith et al. 2013).

One effort to develop vulnerability curves for buildings in Australia was carried out by Edwards et al. 2004 where a limited data set was used. Later, a study by Lumantarna et al. 2006 was conducted on URM wall specimens to develop fragility curves for URM based on experiments. In the light of low seismicity and the scarcity of building damage data, this study aims to utilise the best available information in Australia which has been collected during the last two major earthquakes (1989 Newcastle earthquake and 2010 Kalgoorlie).

This study uses a significantly large loss database (14,000 insurance claims in Newcastle as a result of 1989 earthquake and 400 surveyed buildings in Kalgoorlie following the 2010 earthquake) and follows the GEM empirical vulnerability assessment guidelines developed by Rossetto et al. 2014 within a Global Earthquake Model Foundation project (GEM 2015) to develop empirical vulnerability functions for URM and timber frame structures. Further, the two building classes are subdivided into two age categories, i.e. pre- and post-1945, to distinguish the vulnerability of the older legacy building stock to relatively newer buildings. The steps involved in developing the vulnerability functions are as follows: preparing a loss database, selecting an appropriate intensity measure, selecting and applying a suitable statistical approach to develop vulnerability curves, and identifying the optimum curves based on goodness-of-fit tests. The developed curves are the first publically available curves based on Australian building data. These curves can be applied in seismic risk assessment studies in Australia which involve URM and timber structures. The calculated risk can inform appropriate mitigation strategies development.

2 Definition of loss and intensity measure

Australia’s low seismicity is due to its geographical location towards the centre of the Indo-Australian Tectonic Plate. Australian earthquakes are termed as intraplate because of their distance from active tectonic plate boundaries. Australian seismicity was considered to be small enough to be largely ignored in building design prior to the 1989 Newcastle earthquake due to limited experience with major damaging events. However, the Newcastle earthquake prompted a re-examination of earthquake hazard in the region and its significance for infrastructure design (Dhu and Jones 2002). Table 1 presents a list of major earthquakes from 1950 to 2010 which resulted in building damage, with the Newcastle and Kalgoorlie events causing the most earthquake-related loss in Australia to date. More details of earthquake history in Australia can be found in Dhu and Jones (2002).

Table 1 Major damaging earthquakes in Australia from 1950 to 2015 (adapted from Dhu and Jones 2002)

2.1 Newcastle earthquake

On 28 December 1989, a magnitude M L  = 5.6 earthquake occurred in Newcastle which caused extensive damage and the loss of 13 lives (Dhu and Jones 2002). Due to a lack of strong motion recordings, the seismic intensity available for this event is expressed in terms of the MMI scale. Rynn et al. (1992) produced a local intensity map for the Newcastle and Lake Macquarie area with MMI ranging from VI to VIII. For this study, each suburb in the study area is assigned an MMI value from the intensity map prepared by Rynn et al. 1992. An averaged intensity is assigned where a suburb has two or more isoseismal contours according to the intensity map and number of claims within the suburb. Figure 1 shows the MMI values for each suburb within the study area.

Fig. 1
figure 1

Study area and intensity map of the 1989 Newcastle earthquake

Insurance claims settled by the Insurance Australia Group (IAG) were obtained from the Newcastle City Council to estimate the cost of damage to buildings due to this seismic event. There are approximately 14,000 insurance claims in total for building damage including contents. Each claim includes the suburb, the value insured, the payout and whether the claim concerns a brick building, a timber building or contents. However, for this study, the claims for contents are excluded with a focus on building loss only to derive vulnerability functions for the building structure.

For the study region, the insurance data include total building claims of approximately $86 million (1989 US dollars) and a total insured value for buildings of $8981 million (1989 US dollars). However, these data represent an incomplete sample of building loss as it does not include the damaged buildings for which claims were made but to other insurers. Furthermore, there is uncertainty regarding the percentage of buildings in the study region which were insured by IAG but did not claim, as well as the level of underinsurance and the deductible (excess) applied to each claim.

To address these issues and make optimal use of the loss database, the authors consulted the IAG. The consultation involved the estimation of the claim rates for URM and timber structures for each intensity level as well as the evaluation of the underinsurance factor and the typical deductible value. Demand surge (or post-event inflation), which could have distorted the claims, is believed to have been minor given that the 1989 Newcastle Earthquake occurred at a time of softening demand in the building industry. For this reason, the demand surge is neglected in the analysis.

It should be noted that the insurance claim data do not provide the street addresses for each claim. Thus, the claims are aggregated at the suburb level (114 suburbs). Only the suburbs having 20 or more claims are included in the further research. By using the outcomes of an exposure survey of more than 6000 properties conducted by Geoscience Australia in Newcastle in 1999, an indicative age (pre-1945 and post-1945) is attributed to each suburb to differentiate the older building stock from the relatively newer one. The year 1945 was not a pivotal year in building regulation or enforcement but is chosen as a demarcation line between the two vintages. The older building stock (pre-1945) is considered to have deteriorated more with time (e.g. corrosion of ties and degradation of mortar) and been constructed with poorer building practices with limited building controls to monitor the construction quality. The post-1945 building stock is relatively newer built with better construction practices, materials, and quality control.

For each suburb, the claims are subsampled based on the construction material (i.e. brick or timber) and age category (i.e. pre-1945 and post-1945). The number of buildings and total cover in the suburb is then expanded to a notional portfolio by using an agreed claim rate for each of the four categories and intensity levels. Then, adjustments are made for underinsurance and deductibles. In the final step, the DI is calculated as the ratio of adjusted claim to adjusted cover for the building type in each suburb. Figure 2 presents the loss distribution due to the 1989 Newcastle earthquake for the four building categories in terms of DI for each suburb within the study area.

Fig. 2
figure 2

Loss distribution in the 1989 Newcastle earthquake

2.2 Kalgoorie earthquake

On the 20 April 2010, a magnitude M L  = 5.0 earthquake shook Kalgoorlie-Boulder and neighbouring areas in western Australia. The resultant ground motion was found to vary markedly across the town due to the shallow focus of the event (Edwards et al. 2010). Figure 3 shows the locations of surveyed building and the MMI values within the study area which were derived from interviews with residents. The estimated MMI in Kalgoorlie and Boulder were V and VI, respectively.

Fig. 3
figure 3

Location and intensity map of the 2010 Kalgoorlie earthquake

Geoscience Australia conducted an initial reconnaissance and captured street-view imagery of 12,000 buildings within Kalgoorlie by using a vehicle-mounted camera system. The subsequent foot survey collected detailed information from nearly 400 URM structures in Kalgoorlie and Boulder. The survey template consisted of 290 data fields to characterise the surveyed buildings and the severity and extent of earthquake damage. The survey included parameters such as address, building usage, built year, wall material, roof material, number of storeys, level, and type of damage. The shaking caused widespread damage to pre-World War I unreinforced masonry buildings. More modern masonry buildings also experienced some damage in the vicinity of Boulder.

In Kalgoorlie, damage to brick veneer structure was observed to be minor. Timber clad framed structures were not surveyed, but anecdotal discussions with owners indicated that no discernible damage was sustained by this type of structure other than to masonry components such as chimneys (Edwards et al. 2010).

The DI for each surveyed building is calculated by firstly recording damage to different building elements and assigning a damage state in terms of none, slight, moderate, extensive, and complete to match the HAZUS damage states (FEMA 2003). Secondly, a percentage damage is assigned to each element, and lastly, the percentage loss for each building is determined as the sum over all building elements: (% of building cost contributed by the element) × (% damage) × (% of element so damaged). The Kalgoorlie data set provides estimate of average DI for older URM (pre-1945) at two macroseismic intensities (MMI V and MMI VI) and for post-1945 URM at a single intensity (MMI VI).

3 Building classes

The buildings in the database are classified according to their primary structural system (URM and timber frame) and the year of construction (pre- and post-1945). A brief description of the four building classes and their structural performance during the two events are provided below.

3.1 Unreinforced masonry structures

URM structures can be found in all parts of Australia. This type of structure was the most common building type in Newcastle until the 1960s after which its usage declined sharply in Newcastle and the rest of eastern Australia (Dhu and Jones 2002). However, it is still used as the primary residential construction form in western parts of the country. URM structures are typically one to three storeys high and used for a wide range of building purposes including residential, commercial, government, and administration (Walker 1991). Figure 4 presents photographs of a typical old and newer URM structure.

Fig. 4
figure 4

Example of unreinforced masonry (URM) structures. a An example of pre-1945 brick commercial building. b An example of post-1945 brick residential building

URM structures can perform poorly during earthquakes if not well designed and constructed according to good building standards (Maqsood and Schwarz 2008; Russell and Ingham 2008). During the Newcastle and Kalgoorlie earthquakes, older masonry (pre1945) performed poorly and most of the damage (structural and non-structural) occurred in this type of structure. The most common factors which contributed to damage were identified to be bad quality of workmanship, lack of supervision, use of unsuitable materials, general building deterioration, poor building layout, excessive diaphragm deflection, poor design, and poor detailing of components (Page 1991; Blackie 1991; Page 2002). Another common deficiency in this type of structure was the lack of effective ties between the two leaves of double-brick cavity wall construction (Page 1991; Pedersen 1991; Gohil et al. 1991; Melchers 1990). This deficiency may be a result of corrosion or simply the lack of or incorrect placement of ties (Dhu and Jones 2002). Poorly graded sand was commonly used in brick mortar, resulting in a harsh mix requiring plasticisers to improve its workability. The excessive use of these additives results in low bond strength of mortar which contributes to structural weakness (Page 1991; Pedersen 1991). These deficiencies commonly manifest themselves in the failure of parapets, gable roof ends, corners, chimneys, and the out-of-plane failure of walls (Edwards et al. 2010; Page 1991). Similar damage mechanism and failure modes have also been observed in New Zealand for older URM structures during the Canterbury earthquakes (Moon et al. 2014; Senaldi et al. 2014; Ingham et al. 2011; Dizhur et al. 2010).

Compared to older buildings, better performance has been observed in newer construction (post-1945). Damage to these buildings during both seismic events was considered to be mostly non-structural. Internal damage in the form of minor wall cracking and cornice damage associated with relative movement between the roof and the internal wall was observed (Edwards et al. 2010). This demonstrates that URM buildings are capable of resisting a moderate level of earthquake shaking (Walker 1991). Table 2 provides an overview of the characteristics of pre- and post-1945 URM structures.

Table 2 Typical characteristics of pre- and post-1945 unreinforced masonry buildings

3.2 Timber frame structures

In Australia, timber frame housing is typically clad with brick veneer, timber, and sometimes fibreboard cladding (Dhu and Jones 2002). Light timber frame buildings in northern Australia are supported on both short (low-set) and tall (high-set) piers. The latter is often poorly braced and can exhibit a soft storey failure mechanism during ground shaking. Brick veneer clad buildings, which have a light timber frame as the load-bearing system, can easily be confused with unreinforced masonry buildings. These are more common for residential construction since the 1960s in eastern and southern Australia though URM construction is still common in western Australia. Veneers are non-structural elements that rely on wall ties to support timber frame for its out-of-plane stability. Although these are non-load-bearing elements, their seismic performance is important to consider due to its widespread use and high cost of repair (Page 2002). Figure 5 presents photographs of typical old and newer timber residential structure.

Fig. 5
figure 5

Examples of timber structure. a An example of pre-1945 timber frame building. b An example of post-1945 timber frame building (brick veneer)

Timber frame buildings have traditionally performed well in earthquakes, although non-structural damage has widely been observed. Brick veneer cladding, chimneys, plasterboard linings, and cornices are commonly damaged by earthquake shaking. Serious structural damage can also occur in the foundations, particularly where brick pier or soft storey-type foundations are used or where there is a lack of continuity in the structural system (Dhu and Jones 2002).

Timber frame buildings suffered non-structural damage during the 1989 Newcastle earthquake. However, little difference was noted in the severity of damage for timber structures of different construction age due to the inherent resilience of this form of construction to earthquake. The improvements which have been made recently relate to the performance of wall ties and reducing the mass of clay brick by introducing hollow cores and reducing the size of brick (Dizhur et al. 2013). The only factors observed to contribute to damage in older timber structures in Newcastle were corrosion of fasteners and termite problems. Although the Kalgoorlie survey did not focus on timber frame buildings, no significant seismic damage was observed as these buildings resisted the moderate level of earthquake shaking well (Edwards et al. 2010).

Table 3 provides a description of light frame timber structure with an overview of the typical characteristics of pre- and post-1945 structures.

Table 3 Typical characteristics of pre- and post-1945 timber buildings

4 Direct vulnerability assessment methodology

After preparing the loss database, the framework of the direct empirical vulnerability assessment consists of four steps (see Rossetto et al. 2014), depicted in Fig. 6. Firstly, a statistical model is developed based on the exploratory analysis. Then, the model is fitted to the loss data, its goodness of fit is assessed, and finally, for the best-fitted model, the 90 % prediction intervals are constructed by bootstrap analysis. The proposed procedure is based on the assumptions that the loss data are of high quality, and the measurement error of the explanatory variables (i.e. the intensity measure levels, construction material, and year of construction) is negligible. Such assumptions are common in the vulnerability literature.

Fig. 6
figure 6

Direct vulnerability assessment framework (Rossetto et al. 2014)

4.1 Exploratory analysis

A single database is produced by merging the two data sets (i.e. the 1989 Newcastle and the 2010 Kalgoorlie database). Inherent in this is the assumption is that the data in the 1989 Newcastle database would be reproduced if the sampling technique used to collect the 2010 Kalgoorlie data is adopted. This is a common assumption in studies focused on empirical fragility assessment using multiple databases (Rossetto and Elnashai 2003; Rota et al. 2008). The single database included a total of 109 data points, which included information for four variables, namely loss, intensity measure, construction material, and year of construction, summarised in Table 4.

Table 4 Characteristics of the four variables found in the post-earthquake database

This study aims to construct a statistical model, which best fits the available data. Such a model should be able to capture the relationships among the four variables. (What is the relationship between one of the four aforementioned variables against the others?) Figure 7 shows a matrix of plots of one variable against the other three, aiming to assist in the construction of a statistical model which fits well the available database. It can be noted that most data points are concentrated at MMI VII.

Fig. 7
figure 7

Matrix of plots for the four variables, namely loss, IM, material, and year

4.2 Selection of statistical model

In general, a statistical model consists of a random and a systematic component. The random component defines the probability distribution of the response variable (i.e. a loss measure). Then, the parameters of that probability distribution are linked to a systematic component which is typically a function of explanatory variables (e.g. intensity measure, construction material). In the framework of direct empirical vulnerability assessment, the systematic component is used to control the relationship of the vulnerability curve to the explanatory variables. This curve is a continuous function which relates the mean loss measure with the intensity measure, and in this study, its shape is also influenced by two additional explanatory variables, i.e. the construction material and the year of construction.

4.2.1 Selection of random component

The identification of suitable random components depends on the properties of the response variable. In this study, economic loss, L, is expressed in terms of DI. This loss measure is a continuous variable that is bounded in the unit interval (0, 1), and given the remarks in the exploratory analysis, for the purposes of this work the loss is assumed to follow a beta distribution \((L\sim\beta (\mu ,\varphi ))\). In order to link the loss with given observed values of explanatory variables, the beta distribution of the loss is first parameterised in terms of its mean μ and its precision φ (Ferrari and Cribari-Neto 2004), i.e. it is assumed that the probability density function, expected value, and variance of L given μ and φ are:

$$\begin{array}{*{20}l} {f(l;\mu ,\varphi ) = \frac{\varGamma \left( \varphi \right)}{{\varGamma \left( {\mu \varphi } \right)\varGamma \left( {\left( {1 - \mu } \right)\varphi } \right)}}l_{i}^{\mu \varphi - 1} \left( {1 - l} \right)^{{\left( {1 - \mu } \right)\varphi - 1}} } \hfill & {(0 < l < 1)} \hfill \\ {E\left[ {L;\mu ,\varphi } \right] = \mu } \hfill & {0 < \mu < 1} \hfill \\ {VAR\left[ {L;\mu ,\varphi } \right] = \frac{{\mu \left( {1 - \mu } \right)}}{1 + \varphi }} \hfill & {\varphi > 0} \hfill \\ \end{array}$$
(1)

For fixed μ, it can be noted that the larger the value of the precision φ is, the smaller is the variance of loss. Then, a beta regression model links μ and possibly φ with a systematic component that is a function of a vector of explanatory variables. The explanatory variables that are available for the current analysis are the intensity measure, IM; the construction material, M; and the year of construction, Y.

Then, for N loss observations, \(l_{1} , \ldots , l_{n}\), and corresponding vectors \(\varvec{x}_{1} , \ldots ,\varvec{x}_{n}\) of explanatory variables, we assume that

$$L_{i} \sim\beta \left( {\mu_{i} ,\,\varphi_{i} } \right)$$
(2)

Equation (2) provides a model which allows the dispersion parameter to vary with the observations, which may be helpful given the observations in the exploratory analysis regarding the variability in the scatter of the loss given the explanatory variables.

4.2.2 Definition of systematic component for the mean

The systematic component for the mean is defined as a real-valued linear predictor η 1i which is a linear combination of regression parameters and explanatory variables. Because the mean of the beta distribution takes values on the unit interval and η 1i is typically linked to μ i via a link function g 1 from the real line to the unit interval,

$$\mu_{i} = g_{1}^{ - 1} \left( {\eta_{1i} } \right)$$
(3)

A standard link function that is used in the beta regression literature, and the one that is used in this work, is the logit link:

$$g_{1} \left( {\mu_{i} } \right) = \log \left( {\frac{{\mu_{i} }}{{1 - \mu_{i} }}} \right)$$
(4)

The reason for its widespread use is the direct interpretation it offers to the regression parameters (see Ferrari and Cribari-Neto 2004 for a detailed explanation).

As far as the linear predictor is concerned, the explanatory analysis showed that all three explanatory variables (i.e. IM, M, and Y) appear to influence the loss. For this reason, all three variables should be included in the linear predictor. This yields the question as to whether these variables should be simply added or their interaction should also be taken into account. A plot of the marginal relationships of L with IM, M, and Y is later used to identify the best combinations for the available data (see Fig. 8).

Fig. 8
figure 8

Marginal relationships of L with IM, M, and Y

4.2.3 Definition of the systematic component for the precision

The variable precision φ i can also be considered to be a function of linear predictor η 2i :

$$\varphi_{i} = g_{2}^{ - 1} (\eta_{2i} )$$
(5)

where g 2(·) is the link function, taken here to be the log function:

$$g_{2} \left( {\mu_{i} } \right) = \log \left( {\mu_{i} } \right)$$
(6)

and η 2i is the linear predictor.

4.3 Statistical model fitting technique

The aforementioned statistical models are then fitted to the field data. This involves the estimation of their unknown parameters by maximising the log-likelihood function via the ‘betareg’ package (Cribari-Neto and Zeilesis 2010; Gruen et al. 2012) in ‘R’ (R Core Team 2014) as:

$${\varvec{\uptheta}}^{\text{opt}} = \arg \hbox{max} \left[ {\log \left( {L\left( {\varvec{\uptheta}} \right)} \right)} \right] = \arg \hbox{max} \left[ {\log \left( {\prod\limits_{j = 1}^{N} {\frac{{\varGamma \left( {\varphi_{j} } \right)}}{{\varGamma \left( {\mu_{j} \varphi_{j} } \right)\varGamma \left( {\left( {1 - \mu_{j} } \right)\varphi_{j} } \right)}}l_{j}^{{\mu_{j} \varphi_{j} - 1}} \left( {1 - l_{j} } \right)^{{\left( {1 - \mu_{j} } \right)\varphi_{j} - 1}} } } \right)} \right]$$
(7)

where N is the total number of data points.

It has been shown (Gruen et al. 2012; Kosmidis and Firth 2010) that when maximum likelihood is used, the parameters in the systematic component for the mean are estimated in an almost unbiased way. Nonetheless, the maximum likelihood estimator of the precision parameter usually suffers from significant bias, which in turn causes the underestimation of the estimated standard errors of the beta regression model. This can potentially have a big impact on the reported significance of the explanatory variables. In order to get more realistic estimates of the standard errors of the model parameters, we use the bias reduction method that is supplied in the ‘betareg’ package (Firth 1993; Kosmidis and Firth 2009).

The 90 % point-wise prediction intervals for the vulnerability functions are calculated using the bootstrap procedure in Espinheira et al. (2014).

4.4 Goodness-of-fit assessment

The proposed procedure is based on developing a number of realistic statistical models, which are then fitted to the available data. Which one provides the best fit? To answer this question, the relative as well as the absolute goodness of fit of the proposed models is assessed. The model comparison tools aim to identify the model that provides the best fit compared to the available alternatives. The model checking tools aim to explore whether the modelling assumptions are violated, and in doing so it provides hints towards improving the model.

4.4.1 Model comparison tools

The likelihood ratio test can be used to compare the fit of a complex model relative to that of a simpler, nested model. The nested model results by fixing some of the parameters of the complex model to follow given relationships (e.g. fixing a few regression parameters to zero). Generally, the more complex model will fit the data better given that it has more parameters. This raises the question as to whether the difference between the two models is statistically significant. The likelihood ratio test is used to test the hypothesis that the simpler model fits the data as well as the complex model does. It can be shown that asymptotically, under that hypothesis the difference \(D = - 2\left( {\log (L_{{{\text{simple}}\,{\text{model}}}} ) - \log \left( {L_{{{\text{complex}}\,{\text{model}}}} } \right)} \right)\) follows a Chi-square distribution with degrees of freedom df = df simple model − df complex model. This is used to calculate p values, and in this study, it is considered that the evidence against the hypothesis is significant if the p value is less than 0.05. In this case, the complex model is considered a better fit for the given data.

4.4.2 Model checking tools

The goodness of fit of the model to the given database can be assessed by informal graphical tools. For beta regressions, the adequacy of the assumptions for the random and systematic component can be checked through the behaviour of the residuals, termed ‘standardised weighted residuals 2’ (Espinheira et al. 2008). For example, under the model assumptions, these residuals should be between −3 and 3 with high probability, the scatterplots of the residuals against the observation index number or against the linear predictor should reveal a random scatter around the zero line. For the goodness-of-fit assessment of the models in this study, the latter scatterplots are adopted as well as a half-normal plot of these residuals with simulated envelopes and a plot of the observed loss against the predicted one. The expected behaviour on the two latter scatterplots is that the points line around a 45°.

5 Results and discussion

The vulnerability of selected Australian building types for various intensity measure levels, ranging from MMI VI to VIII, is empirically assessed by fitting statistical models to the total number of data points (i.e. total of 109 data points) from two seismic events.

The first model that is examined (termed ‘Model 1’ in what follows) assumes that the loss for each level of the three explanatory variables (IM, M, and Y) follows a beta distribution (according to Eq. 1). This distribution is characterised by the mean, μ i , and the dispersion, φ. The mean, μ i , is related to the three explanatory variables through a logit link function (see Eq. 4), and the precision parameter is assumed constant, i.e. φ i  = φ. The exploratory analysis showed that all three explanatory variables, i.e. IM, M, and Y, affect the loss. For this reason, they are added in the linear predictor η 1i . To examine the need to add an interaction between the explanatory variables in η 1i, the marginal relationships of the logit of L with IM, M, and Y are plotted in Fig. 8. Figure 8 shows that IM seems to influence the logit of L differently depending on values of M and Y. This indicates that the interaction between IM and M, as well as IM and Y, should also be taken into account, at least initially. Similarly, from the right-most plot in Fig. 8, there appears to be a marked change in the distribution of the logit loss across building types for the two construction periods. For this reason, the interaction between M and Y is also taken into account in the model. Thus, the linear predictor, η 1i, for ‘Model 1’ can be written in the form:

$$\eta_{1} = \theta_{0} + \theta_{1} \cdot IM + \theta_{2} \cdot M + \theta_{3} \cdot Y + \theta_{4} \cdot IM \cdot M + \theta_{5} \cdot IM \cdot Y + \theta_{6} \cdot M \cdot Y$$
(8)

‘Model 1’ is then fitted to the 109 points via the ‘betareg’ package in ‘R’. The absolute goodness of fit of the model is assessed by the four informal graphical tools described in Sect. 4.4.2 and presented in Fig. 9. The points on the scatterplot of the observed versus the predicted losses lie roughly around the 45° line, but with a marked increase in variability as the observed responses increase in value. The apparent heteroscedasticity is also detected on the scatterplot of residuals versus the observation order and versus the linear predictor and can be attributed to the inability of the selected model to fully capture the differences in the variability of loss for the two structure types and years of construction.

Fig. 9
figure 9

Diagnostic plots for ‘Model 1’: a Residuals versus indices of observations, b residuals versus linear predictor, c half-normal plot of residuals, d predicted versus observed values

These issues of ‘Model 1’ can be addressed by relaxing the assumption of the constant precision φ. The updated model, termed ‘Model 2’, considers that the dispersion is a log function of the construction material and the year of construction (see Sect. 4.2.3):

$$\eta_{2} = \theta_{7} + \theta_{8} \cdot M + \theta_{9} \cdot Y$$
(9)

‘Model 2’ is then fitted to the 109 data points. The diagnostic plots in Fig. 10 show no direct evidence against the model assumptions.

Fig. 10
figure 10

Diagnostic plots for ‘Model 2’ a residuals versus indices of observations, b residuals versus linear predictor, c half-normal plot of residuals, d predicted versus observed values

The next question that we consider is whether we can further simplify the mean specification of ‘Model 2’ without compromising its good fit. For answering this question, we examine whether any of the interaction terms can be dropped from the model.

Using log-likelihood ratio tests, the p value from dropping the interaction between M and IM is less than 0.001 (Chi-squared statistic of 21.323 on one degree of freedom), the p value from dropping the interaction between M and Y is 0.992 (Chi-squared statistic of 0.001 on one degree of freedom) and that of dropping the interaction between Y and IM is 0.013 (Chi-squared value of 6.168 on one degree of freedom). For this reason, the interaction of M and Y is removed from ‘Model 2’, giving rise to ‘Model 3’ with linear predictor:

$$\eta_{1} = \theta_{0} + \theta_{1} \cdot IM + \theta_{2} \cdot M + \theta_{3} \cdot Y + \theta_{4} \cdot IM \cdot M + \theta_{5} \cdot IM \cdot Y$$
(10)

Residual analysis and the scatterplot of predicted versus fitted values (not shown here) give that the fit of ‘Model 3’ is of the same quality as that of ‘Model 2’. Table 5 gives the reduced-bias estimates for the parameters of this simpler model and their associated estimated standard errors, z statistics, and Wald test p values (see Sect. 4.3 for justification on the use of bias reduction). Both the residual analysis and the reported significance of the coefficients in Table 5 indicate that ‘Model 3’ provides a good fit to the data.

Table 5 Estimates of the regression parameters of the best-fitted model ‘Model 3’, their standard error, z values and p values obtain for the Wald test

Figure 11 displays the mean vulnerability curves using ‘Model 3’ along with 90 % point-wise predictive bootstrap intervals. It should be noted that the bootstrap analysis involved 9999 iterations. The URM buildings built before 1945 appear to be the most vulnerable, followed by the post-1945 URM buildings. Timber buildings appear to be the least vulnerable, with little difference observed in the vulnerability of timber buildings built before or after 1945. The selected statistical model fits well to the observed damage data collected in the aftermath of the Newcastle and the Kalgoorlie earthquakes, where most of the damage occurred to URM buildings in terms loss of chimneys, gable and parapet failures, and extensive cracking of walls (Page 1991; Blackie 1991). Timber buildings generally suffered slight non-structural damage such as cracks in wall linings and cornices (Edwards et al. 2010). The vulnerability curves developed in this study can be used to predict future losses for the four building classes, not only in Newcastle and Kalgoorlie, but for similar building types Australia wide as construction practices are similar throughout the country. The differentiation of building material (masonry and timber) offers better predictability of losses as the resistance to earthquake of both types is quite different. The severity and nature of damage sustained by these types is also different, thus necessitating the differentiation. Furthermore, the vintage of the building helps to differentiate the more vulnerable older building stock that has been deteriorated, influenced by poor material quality, poor construction practices, and non-conformance with earthquake standards, from the newer ones that have not.

Fig. 11
figure 11

Vulnerability functions and their 90 % prediction intervals (90 % CI) for the four building classes based on the best-fitted model ‘Model 3’ are compared with existing vulnerability curves from New Zealand buildings (Uma, personal communication)

Although the selected model fits the damage data well, the moderate quality of the damage and intensity data raises concerns regarding the reliability of the loss predictions. The moderate quality of the data is characterised by the lack of ground motion intensity measurements, by the use of aggregated data points in the regression, and by attempts to reduce the bias of the largest Newcastle database. The impact of the first two on the shape of the fragility, rather than vulnerability curves, has been studied in the literature (e.g. Ioannou et al. 2015). Thus, the reliability of empirical vulnerability curves could be improved with the improvement of the data quality. This could be achieved by a relatively small sample of buildings capable to capture the variability in the building stock as well as represent the impact of the earthquake to these buildings.

The reliability of the predictions of the vulnerability curves constructed herein could be assessed by using cross-validation procedures with independent post-earthquake data that have not been used in the construction of these curves. In the absence of these data, an effort is made to compare the resultant vulnerability curves with existing curves in the region. Unfortunately, there are no publically available curves in Australia due to a paucity of vulnerability studies; however, a few vulnerability curves have been developed in New Zealand based on observed damages from New Zealand and overseas earthquakes (Dowrick 1991; Dowrick and Rhoades 1993; Dowrick et al. 2001; Dowrick and Rhoades 2002) as well as expert judgement (Uma, personal communication, 2015). These curves include the observations from the 1942 Wairarapa, 1968 Inangahua, and 1987 Edgecumbe earthquakes but do not include the data obtained from the Canterbury earthquakes. In comparison with the functions from New Zealand (see Fig. 11), it is noticed that the New Zealand building types seem to be less vulnerable than the Australian counterparts. This means that the use of the curves constructed herein produces conservative estimates of the loss. It is, however, anticipated that the difference might reduce when the data from the 2010 Canterbury earthquake are used to update the New Zealand curves.

6 Conclusions

This study describes the two most common building types (URM and timber frame structures) in Australia and provides typical characteristics of older legacy buildings (pre-1945) and relatively newer ones constructed after 1945. This study also provides an overview of the building performance in the 1989 Newcastle and the 2010 Kalgoorlie earthquakes along with common failure modes and the factors which contributed to the damage.

This study utilises a large body of empirical data from the two earthquakes mentioned above and develops the first publically available empirical vulnerability curves using the best available Australian data sets. The curves provide mean building population losses and their uncertainties for four Australian building classes. This study adopts the latest research in following a novel methodology presented in the GEM empirical vulnerability assessment guidelines to develop the empirical functions for Australian building types.

From the vulnerability functions developed, it is concluded that the URM structures are more vulnerable than the timber structures. Moreover, the analysis showed that the uncertainty is higher in the loss for structures built before 1945 and, in particular, of URM structures. The functions not only represent the vulnerability of buildings in Newcastle and the Kalgoorlie building stock but more generally can be used to quantify the vulnerability of buildings throughout Australia given the common construction practices used across the country. The curves and associated uncertainties can be improved by using a richer data set when available including data from any future damaging event. The study can also be extended to consider a wider variation in building types and age categories.

Nevertheless, the developed curves can be applied in any seismic risk assessment study in Australia involving low-rise URM and timber structures, provided that the required intensity is within the MMI V to VIII range. Based on the risk studies, retrofit strategies can be developed to reduce the future risk associated with the more vulnerable of these building types. Building on this research and utilising support from the Australian Government, Geoscience Australia is collaborating in a mitigation strategy development project within the Bushfire and Natural Hazards Cooperative Research Centre (BNHCRC 2015) to provide an evidence base for strengthening more vulnerable building types in the existing Australian building stock.