1 Introduction

Crash prediction models (CPMs) are mathematical equations which link road safety performance and crash risk factors. First applications of CPMs appeared in the 1980s (for a review, see e.g. [48, 54, 66]). CPMs express the predicted crash frequency of a road (e.g. road segment or intersection) as a function of explanatory variables. These variables (risk factors) describe exposure to crash risk and other characteristics related to cross section, road design and other road and traffic attributes. The typical model form is:

$$ {N}_i=\exp \left({\beta}_0\right)\cdot {\left( EXP{O}_i\right)}^{\beta_j}\cdot \exp \left({\sum}_{j=2}^n\left({\beta}_j\cdot {x}_j\right)\right) $$
(1)

where

Ni … crash frequency on road i in specific time period

β0 … intercept

EXPOi … exposure on road i in specific time period

βj … regression coefficients

xj … explanatory variables

In order to correctly consider discrete and character of crash frequencies, generalized linear modelling (GLM) methods are typically used. First models used the Poisson regression as a starting point; however, it was found that they cannot handle overdispersion (the variance exceeding the mean), which is typical for crash data [66]. It motivated use of the negative binomial (or Poisson-gamma) models, which assume that the Poisson parameter follows a gamma probability distribution. According to an extensive review by Lord and Mannering [47], the negative binomial (NB) models are the most used in crash-frequency modelling. Given this fact, further text will focus on NB models; for more information on other model types, such as zero-inflated, generalized estimating equations (GEE), generalized additive models (GAM), random-effects, random-parameters, hierarchical/multilevel or neural networks, see e.g. [5, 47, 50].

CPMs analyse and highlight potential safety issues, help to identify potential for safety improvements and estimate their benefits [87]. Over the past decades, building on road infrastructure data, CPMs have become the fundamental scientific tools in quantitative road safety management, forming the foundation of the AASHTO Highway Safety Manual (HSM) or the Australian National Risk Assessment Model (ANRAM). First edition of HSM (2010) became a recognized source of information and methods for science-based decision making, allowing safety to be quantitatively evaluated alongside other transportation performance measures such as traffic operations, environmental impacts, pavement durability or construction costs. The methods in HSM, based on CPMs, provide an opportunity to: (1) improve the reliability of common activities, such as screening a network for sites at which to reduce crashes, and (2) expand analysis to include assessments of new or alternative geometric and operational characteristics [1].

CPMs may be used for various key functions, including network safety screening, development of crash modification factors (CMFs), road safety impact assessments and economic analysis. However, there are gaps between state-of-the-art (what is published by researchers) and state-of-the-practice (what is needed/used by practitioners), which limit the application of CPMs.

This paper will assist road safety practitioners in understanding why and how they might use CPMs to improve road safety. The paper presents a review of how CPMs are developed and applied. Especially, the paper explores challenges of optimising scientific validity and practical applicability. These challenges are discussed in context of opportunities and potential solutions that might assist practitioners in incorporation CPMs into road safety management.

2 Methods

The goal of the review was to critically summarize international experience in the development and application of CPMs for crash frequency estimation, with a focus on practical use by road transport agencies. In this regard, both scientific and practice-oriented literature was retrieved based on the following criteria:

  • Sources:

    • academic: Web of Science and Scopus, including selected references (snowballing)

    • practical: reports of agencies (e.g. Federal Highway Administration, Austroads, NZ Transport Agency)

    • both: ARRB Knowledge Base, TRID database, reports of European institutes, EU project deliverables

  • Keywords: accident prediction model, crash prediction model, safety performance function

  • Language: English

  • Time frame restriction: none

To focus on the typical road settings (the main road network, i.e. motorways/freeways/expressways and national roads), the following specific issues were not considered:

  • Macro/planning-level applications (analysis based on jurisdiction, GDP, or land-use zones in assignment models)

  • Specific CPMs for vulnerable road users, such as pedestrians or bicyclists

  • CPMs for specific road elements (e.g. railway level crossings, bridges, tunnels, etc.)

  • Logistic binary modelling of crash characteristics (e.g. victim gender and age, vehicle age, etc.)

  • Use of CPMs for evaluation of safety effectiveness of safety treatments or programmes (before/after studies)

These CPM applications are important in broader road safety context and may be explored using findings presented in this paper as a starting point.

The retrieved materials were mainly from Europe, Australia, New Zealand and North America. In order to stress the practical focus, the aim was to select the works related to the most frequent applications of CPMs. The final literature selection focused on developing and using CPMs of road segments and intersections from an international perspective of fulfilling important road safety management functions. These include network screening for high-risk road sections, identifying significant crash risk factors, and road safety impact assessment of potential treatment options. The review is structured along the following sections, given by the hierarchical nature of considering, developing and applying CPMs:

  1. 1.

    Data collection, sample size and time period

  2. 2.

    Road network segmentation

  3. 3.

    Selection of explanatory variables

  4. 4.

    Model function and variable forms

  5. 5.

    Model validation

  6. 6.

    Using CPMs in network screening

  7. 7.

    Using CPMs in developing crash modification factors (CMFs)

  8. 8.

    Using CPM tools, e.g. for road safety impact assessment

Previous reviews related to CPMs [5, 47, 54, 86] usually considered some of these steps only, mainly 3 and 4. The presented review fills the gap by compiling information on all eight steps, followed by summarised challenges and opportunities, with available solutions.

3 Review

3.1 CPMs and their uses

CPMs may be used to accomplish various road safety management functions, such as:

  1. 1.

    Exploring and comparing combinations of individual risk factors that make some road locations unsafe

  2. 2.

    Network safety screening, i.e. safety ranking road locations, or identification of hazardous locations

  3. 3.

    Impact assessments, i.e. assessing safety of contemplated (re)constructions or safety treatments

  4. 4.

    Economic analysis of project costs vs. safety benefits

It is to be noted that Task 1 is rather research-oriented; Tasks 2, 3 and 4 represent typical practical tasks undertaken by many road agencies. According to a review of North American practices [62], network screening is the most common application of CPMs. In European project PRACT, cost-benefit analysis was identified as a common use of CPM application [85, 86].

As noted, CPMs may be developed for road segments of a particular road type (e.g. rural undivided highway), for all intersections, for individual intersection types, or any combination of these. CPMs can be developed for all recorded crashes, casualty crashes, or severe crashes only; the approach depends on the purpose of the model. Very broad CPMs may be useful in high-level network screening or highlighting strategic issues. More specific safety management or research objectives will require more specific models. Given the range of potential applications, CPMs have been acknowledged worldwide as recommended tools, on which rational road safety management should be based. However, at the same time, it has been known that prediction modelling is not a simple task [15, 18, 77] and involve various analytical choices, which are often done without explicit justification. This may explain why there are gaps between state-of-the-art and state-of-the-practice; and this may in turn limit the practical use of CPMs. For example, a survey among European road agencies found that 70% of them rarely or never systematically use CPMs in their decision-making [85].

Regarding the selection of research for inclusion in the review, another distinction needs to be made. HSM introduces a set of CPMs (referred to as safety performance functions, SPFs) and crash modification factors (CMFs). Crash prediction in the HSM has two main steps: (1) prediction of a baseline crash rates using SPFs/CPMs for nominal route and intersection conditions, and (2) multiplying the ‘baseline’ models by crash modification factors (CMFs) to capture changes in geometric design and operational characteristics (deviations from nominal conditions). This approach has gained popularity, being incorporated into Interactive Highway Safety Design Model (IHSDM), and recently adopted in the European CPM [86], as well as Australian ANRAM [41] and New Zealand Crash Estimation Compendium [53].

The CPMs/SPFs in the HSM and ISHDM, developed from data in several US states, are not directly transferable to other jurisdictions (inside or outside US). Some studies confirmed good transferability, mainly between US states [7, 74, 84], but others were less successful when applied abroad, for example in Canada, Italy or Korea [42, 63, 64, 69, 88]. Therefore, it is recommended that each country and jurisdiction (e.g. State) develops its own specific CPMs. The present review, written by non-US authors, adopts this perspective.

3.2 Data collection

In theory, to obtain sufficiently representative models, one should randomly sample data from the population of similar road types or intersections. In this regards, given the variance of crash frequencies, several authors recommended minimal sample sizes, such as at least 50 sites [77], 200 crashes [39] or 300 crashes [73]. The HSM [1] advises using a sample of 30–50 locations with a total of at least 100 crashes per year. However, others were critical about the one-size-fits-all approach. For example, Lord [46] provided guidance on necessary sample size based on sample mean, i.e. for example 200 segments in case of average of 5 crashes per segment, or 1000 segments in case of average of 1 crash per segment. (Note that these considerations do not apply in case of network screening, whose goal is to screen the complete network).

In addition, unlike in the case of large USA and Canadian samples, smaller countries are limited in their samples of network and crash data. For example, Turner etal. [77] mentioned, that New Zealand road network size limits the development of models for some segment and site types, e.g. interchanges. This factor also reduces opportunities for disaggregation CPMs into all crash types and severity levels.

Data on crashes, traffic volumes and other relevant road attributes need to be assigned to all the sample sites. Crash data are known for various biases, such as underreporting, location errors, severity misclassification or inaccurate identification of contributory factors. Also, traffic volume data may be prone to errors: typical measure of traffic volume AADT is an average, aggregated for various vehicle types [18]; in addition, location errors also exist, as traffic volumes typically measured at one location are assumed to apply to the entire section, and often to multiple sections. Thus actual variation in traffic flow is difficult to reflect in data.

Choice of time period for crash and AADT data requires another decision. A 1- to 5-year period is usually recommended for safety ranking, with 3-year period being the most frequent [16]. Using longer time periods (beyond 5 years) may cause problems due to changes in conditions, such as substantial increases in traffic volumes or layout changes, over the period. Probably due to these issues there are no specific guidelines for time period choice. An exception was the simulation study of Cheng and Washington [13], which concluded there is little gain in the network screening accuracy when using a period longer than 6 years. Also using several consistency tests, 4 years were found sufficient for developing a CPM in a study by Ambros etal. [2]. Usually a compromise between the need for early analysis of new treatments and the need for accumulating sufficient crashes to permit robust analysis is accepted [18].

Differences between rural and urban settings are also worth mentioning. Traditionally most focus has been given to rural roads (as also evident from CPM reviews [66, 85, 86]). In contrast, modelling urban safety is more challenging, due to higher presence of vulnerable road users and complex environments, including facilities for different road users, mixed land use, or higher density of various intersection types. Detailed crash data is likely to be needed if crash type-specific models are to be developed later on. More road attributes also need to be collected for urban roads, then tested for correlation, autocorrelation, and only then considered in models [50].

Ideal data sources are road agency asset inventories. Unfortunately, these may not be complete or up to date, and a modeller thus needs to combine various data sources. Additional surveys can be also conducted, either in the field (pedestrian counts, signal timing, speeds, etc.), drive-through digital video collection, or via online maps. Recent emergence of big data and open government policies (e.g. open data initiatives such as data.vic.gov.au) have aided these efforts substantially. It is feasible to pull together substantial amounts of road data from publicly available and road agencies’ own sources. Cross-checking of data for the same attributes between different sets also adds to reducing errors and better data quality management.

3.3 Road network segmentation

CPMs are typically developed either for road intersections or segments. In the latter case, segmentation has to be conducted, in order to divide the network into homogeneous segments, i.e. with constant values of explanatory variables. However, in case of multiple variables, this practice can naturally lead to short segments. This may complicate accurate assigning of crashes to individual segments. In addition, crash concentration is heterogeneous and random; many short segments may also have zero crash counts during the selected time period.

For segmentation, some authors set fixed lengths of several hundred meters [12, 14, 26], or used patterns based on tangents and curves [10, 44, 79]. Long segments can lead to forced homogenisation of variables by aggregating continuous variables into categories (e.g. pavement width bands), and this can lead to loss of applicability. In short, segmentation should consider the overall purpose of the modelling exercise. Longer segments (1–5 km) are often used for network screening [27, 57, 65]. Shorter segments are used to develop more meaningful CMFs, or to estimate localised benefits of safety treatments. Variable segment length can be included in the model. HSM assumes length to be a directly proportional to crash frequency, however many published models which include segment length as a variable suggest otherwise (e.g. [79]).

In practice, division of road network into segments is likely to be dictated by structure of national road databanks. For example in the Czech Republic, national traffic census (as the main source of AADT data) does not cover all minor roads; thus process of aggregating segments into longer segments including minor intersections was found feasible [3]. As the segments may be subject to further investigations, their length should be feasible for on-site visits or crash analyses.

3.4 Explanatory variables

Selection of explanatory variables should be guided by previously documented crash and injury risk factor evidence available from research literature. However, in practice it is often dictated simply by data availability. Explanatory variables generally include exposure, transport function, cross section, traffic control; less often variables describing alignment, vehicle types or road user behaviour are used [66]. When actual variables are not available, proxy variables may be used, e.g. abutting land use as a proxy for pedestrian movement counts.

The first step in variable selection involves identifying variables which are correlated with each other. For each such pair the researcher should remove one variable which is less useful to the purpose of the model (e.g. if sealed shoulder provision is strongly correlated with line marking presence, then remove the latter). In order to further identify the statistically significant variables, a stepwise regression approach is typically used. It may be applied either in a forward selection or a backward elimination manner; in both cases selected goodness-of-fit (GOF) measures are used to assess the statistical significance. Common GOF measures include information criteria such as AIC or BIC, while others use for example scaled deviance [22, 77] or proportion of explained systematic variance [2, 45].

Based on a number of explanatory variables (model complexity), CPMs may be simple (exposure-only) or multivariate (fully-specified) [62]. Sawalha and Sayed [71] warned against temptations to build overfit models, i.e. containing too many insignificant variables. In fact, a number of studies found that additional predictors are not as beneficial as expected [59, 70, 82]. One should strive for parsimonious models, i.e. the ones containing as few explanatory variables as possible [66]. Such models enable simple interpretation and understanding, as well as easy updating [2].

A practice-driven approach was adopted in developing New Zealand rural road CPMs [79]. When it was found that the statistically significant variables did not include the parameters that were of most interest to practitioners, two distinct model types were developed. Statistical models are the best-performing models according to goodness-of-fit measures at 95% confidence levels. Practitioner models contain additional variables of interest to safety professionals, at confidence levels of 70% or more.

On the other hand, in case of leaving out an influential explanatory variable due to unavailable data, so called “omitted variable bias” occurs. The bias results in biased parameter estimates that can produce erroneous inferences and crash frequency predictions [47, 50, 51].

Another bias may be caused by spatial correlation, given by the fact that adjacent road segment may share unobserved effects [47]. This bias can be handled by using random-effect models, where the common unobserved effects are assumed to be distributed over the road segments according to some distribution and shared unobserved effects are assumed to be uncorrelated with explanatory variables [47].

3.5 Model function and variable forms

Before carrying out the modelling task, exploratory data analysis should be conducted, in order to detect potential outliers, check the extreme values, potential mistakes, etc.

As previously mentioned, crash data are typically overdispersed. The degree of overdispersion in a negative binomial model is represented by overdispersion parameter that is estimated during modelling along with the regression coefficients of the regression equation. The overdispersion parameter is used to determine the value of a weight factor for use in the empirical Bayes (EB) method. This method combines predicted (modelled) and recorded (observed) crash frequencies, in order to improve reliability of a specific site safety level estimation [32]. Applications of EB methods are described in later sections of the review.

Crash frequency (i.e. response variable) ideally should not involve mixed levels of crash severity and crash types, as it may produce uninterpretable results [18]. It is thus recommended to develop disaggregated CPMs [66]. Alternatively one may use the observed proportion of a given crash type or severity and apply it to the CPM that has been estimated for total crashes [72]. However, this has been found a questionable practice, leading to estimation errors [40]. The current recommendation is estimating separate CPMs by crash types. New Zealand practice is developing models for key (or common) crash types and, if necessary, scaling their predictions to represent total crash frequency, to allow for less common crash types [77]. Some studies [24, 27] used sub-samples (for example stratification based on AADT under/over specific limits) in order to improve model quality. In any case, developing disaggregated CPMs obviously requires larger sample sizes. In terms of severity models are developed by injury severity levels (usually with fatal and serious injury crashes combined), as with the ANRAM models [41]. Alternatively, severity factors (proportions) are applied to models developed for all injury crashes or all crashes (including non-injury) [53].

Regarding function forms of explanatory variables, there is no universal guidance and various are used in the literature. To select the most suitable mathematical forms of explanatory variables, one may use graphical relationships between crash frequency or a road variable (i.e. univariate analysis) [4], or use more complex techniques, such as empirical integral functions and cumulative residuals (CURE) [33]. According to Hauer [35], the model equation may have both multiplicative components (to represent the influence of continuous factors, such as lane width or shoulder type), and additive components (to account for the influence of point hazards, such as driveways or narrow bridges). Despite these recommendations, the typical modelling approach is often simple. The general model form of Eq. (1) is widely adopted.

Exposure is usually modelled in terms of traffic volume, i.e. single AADT value for road segments, or product of major and minor AADTs for road intersections. Function is typically a power form, but some authors considered it jointly with an exponential form (so called Ricker model [68]). Traffic volumes (flows) should be adapted to the specific segment and intersection types. For example, New Zealand CPMs [53] apply either product of flows or conflicting flows, based on the type of intersection, urban/rural settings and speed limits. As discussed, segment length variable is often used where road segments are not of equal length. For intersections, standard approach length is typically used, e.g. 50–100 m, and not modelled as a variable.

Another example is segment length, usually applied as an offset, i.e. with regression coefficient = 1, but often also in a power form [30, 67, 68]. According to Hauer [34], segment length should also be considered when estimating the over-dispersion parameter for the frequency models to be used in the empirical Bayes approach. However, the exact form of the relationship is not definite [9]; in fact, not only length but also other variables may play a role [25].

Creation of a model is undertaken by running relevant statistical regression processes on the sample data. The most common tools for this are statistical software packages such as R, SPSS, SAS or Matlab. Microsoft Excel is not considered appropriate for this task as it lacks many of the necessary statistical features.

In practice, the modelling process is highly iterative. Variables are added, and then removed if shown to add little or nothing to explanation of the response variable. Often data for a given variable is re-categorised to improve its significance if it is borderline. Often borderline or non-significant variables are retained if they add to better understanding of crash problem. Optimisation of the model fit vs. number of variables vs. applicability is gradually achieved. This iterative process can be stopped when little further improvement in the model is achieved with each iteration [10, 25].

3.6 Model validation

The goal of validation is proving whether the developed model is acceptable from both scientific and practical perspectives. It is thus surprising that most of modelling guidelines seem to overlook this step [1, 23, 35, 36, 48, 71, 72, 83].

According to Oh etal. [56], one may distinguish between internal validity and external validity.

  • Interval validity means that CPM findings should be consistent with established knowledge on the subject; CPM should also possess the features of the underlying phenomenon; and finally CPM should agree with fundamental information and knowledge, such as physical mechanics and dynamics involved with crashes [56]. Newly developed CPMs may be compared to previous literature in terms of signs and magnitudes of regression coefficients, or for example their marginal effects [81].

  • External validity (goodness-of-fit) may be evaluated by comparing either models from two independent samples, or a model from a complete sample applied on selected sub-samples that have not been used in the model building (e.g. randomly-chosen 20%). Various goodness-of-fit indicators may be applied; often proportion of systematic variation in the original accident dataset explained by the model (also known as Elvik index) is used [22, 45].

3.7 Using CPMs in network screening

Previous reviews [16, 52] indicated that current state-of-practice is generally behind the state-of-the-art. According to the EB methodology, predicted crash frequency from CPMs should be combined with observed historical crash frequency to obtain the so called “expected average crash frequency with empirical Bayes adjustment” (in short EB estimate). These EB estimates benefit to the practitioner by removing much of the random statistical variation associated with historical crash data, especially at low frequencies [1, 41]. Apart from EB estimates, other safety indicators can be developed for network screening purposes, for example potential for safety improvement (PSI) [61], level of service of safety (LOSS) [43] or scaled difference [8].

In Australia and New Zealand, where low-volume rural roads generate very low numbers of crashes per kilometre per 5 years (or zero), CPMs provide a continuous proxy measure of safety. In Australia the ANRAM model uses EB estimates of severe casualty crashes to remove the random variation in observed crash data at 1–3 km segment level: sites are prioritised simply on the EB estimate [41]. Differences of more than two standard errors between the EB estimate and observed crashes are noted as a possible indicator of non-infrastructure based influences of safety (e.g. localised speeding or drink-driving) [41].

Given the variety of available methods, HSM [1] notes that “using multiple performance measures to evaluate each site may improve the level of confidence in the results.” Hence sites may be ranked for treatment based on several different methods [49, 52, 89]. Those that rank consistently high using several methods are the sites where treatment should be focused.

3.8 Using CPMs in developing crash modification factors

Crash modification factor (CMF) is a multiplicative factor used to compute the expected number of crashes after implementing a given countermeasure or a design change at a location. CMFs may be derived from before-after or cross-sectional studies; however, each method has its own challenges, and available CMFs can often be highly inconsistent between literature sources [28]. Before and after studies are generally the preferred source of CMFs, particularly for the HSM. However they typically only look at features in isolation and so when the combined effects of features on crash occurrence is not the sum of the effects of each individual feature, then they may provide misleading results. Several solutions to developing multiple treatment CMFs have been proposed, without reaching definite conclusions [17, 29, 58].

Cross-sectional studies (i.e. the ones based on CPMs) have been criticised for being more prone to non-causal safety effects, due to bias-by selection [11, 19, 36]. Bias-by-selection can occur when a treatment (e.g. a crash barrier) is applied more often to sites that already have a crash problem than to those that do not. They do however provide a much better crash prediction for the combination of road features. In some cases, CMFs are developed from CPMs where limited before and after studies are available.

Although the practice of deriving crash modification factors (CMFs) from cross-sectional CPMs has been criticised, it is relatively common. Again, there are various approaches: for example, Park etal. [58] tested six different methods of combining CMFs and concluded that one should not rely on only one of them. Interim solution is applying ‘rule-of-thumbs’ , such as using the product of no more than three separate independent countermeasures [55] or reducing the product through multiplying by a ratio 2/3 [76].

3.9 Using CPM tools

The above-mentioned analytical steps (data preparation, exploratory analysis, modelling, calculations) are typically conducted in statistical software or spreadsheets. Nevertheless, for an end user it is beneficial to be able to visualize the results. These may take form of tables or map outputs, for example the identified hotspots or the lists of ranked segments. A number of practitioner tools are worthy of mention, especially as they apply to network screening and analysis of safety impacts of potential treatments.

One option is using stand-alone software solutions, such as the following two from the USA:

  • IHSDM Crash Prediction Module [20] estimates the frequency and severity of crashes on a highway using geometric design and traffic characteristics. This helps users evaluate an existing highway, compare the relative safety performance of design alternatives, and assess the safety cost-effectiveness of design decisions.

  • SafetyAnalyst (commercial software) Network Screening Tool [21] identifies sites with potential for safety improvement. In addition, it is able to identify sites with high crash severities and with high proportions of specific crash types.

Note that there are close links between IHSDM, SafetyAnalyst and Highway Safety Manual. According to Harwood etal. [31], SafetyAnalyst Module 1 (network screening) is to be applied first, followed by Module 2 (diagnosis and countermeasure selection), Module 3 (economic appraisal and priority ranking) and IHSDM to perform safety analyses as part of the design process.

The Finnish evaluation tool TARVA [60] also deserves mentioning. Its purpose is to provide a common method and database for (1) predicting the expected number of crashes, and (2) estimating the safety effects of road safety improvements. Based on simple CPMs and pre-determined CMFs, it currently exists in Finnish and Lithuanian versions, with planned applications in other countries.

Capabilities of network screening and road safety impact assessment are built in commercial software PTV Visum Safety. There are also applications in the form of Excel spreadsheets, for example British COBALT, Swedish TS-EVA or Norwegian CPMs for national and country roads [37, 38]. In the US, spreadsheets were developed for safety analysis of freeway segments and interchanges (ISAT [75] and ISATe [6]).

The Australian National Risk Assessment Model (ANRAM) tool, available to road agencies, is a network screening and prioritisation tool, which uses CPMs for different road stereotypes, together with CMFs and observed crash data to estimate severe injury crashes across segmented road network [41]. ANRAM allows users to develop and estimate benefits of road network and corridor treatment programs. This tool has gained wide use among state road agencies in Australia, particularly for the rural road networks where actual severe crashes are randomly distributed. ANRAM is available in a spreadsheet form, with planned online adaptations.

New Zealand also has a history of various safety prediction tools. Turner etal. [78] stressed the practical need of such tools and after review of overseas applications, considered IHSDM as worth transferring into New Zealand conditions, for assessing new road designs. A later work [80] reviewed New Zealand spreadsheet applications, as well as experience with using and calibrating the ISAT tool from the USA.

Increasingly, online business analytics software has been used to display CPM results in map format, often with dynamic filtering and computational functions. Examples include open source and free resources such as ArcGIS Online, QGIS, Tableau, or Microsoft Power BI. These solutions make it easy for practitioners to access and understand the value of CPMs.

4 Challenges and opportunities

The review has presented an opportunity to synthesise the key challenges practitioners are likely to face in translating the scientific state-of-the-art into practice. Opportunities and potential solutions are proposed for addressing these challenges and making CPMs more accessible to road safety practitioners – see Table 1.

Table 1 Overview of identified challenges, opportunities and potential solutions

5 Summary and conclusions

Greater uptake of state-of-the-art analytical techniques is necessary for continuing improvement in road safety. This paper aimed to improve practitioner understanding of modelling road safety performance using CPMs, so that this useful analytical technique could become more accessible.

A number of steps have been reviewed: from data collection and road network segmentation to choosing variables and function forms, validating models and using them in practice, including description of available tools. The review highlighted that developing CPMs is not a straightforward task: there are many alternative choices and decisions to be made during the process (without definite guidance), which explains the diversity of approaches and techniques. While this may be interesting from a research perspective, the current diverse state-of-the-art limits understanding and application by practitioners, and complicates international comparability or transferability. There is a need to identify the opportunities and solutions, which will be scientifically sound, while also meeting the needs of practitioners.

The main consideration for the researches should be application of their models by intended practitioners. This applies equally in the context of basic research, such as seeking understanding of a new challenge, as in the context of applied research such as development of algorithms for inclusion in practitioner software. Either way the end users of CPMs are the practitioners, i.e. road agency engineers, policy makers, or data analysts.

The review aimed to improve practitioner understanding of CPMs to bolster their use in improving road safety. The question of how and why should practitioners consider using CPMs could be answered as follows:

  • CPMs are valuable tools, which help link crashes with risk factors. This is especially valuable in current conditions of scattered crash occurrence (less crash black-spots), where traditional crash-based approaches do not work well.

  • Developing and using CPMs has its challenges. However, these may be overcome by improved communication of the CPM benefits and application, so that practitioners have a basic understanding of CPMs and can make basic application decisions (e.g. use or calibrate available models).

  • Applying network-wide CPMs enable performing effective road safety impact assessment and network screening.

  • Ongoing investment in developing CPM-based practitioner tools, big data management and visualisation platforms offers potential for improved accessibility and uptake of CPMs in road safety management.