Background

It is generally recognised that as well as studying the prevalence of medication errors, their clinical importance must be taken into account when comparing drug distribution systems or assessing the effects of interventions [1, 2]. Medication errors range from those with very serious consequences to those that have little or no effect on the patient. Assessing the clinical importance of errors therefore increases the clinical relevance of studies’ findings compared with studies based on prevalence alone. In many studies, actual outcomes are not known, either because there is no longitudinal patient follow-up or because researchers intervene to prevent errors from causing patient harm. Methods of measuring the potential severity, or clinical importance, of medication errors are therefore needed to evaluate the effectiveness of interventions designed to reduce them.

A systematic review of methods for measuring the clinical importance of prescribing errors identified a wide range of available methods but no comparative studies [3]. While similar principles are likely to apply, no comprehensive review has focused on methods to assess clinical importance of medication administration errors. However, in the studies included in a systematic review of the prevalence and nature of medication administration errors, Keers et al. [4] noted that the two most commonly cited severity assessment methods were the US National Coordinating Council for Medication Error Reporting and Prevention (NCC MERP) severity index [5] and Dean and Barber’s method [6]. The former is an ordinal scale designed to be used by local staff who would usually have knowledge of actual outcomes and other contextual information, and the latter is an interval scale used by experts using descriptions of the errors without knowledge of their outcomes. In this paper, we describe a comparison of these two methods of assessment, using a dataset of errors identified in the administration of intravenous infusions in English hospitals, to inform future comparisons between studies and to shed further light on the strengths and weaknesses of each.

Methods

Data source

We used data from an observational point prevalence study of the frequency and types of errors in the infusion of intravenous medication, as described in detail elsewhere [7]. The study took place between April 2015 and December 2016 in multiple clinical areas (general surgery, general medicine, critical care, paediatrics and day-case chemotherapy units) at 16 purposively sampled English hospital sites. Two data collectors at each site, usually a nurse and a pharmacist, observed medication being infused at the time of their visit to each clinical area. The data collectors compared the medication being administered against the patient’s prescribed medication and relevant medication administration policies to identify any errors. This included a comparison of the medication or fluid name, the concentration, and rate of infusion. Data were collected from 1326 patients and 2008 infusions in total. Details of any errors identified were recorded in an online database.

Assessing the potential clinical importance of the errors identified

First, errors were classified by the local data collectors using an adapted version of the NCC MERP index. The original NCC MERP index comprises nine discrete categories ranging from A (circumstances or events that have the capacity to cause error) to I (an error occurred that resulted in the patient’s death) [5]. The adaptation rephrased the descriptions of each category to allow for ratings to be based on the most likely outcome of the error in terms of patient harm if it had not been intercepted, rather than actual patient harm; the original and adapted categories are described elsewhere [7]. Guidance was provided to data collectors on how to categorise errors, including illustrative examples. The two data collectors at each participating site therefore discussed each error and reached a consensus on the severity rating according to NCC MERP. If a consensus could not be reached or if major inconsistencies were identified between sites, final assignment of clinical importance was determined by consensus among clinical members of the research team. For the present analysis, only events that were identified as errors and that reached the patient were included; any events that would have been considered A or B were therefore excluded from our dataset.

Second, we combined data from all sites onto one database and applied the Dean and Barber method for assessing the potential clinical importance of medication administration errors. This method was developed and validated in the UK and involves four experienced healthcare professionals each assessing each error on a 0 to 10 visual scale, where 0 represents an error with no potential consequences to the patient and 10 an error that would result in death [6]. The mean score across the four judges is then used as an index of clinical importance, with the requirement for four judges calculated using generalizability theory to take into account inter-rater variation [6]. Scores of less than 3 are considered to be ‘minor’, those between 3 and 7 to be ‘moderate’ and those above 7 to be ‘severe’ [6]. We recruited two experienced clinical pharmacists and two experienced nurses as the four judges. Judges were given a description of each error, blinded to the NCC MERP scores previously allocated, and asked to rate each on the 0 to 10 scale. If identical errors occurred several times, only one was assessed.

Comparing the methods for assessing the potential clinical importance of errors

Scores from both methods were entered into SPSS (version 21) for descriptive analysis. A scatter plot was produced to allow visual comparison between the two sets of scores, and the correlation between them was assessed using Spearman’s correlation coefficient.

Results

In total, 155 errors were assessed. Using the NCC MERP method, 137 (88%) were rated C (‘an error occurred but was unlikely to cause harm despite reaching the patient’), 17 (11%) rated D (‘an error occurred that would be likely to have required increased monitoring’) and one (1%) rated E (‘an error occurred that would be likely to have caused temporary harm’). Using the Dean and Barber method, scores ranged from 0 to 4.75 with a mean of 1.7, with 138 (89%) errors rated minor and 17 (11%) moderate. Of the 17 errors rated ‘moderate’ on the Dean and Barber scale, 11 were rated C using the NCC MERP method, five as D and one as E. Of the 17 errors rated D using the NCC MERP method, 11 were rated minor using the Dean and Barber method, and 6 as moderate. Figure 1 presents the scores using each method. Of the 137 errors scored as C using NCC MERP, and the 138 errors classified as minor using the Dean and Barber method, 127 were common to both. Scores from the two methods were significantly but weakly correlated (Spearman’s rho = 0.36, p < 0.01).

Fig. 1
figure 1

Dean and Barber scores versus NCC MERP severity scores. Some symbols represent more than one error

Discussion

We compared two methods for assessing the clinical importance of a sample of 155 medication administration errors and found significant but weak correlation between them. Most errors were of relatively minor clinical importance, with 88% scoring C on the adapted NCC MERP scale and 89% being classified as minor on the Dean and Barber scale. While there was weak correlation between scores obtained from the two methods, a scatter plot suggests this is too weak to allow direct comparison of results using the two methods.

It is not possible to determine whether the lack of correlation arose due to characteristics of the scales themselves or the people who assessed the errors. This is because we used the scales in line with their intended method for use rather than simply asking the same judges to use two different scales. The NCC MERP scale is generally used by local clinicians whereas the Dean and Barber scale is applied by experts viewing a description of each error; our study reflected this difference. Both approaches have advantages and disadvantages. Scoring by local clinicians means that full contextual knowledge about the error can be taken into account. However, using large numbers of different assessors could potentially result in a lack of consistency. The Dean and Barber method potentially provides more granularity and permits more robust statistical analysis due to it being an interval scale. It may therefore be more suitable for use in research. Conversely, the NCC MERP method is less time consuming and may be more suitable for use in routine clinical practice. If rating is conducted by clinical staff close to the time of the incident, it may also be possible to take into account more contextual information than that available retrospectively.

Strengths and limitations

Strengths of this study are that it is, to our knowledge, the first to compare two different methods of assessing the potential clinical importance of medication errors. The data were gathered from 16 different clinical sites. However, a limitation is that the majority of errors were of minor clinical importance. This means that the comparisons were limited to less severe errors rather than those across the whole range of potential clinical outcomes. Further work is needed comparing a wider range of errors in terms of their clinical importance.

Assessing the potential severity of error is a complex judgement. It will be influenced by who is judging and the information they have about the error and its context. For example, doses that are unusual but clinically appropriate for one patient could be harmful to another. These variations between methods need to be understood more deeply in future work.

Conclusions

Scores from the adapted NCC MERP and Dean and Barber methods are only weakly correlated in the assessment of medication administration errors. In the absence of a uniformly agreed standard method for assessing errors’ clinical importance, researchers should be aware that comparisons between studies are likely to have limitations. In the meantime, choice of method should take into account the purpose for which clinical importance is being assessed.